About me

I'm a Certified Data Scientist and Data Engineer with 3+ years of experience in building end-to-end data pipelines and machine learning systems, currently pursuing M.Tech in Computer Science and Engineering at IIT Kanpur. I thrive at the intersection of data, algorithms, and real-world impact—transforming raw data into actionable insights and scalable AI solutions.

Throughout my professional journey, I collaborated closely with leadership and architected scalable ETL workflows. In addition, I developed machine learning models that reduced operational costs, enabled data-driven decision-making, and solved critical business problems.

My expertise spans Python for ML and backend development, Big Data tools (Airflow, Spark), LLM fine-tuning (PEFT, LoRA), Spatio-Temporal Forecasting and Kriging. I’m especially passionate about Deep Learning, Generative AI, LLMs, RAG, and agentic AI systems.

When I'm not training models or optimizing pipelines, you’ll find me trekking, hitting the gym, traveling to offbeat destinations or spending time with my furry friends 🐶.

I’m open to collaborations, research, and roles that challenge me to create meaningful impact through data and AI.

My Craft

  • data engineering icon

    Data Engineering & Backend Development

    I architect scalable data pipelines and backend systems turning data into actionable insights and integrating services for real-world impact. Experienced with ETL using Airflow, DataProc, PySpark & REST APIs (FastAPI, Flask etc).

  • ML Engineering icon

    Data Science & ML Engineering

    I design and train ML models for predictive analytics and optimizations. Skilled in handling large-scale datasets, feature engineering, and applying advanced methods to deliver measurable improvements across domains.

  • computer vision icon

    Deep Learning & Computer Vision

    I build state-of-the-art deep learning solutions for image understanding. My work spans CNNs, Vision Transformers, and hybrid models for sketch recognition, detection, and classification tasks.

  • LLM icon

    LLMs, Agentic AI & RAG

    I specialize in fine-tuning LLMs, NLP, designing RAG pipelines, and building multi-agent systems. I also develop intelligent chatbots and personalized assistants that blend reasoning, knowledge grounding, and conversational fluency.

  • research icon

    Spatio-Temporal Forecasting at Centre of Excellence- ATMAN, IIT Kanpur

    At the Centre of Excellence- ATMAN, National Aerosol Facility, IIT Kanpur, I work on cutting-edge research in spatio-temporal forecasting and environmental data modeling. My focus is on developing Graph Neural Network- and GRU/Transformers-based models for PM2.5 prediction, enabling actionable insights for air quality and climate research.

Recommendation

💬

"If there was one word to describe Aryan, it would be - Ninja! He is super fast in delivering high-impact outcomes, and even faster and more enthusiastic in learning new things on the go. A genuine hustler with a smart and sharp mind, combined with an insane sense of ownership that he walks with, Aryan is the perfect teammate anyone could ask for. Working alongside him on the very many projects has been an absolute delight, and I am sure that he will achieve big in life."

View on LinkedIn

Resume

Achievements

  1. Academic Excellence Award

    2024

    Received for exceptional performance at IIT Kanpur

  2. GATE CS 2024 - AIR 231

    2024

    Secured All India Rank 231 among 1.23 Lakh candidates

  3. ESOP Grant

    2022

    Awarded at Classplus for delivering a high-impact Moonshot project

  4. Third Rank in BE Program

    2020

    Secured third rank in Bachelor of Engineering degree program

  5. Promising Secretary Award by Rotary International

    2019

    Promising Secretary Award by Rotary International District 3190 Council for leading multiple community service projects

  6. Distinction and Scholarship in Class XII Exams

    2016

    Secured Distinction Award and Scholarship in Class XII Standard Examination by Brij Bhushan Lal Public School.

Education

  1. Master of Technology- Computer Science and Engineering

    Indian Institute of Technology, Kanpur
    2024 — Present
    • CPI: 9.57/10
    • Received Academic Excellence Award for exceptional performance at IIT Kanpur (2024).
    • Courses explored: Design and Analysis of Algorithms, Large Language Models, Deep Learning for Computer Vision, Data Mining, Introduction to Machine Learning, Parallel Computing, and Big Data and Visualization.
  2. Bachelor of Engineering

    R.V. College of Engineering, Bengaluru (Visvesvaraya Technological University, Belagavi)
    2016 — 2020
    • CGPA: 8.82/10
    • Third Rank Holder in UG Degree Program

Work Experience

  1. Data Engineer — Classplus

    May 2023 — Feb 2024
    • Designed and implemented scalable ETL pipelines and automation scripts using Airflow that processed events for 5M+ DAU with 99% uptime, thus reducing operational overhead.
    • Modeled the CRM scoring algorithm, that boosted sales by nearly 15% through predictive modeling of user behavior.
  2. Jr. Data Scientist- Classplus

    Aug 2022 — Apr 2023
    • Collaborated directly with Founders’ office to design and deploy machine learning and analytical models to optimize operational workflows and drive new revenue streams. Key projects included: Hedwig (94% Accuracy- Text Classification), Exam Seasonality Analysis (forecast-driven sales planning)
  3. Business Analyst- Classplus

    Sep 2021 — Aug 2022
    • Handled day-to-day Product operations, analytics for CXO-level reporting, and automation for daily reports communication to 20k+ clients, saving 10+ analyst hours/week
    • Analyzed new product feature requirements, adoption, and troubleshooting using SQL, NoSQL, Python and BI tools.

My skills

  • Pytorch, Keras, Tensorflow, Langchain, Pykrige, folium
    85%
  • Python automation and backend development
    90%
  • Deep Learning- Computer Vision, Natural Language Processing, Timeseries Forecasting
    95%
  • LLM Fine tuning/ Agentic AI / RAG
    75%
  • Product Recommendation Systems and other Machine Learning algorithms
    90%
  • Airflow / Pyspark / Dataproc
    80%
  • MongoDB (NoSQL)/ SQL / Data Warehousing-OLAP
    95%
  • Cloud Tools- DataProc, GCP tools
    80%
  • FastAPI
    75%
  • Data Analytics and Visualizations- Tableau, BI Tools, Webengage
    90%

Projects

Projects at IIT Kanpur

Spatio-temporal Kriging & Forecasting (Thesis)

Air Quality Demo
Graph Network made over Bihar
  • Designed spatio-temporal PM2.5 forecasting & interpolation pipelines for sensor networks across Bihar and Uttar Pradesh, using ST-LLM, Geostatistical Kriging, Tree based methods (XGBoost, etc.) and diffusion-driven Inductive GNNs.
  • Developed and deployed production-ready APIs powering a real-time pollution maps & hotspot detection system at 500m x 500m grid, to enable early-warning alerts and forecast-assisted policy interventions.
  • Currently extending the research by benchmarking additional foundational models to improve generalizability. Latest experiments demonstrate RMSE less than 40 μg/m3 and MAPE ≈ 30% on real sensor data, indicating strong predictive reliability.
  • Forecasting using ST-LLM+ Graph Neural Networks + PEFT-LoRA Finetuning on GPT2

  • Hybrid ST-LLM + GNN Framework: Designed a novel pipeline where a Spatio-Temporal LLM captures long-range temporal dependencies, while a Graph Neural Network (GNN) encodes spatial correlations across sensors. This joint modeling significantly improved PM2.5 forecasting accuracy compared to standalone temporal or spatial models.
  • Parameter-Efficient Fine-Tuning (PEFT): Applied LoRA-based PEFT on ST-LLM layers instead of full fine-tuning, reducing trainable parameters by >90% while preserving forecasting performance. This enabled efficient adaptation to domain-specific air quality data on limited GPU resources.
Conditional GAN Imputation

    Data Imputation with Generative AI- Conditional GANs

  • Built a cGAN-based imputation framework where the generator reconstructs missing PM2.5, Temp, RH values conditioned on spatio-temporal features (latitude, longitude, time encodings, and neighbor sensor averages). This was used in preparing imputed data to used in different ML models and problems later
  • Time Series Clustering and Airshed

  • Built an LSTM-based temporal embedding model for PM2.5 time series, extracting latent pollution dynamics from multiple sensor locations to represent long-term spatio-temporal dependencies.
  • Applied hierarchical agglomerative clustering (Ward’s linkage + dendrogram cuttings) on LSTM embeddings to identify natural airsheds, grouping locations with shared pollution signatures beyond political boundaries.
  • Developed interactive Plotly dashboards and dendrogram visualizations to explore seasonal airsheds, enabling policymakers to interpret clusters and design regionally coordinated clean-air strategies.
Tech stack: PyTorch, Generative AI (GAN), LLM Finetuning, Transformers, PEFT, GraphSage, PyGeometric, XGBoost, GNNs, Kriging, FastAPI
View Project

Agent-Based Intelligent Tutor Chatbot

  • Built a conversational AI chatbot with LLaMA 3.2 via Ollama and Langchain.
  • It was backed by RAG pipeline (Real-Time web search and uploaded PDFs via FAISS indexing), serving student’s academic queries with a personalized chatbot.
  • Established modular agent-based system to get multi-hop reasoning and tool-use like (Web search, file reader).
Tech stack: LangChain, LLaMA, Ollama, FAISS, RAG, Streamlit
View Project

Doodle-Vision: Air-Drawn Sketch Recognition

How Stroke Info is captured by model
Air Drawn sketch Captured and Classified
  • Real-time sketch classification on QuickDraw dataset with hybrid ConvLSTM, Vision Transformers, and CNN-Transformer architectures for multimodal learning. Achieved 85% accuracy on 50+ classes.
  • Created a real-time drawing interface to capture hand drawn sketches using webcam for object recognition; also explored lightweight models: MobileNet for achieving high accuracy suitable for mobile deployment.
  • The project work is also in pipeline to be submitted for publication in WACV 2025 Conference.
Tech stack: PyTorch,Tensorflow, OpenCV, Vision Transformers, Pre-Trained Models (VGG, ResNet50)
View Project

Tuning LLM via PEFT for IMDB Reviews

  • Fine-tuned T5-small with PEFT methods (Prompt Tuning, LoRA) for sentiment classification.
  • Reduced trainable parameters by ~80% with minimal accuracy loss, making it suitable for edge inference.
    • Tech stack: PEFT, LoRA, HuggingFace, Transformers, Adapters
      View Project

Topic Mining on IRCTC Reviews

  • Scraped Google Playstore Reviews for IRCTC App using Selenium and Beautiful Soup libraries.
  • Implemented topic mining on it using LDA, BERT embeddings, UMAP dimensionality reduction and HDBSCAN clustering for identifying key user concerns and improving service; Identified 66+ niche topics and visualized using t-SNE plots.
Tech stack: Python, BERT, LDA, HDBSCAN, t-SNE, Transformers
View Project

AmazoLens: Amazon Sales Analysis

  • Built recommendation systems, market basket analysis (Apriori, FP-Growth), and customer segmentation from Amazon sales dataset.
  • Created interactive dashboards with Plotly/D3 and ClickHouse backend.
Tech stack: Python, Plotly, D3.js, ClickHouse, Folium, Flask, React
View Project

Work Projects

CRM Scoring Improvement Classplus, Noida

  • Designed predictive scoring model for telesales at Classplus/Testbook.
  • Increased sales conversion by 15%+ by identifying high-intent student leads.
Tech stack: Python, ML, NoSQL MongoDB, PySpark, Dataproc, GCP, Airflow

Project Hedwig: Text Classification Classplus, Noida

  • Deployed model to classify 200+ categories of educational content with 94% accuracy, reducing manual effort and saving KYC costs.
Tech stack: Python, NLP, TF-IDF, Scripting, SQL

Exam Seasonality Analysis Classplus, Noida

Trend Analysis
Final Suggestions Learnt with Peaks superimposing
  • Built demand forecasting pipeline for exam-specific materials. Triggered sales alerts from Google API + transactional data to optimize strategies.
Tech stack: Python, Forecasting, APIs, SQL, Trend Analysis

Contact

Contact Form