Skip to content
View pathak-ashutosh's full-sized avatar
💭
Learning
💭
Learning

Block or report pathak-ashutosh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pathak-ashutosh/README.md

Ashutosh Pathak

I work on large-scale machine learning systems, focusing on the design, training, and deployment of models that operate reliably under real-world conditions. My interests include large language models, multimodal architectures, retrieval-augmented generation, and the data/compute infrastructure required to support them.

My work spans the entire lifecycle of modern ML systems: dataset construction, training pipelines, evaluation methodology, and inference optimization. I care about clarity in system design, reproducibility, and empirical rigor. I also build tools that make model behavior more interpretable and controllable.

I hold a Bachelor's and Master’s in Computer Science (Machine Learning). I write about ML, systems, and experimentation at https://thenumbercrunch.com/.

Side projects include HiveHaven - a lightweight platform for international students seeking housing in the U.S., and PolNet - a data visualization tool for analyzing and visualizing U.S. congressional caucus memberships and political network data.

Current reading: “Build a Large Language Model (From Scratch)” by Sebastian Raschka.


Focus areas

Predictive Modeling, Large Language Models, Multimodal Models, Generative Modeling
Retrieval-Augmented Generation, Vector Search, Data-Centric Evaluation
Training Pipelines, MLOps, Distributed Systems, High-Throughput Inference


Toolchain

Javascript, Python, C/C++, C#, SQL
PyTorch, TensorFlow, Scikit-learn
LangChain, LangGraph, ElasticSearch, Neo4j
Apache Spark, Databricks, Hadoop (HDFS), Postgres, BigQuery
AWS SageMaker, Amazon Bedrock, Vertex AI, Azure ML
Docker, Kubernetes, Git, DVC


Contact

LinkedIn: https://www.linkedin.com/in/pathak-ash/
X: https://x.com/pathak_jeee
Email: ashutoshpathak@thenumbercrunch.com
Writing: https://thenumbercrunch.com/

Pinned Loading

  1. econberta econberta Public

    Robust Extraction of Named Entities in Economics

    Jupyter Notebook 2

  2. clinical-risk-prediction clinical-risk-prediction Public

    Clinical Risk Prediction using EHRs

    Jupyter Notebook 2

  3. spark-movie-recommendation spark-movie-recommendation Public

    A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark

    Python 3

  4. liver-segmentation liver-segmentation Public

    Segment liver using unet architecture. This was a project I did for a senior anonymously as his final year project during my undergrad.

    Jupyter Notebook 2

  5. sentiment-analysis-yelp-reviews sentiment-analysis-yelp-reviews Public

    Perform sentiment analysis on Yelp dataset with Apache Spark

    Python 1