profile-pic

Mihir Purwar

AI Engineer Specializing in Generative AI, NLP, and AI Product Excellence
  • Role

    Senior AI Engineer

  • Years of Experience

    6 years

Skillsets

  • XgBoost
  • Machine Learning
  • Generative AI
  • Microsoft Excel
  • TensorFlow
  • PyTorch
  • NLP
  • Docker
  • Python
  • Scikit-learn - 6 Years
  • LLMs - 4 Years
  • TensorFlow - 6 Years
  • PyTorch - 6 Years
  • Python - 6 Years
  • Llm-inference
  • Llm as judge
  • A/B testing
  • Transformers - 6 Years
  • Statistics
  • Random Forest
  • rag
  • Prompt Engineering
  • OpenAI
  • LightGBM
  • LangGraph
  • embeddings
  • clustering
  • Classification
  • Catboost
  • BERT
  • Anthropic
  • agentic frameworks

Professional Summary

6Years
  • Dec, 2021 - Present3 yr 10 months

    Senior Data Scientist

    Innovaccer
  • Jul, 2019 - Dec, 20212 yr 5 months

    Data Science Associate Consultant

    ZS Associates

Work History

6Years

Senior Data Scientist

Innovaccer
Dec, 2021 - Present3 yr 10 months

    Spearheaded a team of 2 MLEs and 1 SDE in Analytics R&D, working on GenAI and NLP initiatives from POC to production

    and fostered strong collaboration across Product, Engineering, GTM, and key business stakeholders to align technical

    execution with product vision.

    Architected a GenAI-powered Pop Health Copilot using Chain-of-Thought prompting and multi-agent orchestration for

    NL2SQL, insights generation, and visualizations in a conversational interface.

    Boosted NL2SQL accuracy with 2,500+ custom instructions, dynamic guideline selection, and abstraction over 37+ tables -

    achieving 89% query acceptance and reducing latency from 130s to 40s.

    Engineered scalable architecture for chatbot with MongoDB (conversation logs), Redis (session management), AWS S3

    (knowledge base), and Snowflake (insights), ensuring 99% uptime and 56% faster response time.

    Led evaluation of vector DBs (Pinecone, FAISS, Milvus, ChromaDB), finalizing Milvus and cutting 82% resource utilization via

    optimized data loading strategies.

    Built a prompt and config versioning system with CI/CD integration, reducing release cycles from 2-3 days to under 30 mins

    and enabling rapid, agile NLP experimentation.

    Accelerated OCR inference by optimizing the PARSeq model with TensorRT and deploying it via Triton Inference Server (fp16

    quantized), achieving 23-25 RPS with <1s response time - a 2.8x speedup over the baseline.

    Carried out POCs for SLMs (small language models) like Qwen 2.5, NuExtract 1.5 using vLLM, Triton, SageMaker, and RunPod,

    for future GenAI scalability, cost, and performance.

    Evaluated LLM observability frameworks (LiteLLM, TrueFoundry) with senior architects to enhance traceability and observability.

Data Science Associate Consultant

ZS Associates
Jul, 2019 - Dec, 20212 yr 5 months

    Built an NLP-based information retrieval system using BioBERT and fine-tuned Spacy NER to extract insights from clinical trial

    documents (PDFs, ClinicalTrials.gov, PubMed); reduced manual effort by 60%, reduce trial design time from weeks to hours;

    featured on Cision PR Newswire.

    Developed a sales optimization pipeline on Dataiku using patient-level data, feature engineering, XGBoost, and unconstrained

    optimization to generate timely triggers and rank physicians, achieving 74% recall and boosting sales by 18%.

Major Projects

3Projects

Simulate fish swarm behaviour using Q-Learning

    Developed an algorithm to simulate fish swarm behaviour using Q-Learning as part of final year project.

Automatic Text Summarization using Deep Learning

Jul, 2018 - Present7 yr 3 months

    Automatic Text Summarization using Deep Learning

Python

Nov, 2018 - Dec, 2018 1 month


    Solution of Differential Equation using Newton Raphson Method - Python

Education

  • Bachelor of Electronics and Communication

    Jaypee Institute of Information Technology (2019)