Dixant Mittal ☕️

Dixant Mittal

Dixant works at the intersection of research and industry — currently as a Lead Data Scientist at Paytm, where he focuses on large language models and generative AI, and previously as a Research Scientist at Moovita, building decision-making systems for autonomous vehicles.

He received his Ph.D. in Computer Science from the National University of Singapore, advised by Professor Wee Sun Lee. He holds a Master’s degree in Computer Science from the National University of Singapore, where he was advised by Professor David Hsu, and a Bachelor’s degree in Information Technology from National Institute of Technology Kurukshetra.

His research interests span reinforcement learning, planning & search, and large language models — broadly, how intelligent systems can reason and act effectively in complex, uncertain environments.

Recent Publications

(2025). Learning to Search from Demonstration Sequences. In ICLR.

PDF Cite

(2024). EVaDE: Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning. In ACML.

PDF Cite

(2023). ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search. In AAMAS.

PDF Cite

(2020). INGRESS: Interactive visual grounding of referring expressions. In IJRR.

PDF Cite

Experience

 
 
 
 
 
Lead Data Scientist
May 2025 – Present Gurugram, India

Fine-tuned Small Language Models (SLM) for Paytm Chatbot

  • Owned end-to-end development of a 4B-parameter SLM that replaced the production Llama-70B chatbot stack, beating GPT-OSS 120B by +2.5 pts on Paytm’s customer-care eval while cutting annual inference cost ~97% ($2M → $60K).
  • Designed and trained a 3-phase pipeline: teacher-aligned supervised fine-tuning, generalized knowledge distillation (GKD) and data aggregation (DAgger) to counter distribution shift, and RL alignment using modified PPO against human (CSAT) and LLM-judge reward signals.
  • Built the LLM-as-Judge evaluation framework — a composite score of 6 instruction-following metrics — to benchmark model quality and drive data-driven iteration between training phases.
  • Architected multi-GPU training and unified-LoRA serving via vLLM across 40 business verticals; production inference sustains p95 TTFT of 120ms on 10K-token prompts, 25ms ITL, and 3,000 tok/s peak throughput at 64 concurrency on 2×H200.
  • Engineered a continual-learning loop that fine-tunes the live model on a blended CSAT and eval-score reward, with a staged rollout policy (5% → 25% → 50% → 100%) gated on eval-score gain.

Paytm Playback — Personalized Rap Song Generation

  • Owned model development of Paytm Playback, a personalized rap-song generator that turns a user’s recent transactions into custom Hindi lyrics and music; ~4M songs generated since launch.
  • Fine-tuned the open-source ACE-Step diffusion transformer on a custom synthetic audio–lyric dataset to fix poor Hindi pronunciation in the base model, yielding production-quality Hindi rap generation.
  • Engineered a scalable inference pipeline using Ray Serve with dynamic batching, sustaining real-time low-latency generation through launch-day traffic spikes.
  • Reduced per-song generation cost from ₹1.92 → ₹0.25 (~87%) through model optimization, mixed-precision inference, and batch-level parallelization.
  • Delivered measurable product lift: +15% engagement, +12% retention, and a 40% song-share rate.
 
 
 
 
 
Research Scientist
October 2023 – April 2025 Singapore

Scene Foundation Model for Autonomous Vehicles

  • Designed a foundation model for driving-scene understanding, representing the road network, vehicle positions, curbs, and other items-of-interest as a heterogeneous graph with typed nodes and relational edges (e.g. vehicle-on-lane, lane-leads-to-lane).
  • Built a graph neural network backbone over the scene graph to learn unified embeddings capturing both geometric and topological relationships between scene entities.
  • Pre-trained via self-supervised learning using a dual masking objective — masked node-attribute reconstruction and masked edge-existence prediction — to learn rich, transferable scene representations without labelled data.

Probabilistic Intention-aware Decision Maker

  • Developed a neural decision-making module that explicitly models uncertainty over surrounding actors’ intentions (turn left/right, go straight, yield) and selects ego high-level actions conditioned on this uncertainty.
  • Designed a two-headed transformer: one head maintains a Bayesian posterior over neighbouring agents’ intentions, updated via Bayes’ rule as new observations arrive; the second head outputs the ego decision conditioned on the current scene and the posterior.
  • Achieved dynamic conservatism — cautious when actor intentions were uncertain, assertive when confident — yielding faster mission-completion time and smoother negotiation at comparable collision rates to the production planner.

Overtake Confirmation Network

  • Built a neural decision network based on a scene transformer architecture that ingests the full surrounding environment — lanes, vehicles, pedestrians, and traffic-light signals — and outputs a discrete action: overtake-left, overtake-right, or maintain lane.
  • Pre-trained via imitation learning on expert driving trajectories, then fine-tuned with RL using a modified asynchronous PPO designed to hide simulator latency and keep GPU utilisation high.
  • Outperformed the production deterministic planner on all key metrics: lower collision rate, smoother motion profile, greater distance covered, and faster mission-completion time.
 
 
 
 
 
Lecturer
July 2020 – September 2024 Singapore

Developer Toolkit #2 – Backend Programming

Co-lectured a 6-session course on backend development for the NUS-FintechSG programme, covering the full stack of components needed to build and expose a backend system to client applications.

 
 
 
 
 
Research Intern
October 2021 – May 2022 Singapore

Differentiable Online Search in Latent State Space

  • Designed a neural network architecture that bakes the inductive bias of Monte Carlo Tree Search directly into its computation graph, performing online search over a learnt latent world model.
  • The full search procedure — node selection, expansion, rollout, and back-up — was implemented as differentiable operations, yielding an end-to-end differentiable architecture whose search behaviour could be optimised jointly with the policy and Q-value heads.
  • Evaluated on grid-world navigation and Sokoban planning tasks, where the model consistently outperformed a vanilla baseline with identical capacity but no built-in search inductive bias.
 
 
 
 
 
Teaching Assistant
January 2019 – May 2022 Singapore
  • CS3243 – Introduction to Artificial Intelligence: Teaching and grading assignments.
  • CS5339 – Theory and Algorithms for Machine Learning: Grading assignments and course consultation.
  • IS5006 – Intelligent System Deployment: Grading assignments, creating course material and demo codes.
 
 
 
 
 
Research Intern
December 2017 – September 2021 Singapore

Intuitive Motion Prediction Network

  • Implemented a pedestrian motion-prediction module operating on a 2D bird’s-eye-view grid around the ego vehicle, producing goal-conditioned future trajectory distributions for downstream planning.
  • Coupled a custom Value Iteration Network for grid-based path computation with a Bayesian filter that maintains and updates a posterior over each pedestrian’s hidden goal location as new observations arrive.

Object Detection Network

  • Developed a custom YOLO-based object-detection network for autonomous vehicles, pruning the class taxonomy to AV-relevant categories to free up model capacity for the classes that matter.
  • Re-architected the backbone for embedded deployment: pruned layers and channel widths, removed heavy ConvNeXt blocks, and added a Feature Pyramid Network (FPN) for multi-scale detection, trading depth for latency without sacrificing recall on small objects.
  • Optimised and deployed on Google’s Coral Edge TPU, achieving real-time on-vehicle detection at 50 FPS and 0.72 mAP.

High Level Path Generation

  • Developed a high-level online path planner for autonomous vehicles using the A* algorithm, leveraging the road-network graph and real-time vehicle location to produce route plans on demand.
 
 
 
 
 
Senior Software Engineer
March 2017 – July 2017 Gurgaon, India

Trending Train Searches

  • Built a service in Java Spring Boot that consumed the live train-search event stream from Kafka and mined the most frequently searched trains and stations, surfacing them as “Trending Searches” on the Ixigo UI.

Scalable Task Scheduler

  • Designed and built a horizontally scalable task scheduler in which executor workers pull the next available task from a Kafka queue and run it, allowing consumers to be scaled up or down on demand.
  • Designed the scheduler as a generic, task-agnostic primitive that any team could plug into for deferred execution, scheduled retries, or long-running async workloads — removing the need for service-specific scheduling logic.
 
 
 
 
 
Software Engineer
July 2015 – March 2017 Gurgaon, India

Identity Management System (IMS)

  • Contributed to Snapdeal’s in-house identity platform (IMS) serving a user base of 25 million.
  • Shipped OAuth integration for third-party login and built a reusable data validation framework with structured error mapping for the IMS APIs.

Education

 
 
 
 
 
Doctor of Philosophy (Ph.D.) in Computer Science
January 2019 – June 2024 Singapore

Thesis: Combining Planning and Learning to Improve Decision Making

Advisor: Prof. Wee Sun Lee

 
 
 
 
 
Master of Computing (M.Comp.) in Computer Science
July 2017 – December 2018 Singapore

Thesis: Active Information Gathering to Disambiguate Referring Expressions

Advisor: Prof. David Hsu

 
 
 
 
 
Bachelor of Technology (B.Tech.) in Information Technology
July 2011 – May 2015 India

Projects

Contact