About Me

I am Mohammad Mozaffari, an ML Researcher at ElastixAI. I received my PhD in Computer Science from the University of Toronto, supervised by Professor Maryam Mehri Dehnavi. I got my B.Sc. in Electrical Engineering with a minor degree in Computer Engineering from the University of Tehran.

My research focuses on the "Compression Trinity" for Large Language Models: the interplay of sparsity, quantization, and low-rank approximations to make LLMs faster and smaller. My work has been featured by NVIDIA Research and the official PyTorch blog. You can explore it here: The Compression Trinity for LLMs.

Publications

Invited Talks

  • PATCH: Learnable Tile-Level Hybrid Sparsity for LLMs — NVIDIA Research, Seattle Oct 2025
  • Compression Trinity: Interplay of Sparsity, Quantization, and Low-Rank Approximation for LLMs — Cerebras, Toronto Mar 2025
  • Efficient LLM Training and Inference: Sparsity, Quantization, and Low-Rank Approximation — Google DeepMind, Seattle Mar 2025
  • Enabling Semi-structured Sparsity in LLMs — NVIDIA Research, Seattle Mar 2024
  • Communication-Efficient Second-Order Optimization Methods — Rutgers University, New Jersey Nov 2023

Media & Outreach

Mentorship

Mentored 7 undergraduate and master's students on projects related to LLM compression. Two mentees were admitted to Stanford for graduate studies.

Experience

ML Researcher at ElastixAI Dec 2025 – Present

  • Research and develop compression techniques for efficient deployment of large language models.
  • Investigate Mixture-of-Experts architectures, including token routing, kernel design, and dispatch optimization.
  • Collaborate directly with the CTO and CEO on research direction and production integration.

Research Intern at Autodesk Aug 2022 – Dec 2022

  • Reduced multi-GPU simulation time from 4h to 3.2h via CUDA kernel optimization and profiling.
  • Designed kernel fusion and memory coalescing strategies, reducing bandwidth by 30%.
  • Improved inter-GPU synchronization and dataflow using Nsight Systems.