Satya Krishna Gorti

Senior Machine Learning Scientist @ Layer6 AI

Large Language Models • Multimodal Learning • Representation Learning • Applied ML Systems

Email GitHub LinkedIn Twitter/X Scholar Resume (PDF)

MSc. in Applied Computing - University of Toronto • satyag [at] cs [dot] toronto [dot] edu

Bio

I am a Senior Machine Learning Scientist at Layer6 AI, where I lead the LLM Training team. My work focuses on building and training large language models that power internal retrieval-augmented generation (RAG) and agentic systems across the organization. Previously at Layer6, I designed and built machine-learning frameworks for training, deploying, and monitoring models in production cloud environments, and worked on applied ML use cases including credit risk modeling and vehicle loss prediction from images. I hold an MSc in Applied Computing from the University of Toronto. Earlier in my career, I was a Research Intern at Uber Advanced Technologies Group (ATG), where I worked on multi-object tracking using LiDAR and radar sensors for autonomous driving systems. My research interests span large language models, representation learning, multimodal understanding, and applied ML systems.

Research

NAACL 2025 Albuquerque, NM Text-to-SQL Oral

MSc-SQL: Multi-Sample Critiquing Small Language Models for Text-to-SQL Translation

We develop small, efficient, open-source text-to-SQL models by sampling multiple candidate SQL generations and critiquing them using schema and metadata. The critiquing model evaluates multiple outputs jointly, achieving state-of-the-art performance among open models while remaining cost-effective.

Paper Code

Also abridged at NeurIPS 2024, Table Representation Learning Workshop (Workshop Oral) — Vancouver, BC.

NeurIPS 2024 Workshop Vancouver, BC Diffusion / ODE

Inconsistencies in Consistency Models: Better ODE Solving Does Not Imply Better Samples

Consistency model distillation targets the probability flow ODE of a diffusion model. We show that improving ODE-solving accuracy can significantly worsen sample quality, challenging common intuitions about why consistency models work in practice.

Presented at: Attributing Model Behavior at Scale Workshop; Fine-Tuning in Modern Machine Learning Workshop — Vancouver, BC.

CVPR 2024 Seattle, WA Multimodal Spotlight

Data-Efficient Multimodal Fusion on a Single GPU

FuseMix is a multimodal augmentation method operating in the latent spaces of arbitrary pre-trained unimodal encoders. It achieves competitive retrieval performance with dramatically reduced compute and data.

Paper Code

ICML 2023 Honolulu, HI Generative Modeling

TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation

TR0N turns pre-trained unconditional generators into conditional models using a lightweight stochastic translator and Langevin refinement, enabling zero-shot conditional generation.

Paper Code

CVPR 2022 New Orleans, LA Text–Video Retrieval

XPool: Cross-Modal Language-Video Attention for Text-Video Retrieval

XPool allows text queries to attend to the most relevant video frames via cross-modal attention, producing a text-conditioned video representation with strong retrieval performance.

Paper Code

CVPR 2021 Nashville, TN Video Understanding Weak Supervision

Weakly Supervised Action Selection Learning in Video

We propose Action Selection Learning (ASL) for temporally localizing actions in untrimmed videos using only video-level class labels as weak supervision. ASL improves over strong baselines on THUMOS-14 and ActivityNet-1.2.

Paper

ICCV 2019 Workshop Seoul, South Korea YouTube-8M Temporal Localization

Cross-Class Relevance Learning for Information Fusion in Temporal Concept Localization

We present a framework for temporal concept localization that leverages cross-class relevance learning for information fusion, achieving strong results on YouTube-8M.

Paper Workshop

NeurIPS 2019 Vancouver, BC Image Retrieval Graph Learning Oral

Guided Similarity Separation for Image Retrieval

We propose a graph convolutional approach that incorporates neighborhood information into image descriptors for retrieval. We introduce an unsupervised objective that encourages pairwise separation of image similarities, improving retrieval quality.

Paper

CVPR 2019 Workshop Long Beach, CA Image Retrieval Semi-Supervised

Semi-Supervised Traversal in Image Retrieval

We introduce a semi-supervised extension to Explore-Exploit Graph Traversal (EGT) for image retrieval, enabling improved traversal and retrieval performance when only limited supervision is available.

Paper Workshop

Optimization SGD arXiv Preprint

Online Algorithm for Adaptive Learning Rate

We study an online approach to adapt learning rates in stochastic gradient descent using first- and second-order approximations, analyzing behavior across convex and non-convex settings.

arXiv GitHub

Generative Modeling GAN arXiv Preprint

Text-to-Image-to-Text Translation Using Cycle-Consistent Adversarial Networks

We improve text-to-image synthesis using cycle consistency by mapping generated images back into text space, encouraging semantic alignment between text and visual domains.

arXiv GitHub

Ground Truth Caption	Generated Image	Generated Caption
the flower has long yellow petals that are thin and a yellow stamen		this flower has petals that are yellow and very thin
there are many long and narrow floppy pink petals surrounding many red stamen and a green stigma		this flower has petals that are red with pointed tips

Qualitative results shown inline as the project figure.

Talks & Presentations

TMLS 2019, Toronto, ON — Temporal Concept Localization on YouTube-8M — Video
ICCV 2019, Seoul, South Korea — YouTube-8M 1st Place challenge presentation
CVPR 2019, Long Beach, CA — Semi-supervised EGT for landmark retrieval — Slides
Review of GANs for Sequences of Discrete Elements with Gumbel-Softmax Distribution — Slides