Satya Krishna Gorti

Research

MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Text-to-SQL generation enables non-experts to interact with databases via natural language. Recent advances rely on large closed-source models like GPT-4 that present challenges in accessibility, privacy, and latency. To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata. Our sample critiquing model evaluates multiple outputs simultaneously, achieving state-of-the-art performance compared to other open-source models while remaining competitive with larger models at a much lower cost.
Workshop Oral!
NeurIPS 2024, Table Representation Learning Workshop - Vancouver, BC
[Paper][Code]
Data-Efficient Multimodal Fusion on a Single GPU
We propose FuseMix, a multimodal augmentation scheme that operates on latent spaces of arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal alignment, we achieve competitive performance and in certain cases outperform state-of-the art methods in both image-text and audio-text retrieval, with orders of magnitude less compute and data: for example, we outperform CLIP on the Flickr30K text-to-image retrieval task with ∼600× fewer GPU days and ∼80× fewer image-text pairs.
CVPR 2024 - Seattle, WA
[Paper][Code]
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. The conditioning can be highly arbitrary, and requires only a pre-trained auxiliary model. We show how to turn unconditional models into class-conditional ones with the help of a classifier, and also into text-to-image models by leveraging CLIP. TR0N learns a lightweight stochastic mapping which translates between the space of conditions and the latent space of the generative model. The translated latent samples are then further improved upon through Langevin dynamics, enabling us to obtain higher-quality data samples. TR0N requires no training data nor fine-tuning, yet can achieve a zero-shot FID of $10.9$ on MS-COCO, outperforming competing alternatives not only on this metric, but also in sampling speed.
ICML 2023 - Honolulu, HI
[Paper][Code]
XPool: Cross-Modal Language-Video Attention for Text-Video Retrieval
We propose a cross-modal attention model called XPool that reasons between a text and the frames of a video. Our core mechanism is a scaled dot product attention for a text to attend to its most semantically similar frames. We then generate an aggregated video representation conditioned on the text’s attention weights over the frames. We evaluate our method on three benchmark datasets of MSRVTT, MSVD and LSMDC, achieving new state-of-the-art results by up to 8% in relative improvement in Recall@1.
CVPR 2022 - New Orleans, LA
[Paper][Code]
Weakly Supervised Action Selection Learning in Video
We propose Action Selection Learning (ASL), an approach to temporally localize actions in untrimmed videos using video level class labels as weak supervision. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 12.3% and 5.7% relative improvement respectively.
CVPR 2021 - Nashville, TN
[Paper]
Cross-Class Relevance Learning for Information Fusion in Temporal Concept Localization
We present a framework for temporal concept localization and hold state-of-the-art results on Youtube-8M dataset.
ICCV 2019 - The 3rd Workshop on YouTube-8M Large-Scale Video Understanding - Seoul, South Korea
[Paper][Workshop]
Guided Similarity Separation for Image Retrieval
We propose a graph convolutional network to directly encode neighbour information into image descriptors for image retrieval. We further leverage ideas from clustering and manifold learning, and introduce an unsupervised loss based on pairwise separation of image similarities.
NeurIPS 2019 - Vancouver, BC
[Paper]
Semi-Supervised Traversal in Image Retrieval
A novel semi-supervised graph traversal extention to Explore-Exploit Graph Traversal (EGT) for image retrieval.
CVPR 2019 - Landmark Recognition Workshop - Long Beach, CA
[Paper][Workshop]
Online algorithm for adaptive learning rate
Online algorithm for learning the learning rate in stochastic gradient descent using first order and second order approximation methods and studying its effects on convex and non-convex machine learning problems.
[arXiv][GitHub]

Text-to-Image-to-Text translation using cycle consistent adversarial networks
Improving text to image synthesis using cycle consistency.
[arXiv][GitHub]

Ground Truth Caption	Generated Image	Generated Caption
the flower has long yellow petals that are thin and a yellow stamen		this flower has petals that are yellow and very thin
there are many long and narrow floppy pink petals surrounding many red stamen and a green stigma on this flower		this flower has petals that are red with pointed tips

ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs
A comparative study on gradient estimators for sequence generation using GANs
[arXiv][GitHub]

Brief Bio

Research

Presentations

Resume

Satya Krishna Gorti

Brief Bio

Research

Presentations

Resume

Social Links