Satya Krishna Gorti

MSc. in Applied Computing

satyag [at] cs [dot] toronto [dot] edu

Brief Bio

I graduated with MSc. in Applied Computing from University of Toronto. My main interests lie in the area of Machine Learning, Deep Learning and Computer Vision. I am currently a Sr. Machine Learning Research Scientist at Layer6 AI where I work on large-scale image retrieval and video understanding. Previous to this, I was a Research Intern at Uber ATG working on multi-object tracking using LIDAR and RADAR sensors for self-driving vehicles.

Courses taken at University of Toronto

Research

  • XPool: Cross-Modal Language-Video Attention for Text-Video Retrieval
    We propose a cross-modal attention model called XPool that reasons between a text and the frames of a video. Our core mechanism is a scaled dot product attention for a text to attend to its most semantically similar frames. We then generate an aggregated video representation conditioned on the text’s attention weights over the frames. We evaluate our method on three benchmark datasets of MSRVTT, MSVD and LSMDC, achieving new state-of-the-art results by up to 8% in relative improvement in Recall@1.
    CVPR 2022 - New Orleans, LA
    [Paper][Code]
  • Weakly Supervised Action Selection Learning in Video
    We propose Action Selection Learning (ASL), an approach to temporally localize actions in untrimmed videos using video level class labels as weak supervision. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 12.3% and 5.7% relative improvement respectively.
    CVPR 2021 - Nashville, TN
    [Paper]
  • Cross-Class Relevance Learning for Information Fusion in Temporal Concept Localization
    We present a framework for temporal concept localization and hold state-of-the-art results on Youtube-8M dataset.
    ICCV 2019 - The 3rd Workshop on YouTube-8M Large-Scale Video Understanding - Seoul, South Korea
    [Paper][Workshop]
  • Guided Similarity Separation for Image Retrieval
    We propose a graph convolutional network to directly encode neighbour information into image descriptors for image retrieval. We further leverage ideas from clustering and manifold learning, and introduce an unsupervised loss based on pairwise separation of image similarities.
    NeurIPS 2019 - Vancouver, BC
    [Paper]
  • Semi-Supervised Traversal in Image Retrieval
    A novel semi-supervised graph traversal extention to Explore-Exploit Graph Traversal (EGT) for image retrieval.
    CVPR 2019 - Landmark Recognition Workshop - Long Beach, CA
    [Paper][Workshop]
  • Online algorithm for adaptive learning rate
    Online algorithm for learning the learning rate in stochastic gradient descent using first order and second order approximation methods and studying its effects on convex and non-convex machine learning problems.
    [arXiv][GitHub]
  • Text-to-Image-to-Text translation using cycle consistent adversarial networks
    Improving text to image synthesis using cycle consistency.
    [arXiv][GitHub]
    Ground Truth Caption Generated Image Generated Caption
    the flower has long yellow petals that are thin and a yellow stamen this flower has petals that are yellow and very thin
    there are many long and narrow floppy pink petals surrounding many red stamen and a green stigma on this flower this flower has petals that are red with pointed tips
  • ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs
    A comparative study on gradient estimators for sequence generation using GANs
    [arXiv][GitHub]

Presentations

    • TMLS 2019, Toronto, Ontario - Temporal Concept Localization on Youtube-8M [Video]
    • ICCV 2019, Seoul, South Korea - Youtube-8m 1st Place challenge presentation
    • CVPR 2019, Long Beach, CA - Semi-supervised EGT for landmark retrieval [Slides]
    • Review of GANs for Sequences of Discrete Elements with Gumbel-Softmax Distribution [Slides]

    Resume

    You can find my full resume here.