Rishit Dagli
CS, Math Undergrad at UofT
I am very interested in learning algorithms, computer vision, graphics, learning theory, and math (number theory and geometry).
I am currently on a break from my undergrad and I am working at NVIDIA on the intersection of AI, vision, and graphics research. After I switched boats to research I interned at Qualcomm AI Research in 2024 with Roland Memisevic (on VLMs) and Guillaume Berger and at Civo in 2023 with Josh Mesout (on improving inference performance of multimodal models).
In a past life, I used to work on software engineering and building robot hardware. I used to contribute extensively to/maintain some popular open-source projects which can be found on my github and software.
I am looking for a PhD position starting Fall 2026, please reach out if you can help in any way.
news
| Oct 3, 2024 | We released a new 7B VLM and large-scale dataset for video understanding. Arxiv. Dataset. (code release soon, in the hands of corporate overlords) |
|---|---|
| Jun 18, 2023 | We released the first vision (images and video)-spatial audio model as a step towards complete generation. Arxiv. Code and Web Demo. |
selected publications
-
-
Can Vision-Language Models Answer Face to Face Questions in the Real-World?arXiv 2025 (* joint first authors) -
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial SoundSIGGRAPH Posters 2025