noel

Noël Vouitsis

 

About

I received both my Master of Science (MSc.) in Applied Computing and my Bachelor of Applied Science (BASc.) in Computer Engineering at the University of Toronto with a focus in machine learning. I am currently a Machine Learning Research Scientist at Layer 6 AI where I mainly do research on multimodal representation learning and generative modeling. My main interests are in deep learning, representation learning, multimodal learning, computer vision and generative modeling.

Publications

Refereed

(*) denotes equal contribution

  • Conformal Prediction Sets Improve Human Decision Making
    In ICLR 2024 Bridging the Gap Between Practice and Theory in Deep Learning Workshop
    Jesse C. Cresswell, Yi Sui, Bhargava Kumar, Noël Vouitsis

    We study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.

  • Data-Efficient Multimodal Fusion on a Single GPU
    In CVPR 2024 Highlight [top 3%]
    Noël Vouitsis*, Zhaoyan Liu*, Satya Krishna Gorti*, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, Maksims Volkovs

    We propose FuseMix, a multimodal augmentation scheme that operates on the latent spaces of arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal alignment, we achieve competitive performance – and in certain cases outperform state-of-the art methods – in both image-text and audio-text retrieval, with orders of magnitude less compute and data: for example, we outperform CLIP on the Flickr30K text-to-image retrieval task with ∼600× fewer GPU days and ∼80× fewer image-text pairs. Additionally, we show how our method can be applied to convert pre-trained text-to-image generative models into audio-to-image ones.

  • TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
    In ICML 2023
    Zhaoyan Liu*, Noël Vouitsis*, Satya Krishna Gorti, Jimmy Ba, Gabriel Loaiza-Ganem

    TR0N is a highly general framework to add any type of conditioning (e.g. classes, free-form text, images) to pre-trained unconditional generative models (e.g. GANs, VAEs). TR0N is simple, efficient and requires no provided dataset to train (zero-shot). We show impressive quantitative and qualitative results across tasks, and are highly competitive with DALL·E 2 in terms of FID on MS-COCO.

  • X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
    In CVPR 2022
    Satya Krishna Gorti*, Noël Vouitsis*, Junwei Ma*, Keyvan Golestan, Maksims Volkovs, Animesh Garg, Guangwei Yu

    X-Pool is a cross-modal attention model that reasons between a text and the frames of a video. Our core mechanism is a scaled dot product attention for a text to attend to its most semantically similar frames. We then generate an aggregated video representation conditioned on the text’s attention weights over the frames. We evaluate our method on three benchmark datasets of MSRVTT, MSVD and LSMDC, achieving new state-of-the-art results by up to 12% in relative improvement in Recall@1.