Visual Learning using Synthetic Data

Abstract

Deep learning has revolutionized several fields, including computer vision. Yet, creating computer vision datasets to train deep neural networks using supervised learning is time-consuming and expensive, typically involving collecting and annotating large amounts of sensor data. This collection and annotation process becomes more complex and costly if the task demands risky events, highly precise annotation, or expert knowledge. Using synthetic data generated with graphics simulators for training deep neural networks is an alternative solution. We can obtain an infinite number of perfectly labeled training samples with synthetic data, including long-tail rare events and safety-critical scenarios, and those humans might annotate noisily. However, the generalization performance in complex real-world scenes of synthetic-data-trained models frequently degrades. This thesis presents a collection of work toward overcoming this issue. Our core idea interprets learning from data generated with graphics simulators as a domain adaptation problem and develops novel adaptation techniques that improve the transfer performance in real-world datasets. The underpinning of our proposed algorithms lies in aligning synthetic and real-world data distributions using a novel adversarial framework that minimizes a chosen f-divergence and is both practical and theoretically motivated. We also show how to deal with complications arising in this framework, such as complex optimization dynamics and label-distribution mismatch. We demonstrate the effectiveness of our methods in both controlled settings using standard datasets and in more practical scenarios using an open-source self-driving simulator with multi-sensor data. Finally, I conclude the thesis by summarizing the key contributions of my work and discussing directions for future research.

Type
Publication
PhD Thesis