Sim-to-Real Generalization of Computer Vision with Domain Adaptation, Style Randomization, and Multi-Task Learning
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
In recent years, supervised deep learning has been very successful in computer vision
applications. This success comes at the cost of a large amount of labeled data required to
train artificial neural networks. However, manual labeling can be very expensive.
Semantic segmentation, the task of pixel-wise classification of images, requires
painstaking pixel-level annotation. The particular difficulty of manual labeling for
semantic segmentation motivates research into alternatives.
One solution is to use simulations, which can generate semantic segmentation ground
truth automatically. Unfortunately, in practice, simulation-trained models have been
shown to generalize poorly to the real world.
This work considers a simulation environment, used to train models for semantic
segmentation, and real-world environments to evaluate their generalization. Three
different approaches are studied to improve generalization from simulation to reality.
Firstly, using a generative image-to-image model to make the simulation look realistic.
Secondly, using style randomization, a form of data augmentation using style transfer, to
make the model more robust to change in visual style. Thirdly, using depth estimation as
an auxiliary task to enable learning of geometry.
Our results show that the first method, image-to-image translation, improves performance
on environments similar to the simulation. By applying style randomization, the
trained models generalized better to completely new environments. The additional depth
estimation task did not improve performance, except by a small amount when combined
with style randomization.
Description
Keywords
Computer Vision, Machine Learning, Deep Learning, Domain Adaptation, Data Augmentation, Multi-Task Learning, Convolutional Neural Networks