Synthetic data for training deep neural networks is increasingly used in computer vision. Different strategies, such as domain randomization or domain adaptation, exist to bridge the gap between synthetic training data and the real application. Despite recent progress and gain in knowledge in this area, the following question remains: How much adjustment to reality is required and which degree of randomization is useful for transferring precise object detectors to real use cases? In this paper, we present a detailed study with more than 100 datasets and 2,700 trained convolutional neural networks (CNNs), comparing the influence of different degrees of manual optimization (scene engineering) and domain randomization techniques. To distinguish precision and robustness, the trained object detectors are evaluated on different domain shifts with respect to scene environment and object appearance. Using the example of robot-based industrial item picking, we show that the scene context and structure as well as realistic textures are crucial for the simulation to reality transfer. The combination with well-chosen randomization parameters, especially lighting and distractor objects, improves the robustness of the CNNs at higher domain shifts.
