Abstract
Synthetic data for training deep neural networks is increasingly used in computer vision. Different strategies, such as domain randomization or domain adaptation, exist to bridge the gap between synthetic training data and the real application. Despite recent progress and gain in knowledge in this area, the following question remains: How much adjustment to reality is required and which degree of randomization is useful for transferring precise object detectors to real use cases? In this paper, we present a detailed study with more than 100 datasets and 2,700 trained convolutional neural networks (CNNs), comparing the influence of different degrees of manual optimization (scene engineering) and domain randomization techniques. To distinguish precision and robustness, the trained object detectors are evaluated on different domain shifts with respect to scene environment and object appearance. Using the example of robot-based industrial item picking, we show that the scene context and structure as well as realistic textures are crucial for the simulation to reality transfer. The combination with well-chosen randomization parameters, especially lighting and distractor objects, improves the robustness of the CNNs at higher domain shifts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255. IEEE, Piscataway, NJ (2009)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. pp. 2672–2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017)
Weisenboehler, M., Wurll, C.: Automated item picking for fashion articles using deep learning. In: ISR 2020; 52th International Symposium on Robotics, pp. 1–8 (2020)
Tobin, J., Biewald, L., Duan, R., Andrychowicz, M., Handa, A., Kumar, V., McGrew, B., Ray, A., Schneider, J., Welinder, P., Zaremba, W., Abbeel, P.: Domain randomization and generative models for robotic grasping. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3482–3489 (2018)
Fang, K., Bai, Y., Hinterstoisser, S., Savarese, S., Kalakrishnan, M.: Multi-task domain adaptation for deep learning of instance grasping from simulation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 3516–3523 (2018)
Zuo, G., Zhang, C., Liu, H., Gong, D.: Low-quality rendering-driven 6d object pose estimation from single RGB image. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
James, S., Davison, A.J., Johns, E.: Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. arXiv:1707.02267 (2017)
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., Bousmalis, K.: Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12619–12629 (2019)
Ho, D., Rao, K., Xu, Z., Jang, E., Khansari, M., Bai, Y.: Retinagan: an object-aware approach to sim-to-real transfer. arXiv:2011.03148 (2020)
Sadeghi, F., Levine, S.: Cad2rl: Real single-image flight without a single real image. In: Amato, N., Systems, R.S.A. (eds.) Robotics: Science and System XIII. Robotics: Science and Systems Online Proceedings (2017)
Sadeghi, F., Toshev, A., Jang, E., Levine, S.: Sim2real viewpoint invariant visual servoing by recurrent control. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp. 4691–4699. IEEE, Piscataway, NJ (2018)
Georgakis, G., Mousavian, A., Berg, A., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. In: Amato, N., Systems, R.S.a. (eds.) Robotics: Science and System XIII. Robotics: Science and Systems Online Proceedings (2017)
Hinterstoisser, S., Lepetit, V., Wohlhart, P., Konolige, K.: On Pre-trained Image Features and Synthetic Images for Deep Learning. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 682–697. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_42
Hinterstoisser, S., Pauly, O., Heibel, H., Martina, M., Bokeloh, M.: An annotation saved is an annotation earned: Using fully synthetic training for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 2787–2796 (2019)
Dehban, A., Borrego, J., Figueiredo, R., Moreno, P., Bernardino, A., Santos-Victor, J.: The impact of domain randomization on object detection: a case study on parametric shapes and synthetic textures. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2593–2600. IEEE, [Piscataway, NJ] (2019)
Lee, Y., Chuang, C., Lai, S., Jhang, Z.: Automatic generation of photorealistic training data for detection of industrial components. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2751–2755 (2019)
Noh, S., Back, S., Kang, R., Shin, S., Lee, K.: Automatic detection and identification of fasteners with simple visual calibration using synthetic data. In: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), vol. 1, pp. 38–44 (2020)
Magaña, A., Wu, H., Bauer, P., Reinhart, G.: Posenetwork: pipeline for the automated generation of synthetic training data and cnn for object detection, segmentation, and orientation estimation. In: 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), vol. 1, pp. 587–594 (2020)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtualworlds as proxy for multi-object tracking analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 4340–4349. IEEE, Piscataway, NJ (2016)
Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., Vasudevan, R.: Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 746–753. IEEE Computer Society, Danvers (MA) (2017)
Tsirikoglou, A., Kronander, J., Wrenninge, M., Unger, J.: Procedural modeling and physically based rendering for synthetic data generation in automotive applications. arXiv:1710.06270 (2017)
Abu Alhaija, H., Karthik Mustikovela, S., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126(9), 961–972 (2018)
Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2018). pp. 1082–10828. IEEE, Piscataway, NJ (2018)
Prakash, A., Boochoon, S., Brophy, M., Acuna, D., Cameracci, E., State, G., Shapira, O., Birchfield, S.: Structured domain randomization: bridging the reality gap by context-aware synthetic data. In: 2019 IEEE International Conference on Robotics and Automation (ICRA), pp. 7249–7255 (2019)
Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3d models. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1278–1286. IEEE (2015)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694. IEEE (2015)
Rudorfer, M., Neumann, L., Krüger, J.: Towards learning 3d object detection and 6d pose estimation from synthetic data. In: 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 1540–1543 (2019)
Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1310–1319 (2017)
Ruiz, N., Schulter, S., Chandraker, M.: Learning to simulate. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HJgkx2Aqt7
Khirodkar, R., Yoo, D., Kitani, K.M.: Vadra: visual adversarial domain randomization and augmentation. arXiv:1812.00491 (2018)
Kar, A., Prakash, A., Liu, M.Y., Cameracci, E., Yuan, J., Rusiniak, M., Acuna, D., Torralba, A., Fidler, S.: Meta-sim: learning to generate synthetic datasets. In: 2019 International Conference on Computer Vision. pp. 4550–4559. IEEE, Piscataway, NJ (2019)
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K., Levine, S., Vanhoucke, V.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4243–4250. IEEE (2018)
Hoffman, J., Tzeng, E., Park, T., Zhu, J.Y., Isola, P., Saenko, K., Efros, A., Darrell, T.: Cycada: cycle-consistent adversarial domain adaptation. In: International Conference on Machine Learning, pp. 1989–1998 (2018). https://proceedings.mlr.press/v80/hoffman18a.html
Nogues, F.C., Huie, A., Dasgupta, S.: Object detection using domain randomization and generative adversarial refinement of synthetic images. arXiv:1805.11778 (2018)
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. Proc. VLDB Endow. 11(10), 1071–1083 (2018). https://doi.org/10.14778/3231751.3231757
Lin, Y., Suzuki, K., Takeda, H., Nakamura, K.: Generating synthetic training data for object detection using multi-task generative adversarial networks. ISPRS Ann. Photogrammetry, Rem. Sens. Spat. Inf. Sci. 2, 443–449 (2020)
Rao, K., Harris, C., Irpan, A., Levine, S., Ibarz, J., Khansari, M.: Rl-cyclegan: reinforcement learning aware simulation-to-real. In: Proceedings, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11154–11163. IEEE Computer Society, Conference Publishing Services, Los Alamitos, California (2020)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Dörr, L., Brandt, F., Meyer, A., Pouls, M.: Lean training data generation for planar object detection models in unsteady logistics contexts. In: 2019 18th IEEE International Conference On Machine Learning and Applications (ICMLA), pp. 329–334 (2019)
Sarkar, K., Varanasi, K., Stricker, D.: Trained 3d models for CNN based object recognition. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 130–137. SCITEPRESS - Science and Technology Publications (2017)
Khirodkar, R., Yoo, D., Kitani, K.: Domain randomization for scene-specific car detection and pose estimation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1932–1940, IEEE (2019)
Grün, S., Höninger, S., Scheikl, P.M., Hein, B., Kröger, T.: Evaluation of domain randomization techniques for transfer learning. In: 2019 19th International Conference on Advanced Robotics (ICAR), pp. 481–486 (2019)
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Staff, I. (ed.) 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297. IEEE, Piscataway, NJ (July (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pp. 4278–4284. AAAI’17, AAAI Press (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018). pp. 4510–4520. IEEE, Piscataway, NJ (2018)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Staff, I. (ed.) 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944. IEEE, Piscataway, NJ (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Weisenböhler, M., Hein, B., Wurll, C. (2023). On Scene Engineering and Domain Randomization: Synthetic Data for Industrial Item Picking. In: Petrovic, I., Menegatti, E., Marković, I. (eds) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-031-22216-0_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-22216-0_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22215-3
Online ISBN: 978-3-031-22216-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)