Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3460319.3464802acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Automatic test suite generation for key-points detection DNNs using many-objective search (experience paper)

Published: 11 July 2021 Publication History

Abstract

Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time---where each individual key-point may be critical in the targeted application---and images can vary a great deal according to many factors.
In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.

References

[1]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 1016–1026. isbn:9781450356381 https://doi.org/10.1145/3180155.3180160
[2]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Autonomous Cars for Feature Interaction Failures Using Many-Objective Search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). Association for Computing Machinery, New York, NY, USA. 143–154. isbn:9781450359375 https://doi.org/10.1145/3238147.3238192
[3]
Raja Ben Abdessalem, Annibale Panichella, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing Autonomous Cars for Feature Interaction Failures Using Many-Objective Search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). Association for Computing Machinery, New York, NY, USA. 143–154. isbn:9781450359375 https://doi.org/10.1145/3238147.3238192
[4]
Mykhaylo Andriluka, Leonid Pishchulin, Peter V. Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, Columbus, OH. 3686–3693. https://doi.org/10.1109/CVPR.2014.471
[5]
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing Advanced Driver Assistance Systems Using Multi-Objective Search and Neural Networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). Association for Computing Machinery, New York, NY, USA. 63–74. isbn:9781450338455 https://doi.org/10.1145/2970276.2970311
[6]
Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth. isbn:0-534-98053-8
[7]
Leyde Briceno and Gunther Paul. 2019. MakeHuman: A Review of the Modelling Framework. In Proceedings of the 20th Congress of the International Ergonomics Association (IEA 2018), Sebastiano Bagnara, Riccardo Tartaglia, Sara Albolino, Thomas Alexander, and Yushi Fujita (Eds.). Springer International Publishing, Cham. 224–232.
[8]
Yu Chen, Jian Yang, and Jianjun Qian. 2017. Recurrent neural network for facial landmark detection. Neurocomputing, 219 (2017), 26–38.
[9]
Guillermo Campos Ciro, Frédéric Dugardin, Farouk Yalaoui, and Russell Kelly. 2016. A NSGA-II and NSGA-III comparison for solving an open shop scheduling problem with resource constraints. IFAC-PapersOnLine, 49, 12 (2016), 1272–1277.
[10]
Blender Online Community. 2018. Blender - a 3D modelling and rendering package. http://www.blender.org
[11]
Makehumancommunity Online Community. 2020. Makehumancommunity.org. www.Makehumancommunity.org
[12]
Kalyanmoy Deb and Himanshu Jain. 2013. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE transactions on evolutionary computation, 18, 4 (2013), 577–601.
[13]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6, 2 (2002), 182–197.
[14]
Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Jianjun Zhao, and Yang Liu. 2018. DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems. CoRR, abs/1812.05339 (2018), arxiv:1812.05339. arxiv:1812.05339
[15]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-Oriented Software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 416–419. isbn:9781450304436 https://doi.org/10.1145/2025113.2025179
[16]
Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019. Automatically Testing Self-Driving Cars with Search-Based Procedural Content Generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). Association for Computing Machinery, New York, NY, USA. 318–328. isbn:9781450362245 https://doi.org/10.1145/3293882.3330566
[17]
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: Differential Fuzzing Testing of Deep Learning Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 739–743. isbn:9781450355735 https://doi.org/10.1145/3236024.3264835
[18]
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel C. Briand. 2020. Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). 85–95. https://doi.org/10.1109/ICST46399.2020.00019
[19]
Yunfei Hou, Yunjie Zhao, Aditya Wagh, Longfei Zhang, Chunming Qiao, Kevin F Hulme, Changxu Wu, Adel W Sadek, and Xuejie Liu. 2015. Simulation-based testing and evaluation tools for transportation cyber–physical systems. IEEE Transactions on Vehicular Technology, 65, 3 (2015), 1098–1108.
[20]
Di Huang, Renke Zhang, Yuan Yin, Yiding Wang, and Yunhong Wang. 2017. Local feature approach to dorsal hand vein recognition by centroid-based circular key-point grid and fine-grained matching. Image and Vision Computing, 58 (2017), 266–277.
[21]
Yichao Huang, Xiaorui Liu, Lianwen Jin, and Xin Zhang. 2015. Deepfinger: A cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Kowloon. 2944–2949.
[22]
Rateb Jabbar, Khalifa Al-Khalifa, Mohamed Kharbeche, Wael Alhajyaseen, Mohsen Jafari, and Shan Jiang. 2018. Real-time driver drowsiness detection for android application using deep neural networks techniques. Procedia computer science, 130 (2018), 400–407.
[23]
Joshua Knowles and David Corne. 2007. Quantifying the Effects of Objective Space Dimension in Evolutionary Multiobjective Optimization. In Evolutionary Multi-Criterion Optimization, Shigeru Obayashi, Kalyanmoy Deb, Carlo Poloni, Tomoyuki Hiroyasu, and Tadahiko Murata (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 757–771. isbn:978-3-540-70928-2
[24]
Zelun Kong, Junfeng Guo, Ang Li, and Cong Liu. 2020. PhysGAN: Generating Physical-World-Resilient Adversarial Examples for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE, Seattle, WA, USA. 14242–14251. https://doi.org/10.1109/CVPR42600.2020.01426
[25]
J. Li and E. Y. Lam. 2015. Facial expression recognition using deep neural networks. In 2015 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, Macau. 1–6. https://doi.org/10.1109/IST.2015.7294547
[26]
Zheng Li, Mark Harman, and Robert M Hierons. 2007. Search algorithms for regression test case prioritization. IEEE Transactions on software engineering, 33, 4 (2007), 225–237.
[27]
Junyu Lin, Lei Xu, Yingqi Liu, and Xiangyu Zhang. 2020. Black-box adversarial sample generation based on differential evolution. Journal of Systems and Software, 170 (2020), 110767.
[28]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50–60.
[29]
Phil McMinn. 2011. Search-Based Software Testing: Past, Present and Future. In Proceedings of the 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops (ICSTW ’11). IEEE Computer Society, USA. 153–163. isbn:9780769543451 https://doi.org/10.1109/ICSTW.2011.100
[30]
Claudio Menghi, Shiva Nejati, Lionel Briand, and Yago Isasi Parache. 2020. Approximation-Refinement Testing of Compute-Intensive Cyber-Physical Models: An Approach Based on System Identification. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 372–384. isbn:9781450371216 https://doi.org/10.1145/3377811.3380370
[31]
W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. 2019. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116, 44 (2019), 22071–22080.
[32]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham. 483–499. isbn:978-3-319-46484-8
[33]
A. Panichella, F. M. Kifetew, and P. Tonella. 2015. Reformulating Branch Coverage as a Many-Objective Optimization Problem. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). IEEE, Graz, Austria. 1–10. https://doi.org/10.1109/ICST.2015.7102604
[34]
Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2017. Automated test case generation as a many-objective optimisation problem with dynamic selection of the targets. IEEE Transactions on Software Engineering, 44, 2 (2017), 122–158.
[35]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical Black-Box Attacks against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ’17). Association for Computing Machinery, New York, NY, USA. 506–519. isbn:9781450349444 https://doi.org/10.1145/3052973.3053009
[36]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. GetMobile: Mobile Comp. and Comm., 22, 3, 36–38. issn:2375-0529 https://doi.org/10.1145/3308755.3308767
[37]
Vincenzo Riccio and Paolo Tonella. 2020. Model-Based Exploration of the Frontier of Behaviours for Deep Learning System Testing. Association for Computing Machinery, New York, NY, USA. 876–888. isbn:9781450370431 https://doi.org/10.1145/3368089.3409730
[38]
Andras Rozsa, Manuel Günther, Ethan M Rudd, and Terrance E Boult. 2019. Facial attributes: Accuracy and adversarial robustness. Pattern Recognition Letters, 124 (2019), 100–108.
[39]
Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller. 2017. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arxiv:1708.08296.
[40]
B. Sapp and B. Taskar. 2013. MODEC: Multimodal Decomposable Models for Human Pose Estimation. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Portland, OR, USA. 3674–3681. https://doi.org/10.1109/CVPR.2013.471
[41]
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI. 1145–1153.
[42]
Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, and Qingsheng Yuan. 2020. KPNet: Towards Minimal Face Detector. arxiv:2003.07543.
[43]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 303–314. isbn:9781450356381 https://doi.org/10.1145/3180155.3180220
[44]
Cumhur Erkan Tuncali, Georgios Fainekos, Hisahiro Ito, and James Kapinski. 2018. Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, Changshu, China. 1555–1562.
[45]
András Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 arxiv:https://doi.org/10.3102/10769986025002101.
[46]
Xinyao Wang, Liefeng Bo, and Fuxin Li. 2019. Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression. Seoul, Korea (South). 6970–6980. https://doi.org/10.1109/ICCV.2019.00707
[47]
Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-Guided Black-Box Safety Testing of Deep Neural Networks. In Tools and Algorithms for the Construction and Analysis of Systems, Dirk Beyer and Marieke Huisman (Eds.). Springer International Publishing, Cham. 408–426. isbn:978-3-319-89960-2
[48]
Ian H Witten and Eibe Frank. 2002. Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record, 31, 1 (2002), 76–77.
[49]
L. Wolf, T. Hassner, and I. Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In CVPR 2011. IEEE, Providence, RI. 529–534. https://doi.org/10.1109/CVPR.2011.5995566
[50]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). Association for Computing Machinery, New York, NY, USA. 146–157. isbn:9781450362245 https://doi.org/10.1145/3293882.3330579
[51]
Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 30, 9 (2019), 2805–2824.
[52]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Montpellier, France. 132–142.
[53]
Shutong Zhang and Chenyue Meng. 2016. Facial keypoints detection using neural network. Stanford Report, 1.
[54]
Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 347–358. isbn:9781450371216 https://doi.org/10.1145/3377811.3380422

Cited By

View all
  • (2024)Supporting Safety Analysis of Image-processing DNNs through Clustering-based ApproachesACM Transactions on Software Engineering and Methodology10.1145/364367133:5(1-48)Online publication date: 3-Jun-2024
  • (2024)ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.338236550:6(1340-1359)Online publication date: 29-Mar-2024
  • (2024)Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.336661350:4(854-883)Online publication date: 16-Feb-2024
  • Show More Cited By

Index Terms

  1. Automatic test suite generation for key-points detection DNNs using many-objective search (experience paper)

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2021
      685 pages
      ISBN:9781450384599
      DOI:10.1145/3460319
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Key-point detection
      2. deep neural network
      3. many-objective search algorithm
      4. software testing

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ISSTA '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 58 of 213 submissions, 27%

      Upcoming Conference

      ISSTA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)155
      • Downloads (Last 6 weeks)24
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Supporting Safety Analysis of Image-processing DNNs through Clustering-based ApproachesACM Transactions on Software Engineering and Methodology10.1145/364367133:5(1-48)Online publication date: 3-Jun-2024
      • (2024)ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.338236550:6(1340-1359)Online publication date: 29-Mar-2024
      • (2024)Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.336661350:4(854-883)Online publication date: 16-Feb-2024
      • (2023)Software Test Case Generation Tools and Techniques: A ReviewInternational Journal of Mathematical, Engineering and Management Sciences10.33889/IJMEMS.2023.8.2.0188:2(293-315)Online publication date: 1-Apr-2023
      • (2023)Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network TestingACM Transactions on Software Engineering and Methodology10.1145/357604032:3(1-48)Online publication date: 26-Apr-2023
      • (2023)Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-based Safety-critical SystemsACM Transactions on Software Engineering and Methodology10.1145/356993532:4(1-47)Online publication date: 27-May-2023
      • (2023)Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and ClusteringACM Transactions on Software Engineering and Methodology10.1145/355027132:3(1-40)Online publication date: 26-Apr-2023
      • (2023)A Search-Based Testing Approach for Deep Reinforcement Learning AgentsIEEE Transactions on Software Engineering10.1109/TSE.2023.326980449:7(3715-3735)Online publication date: 1-Jul-2023
      • (2023)Ergo, SMIRK is safe: a safety case for a machine learning component in a pedestrian automatic emergency brake systemSoftware Quality Journal10.1007/s11219-022-09613-131:2(335-403)Online publication date: 1-Mar-2023
      • (2023)Machine learning testing in an ADAS case study using simulation‐integrated bio‐inspired search‐based testingJournal of Software: Evolution and Process10.1002/smr.259136:5Online publication date: 21-Jun-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media