Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ASE.2019.00078acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

A study of oracle approximations in testing deep learning libraries

Published: 07 February 2020 Publication History

Abstract

Due to the increasing popularity of deep learning (DL) applications, testing DL libraries is becoming more and more important. Different from testing general software, for which output is often asserted definitely (e.g., an output is compared with an oracle for equality), testing deep learning libraries often requires to perform oracle approximations, i.e., the output is allowed to be within a restricted range of the oracle. However, oracle approximation practices have not been studied in prior empirical work that focuses on traditional testing practices. The prevalence, common practices, maintenance and evolution challenges of oracle approximations remain unknown in literature.
In this work, we study oracle approximation assertions implemented to test four popular DL libraries. Our study shows that there exists a non-negligible portion of assertions that leverage oracle approximation in testing DL libraries. Also, we identify the common sources of oracles on which oracle approximations are being performed through a comprehensive manual study. Moreover, we find that developers frequently modify code related to oracle approximations, i.e., using a different approximation API, modifying the oracle or the output from the code under test, and using a different approximation threshold. Last, we performed an in-depth study to understand the reasons behind the evolution of oracle approximation assertions. Our findings reveal important maintenance challenges that developers may face when maintaining oracle approximation practices as code evolves in DL libraries.

References

[1]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.
[2]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," 2017.
[3]
Theano Development Team, "Theano: A Python framework for fast computation of mathematical expressions," arXiv e-prints, vol. abs/1605.02688, May 2016.
[4]
F. Chollet et al., "Keras," 2015.
[5]
H. Pham, T. Lutellier, W. Qi, and L. Tan, "Cradle: Cross-backend validation to detect and localize bugs in deep learning libraries," in Proceedings of the 40th International Conference on Software Engineering, 2019.
[6]
Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, "An empirical analysis of flaky tests," in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, (New York, NY, USA), pp. 643--653, ACM, 2014.
[7]
L. S. Pinto, S. Sinha, and A. Orso, "Understanding myths and realities of test-suite evolution," in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, (New York, NY, USA), pp. 33:1--33:11, ACM, 2012.
[8]
K. Atkinson and W. Han, "Error and computer arithmetic," in Elementary Numerical Analysis, ch. 2, pp. 33--71, WILEY, 3 ed., 2004.
[9]
"Tensorflow architecture," 2018.
[10]
E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, "The oracle problem in software testing: A survey," IEEE Transactions on Software Engineering, vol. 41, pp. 507--525, May 2015.
[11]
Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao, "Coverage-directed differential testing of jvm implementations," in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '16, (New York, NY, USA), pp. 85--99, ACM, 2016.
[12]
U. Kanewala and J. M. Bieman, "Testing scientific software: A systematic literature review," Information and Software Technology, vol. 56, no. 10, pp. 1219 -- 1232, 2014.
[13]
J. E. Hannay, C. MacLeod, J. Singer, H. P. Langtangen, D. Pfahl, and G. Wilson, "How do scientists develop and use scientific software?," in Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, SECSE '09, (Washington, DC, USA), pp. 1--8, IEEE Computer Society, 2009.
[14]
J. C. Carver, R. P. Kendall, S. E. Squires, and D. E. Post, "Software development environments for scientific and engineering software: A series of case studies," in Proceedings of the 29th International Conference on Software Engineering, ICSE '07, (Washington, DC, USA), pp. 550--559, IEEE Computer Society, 2007.
[15]
D. Hook and D. Kelly, "Testing for trustworthiness in scientific software," in Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, SECSE '09, (Washington, DC, USA), pp. 59--64, IEEE Computer Society, 2009.
[16]
T. Y. Chen, Jianqiang Feng, and T. H. Tse, "Metamorphic testing of programs on partial differential equations: a case study," in Proceedings 26th Annual International Computer Software and Applications, pp. 327--333, Aug 2002.
[17]
T. Y. Chen, J. W. Ho, H. Liu, and X. Xie, "An innovative approach for testing bioinformatics programs using metamorphic testing," no. 1, 2009.
[18]
U. Kanewala and J. M. Bieman, "Techniques for testing scientific programs without an oracle," in Proceedings of the 5th International Workshop on Software Engineering for Computational Science and Engineering, SE-CSE '13, (Piscataway, NJ, USA), pp. 48--57, IEEE Press, 2013.
[19]
A. Vahabzadeh, A. M. Fard, and A. Mesbah, "An empirical study of bugs in test code," in 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 101--110, Sep. 2015.
[20]
A. Zaidman, B. V. Rompaey, S. Demeyer, and A. v. Deursen, "Mining software repositories to study co-evolution of production and test code," in 2008 1st International Conference on Software Testing, Verification, and Validation, pp. 220--229, April 2008.
[21]
M. Beller, G. Gousios, A. Panichella, and A. Zaidman, "When, how, and why developers (do not) test in their ides," in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, (New York, NY, USA), pp. 179--190, ACM, 2015.
[22]
M. Beller, G. Gousios, and A. Zaidman, "Oops, my tests broke the build: An explorative analysis of travis ci with github," in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 356--367, May 2017.
[23]
K. Pei, Y. Cao, J. Yang, and S. Jana, "Deepxplore: Automated whitebox testing of deep learning systems," in Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, (New York, NY, USA), pp. 1--18, ACM, 2017.
[24]
Y. Tian, K. Pei, S. Jana, and B. Ray, "Deeptest: Automated testing of deep-neural-network-driven autonomous cars," in Proceedings of the 40th International Conference on Software Engineering, ICSE '18, (New York, NY, USA), pp. 303--314, ACM, 2018.
[25]
L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu, J. Zhao, and Y. Wang, "Deepgauge: Multi-granularity testing criteria for deep learning systems," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, (New York, NY, USA), pp. 120--131, ACM, 2018.
[26]
X. Du, X. Xie, Y. Li, L. Ma, Y. Liu, and J. Zhao, "Deepstellar: Model-based quantitative analysis of stateful deep learning systems," in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, (New York, NY, USA), pp. 477--487, ACM, 2019.
[27]
L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang, "Deepmutation: Mutation testing of deep learning systems," CoRR, vol. abs/1805.05206, 2018.
[28]
Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, "An empirical study on tensorflow program bugs," in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, (New York, NY, USA), pp. 129--140, ACM, 2018.

Cited By

View all
  • (2024)Fairness Testing: A Comprehensive Survey and Analysis of TrendsACM Transactions on Software Engineering and Methodology10.1145/365215533:5(1-59)Online publication date: 4-Jun-2024
  • (2024)Flaky Tests in the AI DomainProceedings of the 1st International Workshop on Flaky Tests10.1145/3643656.3643897(20-21)Online publication date: 14-Apr-2024
  • (2024)Security for Machine Learning-based Software Systems: A Survey of Threats, Practices, and ChallengesACM Computing Surveys10.1145/363853156:6(1-38)Online publication date: 23-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering
November 2019
1333 pages
ISBN:9781728125084

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 07 February 2020

Check for updates

Author Tags

  1. software quality assurance
  2. software testing
  3. test oracle
  4. testing deep learning libraries

Qualifiers

  • Research-article

Conference

ASE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fairness Testing: A Comprehensive Survey and Analysis of TrendsACM Transactions on Software Engineering and Methodology10.1145/365215533:5(1-59)Online publication date: 4-Jun-2024
  • (2024)Flaky Tests in the AI DomainProceedings of the 1st International Workshop on Flaky Tests10.1145/3643656.3643897(20-21)Online publication date: 14-Apr-2024
  • (2024)Security for Machine Learning-based Software Systems: A Survey of Threats, Practices, and ChallengesACM Computing Surveys10.1145/363853156:6(1-38)Online publication date: 23-Feb-2024
  • (2024)Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning ProjectsACM Transactions on Software Engineering and Methodology10.1145/363824533:4(1-22)Online publication date: 18-Apr-2024
  • (2024)A Post-training Framework for Improving the Performance of Deep Learning Models via Model TransformationACM Transactions on Software Engineering and Methodology10.1145/363001133:3(1-41)Online publication date: 15-Mar-2024
  • (2023)Virtual Reality (VR) Automated Testing in the Wild: A Case Study on Unity-Based VR ApplicationsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598134(1269-1281)Online publication date: 12-Jul-2023
  • (2023)ACETest: Automated Constraint Extraction for Testing Deep Learning OperatorsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598088(690-702)Online publication date: 12-Jul-2023
  • (2023)Toward Understanding Deep Learning Framework BugsACM Transactions on Software Engineering and Methodology10.1145/358715532:6(1-31)Online publication date: 29-Sep-2023
  • (2023)COMET: Coverage-guided Model Generation For Deep Learning Library TestingACM Transactions on Software Engineering and Methodology10.1145/358356632:5(1-34)Online publication date: 21-Jul-2023
  • (2023)An annotation-based approach for finding bugs in neural network programsJournal of Systems and Software10.1016/j.jss.2023.111669201:COnline publication date: 1-Jul-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media