research-article

Open access

DeepSample: DNN sampling-based testing for operational accuracy assessment

Authors:

Antonio Guerriero,

Roberto Pietrantuono,

Stefano RussoAuthors Info & Claims

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Article No.: 120, Pages 1 - 12

https://doi.org/10.1145/3597503.3639584

Published: 12 April 2024 Publication History

Abstract

Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure.

This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost.

References

[1]

Z. Li, X. Ma, C. Xu, C. Cao, J. Xu, and J. Lü. Boosting Operational DNN Testing Efficiency through Conditioning. In Proc. 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 499--509. ACM, 2019.

Digital Library

[2]

A. Guerriero, R. Pietrantuono, and S. Russo. Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 348--358. IEEE, 2021.

Digital Library

[3]

P. A. Currit, M. Dyer, and H. D. Mills. Certifying the reliability of software. IEEE Transactions on Software Engineering, SE-12(1):3--11, 1986.

Digital Library

[4]

H. D. Mills, M. Dyer, and R. C. Linger. Cleanroom software engineering. IEEE Software, 4(55):19--24, 1987.

Digital Library

[5]

R. C. Linger and H. D. Mills. A case study in cleanroom software engineering: the IBM COBOL Structuring Facility. In 12th International Computer Software and Applications Conference (COMPSAC), pages 10--17. IEEE, 1988.

[6]

R. H. Cobb and H. D. Mills. Engineering software under statistical quality control. IEEE Software, 7(6):45--54, 1990.

Digital Library

[7]

J. D. Musa. Software reliability-engineered testing. Computer, 29(11):61--68, 1996.

Digital Library

[8]

D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):pp. 663--685, 1952.

[9]

J. Lv, B. Yin, and K. Cai. Estimating confidence interval of software reliability with adaptive testing strategy. Journal of Systems and Software, 97:192--206, 2014.

Digital Library

[10]

K. Cai, C. Jiang, H. Hu, and C. Bai. An experimental study of adaptive testing for software reliability assessment. Journal of Systems and Software, 81(8):1406--1429, 2008.

Digital Library

[11]

K. Cai, Y. Li, and K. Liu. Optimal and adaptive testing for software reliability assessment. Information and Software Technology, 46(15):989--1000, 2004.

[12]

J. Lv, B. Yin, and K. Cai. On the asymptotic behavior of adaptive testing strategy for software reliability assessment. IEEE Transactions on Software Engineering, 40(4):396--412, 2014.

Digital Library

[13]

A. Podgurski, W. Masri, Y. McCleese, F.G. Wolff, and C. Yang. Estimation of software reliability by stratified sampling, 1999.

Digital Library

[14]

F.b.N. Omri. Weighted statistical white-box testing with proportional-optimal stratification. In Proc. 19th International Doctoral Symposium on Components and Architecture, WCOP'14, pages 19--24. ACM, 2014.

Digital Library

[15]

D. Cotroneo, R. Pietrantuono, and S. Russo. RELAI Testing: A Technique to Assess and Improve Software Reliability. IEEE Transactions on Software Engineering, 42(5):452--475, 2016.

Digital Library

[16]

R. Pietrantuono and S. Russo. Probabilistic sampling-based testing for accelerated reliability assessment. In IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 35--46. IEEE, 2018.

[17]

R. Pietrantuono and S. Russo. On adaptive sampling-based testing for software reliability assessment. In 27th International Symposium on Software Reliability Engineering, ISSRE, pages 1--11. IEEE, 2016.

[18]

J. Chen, Z. Wu, Z. Wang, H. You, L. Zhang, and M. Yan. Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol., 29(4), oct 2020.

[19]

J. Zhou, F. Li, J. Dong, H. Zhang, and D. Hao. Cost-effective testing of a deep learning model through input reduction. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pages 289--300, 2020.

[20]

S. L. Lohr. Sampling: Design and Analysis. Duxbury Press, 2nd edition, 2009.

[21]

A. Stocco, M. Weiss, M. Calzana, and P. Tonella. Misbehaviour prediction for autonomous driving systems. In Proc. of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pages 359--371. ACM, 2020.

Digital Library

[22]

J. Kim, R. Feldt, and S. Yoo. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE), pages 1039--1049. IEEE, 2019.

Digital Library

[23]

M. P. Wand and M. C. Jones. Kernel smoothing. CRC press, 1994.

[24]

H. H. Morris and N. H. William. On the Theory of Sampling from Finite Populations. The Annals of Mathematical Statistics, 14(4):333--362, 1943.

[25]

J.N.K. Rao, H.O. Hartley, and W.G. Cochran. On a simple procedure of unequal probability sampling without replacement. Journal of the Royal Statistical Society. Series B (Methodological), 24(2):482--491, 1962.

[26]

J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281--297. Oakland, CA, USA, 1967.

[27]

Y. LeCun and C. Cortes. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010.

[28]

A. Krizhevsky. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, 2009.

[29]

K. Pei, Y. Cao, J. Yang, and S. Jana. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Communications of the ACM, 62(11):137--145, 2019.

Digital Library

[30]

B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do ImageNet Classifiers Generalize to ImageNet? In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of Machine Learning Research (PMLR), volume 97, pages 5389--5400, 2019.

[31]

A. Guerriero, M. R. Lyu, R. Pietrantuono, and S. Russo. Assessing operational accuracy of cnn-based image classifiers using an oracle surrogate. Intelligent Systems with Applications, 17:200172, 2023.

[32]

R. L. Iman and J. M. Davenport. Approximations of the critical region of the fbietkan statistic. Communications in Statistics - Theory and Methods, 9(6):571--595, 1980.

[33]

A. Dinno. Nonparametric pairwise multiple comparisons in independent groups using dunn's test. The Stata Journal, 15(1):292--300, 2015.

[34]

A. Guerriero, R. Pietrantuono, and S. Russo. Iterative assessment and improvement of dnn operational accuracy. In 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 43--48. IEEE, 2023.

Digital Library

Index Terms

DeepSample: DNN sampling-based testing for operational accuracy assessment
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Boosting operational DNN testing efficiency through conditioning
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

With the increasing adoption of Deep Neural Network (DNN) models as integral parts of software systems, efficient operational testing of DNNs is much in demand to ensure these models' actual performance in field conditions. A challenge is that the ...
Iterative Assessment and Improvement of DNN Operational Accuracy
ICSE-NIER '23: Proceedings of the 45th International Conference on Software Engineering: New Ideas and Emerging Results

Deep Neural Networks (DNN) are nowadays largely adopted in many application domains thanks to their human-like, or even superhuman, performance in specific tasks. However, due to unpredictable/unconsidered operating conditions, unexpected failures ...
Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing
Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

May 2024

2942 pages

ISBN:9798400702174

DOI:10.1145/3597503

Co-chairs:
Ana Paiva,
Rui Abreu,
Program Co-chairs:
Abhik Roychoudhury,
Margaret Storey

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Research Executive Agency

Conference

ICSE '24

Sponsor:

SIGSOFT

ICSE '24: IEEE/ACM 46th International Conference on Software Engineering

April 14 - 20, 2024

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
228
Total Downloads

Downloads (Last 12 months)228
Downloads (Last 6 weeks)33

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents