Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3597503.3639584acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

DeepSample: DNN sampling-based testing for operational accuracy assessment

Published: 12 April 2024 Publication History

Abstract

Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure.
This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost.

References

[1]
Z. Li, X. Ma, C. Xu, C. Cao, J. Xu, and J. Lü. Boosting Operational DNN Testing Efficiency through Conditioning. In Proc. 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pages 499--509. ACM, 2019.
[2]
A. Guerriero, R. Pietrantuono, and S. Russo. Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 348--358. IEEE, 2021.
[3]
P. A. Currit, M. Dyer, and H. D. Mills. Certifying the reliability of software. IEEE Transactions on Software Engineering, SE-12(1):3--11, 1986.
[4]
H. D. Mills, M. Dyer, and R. C. Linger. Cleanroom software engineering. IEEE Software, 4(55):19--24, 1987.
[5]
R. C. Linger and H. D. Mills. A case study in cleanroom software engineering: the IBM COBOL Structuring Facility. In 12th International Computer Software and Applications Conference (COMPSAC), pages 10--17. IEEE, 1988.
[6]
R. H. Cobb and H. D. Mills. Engineering software under statistical quality control. IEEE Software, 7(6):45--54, 1990.
[7]
J. D. Musa. Software reliability-engineered testing. Computer, 29(11):61--68, 1996.
[8]
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):pp. 663--685, 1952.
[9]
J. Lv, B. Yin, and K. Cai. Estimating confidence interval of software reliability with adaptive testing strategy. Journal of Systems and Software, 97:192--206, 2014.
[10]
K. Cai, C. Jiang, H. Hu, and C. Bai. An experimental study of adaptive testing for software reliability assessment. Journal of Systems and Software, 81(8):1406--1429, 2008.
[11]
K. Cai, Y. Li, and K. Liu. Optimal and adaptive testing for software reliability assessment. Information and Software Technology, 46(15):989--1000, 2004.
[12]
J. Lv, B. Yin, and K. Cai. On the asymptotic behavior of adaptive testing strategy for software reliability assessment. IEEE Transactions on Software Engineering, 40(4):396--412, 2014.
[13]
A. Podgurski, W. Masri, Y. McCleese, F.G. Wolff, and C. Yang. Estimation of software reliability by stratified sampling, 1999.
[14]
F.b.N. Omri. Weighted statistical white-box testing with proportional-optimal stratification. In Proc. 19th International Doctoral Symposium on Components and Architecture, WCOP'14, pages 19--24. ACM, 2014.
[15]
D. Cotroneo, R. Pietrantuono, and S. Russo. RELAI Testing: A Technique to Assess and Improve Software Reliability. IEEE Transactions on Software Engineering, 42(5):452--475, 2016.
[16]
R. Pietrantuono and S. Russo. Probabilistic sampling-based testing for accelerated reliability assessment. In IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 35--46. IEEE, 2018.
[17]
R. Pietrantuono and S. Russo. On adaptive sampling-based testing for software reliability assessment. In 27th International Symposium on Software Reliability Engineering, ISSRE, pages 1--11. IEEE, 2016.
[18]
J. Chen, Z. Wu, Z. Wang, H. You, L. Zhang, and M. Yan. Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol., 29(4), oct 2020.
[19]
J. Zhou, F. Li, J. Dong, H. Zhang, and D. Hao. Cost-effective testing of a deep learning model through input reduction. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pages 289--300, 2020.
[20]
S. L. Lohr. Sampling: Design and Analysis. Duxbury Press, 2nd edition, 2009.
[21]
A. Stocco, M. Weiss, M. Calzana, and P. Tonella. Misbehaviour prediction for autonomous driving systems. In Proc. of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pages 359--371. ACM, 2020.
[22]
J. Kim, R. Feldt, and S. Yoo. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE), pages 1039--1049. IEEE, 2019.
[23]
M. P. Wand and M. C. Jones. Kernel smoothing. CRC press, 1994.
[24]
H. H. Morris and N. H. William. On the Theory of Sampling from Finite Populations. The Annals of Mathematical Statistics, 14(4):333--362, 1943.
[25]
J.N.K. Rao, H.O. Hartley, and W.G. Cochran. On a simple procedure of unequal probability sampling without replacement. Journal of the Royal Statistical Society. Series B (Methodological), 24(2):482--491, 1962.
[26]
J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281--297. Oakland, CA, USA, 1967.
[27]
Y. LeCun and C. Cortes. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010.
[28]
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, 2009.
[29]
K. Pei, Y. Cao, J. Yang, and S. Jana. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Communications of the ACM, 62(11):137--145, 2019.
[30]
B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do ImageNet Classifiers Generalize to ImageNet? In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of Machine Learning Research (PMLR), volume 97, pages 5389--5400, 2019.
[31]
A. Guerriero, M. R. Lyu, R. Pietrantuono, and S. Russo. Assessing operational accuracy of cnn-based image classifiers using an oracle surrogate. Intelligent Systems with Applications, 17:200172, 2023.
[32]
R. L. Iman and J. M. Davenport. Approximations of the critical region of the fbietkan statistic. Communications in Statistics - Theory and Methods, 9(6):571--595, 1980.
[33]
A. Dinno. Nonparametric pairwise multiple comparisons in independent groups using dunn's test. The Stata Journal, 15(1):292--300, 2015.
[34]
A. Guerriero, R. Pietrantuono, and S. Russo. Iterative assessment and improvement of dnn operational accuracy. In 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 43--48. IEEE, 2023.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

  1. software testing
  2. deep neural networks
  3. sampling

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 228
    Total Downloads
  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)33
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media