Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3127005.3127014acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

A Large-Scale Study of Modern Code Review and Security in Open Source Projects

Published: 08 November 2017 Publication History

Abstract

Background: Evidence for the relationship between code review process and software security (and software quality) has the potential to help improve code review automation and tools, as well as provide a better understanding of the economics for improving software security and quality. Prior work in this area has primarily been limited to case studies of a small handful of software projects. Aims: We investigate the effect of modern code review on software security. We extend and generalize prior work that has looked at code review and software quality. Method: We gather a very large dataset from GitHub (3,126 projects in 143 languages, with 489,038 issues and 382,771 pull requests), and use a combination of quantification techniques and multiple regression modeling to study the relationship between code review coverage and participation and software quality and security. Results: We find that code review coverage has a significant effect on software security. We confirm prior results that found a relationship between code review coverage and software defects. Most notably, we find evidence of a negative relationship between code review of pull requests and the number of security bugs reported in a project. Conclusions: Our results suggest that implementing code review policies within the pull request model of development may have a positive effect on the quality and security of software.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). Software available from tensorflow.org.
[2]
Aybuke Aurum, Håkan Petersson, and Claes Wohlin. 2002. State-of-the-art: software inspections after 25 years. Software Testing, Verification and Reliability 12, 3 (2002). https://doi.org/10.1002/stvr.243
[3]
Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In International Conference on Software Engineering.
[4]
Jose Barranquero, Jorge Díez, and Juan José del Coz. 2015. Quantification-oriented learning based on reliable classifiers. Pattern Recognition 48, 2 (2015). https://doi.org/10.1016/j.patcog.2014.07.032
[5]
Antonio Bella, Cesar Ferri, Jose Hernandez-Orallo, and Maria Jose Ramirez-Quintana. 2010. Quantification via probability estimators. In IEEE International Conference on Data Mining.
[6]
M. Bernhart and T. Grechenig. 2013. On the understanding of programs with continuous code reviews. In International Conference on Program Comprehension. https://doi.org/10.1109/ICPC.2013.6613847
[7]
David Bowes, Tracy Hall, and Jean Petrić. 2017. Software defect prediction: do different classifiers find the same defects? Software Quality Journal (2017). https://doi.org/10.1007/s11219-016-9353-3
[8]
F. Camilo, A. Meneely, and M. Nagappan. 2015. Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project. In Mining Software Repositories. https://doi.org/10.1109/MSR.2015.32
[9]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016). http://arxiv.org/abs/1603.02754
[10]
Steve Christey. 2011. CWE/SANS Top 25 Most Dangerous Software Errors. (2011). https://cwe.mitre.org/top25/
[11]
Anne Edmundson, Brian Holtkamp, Emanuel Rivera, Matthew Finifter, Adrian Mettler, and David Wagner. 2013. An Empirical Study on the Effectiveness of Security Code Review. In International Symposium on Engineering Secure Software and Systems. https://doi.org/10.1007/978-3-642-36563-8_14
[12]
Andrea Esuli and Fabrizio Sebastiani. 2015. Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery from Data (TKDD) 9, 4 (2015).
[13]
George Forman. 2005. Counting positives accurately despite inaccurate classification. In European Conference on Machine Learning.
[14]
John Fox. 2008. Applied regression analysis and generalized linear models (2nd ed.). Sage.
[15]
Wei Gao and Fabrizio Sebastiani. 2015. Tweet sentiment: from classification to quantification. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
[16]
M. Gegick, P. Rotella, and L. Williams. 2009. Predicting Attack-prone Components. In International Conference on Software Testing Verification and Validation.
[17]
M. Gegick, P. Rotella, and T. Xie. 2010. Identifying security bug reports via text mining: An industrial case study. In Mining Software Repositories. https://doi.org/10.1109/MSR.2010.5463340
[18]
Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models. In International Conference on Software Engineering.
[19]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering 38, 6 (Nov 2012). https://doi.org/10.1109/TSE.2011.103
[20]
Chad Heitzenrater and Andrew Simpson. 2016. A Case for the Economics of Secure Software Development. In New Security Paradigms Workshop (NSPW '16). https://doi.org/10.1145/3011883.3011884
[21]
Daniel J Hopkins and Gary King. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54, 1 (2010).
[22]
Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu. 2009. Cutting-plane training of structural SVMs. Machine Learning 77, 1 (2009).
[23]
Thorsten Joachims, Thomas Hofmann, Yisong Yue, and Chun-Nam Yu. 2009. Predicting structured objects with support vector machines. Commun. ACM 52, 11 (2009).
[24]
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. https://doi.org/10.3115/1118108.1118117 Software available from nltk.org.
[25]
Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2014. The impact of code review coverage and code review participation on software quality: a case study of the Qt, VTK, and ITK projects. In Mining Software Repositories. https://doi.org/10.1145/2597073.2597076
[26]
A. Meneely, H. Srinivasan, A. Musa, A. R. Tejeda, M. Mokary, and B. Spates. 2013. When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. https://doi.org/10.1109/ESEM.2013.19
[27]
A Meneely, ACR Tejeda, B Spates, and S Trudeau. 2014. An empirical investigation of socio-technical code review metrics and security vulnerabilities. In International Workshop on Social Software Engineering. https://doi.org/10.1145/2661685.2661687
[28]
Lawrence S. Meyers, Glenn Gamst, and Anthony J. Guarino. 2006. Applied Multivariate Research: Design and Interpretation (1st ed.). Sage.
[29]
L. Milli, A. Monreale, G. Rossetti, F. Giannotti, D. Pedreschi, and F. Sebastiani. 2013. Quantification Trees. In IEEE International Conference on Data Mining. https://doi.org/10.1109/ICDM.2013.122
[30]
J. C. Munson and S. G. Elbaum. 1998. Code churn: a measure for estimating the impact of code change. In International Conference on Software Maintenance. https://doi.org/10.1109/ICSM.1998.738486
[31]
N. Nagappan and T. Ball. 2005. Use of relative code churn measures to predict system defect density. In International Conference on Software Engineering. https://doi.org/10.1109/ICSE.2005.1553571
[32]
Nachiappan Nagappan and Thomas Ball. 2007. Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study. In International Symposium on Empirical Software Engineering and Measurement. https://doi.org/10.1109/ESEM.2007.13
[33]
Preslav Nakov, Alan Ritter, Sara Rosenthal, Veselin Stoyanov, and Fabrizio Sebastiani. 2016. SemEval-2016 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation.
[34]
Andy Ozment and Stuart E Schechter. 2006. Milk or Wine: Does software security improve with age?. In USENIX Security.
[35]
Princeton University. 2010. About WordNet. (2010). http://wordnet.princeton.edu
[36]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In International Symposium on Foundations of Software Engineering. https://doi.org/10.1145/2635868.2635922
[37]
Peter Rigby, Brendan Cleary, Frederic Painchaud, Margaret-Anne Storey, and Daniel German. 2012. Contemporary peer review in action: Lessons from open source development. IEEE Software 29, 6 (2012).
[38]
Y. Shin, A. Meneely, L. Williams, and J. A. Osborne. 2011. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. IEEE Transactions on Software Engineering 37, 6 (Nov 2011). https://doi.org/10.1109/TSE.2010.81
[39]
L. Szekeres, M. Payer, T. Wei, and D. Song. 2013. SoK: Eternal War in Memory. In IEEE Symposium on Security and Privacy. https://doi.org/10.1109/SP.2013.13
[40]
P Thongtanunam and S McIntosh. 2015. Investigating code review practices in defective files: an empirical study of the Qt system. In Mining Software Repositories. https://doi.org/10.1109/msr.2015.23
[41]
Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2011. Security Versus Performance Bugs: A Case Study on Firefox. In Mining Software Repositories. https://doi.org/10.1145/1985441.1985457

Cited By

View all
  • (2024)An Empirical Study of Static Analysis Tools for Secure Code ReviewProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680313(691-703)Online publication date: 11-Sep-2024
  • (2024)Embedded-check a Code Quality Tool for Automatic Firmware VerificationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653577(66-72)Online publication date: 3-Jul-2024
  • (2023)Formal Methods and Validation Techniques for Ensuring Automotive Systems SecurityInformation10.3390/info1412066614:12(666)Online publication date: 18-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering
November 2017
120 pages
ISBN:9781450353052
DOI:10.1145/3127005
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code review
  2. mining software repositories
  3. multiple regression models
  4. quantification models
  5. software quality
  6. software security

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PROMISE

Acceptance Rates

PROMISE Paper Acceptance Rate 12 of 25 submissions, 48%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)5
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Study of Static Analysis Tools for Secure Code ReviewProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680313(691-703)Online publication date: 11-Sep-2024
  • (2024)Embedded-check a Code Quality Tool for Automatic Firmware VerificationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653577(66-72)Online publication date: 3-Jul-2024
  • (2023)Formal Methods and Validation Techniques for Ensuring Automotive Systems SecurityInformation10.3390/info1412066614:12(666)Online publication date: 18-Dec-2023
  • (2023)A Case Study to Evaluate the Introduction of Code Review in a Industrial ProjectProceedings of the XXII Brazilian Symposium on Software Quality10.1145/3629479.3629494(188-197)Online publication date: 7-Nov-2023
  • (2023)Modern Code Reviews—Survey of Literature and PracticeACM Transactions on Software Engineering and Methodology10.1145/358500432:4(1-61)Online publication date: 26-May-2023
  • (2023)It’s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179320(1527-1544)Online publication date: May-2023
  • (2023)Security Defect Detection via Code Review: A Study of the OpenStack and Qt Communities2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1109/ESEM56168.2023.10304852(1-12)Online publication date: 26-Oct-2023
  • (2022)A characterization study of testing contributors and their contributions in open source projects.Proceedings of the XXXVI Brazilian Symposium on Software Engineering10.1145/3555228.3555244(95-105)Online publication date: 5-Oct-2022
  • (2022)Modeling review history for reviewer recommendationProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510213(1381-1392)Online publication date: 21-May-2022
  • (2022)Committed to Trust: A Qualitative Study on Security & Trust in Open Source Software Projects2022 IEEE Symposium on Security and Privacy (SP)10.1109/SP46214.2022.9833686(1880-1896)Online publication date: May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media