Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ReBack: recommending backports in social coding environments

Published: 23 February 2024 Publication History

Abstract

Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.

References

[1]
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/
[2]
Chakroborti, D.: ReBack BenchMark (2021a).
[3]
Chakroborti, D.: ReBack Tool (2021b).
[4]
Ansible: Backport ReadmeD. https://tinyurl.com/backportREADMEMD. [Online; accessed 5-Dec-2021] (2021)
[5]
Ansible: DevelopmentProcess.rst. https://tinyurl.com/ansibledevelopmentprocessrst. [Online; accessed 22-June-2021] (2021)
[6]
Ansible: README.md. https://tinyurl.com/ansiblebackportREADME. [Online; accessed 22-June-2021] (2020)
[7]
Ansible: The Ansible Development Cycle. https://tinyurl.com/information-labels. [Online; accessed 5-Dec-2021] (2021)
[8]
Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action-Based Recommendation in Pull-Request Development, pp. 115–124. Association for Computing Machinery, New York, NY, USA (2020)
[9]
Cabot, J., Cánovas Izquierdo, J.L., Cosentino, V., Rolandi, B.: Exploring the use of labels to categorize issues in open-source software projects. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 550–554 (2015).
[10]
Chakroborti, D., Schneider, K.A., Roy, C.K.: Backports: Change types, challenges and strategies. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. ICPC ’22, pp. 636–647. Association for Computing Machinery, New York, NY, USA (2022).
[11]
Chen, D., Stolee, K.T., Menzies, T.: Replication can improve prior results: A github study of pull request acceptance. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 179–190 (2019).
[12]
Chollet, F., et al.: Keras. https://keras.io/. [Online; accessed 1-Sep-2021] (2021)
[13]
Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S.: Fault triggers in open-source software: An experience report. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 178–187 (2013).
[14]
DP, K., Ba, J.: Adam: a method for stochastic optimization. In: Proc. of the 3rd International Conference for Learning Representations (ICLR) (2015)
[15]
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning. ICML ’06, pp. 233–240. Association for Computing Machinery, New York, NY, USA (2006).
[16]
de Lima Júnior ML, Soares DM, Plastino A, and Murta L Automatic assignment of integrators to pull requests: the importance of selecting appropriate attributes J. Syst. Softw. 2018 144 181-196
[17]
de Lima Júnior, M.L., Soares, D.M., Plastino, A., Murta, L.: Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1567–1572. Association for Computing Machinery, New York, NY, USA (2015).
[18]
Dehaghani SMH and Hajrahimi N Which factors affect software projects maintenance cost more? Acta Inform. Med. 2013 21 1 63
[19]
German, D.M., Di Penta, M., Gueheneuc, Y.-G., Antoniol, G.: Code siblings: Technical and legal implications of copying code between applications. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 81–90 (2009).
[21]
GitHub: About forks. https://docs.github.com/en/get-started/quickstart/fork-a-repo. [Online; accessed 1-Sep-2021] (2021)
[23]
GitHub: Query backport. https://tinyurl.com/Querybackport. [Online; accessed 5-Dec-2021] (2021)
[24]
Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–296. Association for Computing Machinery, New York, NY, USA (2016).
[25]
Gousios, G., Zaidman, A., Storey, M.-A., Deursen, A.v.: Work practices and challenges in pull-based development: The integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 358–368 (2015).
[26]
Gu, X., Han, Y.-S., Kim, S., Zhang, H.: Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs. In: Møller, A., Sridharan, M. (eds.) 35th European Conference on Object-Oriented Programming (ECOOP 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 194, pp. 11–11121. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021). https://drops.dagstuhl.de/opus/volltexte/2021/14054
[27]
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, and Oliphant TE Array programming with NumPy Nature 2020 585 7825 357-362
[28]
Hoang, T., Lawall, J., J. Oentaryo, R., Tian, Y., Lo, D.: Patchnet: A tool for deep patch classification. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 83–86 (2019).
[29]
Hoang T, Lawall J, Tian Y, Oentaryo RJ, and Lo D Patchnet: hierarchical deep learning-based stable patch identification for the linux kernel IEEE Trans. Softw. Eng. 2019
[30]
Jiang J, Yang Y, He J, Blanc X, and Zhang L Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development Inf. Softw. Technol. 2017 84 48-62
[31]
Jiang J, Wu Q, Cao J, Xia X, and Zhang L Recommending tags for pull requests in github Inf. Softw. Technol. 2021 129
[32]
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014).
[33]
Kibana: Creating PRs. https://tinyurl.com/READMEpluginsMD. [Online; accessed 5-Dec-2021] (2021)
[34]
Kibana: README.md. https://tinyurl.com/kibanaREADMEmd. [Online; accessed 22-June-2021] (2021)
[35]
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014). https://www.aclweb.org/anthology/D14-1181
[36]
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)
[37]
Kokubun, T.: Gitstar Ranking. https://gitstar-ranking.com/repositories. [Online; accessed 19-August-2021] (2014)
[38]
Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 124–133 (2018)
[39]
Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’18, pp. 124–133. Association for Computing Machinery, New York, NY, USA (2018).
[40]
Krasner, H.: The cost of poor software quality in the us: A 2020 report. In: Proc. Consortium Inf. Softw. QualityTM (CISQTM) (2021)
[41]
Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Commun. ACM 2017 60 6 84-90
[42]
Lawall, J., Palinski, D., Gnirke, L., Muller, G.: Fast and precise retrieval of forward and back porting information for linux device drivers. In: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference. USENIX ATC ’17, pp. 15–26. USENIX Association, USA (2017)
[43]
Lawrence S, Giles CL, Tsoi AC, and Back AD Face recognition: a convolutional neural-network approach IEEE Trans. Neural Netw. 1997 8 1 98-113
[44]
Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017).
[45]
Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017).
[46]
Li, Z., Yu, Y., Yin, G., Wang, T., Fan, Q., Wang, H.: Automatic classification of review comments in pull-based development model. In: SEKE (2017)
[47]
Li Z, Yu Y, Yin G, Wang T, and Wang H What are they talking about? analyzing code reviews in pull-based development model J. Comput. Sci. Technol. 2017 32 1060-1075
[48]
Li Y, Zhu C, Rubin J, and Chechik M Semantic slicing of software version histories IEEE Trans. Softw. Eng. 2018 44 2 182-201
[49]
Mohamed, A., Zhang, L., Jiang, J., Ktob, A.: Predicting which pull requests will get reopened in github. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 375–385 (2018).
[50]
Mondal, M., Roy, C.K., Schneider, K.A.: Bug propagation through code cloning: An empirical study. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 227–237 (2017).
[51]
Ng, A.Y.: Feature selection, L1 vs L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML ’04, p. 78. Association for Computing Machinery, New York, NY, USA (2004).
[52]
Pham, H., Dai, Z., Xie, Q., Le, Q.V.: Meta pseudo labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11557–11568 (2021)
[53]
Powers, D.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. ArXiv abs/2010.16061 (2020)
[54]
PyGitHUb: About PyGitHUb. http://pygithub.readthedocs.io/en/latest/. [Online; accessed 1-Sep-2021] (2021)
[55]
Rahman, M.M., Roy, C.K.: An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories. MSR 2014, pp. 364–367. Association for Computing Machinery, New York, NY, USA (2014).
[56]
Ray, B., Kim, M., Person, S., Rungta, N.: Detecting and characterizing semantic inconsistencies in ported code. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 367–377 (2013).
[57]
Ray, B., Kim, M.: A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. FSE ’12. Association for Computing Machinery, New York, NY, USA (2012).
[58]
Ren, L.: Automated patch porting across forked projects. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1199–1201. Association for Computing Machinery, New York, NY, USA (2019).
[59]
Silva, M.C.O., Valente, M.T., Terra, R.: Does technical debt lead to the rejection of pull requests? SBSI 2016, pp. 248–254. Brazilian Computer Society, Porto Alegre, BRA (2016)
[60]
Soares DM, de Lima Júnior ML, Plastino A, and Murta L What factors influence the reviewer assignment to pull requests? Inf. Softw. Technol. 2018 98 32-43
[61]
Stanciulescu, S., Schulze, S., Wasowski, A.: Forked and integrated variants in an open-source firmware project. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 151–160 (2015).
[62]
Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, and Stallings J Gender differences and bias in open source: Pull request acceptance of women versus men PeerJ Comput. Sci. 2017 3 111
[63]
Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., Bavota, G.: Towards automating code review activities. arXiv e-prints, 2101 (2021)
[64]
v. d. Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 357–361 (2015)
[65]
Wang, Q., Xu, B., Xia, X., Wang, T., Li, S.: Duplicate pull request detection: When time matters. In: Proceedings of the 11th Asia-Pacific Symposium on Internetware. Internetware ’19. Association for Computing Machinery, New York, NY, USA (2019).
[66]
Yang C, Zhang X-H, Zeng L-B, Fan Q, Wang T, Yu Y, Yin G, and Wang H-M Revrec: a two-layer reviewer recommendation algorithm in pull-based development model J. Central South Univ. 2018 25 5 1129-1143
[67]
Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 367–371 (2015).
[68]
Yu, Y., Wang, H., Yin, G., Ling, C.X.: Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 2014 21st Asia-Pacific Software Engineering Conference, vol. 1, pp. 335–342 (2014).
[69]
Yu, S., Xu, L., Zhang, Y., Wu, J., Liao, Z., Li, Y.: Nbsl: A supervised classification model of pull request in github. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6 (2018)
[70]
Yu Y, Wang H, Yin G, and Wang T Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 2016 74 204-218
[71]
Zampetti, F., Bavota, G., Canfora, G., Penta, M.D.: A study on the interplay between pull request review and continuous integration builds. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 38–48 (2019)
[72]
Zampetti, F., Ponzanelli, L., Bavota, G., Mocci, A., Di Penta, M., Lanza, M.: How developers document pull requests with external references. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 23–33 (2017).
[73]
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling Vision Transformers. arXiv e-prints, 2106–04560 (2021) arXiv:2106.04560 [cs.CV]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Automated Software Engineering
Automated Software Engineering  Volume 31, Issue 1
May 2024
1122 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 23 February 2024
Accepted: 15 January 2024
Received: 25 June 2022

Author Tags

  1. Pull-requests
  2. Deep-learning
  3. Backporting
  4. GitHub
  5. Patches

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media