research-article

ReBack: recommending backports in social coding environments

Authors:

Debasish Chakroborti,

Kevin A. Schneider,

Chanchal K. RoyAuthors Info & Claims

Automated Software Engineering, Volume 31, Issue 1

https://doi.org/10.1007/s10515-024-00416-1

Published: 23 February 2024 Publication History

Abstract

Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.

References

[1]

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org (2015). https://www.tensorflow.org/

[2]

Chakroborti, D.: ReBack BenchMark (2021a).

[3]

Chakroborti, D.: ReBack Tool (2021b).

[4]

Ansible: Backport ReadmeD. https://tinyurl.com/backportREADMEMD. [Online; accessed 5-Dec-2021] (2021)

[5]

Ansible: DevelopmentProcess.rst. https://tinyurl.com/ansibledevelopmentprocessrst. [Online; accessed 22-June-2021] (2021)

[6]

Ansible: README.md. https://tinyurl.com/ansiblebackportREADME. [Online; accessed 22-June-2021] (2020)

[7]

Ansible: The Ansible Development Cycle. https://tinyurl.com/information-labels. [Online; accessed 5-Dec-2021] (2021)

[8]

Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action-Based Recommendation in Pull-Request Development, pp. 115–124. Association for Computing Machinery, New York, NY, USA (2020)

[9]

Cabot, J., Cánovas Izquierdo, J.L., Cosentino, V., Rolandi, B.: Exploring the use of labels to categorize issues in open-source software projects. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 550–554 (2015).

[10]

Chakroborti, D., Schneider, K.A., Roy, C.K.: Backports: Change types, challenges and strategies. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. ICPC ’22, pp. 636–647. Association for Computing Machinery, New York, NY, USA (2022).

Digital Library

[11]

Chen, D., Stolee, K.T., Menzies, T.: Replication can improve prior results: A github study of pull request acceptance. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 179–190 (2019).

Digital Library

[12]

Chollet, F., et al.: Keras. https://keras.io/. [Online; accessed 1-Sep-2021] (2021)

[13]

Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S.: Fault triggers in open-source software: An experience report. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 178–187 (2013).

[14]

DP, K., Ba, J.: Adam: a method for stochastic optimization. In: Proc. of the 3rd International Conference for Learning Representations (ICLR) (2015)

[15]

Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning. ICML ’06, pp. 233–240. Association for Computing Machinery, New York, NY, USA (2006).

Digital Library

[16]

de Lima Júnior ML, Soares DM, Plastino A, and Murta L Automatic assignment of integrators to pull requests: the importance of selecting appropriate attributes J. Syst. Softw. 2018 144 181-196

Digital Library

[17]

de Lima Júnior, M.L., Soares, D.M., Plastino, A., Murta, L.: Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1567–1572. Association for Computing Machinery, New York, NY, USA (2015).

Digital Library

[18]

Dehaghani SMH and Hajrahimi N Which factors affect software projects maintenance cost more? Acta Inform. Med. 2013 21 1 63

[19]

German, D.M., Di Penta, M., Gueheneuc, Y.-G., Antoniol, G.: Code siblings: Technical and legal implications of copying code between applications. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp. 81–90 (2009).

Digital Library

[20]

GitHub: About branches. https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches. [Online; accessed 1-Sep-2021] (2021)

[21]

GitHub: About forks. https://docs.github.com/en/get-started/quickstart/fork-a-repo. [Online; accessed 1-Sep-2021] (2021)

[22]

GitHub: About Pull-requests. https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests. [Online; accessed 10-Oct-2021] (2021)

[23]

GitHub: Query backport. https://tinyurl.com/Querybackport. [Online; accessed 5-Dec-2021] (2021)

[24]

Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–296. Association for Computing Machinery, New York, NY, USA (2016).

Digital Library

[25]

Gousios, G., Zaidman, A., Storey, M.-A., Deursen, A.v.: Work practices and challenges in pull-based development: The integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 358–368 (2015).

[26]

Gu, X., Han, Y.-S., Kim, S., Zhang, H.: Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs. In: Møller, A., Sridharan, M. (eds.) 35th European Conference on Object-Oriented Programming (ECOOP 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 194, pp. 11–11121. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021). https://drops.dagstuhl.de/opus/volltexte/2021/14054

[27]

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, and Oliphant TE Array programming with NumPy Nature 2020 585 7825 357-362

[28]

Hoang, T., Lawall, J., J. Oentaryo, R., Tian, Y., Lo, D.: Patchnet: A tool for deep patch classification. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 83–86 (2019).

Digital Library

[29]

Hoang T, Lawall J, Tian Y, Oentaryo RJ, and Lo D Patchnet: hierarchical deep learning-based stable patch identification for the linux kernel IEEE Trans. Softw. Eng. 2019

[30]

Jiang J, Yang Y, He J, Blanc X, and Zhang L Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development Inf. Softw. Technol. 2017 84 48-62

Digital Library

[31]

Jiang J, Wu Q, Cao J, Xia X, and Zhang L Recommending tags for pull requests in github Inf. Softw. Technol. 2021 129

[32]

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014).

Digital Library

[33]

Kibana: Creating PRs. https://tinyurl.com/READMEpluginsMD. [Online; accessed 5-Dec-2021] (2021)

[34]

Kibana: README.md. https://tinyurl.com/kibanaREADMEmd. [Online; accessed 22-June-2021] (2021)

[35]

Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar (2014). https://www.aclweb.org/anthology/D14-1181

[36]

Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)

[37]

Kokubun, T.: Gitstar Ranking. https://gitstar-ranking.com/repositories. [Online; accessed 19-August-2021] (2014)

[38]

Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), pp. 124–133 (2018)

[39]

Kononenko, O., Rose, T., Baysal, O., Godfrey, M., Theisen, D., de Water, B.: Studying pull request merges: A case study of shopify’s active merchant. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’18, pp. 124–133. Association for Computing Machinery, New York, NY, USA (2018).

Digital Library

[40]

Krasner, H.: The cost of poor software quality in the us: A 2020 report. In: Proc. Consortium Inf. Softw. QualityTM (CISQTM) (2021)

[41]

Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Commun. ACM 2017 60 6 84-90

Digital Library

[42]

Lawall, J., Palinski, D., Gnirke, L., Muller, G.: Fast and precise retrieval of forward and back porting information for linux device drivers. In: Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference. USENIX ATC ’17, pp. 15–26. USENIX Association, USA (2017)

[43]

Lawrence S, Giles CL, Tsoi AC, and Back AD Face recognition: a convolutional neural-network approach IEEE Trans. Neural Netw. 1997 8 1 98-113

Digital Library

[44]

Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017).

Digital Library

[45]

Li, Z., Yin, G., Yu, Y., Wang, T., Wang, H.: Detecting duplicate pull-requests in github. In: Proceedings of the 9th Asia-Pacific Symposium on Internetware. Internetware’17. Association for Computing Machinery, New York, NY, USA (2017).

Digital Library

[46]

Li, Z., Yu, Y., Yin, G., Wang, T., Fan, Q., Wang, H.: Automatic classification of review comments in pull-based development model. In: SEKE (2017)

[47]

Li Z, Yu Y, Yin G, Wang T, and Wang H What are they talking about? analyzing code reviews in pull-based development model J. Comput. Sci. Technol. 2017 32 1060-1075

[48]

Li Y, Zhu C, Rubin J, and Chechik M Semantic slicing of software version histories IEEE Trans. Softw. Eng. 2018 44 2 182-201

[49]

Mohamed, A., Zhang, L., Jiang, J., Ktob, A.: Predicting which pull requests will get reopened in github. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 375–385 (2018).

[50]

Mondal, M., Roy, C.K., Schneider, K.A.: Bug propagation through code cloning: An empirical study. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 227–237 (2017).

[51]

Ng, A.Y.: Feature selection, L1 vs L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML ’04, p. 78. Association for Computing Machinery, New York, NY, USA (2004).

Digital Library

[52]

Pham, H., Dai, Z., Xie, Q., Le, Q.V.: Meta pseudo labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11557–11568 (2021)

[53]

Powers, D.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. ArXiv abs/2010.16061 (2020)

[54]

PyGitHUb: About PyGitHUb. http://pygithub.readthedocs.io/en/latest/. [Online; accessed 1-Sep-2021] (2021)

[55]

Rahman, M.M., Roy, C.K.: An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories. MSR 2014, pp. 364–367. Association for Computing Machinery, New York, NY, USA (2014).

Digital Library

[56]

Ray, B., Kim, M., Person, S., Rungta, N.: Detecting and characterizing semantic inconsistencies in ported code. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 367–377 (2013).

Digital Library

[57]

Ray, B., Kim, M.: A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. FSE ’12. Association for Computing Machinery, New York, NY, USA (2012).

Digital Library

[58]

Ren, L.: Automated patch porting across forked projects. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1199–1201. Association for Computing Machinery, New York, NY, USA (2019).

Digital Library

[59]

Silva, M.C.O., Valente, M.T., Terra, R.: Does technical debt lead to the rejection of pull requests? SBSI 2016, pp. 248–254. Brazilian Computer Society, Porto Alegre, BRA (2016)

[60]

Soares DM, de Lima Júnior ML, Plastino A, and Murta L What factors influence the reviewer assignment to pull requests? Inf. Softw. Technol. 2018 98 32-43

Digital Library

[61]

Stanciulescu, S., Schulze, S., Wasowski, A.: Forked and integrated variants in an open-source firmware project. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 151–160 (2015).

Digital Library

[62]

Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, and Stallings J Gender differences and bias in open source: Pull request acceptance of women versus men PeerJ Comput. Sci. 2017 3 111

[63]

Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., Bavota, G.: Towards automating code review activities. arXiv e-prints, 2101 (2021)

[64]

v. d. Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 357–361 (2015)

[65]

Wang, Q., Xu, B., Xia, X., Wang, T., Li, S.: Duplicate pull request detection: When time matters. In: Proceedings of the 11th Asia-Pacific Symposium on Internetware. Internetware ’19. Association for Computing Machinery, New York, NY, USA (2019).

Digital Library

[66]

Yang C, Zhang X-H, Zeng L-B, Fan Q, Wang T, Yu Y, Yin G, and Wang H-M Revrec: a two-layer reviewer recommendation algorithm in pull-based development model J. Central South Univ. 2018 25 5 1129-1143

[67]

Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 367–371 (2015).

[68]

Yu, Y., Wang, H., Yin, G., Ling, C.X.: Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 2014 21st Asia-Pacific Software Engineering Conference, vol. 1, pp. 335–342 (2014).

Digital Library

[69]

Yu, S., Xu, L., Zhang, Y., Wu, J., Liao, Z., Li, Y.: Nbsl: A supervised classification model of pull request in github. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6 (2018)

[70]

Yu Y, Wang H, Yin G, and Wang T Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 2016 74 204-218

Digital Library

[71]

Zampetti, F., Bavota, G., Canfora, G., Penta, M.D.: A study on the interplay between pull request review and continuous integration builds. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 38–48 (2019)

[72]

Zampetti, F., Ponzanelli, L., Bavota, G., Mocci, A., Di Penta, M., Lanza, M.: How developers document pull requests with external references. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 23–33 (2017).

Digital Library

[73]

Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling Vision Transformers. arXiv e-prints, 2106–04560 (2021) arXiv:2106.04560 [cs.CV]

Recommendations

Backports: change types, challenges and strategies
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other ...
Multi-reviewing pull-requests: An exploratory study on GitHub OSS projects
Abstract
Context:GitHub has enabled developers to easily contribute their review comments on multiple pull-requests and switch their review focus between different pull-requests, i.e., multi-reviewing. Reviewing multiple pull-...
How often and what StackOverflow posts do developers reference in their GitHub projects?
MSR '19: Proceedings of the 16th International Conference on Mining Software Repositories

Stack Overflow (SO) is a popular Q&A forum for software developers, providing a large amount of copyable code snippets. While GitHub is an independent code collaboration platform, developers often reuse SO code in their GitHub projects. In this paper, ...

Comments

Information & Contributors

Information

Published In

cover image Automated Software Engineering

Automated Software Engineering Volume 31, Issue 1

May 2024

1122 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 23 February 2024

Accepted: 15 January 2024

Received: 25 June 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents