Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Aide-mémoire: Improving a Project’s Collective Memory via Pull Request–Issue Links

Published: 29 March 2023 Publication History

Abstract

Links between pull request and the issues they address document and accelerate the development of a software project but are often omitted. We present a new tool, Aide-mémoire, to suggest such links when a developer submits a pull request or closes an issue, smoothly integrating into existing workflows. In contrast to previous state-of-the-art approaches that repair related commit histories, Aide-mémoire is designed for continuous, real-time, and long-term use, employing Mondrian forest to adapt over a project’s lifetime and continuously improve traceability. Aide-mémoire is tailored for two specific instances of the general traceability problem—namely, commit to issue and pull request to issue links, with a focus on the latter—and exploits data inherent to these two problems to outperform tools for general purpose link recovery. Our approach is online, language-agnostic, and scalable. We evaluate over a corpus of 213 projects and six programming languages, achieving a mean average precision of 0.95. Adopting Aide-mémoire is both efficient and effective: A programmer need only evaluate a single suggested link 94% of the time, and 16% of all discovered links were originally missed by developers.

References

[1]
Agile Alliance. 2019. Agile Alliance: Backlog refinement. Retrieved November 26, 2019 from https://www.agilealliance.org/glossary/backlog-grooming/.
[2]
Apache. 2020. Coding and Commit Conventions. Retrieved July 9, 2020 from https://subversion.apache.org/docs/community-guide/conventions.html.
[3]
Hazeline U. Asuncion, Arthur U. Asuncion, and Richard N. Taylor. 2010. Software traceability with topic modeling. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, 95–104.
[4]
Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar Devanbu, and Abraham Bernstein. 2010. The missing links. Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’10), 97.
[5]
Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, and Premkumar Devanbu. 2009. Fair and balanced?: Bias in bug-fix datasets. In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’09), 121–130.
[6]
Markus Borg, Per Runeson, and Anders Ardö. 2014. Recovering from a decade: A systematic mapping of information retrieval approaches to software traceability. Emp. Softw. Eng. 19, 6 (01 December2014), 1565–1616.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.
[8]
Gemma Catolino, Fabio Palomba, Damian A. Tamburri, Alexander Serebrenik, and Filomena Ferrucci. 2020. Refactoring community smells in the wild: The practitioner’s field manual. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Society. 25–34.
[9]
Jane Cleland-Huang, Olly Gotel, Jane Huffman Hayes, Patrick Mäder, and Andrea Zisman. 2014. Software traceability: Trends and future directions. In Proceedings of the Conference on the Future of Software Engineering (FOSE’14), 55–69.
[10]
Martin Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. 1999. Refactoring: Improving the Design of Existing Code.
[11]
GitHub. 2016. GitHub Octoverse 2016. Retrieved August 7, 2017 from https://octoverse.github.com/.
[12]
GitHub. 2017. GitHub: Autolinked References and URLs. Retrieved August 20, 2017 from https://help.github.com/articles/autolinked-references-and-urls/.
[13]
Georgios Gousios and D. Spinellis. 2012. GHTorrent: Github’s data from a firehose. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12). IEEE, 12–21.
[14]
Georgios Gousios and D. Spinellis. 2017. Google Cloud Public Table of GitHub Projects. Retrieved August 10, 2017 from https://bigquery.cloud.google.com/dataset/ghtorrent-bq:ght.
[15]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 1 (November2009), 10–18.
[16]
Donald R. Hedeker and Robert D. Gibbons. 2006. Longitudinal Data Analysis. WileyInterscience.
[17]
JIRA. 2017. JIRA: Link JIRA Issues to Confluence Pages Automatically. Retrieved August 8, 2017 from https://www.atlassian.com/blog/confluence/link-jira-issues-to-confluence-pages-automatically.
[18]
JIRA. 2017. JIRA: Rest API Examples. Retrieved May 14, 2021 from https://developer.atlassian.com/server/jira/platform/jira-rest-api-examples/.
[19]
Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001. SciPy: Open Source Scientific Tools for Python. Retrieved July 31, 2017 from http://www.scipy.org/.
[20]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th International Conference on Mining Software Repositories (MSR’14). ACM Press, New York, NY, 92–101.
[21]
Max Kuhn and Johnson Kjell. CRC Press. Feature Engineering and Selection: A Practical Approach for Predictive Models. 2019.
[22]
Balaji Lakshminarayanan, Daniel M. Roy, and Yee Whye Teh. 2014. Mondrian forests: Efficient online random forests. In Advances in Neural Information Processing Systems. 3140–3148.
[23]
Tien Duy B. Le, Mario Linares-Vásquez, David Lo, and Denys Poshyvanyk. 2015. RCLinker: Automated linking of issue reports and commits leveraging rich contextual information. In Proceedings of the IEEE International Conference on Program Comprehension, Vol. 2015. IEEE, 36–47.
[24]
Mario Linares-Vasquez, Luis Fernando Cortes-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. ChangeScribe: A tool for automatically generating commit messages. Proceedings - International Conference on Software Engineering 2 (2015), 709–712.
[26]
Chen Liu, Jinqiu Yang, Lin Tan, and Munawar Hafiz. 2013. R2Fix: Automatically Generating Bug Fixes from Bug Reports. Ph.D. Dissertation. University of Waterloo.
[27]
Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, and Shanping Li. 2019. Automatic generation of pull request descriptions. arXiv:1909.06987. Retrieved from https://arxiv.org/abs/1909.06987.
[28]
Walid Maalej and Hans-Jörg Happel. 2010. Can development work describe itself? InProceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR’10), 191–200.
[29]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY.
[30]
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest? In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 154–168.
[31]
Qing Mi and Jacky Keung. 2016. An empirical analysis of reopened bugs based on open source projects. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE’16), 1–10.
[32]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781.
[33]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.
[34]
Sridhar Nerur, RadhaKanta Mahapatra, and George Mangalaraj. 2005. Challenges of migrating to agile methodologies. Commun. ACM 48, 5 (2005), 72–78.
[35]
C. Neumuller and P. Grunbacher. 2006. Automating software traceability in very small companies: A case study and lessons learne. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06). 145–156.
[36]
Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2012. Multi-layered approach for recovering links between bug reports and xes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - (FSE’12). ACM Press, New York, New York, USA, 1.
[37]
Peter O’Hearn. 2020. ICSE 2020 Keynote: Formal Reasoning and the Hacker Way. Retrieved from https://youtu.be/bb8BnqhY3Ss?t=2599.
[38]
Fabio Palomba, Damian Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells?IEEE Trans. Softw. Eng. 47, 1 (2018), 108–129.
[39]
Profir-Petru Pârţachi, David R. White, and Earl T. Barr. 2020. Aide-mémoire: Accurate Issue Links at Pull Request submission. Retrieved July 13, 2020 from https://github.com/PPPI/a-m/.
[40]
Profir-Petru Pârţachi, David R. White, and Earl T. Barr. 2020. Datasets as Pickled Python Objects. Retrieved February 25, 2020 from https://figshare.com/s/83c448eb518b3d04651f.
[41]
M. F. Porter. 1980. An algorithm for sux stripping. Program: Electronic Library and Information Systems 14, 3 (1980), 130–137.
[42]
Lutz Prechelt and Alexander Pepper. 2014. Bflinks: Reliable bugfix links via bidirectional references and tuned heuristics. Int. Scholar. Res. Not. 2014 (29 Oct 2014), 701357.
[43]
Shivani Rao and Avinash Kak. 2011. Retrieval from software libraries for bug localization. In Proceeding of the 8th Working Conference on Mining Software Repositories (MSR’11). 43.
[44]
Michael Rath, Jacob Rendall, Jin L. C. Guo, Jane Cleland-Huang, and Patrick Maeder. 2018. Traceability in the Wild: Automatically Augmenting Incomplete Trace Links.
[45]
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, 45–50.
[46]
Hang Ruan, Bihuan Chen, Xin Peng, and Wenyun Zhao. 2019. DeepLink: Recovering issue-commit links based on deep learning. J. Syst. Softw. 158 (2019), 110406.
[47]
Scikit-learn. 2020. Recursive Feature Elimination: SciKit Implementation. Retrieved June 17, 2020 from https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html.
[48]
[49]
Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M. Ibrahim, Masao Ohira, Bram Adams, Ahmed E. Hassan, and Ken Ichi Matsumoto. 2013. Studying re-opened bugs in open source software. Empirical Software Engineering 18, 5 (2013), 1005–1042.
[50]
Daniel Ståhl, Kristofer Hallén, and Jan Bosch. 2017. Achieving traceability in large scale continuous integration and delivery deployment, usage and validation of the eiffel framework. Emp. Softw. Eng. 22, 3 (2017), 967–995.
[51]
Eliza Strickland. 2022. Andrew Ng: Unbiggen AI. Retrieved May 26, 2022 from https://spectrum.ieee.org/andrew-ng-data-centric-ai.
[52]
Y. Sun, C. Chen, Q. Wang, and B. Boehm. 2017. Improving missing issue-commit link recovery using positive and unlabeled data. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 147–152.
[53]
Yan Sun, Qing Wang, and Ye Yang. 2017. FRLink: Improving the recovery of missing issue-commit links by revisiting file relevance. Inf. Softw. Technol. 84 (2017), 33–47.
[54]
Damian A. Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect’s role in community shepherding. IEEE Softw. 33, 6 (2016), 70–79.
[55]
Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering? In Proceedings of the 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE’13). IEEE, 93–96.
[56]
Laurie Tratt. 2018. Personal Communication with Laurie Tratt.
[57]
Michele Tufano, Gabriele Bavota, Denys Poshyvanyk, Massimiliano Di Penta, Rocco Oliveto, and Andrea De Lucia. 2016. An empirical study on developer-related factors characterizing fix-inducing commits. J. Softw.: Evol. Process 26, 12 (August2016), 1172–1192. arxiv:1408.1293
[58]
Ming Wen, Rongxin Wu, and Shing-chi Cheung. 2016. Locus : Locating bugs from software changes. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16), 262–273.
[59]
Renjie Wu and Eamonn J. Keogh. 2020. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. arxiv:cs.LG/2009.13807. Retrieved from https://arxiv.org/abs/2009.13807.
[60]
Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. ReLink: Recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, New York, NY, USA, 15–25.
[61]
Klaus Changsun Youm, June Ahn, and Eunseok Lee. 2017. Improved bug localization based on code change histories and bug reports. Inf. Softw. Technol. 82 (2017), 177–192.

Cited By

View all
  • (2024)Improving Issue-PR Link Prediction via Knowledge-Aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.340844850:7(1901-1920)Online publication date: 1-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 2
March 2023
946 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3586025
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2023
Online AM: 06 June 2022
Accepted: 13 May 2022
Revised: 19 April 2022
Received: 17 July 2020
Published in TOSEM Volume 32, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Traceability
  2. link inference
  3. missing link

Qualifiers

  • Research-article

Funding Sources

  • EPSRC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)5
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving Issue-PR Link Prediction via Knowledge-Aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.340844850:7(1901-1920)Online publication date: 1-Jul-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media