Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3540250.3549163acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Program merge conflict resolution via neural transformers

Published: 09 November 2022 Publication History

Abstract

Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer productivity. To address this problem, we introduce MergeBERT, a novel neural program merge framework based on token-level three-way differencing and a transformer encoder model. By exploiting the restricted nature of merge conflict resolutions, we reformulate the task of generating the resolution sequence as a classification task over a set of primitive merge patterns extracted from real-world merge commit data. Our model achieves 63–68% accuracy for merge resolution synthesis, yielding nearly a 3× performance improvement over existing semi-structured, and 2× improvement over neural program merge tools. Finally, we demonstrate that MergeBERT is sufficiently flexible to work with source code files in Java, JavaScript, TypeScript, and C# programming languages. To measure the practical use of MergeBERT, we conduct a user study to evaluate MergeBERT suggestions with 25 developers from large OSS projects on 122 real-world conflicts they encountered. Results suggest that in practice, MergeBERT resolutions would be accepted at a higher rate than estimated by automatic metrics for precision and accuracy. Additionally, we use participant feedback to identify future avenues for improvement of MergeBERT.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv., 51, 4 (2018), Article 81, July, 37 pages. issn:0360-0300 https://doi.org/10.1145/3212695
[2]
Sven Apel, Olaf Leß enich, and Christian Lengauer. 2012. Structured merge with auto-tuning: balancing precision and performance. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 120–129.
[3]
Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured Merge: Rethinking Merge in Revision Control Systems. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 190–200. isbn:9781450304436 https://doi.org/10.1145/2025113.2025141
[4]
Sven Apel, Jörg Liebig, Christian Lengauer, Christian Kästner, and William R Cook. 2010. Semistructured Merge in Revision Control Systems. In VaMoS. 13–19.
[5]
Ulf Asklund. 1999. Identifying Conflicts During Structural Merge.
[6]
Christian Bird and Thomas Zimmermann. 2012. Assessing the value of branches with what-if analysis. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
[7]
Caius Brindescu, Iftekhar Ahmed, Carlos Jensen, and Anita Sarma. 2020. An empirical investigation into merge conflicts and their effect on software quality. Empirical Software Engineering, 25, 1 (2020), 562–590.
[8]
Caius Brindescu, Yenifer Ramirez, Anita Sarma, and Carlos Jensen. 2020. Lifting the Curtain on Merge Conflict Resolution: A Sensemaking Perspective. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 534–545.
[9]
Yuriy Brun, Reid Holmes, Michael D Ernst, and David Notkin. 2011. Proactive detection of collaboration conflicts. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 168–178.
[10]
Guilherme Cavalcanti, Paulo Borba, and Paola Accioly. 2017. Evaluating and improving semistructured merge. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–27.
[11]
Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052–9065.
[12]
Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. Tipmerge: recommending experts for integrating changes across branches. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 523–534.
[13]
Catarina de Souza Costa, Jose Jair Figueiredo, Joao Felipe Pimentel, Anita Sarma, and Leonardo Gresta Paulino Murta. 2019. Recommending Participants for Collaborative Merge Sessions. IEEE Transactions on Software Engineering.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423
[15]
Elizabeth Dinella, Todd Mytcowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, and Shuvendu Lahiri. 2021. DeepMerge: Learning to merge programs. arxiv:2105.07569. arxiv:2105.07569
[16]
Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014. 313–324. https://doi.org/10.1145/2642937.2642982
[17]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[18]
Authors Elided for Review. 2022. Appendix to Program Merge Conflict Resolution via Neural Transformers. https://doi.org/10.5281/zenodo.6366877
[19]
Authors Elided for Review. 2022. Online Data Set for Program Merge Conflict Resolution via Neural Transformers. https://doi.org/10.5281/zenodo.6366908
[20]
Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and Andre Van Der Hoek. 2018. On the nature of merge conflicts: a study of 2,731 open source java projects hosted by github. IEEE Transactions on Software Engineering, 46, 8 (2018), 892–915.
[21]
Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. 2016. Work practices and challenges in pull-based development: the contributor’s perspective. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 285–296.
[22]
Mário Luís Guimarães and António Rito Silva. 2012. Improving early detection of software merge conflicts. In 2012 34th International Conference on Software Engineering (ICSE). 342–352.
[23]
Kim Herzig and Andreas Zeller. 2013. The Impact of Tangled Code Changes. In Proceedings of the Working Conference on Mining Software Repositories (MSR). 121–130.
[24]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, 837–847. isbn:9781467310673
[25]
JDime. 2022. JDime Publicly Available Implementation. https://github.com/se-sic/jdime
[26]
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In International Conference on Machine Learning. 5110–5121.
[27]
Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big Code ≠ Big Vocabulary: Open-Vocabulary Models for Source Code. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1073–1085. isbn:9781450371216 https://doi.org/10.1145/3377811.3380342
[28]
Bakhtiar Khan Kasi and Anita Sarma. 2013. Cassandra: Proactive conflict minimization through optimized task scheduling. In 2013 35th International Conference on Software Engineering (ICSE). 732–741.
[29]
Hiroyuki Kirinuki, Yoshiki Higo, Keisuke Hotta, and Shinji Kusumoto. 2014. Hey! Are You Committing Tangled Changes? In Proceedings of the International Conference on Program Comprehension (ICPC).
[30]
Olaf Leß enich, Sven Apel, Christian Kästner, Georg Seibt, and Janet Siegmund. 2017. Renaming and shifted code in structured merging: Looking ahead for precision and performance. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 543–553.
[31]
Shane McKee, Nicholas Nelson, Anita Sarma, and Danny Dig. 2017. Software practitioner perspectives on merge conflicts and resolutions. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 467–478.
[32]
Tom Mens. 2002. A state-of-the-art survey on software merging. IEEE transactions on software engineering, 28, 5 (2002), 449–462.
[33]
Nicholas Nelson, Caius Brindescu, Shane McKee, Anita Sarma, and Danny Dig. 2019. The life-cycle of merge conflicts: processes, barriers, and strategies. Empirical Software Engineering, 24, 5 (2019), 2863–2906.
[34]
Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 785–796.
[35]
Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. CoRR, abs/2103.02004 (2021), arxiv:2103.02004. arxiv:2103.02004
[36]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.
[37]
Georg Seibt, Florian Heck, Guilherme Cavalcanti, Paulo Borba, and Sven Apel. 2021. Leveraging Structure in Software Merge: An Empirical Study. IEEE Transactions on Software Engineering.
[38]
Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7881–7892. https://doi.org/10.18653/v1/2020.acl-main.704
[39]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 1715–1725. https://doi.org/10.18653/v1/P16-1162
[40]
Bowen Shen, Cihan Xiao, Na Meng, and Fei He. 2021. Automatic Detection and Resolution of Software Merge Conflicts: Are We There Yet? arXiv preprint arXiv:2102.11307.
[41]
Bo Shen, Wei Zhang, Haiyan Zhao, Guangtai Liang, Zhi Jin, and Qianxiang Wang. 2019. IntelliMerge: a refactoring-aware software merging technique. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–28.
[42]
R. Smith. 1998. GNU diff3. distributed with GNU diffutils package.
[43]
Codice Software. 2021. SemanticMerge. https://www.semanticmerge.com
[44]
Scooter Software. 2021. Beyond Compare. https://www.scootersoftware.com
[45]
Marcelo Sousa, Isil Dillig, and Shuvendu K Lahiri. 2018. Verified three-way program merge. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–29.
[46]
M. Sousa, I. Dillig, and S. K. Lahiri. 2018. Verified Three-way Program Merge. Proc. ACM Program. Lang., 2 (2018), 165:1–165:29.
[47]
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
[48]
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode Compose: Code Generation Using Transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 1433–1443. isbn:9781450370431 https://doi.org/10.1145/3368089.3417058
[49]
Alberto Trindade Tavares, Paulo Borba, Guilherme Cavalcanti, and Sérgio Soares. 2019. Semistructured merge in JavaScript systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1014–1025.
[50]
Alberto Trindade Tavares, Paulo Borba, Guilherme Cavalcanti, and Sérgio Soares. 2019. Semistructured Merge in JavaScript Systems. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1014–1025. https://doi.org/10.1109/ASE.2019.00098
[51]
Gustavo Vale, Claus Hunsen, Eduardo Figueiredo, and Sven Apel. 2021. Challenges of Resolving Merge Conflicts: A Mining and Survey Study. IEEE Transactions on Software Engineering.
[52]
Bernhard Westfechtel. 1991. Structure-oriented merging of revisions of software documents. In Proceedings of the 3rd international workshop on Software configuration management. 68–79.
[53]
Fengmin Zhu and Fei He. 2018. Conflict resolution for structured merge via version space algebra. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25.

Cited By

View all
  • (2025)Promises and perils of using Transformer-based models for SE researchNeural Networks10.1016/j.neunet.2024.107067184(107067)Online publication date: Apr-2025
  • (2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
  • (2024)Evaluation of Version Control Merge ToolsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695075(831-83)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software evolution
  2. ml4code
  3. program merge

Qualifiers

  • Research-article

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)11
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Promises and perils of using Transformer-based models for SE researchNeural Networks10.1016/j.neunet.2024.107067184(107067)Online publication date: Apr-2025
  • (2024)How code composition strategies affect merge conflict resolution?Journal of Software Engineering Research and Development10.5753/jserd.2024.363812:1Online publication date: 31-Oct-2024
  • (2024)Evaluation of Version Control Merge ToolsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695075(831-83)Online publication date: 27-Oct-2024
  • (2024)Revisiting the Conflict-Resolving Problem from a Semantic PerspectiveProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694993(141-152)Online publication date: 27-Oct-2024
  • (2024)Towards Semi-Automated Merge Conflict Resolution: Is It Easier Than We Expected?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661197(282-292)Online publication date: 18-Jun-2024
  • (2024)DeepCNN: A Dual Approach to Fault Localization and Repair in Convolutional Neural NetworksIEEE Access10.1109/ACCESS.2024.338498112(50321-50334)Online publication date: 2024
  • (2024)Multilingual code refactoring detection based on deep learningExpert Systems with Applications10.1016/j.eswa.2024.125164258(125164)Online publication date: Dec-2024
  • (2023)Symbolic Execution to Detect Semantic Merge Conflicts2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00028(186-197)Online publication date: 2-Oct-2023
  • (2023)Behind Developer Contributions on Conflicting Merge Scenarios2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00014(25-36)Online publication date: 2-Oct-2023
  • (2023)Git Merge Conflict Resolution Leveraging Strategy Classification and LLM2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00031(228-239)Online publication date: 22-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media