Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE.2019.00089acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Graph-based mining of in-the-wild, fine-grained, semantic code change patterns

Published: 25 May 2019 Publication History

Abstract

Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies.
We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.

References

[1]
https://docs.google.com/spreadsheets/d/11K_UkwJP-W_8jcDde1L1DDrigFTs3oG6rwryNOApy60/edit#gid=0.
[2]
E. T. Barr, Y. Brun, P. Devanbu, M. Harman, and F. Sarro. The plastic surgery hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 306--317. ACM, 2014.
[3]
E. T. Barr, M. Harman, Y. Jia, A. Marginean, and J. Petke. Automated software transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA'15, pages 257--269. ACM, 2015.
[4]
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE '09, pages 213--222. ACM, 2009.
[5]
B. Dagenais and M. P. Robillard. Recommending adaptive changes for framework evolution. In ICSE '08: Proceedings of the 30th International Conference on Software Engineering, pages 481--490. ACM, 2008.
[6]
D. Dig, C. Comertoglu, D. Marinov, and R. Johnson. Automated detection of refactorings in evolving components. In Proceedings of the 20th European Conference on Object-Oriented Programming, ECOOP'06, pages 404--428. Springer-Verlag, 2006.
[7]
J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE '14, pages 313--324. ACM, 2014.
[8]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987.
[9]
B. Fluri, M. Wuersch, M. PInzger, and H. Gall. Change Distilling Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Trans. Softw. Eng., 33(11)725--743, Nov. 2007.
[10]
S. R. Foster, W. G. Griswold, and S. Lerner. Witchdoctor: IDE support for real-time auto-completion of refactorings. In Proceedings of the 34th International Conference on Software Engineering, ICSE'12, pages 222--232, 2012.
[11]
M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.
[12]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[13]
X. Ge, Q. L. DuBose, and E. R. Murphy-Hill. Reconciling manual and automatic refactoring. In Proceedings of the 34th International Conference on Software Engineering, ICSE'12, pages 211--221, 2012.
[14]
B. Goetz, T. Peierls, J. Bloch, J. Bowbeer, D. Holmes, and D. Lea. Java Concurrency in Practice. Addison-Wesley, 2006.
[15]
C. L. Goues, T. Nguyen, S. Forrest, and W. Weimer. GenProg: A Generic Method for Automatic Software Repair. IEEE Trans. Software Eng., 38(1):54--72, 2012.
[16]
R. Holmes, R. J. Walker, and G. C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Trans. Softw. Eng., 32(12):952--970, Dec. 2006.
[17]
Y. Ke, K. T. Stolee, C. L. Goues, and Y. Brun. Repairing programs with semantic code search (t). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), ASE '15, pages 295--306. IEEE Computer Society, 2015.
[18]
J. Kerievsky. Effective Java (2nd Edition). Addison-Wesley, 2008.
[19]
D. Kim, J. Nam, J. Song, and S. Kim. Automatic patch generation learned from human-written patches. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 802--811. IEEE Press, 2013.
[20]
Y. Lin and D. Dig. Check-then-act misuse of Java concurrent collections. In Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, ICST '13, pages 164--173. IEEE Computer Society, 2013.
[21]
Y. Lin, S. Okur, and D. Dig. Study and refactoring of android asynchronous programming (T). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE'15, pages 224--235, 2015.
[22]
N. Meng, M. Kim, and K. S. McKinley. Sydit: Creating and applying a program transformation from an example. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE '11, pages 440--443. ACM, 2011.
[23]
N. Meng, M. Kim, and K. S. McKinley. LASE: locating and applying systematic edits by learning from examples. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 502--511. IEEE Press, 2013.
[24]
E. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan. The design space of bug fixes and how developers navigate it. IEEE Transactions on Software Engineering, 41(1):65--81, Jan 2015.
[25]
S. Negara, M. Codoban, D. Dig, and R. E. Johnson. Mining fine-grained code changes to detect unknown change patterns. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 803--813. ACM, 2014.
[26]
A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig. API Code Recommendation Using Statistical Learning from Fine-grained Changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 511--522. ACM, 2016.
[27]
A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, A. Tamrawi, H. V. Nguyen, J. Al-Kofahi, and T. N. Nguyen. Graph-based pattern-oriented, context-sensitive source code completion. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 69--79. IEEE Press, 2012.
[28]
H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan. A study of repetitiveness of code changes in software evolution. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE'13, pages 180--190. IEEE Press, 2013.
[29]
H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. Al-Kofahi, and T. N. Nguyen. Clone management for evolving software. IEEE Trans. Softw. Eng., 38(5):1008--1026, Sept. 2012.
[30]
H. A. Nguyen, T. T. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Accurate and efficient structural characteristic feature extraction for clone detection. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, FASE '09, pages 440--455. Springer-Verlag, 2009.
[31]
H. A. Nguyen, T. T. Nguyen, G. Wilson, Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen. A graph-based approach to API usage adaptation. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA'10, pages 302--321. ACM, 2010.
[32]
T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi, and T. N. Nguyen. Recurring bug fixes in object-oriented programs. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE '10, pages 315--324. ACM, 2010.
[33]
T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE '09, pages 383--392. ACM, 2009.
[34]
S. Okur, D. L. Hartveld, D. Dig, and A. v. Deursen. A study and toolkit for asynchronous programming in c#. In Proceedings of the 36th International Conference on Software Engineering, ICSE'14, pages 1117--1127. ACM, 2014.
[35]
H. Osman, M. Lungu, and O. Nierstrasz. Mining frequent bug-fix code changes. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), pages 343--347, Feb 2014.
[36]
B. Ray, V. Hellendoorn, S. Godhane, Z. Tu, A. Bacchelli, and P. Devanbu. On the "naturalness" of buggy code. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, pages 428--439. ACM, 2016.
[37]
B. Ray and M. Kim. A case study of cross-system porting in forked projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 53:1--53:11. ACM, 2012.
[38]
B. Ray, C. Wiley, and M. Kim. REPERTOIRE: a cross-system porting analysis tool for forked software projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 8:1--8:4. ACM, 2012.
[39]
R. Robbes and M. Lanza. How program history can improve code completion. In ASE '08, pages 317--326. IEEE CS, 2008.
[40]
T. Rolfsnes, L. Moonen, S. D. Alesio, R. Behjati, and D. W. Binkley. Improving change recommendation using aggregated association rules. In MSR '16: Proceedings of the 2016 international conference on Mining software repositories. ACM, 2016.
[41]
R. Rolim, G. Soares, L. D'Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. Learning syntactic program transformations from examples. In Proceedings of the 39th International Conference on Software Engineering, ICSE '17, pages 404--415. IEEE Press, 2017.
[42]
http://wiki.c2.com/7RuleOfThree.
[43]
D. Silva, N. Tsantalis, and M. T. Valente. Why we refactor? confessions of github contributors. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 858--870. ACM, 2016.
[44]
G. Uddin, B. Dagenais, and M. P. Robillard. Analyzing temporal API usage patterns. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, ASE '11, pages 456--459. IEEE Computer Society, 2011.
[45]
A. T. T. Ying, G. C. Murphy, R. Ng, and M. C. Chu-Carroll. Predicting source code changes by mining change history. IEEE Trans. Softw. Eng., 30(9):574--586, Sept. 2004.
[46]
T. Zhang and M. Kim. Automated transplantation and differential testing for clones. In Proceedings of the 39th International Conference on Software Engineering, ICSE '17, pages 665--676. IEEE Press, 2017.
[47]
H. Zhong, S. Thummalapenta, T. Xie, L. Zhang, and Q. Wang. Mining API mapping for language migration. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE'10, pages 195--204. ACM, 2010.
[48]
T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In Proceedings of the 26th International Conference on Software Engineering, ICSE '04, pages 563--572. IEEE Computer Society, 2004.

Cited By

View all
  • (2024)Lightweight Syntactic API Usage Analysis with UCovProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644415(426-437)Online publication date: 15-Apr-2024
  • (2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
  • (2024)Compiler-directed Migrating API Callsite of Client CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639084(1-12)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering
May 2019
1318 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Author Tags

  1. graph mining
  2. semantic change pattern mining

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Lightweight Syntactic API Usage Analysis with UCovProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644415(426-437)Online publication date: 15-Apr-2024
  • (2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
  • (2024)Compiler-directed Migrating API Callsite of Client CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639084(1-12)Online publication date: 20-May-2024
  • (2024)Revealing code change propagation channels by evolution history miningJournal of Systems and Software10.1016/j.jss.2023.111912208:COnline publication date: 1-Feb-2024
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2023)Client-Specific Upgrade Compatibility Checking via Knowledge-Guided DiscoveryACM Transactions on Software Engineering and Methodology10.1145/358256932:4(1-31)Online publication date: 26-May-2023
  • (2023)Views on Edits to Variational SoftwareProceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A10.1145/3579027.3608985(141-152)Online publication date: 28-Aug-2023
  • (2023)PyEvolve: Automating Frequent Code Changes in Python ML SystemsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00091(995-1007)Online publication date: 14-May-2023
  • (2022)Untangling Composite Commits by Attributed Graph ClusteringProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545267(117-126)Online publication date: 11-Jun-2022
  • (2022)Smelly variables in ansible infrastructure codeProceedings of the 19th International Conference on Mining Software Repositories10.1145/3524842.3527964(61-72)Online publication date: 23-May-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media