Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1081706.1081733acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Detecting higher-level similarity patterns in programs

Published: 01 September 2005 Publication History

Abstract

Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in software could ease maintenance even further, and also help us identify reuse opportunities. We observed that recurring patterns of simple clones - so-called structural clones - often indicate the presence of interesting design-level similarities. An example would be patterns of collaborating classes or components. Finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non-trivial inferences based on the abstracted information. In this paper, we describe a practical solution to the problem of detecting some basic, but useful, types of design-level similarities such as groups of highly similar classes or files. First, we detect simple clones by applying conventional token-based techniques. Then we find the patterns of co-occurring clones in different files using the Frequent Itemset Mining (FIM) technique. Finally, we perform file clustering to detect those clusters of highly similar files that are likely to contribute to a design-level similarity pattern. The novelty of our approach is application of data mining techniques to detect design level similarities. Experiments confirmed that our method finds many useful structural clones and scales up to big programs. The paper describes our method for structural clone detection, a prototype tool called Clone Miner that implements the method and experimental results.

References

[1]
Abouelhoda, M.I., Kurtz, S., and Ohlebusch, E. The enhanced suffix array and its applications to genome analysis. In Proc. Workshop on Algorithms in Bioinformatics, in Lecture Notes in Computer Science, vol. 2452, Springer-Verlag, Berlin, 2002, pp. 449--463.]]
[2]
Abouelhoda, M. I., Ohlebusch, E., and Kurtz, S. Optimal Exact Strring Matching Based on Suffix Arrays. In Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pages .31--43. September 11-13, 2002.]]
[3]
ANTLR website at http://www.antlr.org]]
[4]
Basit, H. A., Rajapakse, D. C., and Jarzabek, S. Beyond Templates: a Study of Clones in the STL and Some General Implications. In Proceedings of the 28th Intl. Conf. on Software Engineering (ICSE'05)(to appear). 2005. Draft available at http://xvcl.comp.nus.edu.sg/xvcl_cases.php]]
[5]
Baker, B. S. On finding duplication and near-duplication in large software systems. In Proc. 2nd Working Conference on Reverse Engineering. 1995, pages 86--95.]]
[6]
Baker, B. S. Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance. SIAM Journal of Computing, October 1997.]]
[7]
Baxter, I., Yahin, A., Moura, L., and Anna, M. S. Clone detection using abstract syntax trees. In Proc. Intl. Conference on Software Maintenance (ICSM '98), pp. 368--377.]]
[8]
Biggerstaff, T.J. Design Recovery for Maintenance and Reuse. Computer 22(7), pp. 36--49, (July 1989).]]
[9]
Buss, E., Mori, R. D., Gentleman, W., Henshaw, J., Johnson, H., Kontogiannis, K., Merlo, E., Muller, H., Paul, J. M. S., Prakash, A., Stanley, M., Tilley, S., Troster, J., and Wong, K., "Investigating reverse engineering technologies for the CAS program understanding project", IBM Systems Journal, 33(3):477--500, 1994.]]
[10]
Case Study: eliminating redundant codes in the Buffer library. At XVCL Website, http://xvcl.comp.nus.edu.sg/xvcl/buffer/index.htm]]
[11]
Church, K. W. and Helfman, J. I. Dotplot: A program for exploring self-similarity in million of lines of text and code. Journal of Computational and Graphical Statistics, June 1993, 2(2):153--174.]]
[12]
Davey, N., Barson, P., Field, S., Frank, R., and Tansley, D. The development of a software clone detector. International Journal of Applied Software Technology, 1(3-4): 219--236, 1995.]]
[13]
Ducasse, S, Rieger, M., and Demeyer, S. A language independent approach for detecting duplicated code. In Proc. Intl. Conference on Software Maintenance (ICSM '99), pp. 109--118.]]
[14]
Fowler, M. Analysis patterns: reusable object models. Addison-Wesley, 1997.]]
[15]
Gamma, E., Helm, R., Johnson, R. and Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. Reading Mass., Addison Wesley, 1995.]]
[16]
Grahne, G., and Zhu, J., Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proceeding of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), Melbourne, FL, Nov 2003.]]
[17]
Han, J., and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001).]]
[18]
Jarzabek, S. and Shubiao, L. Eliminating Redundancies with a "Composition with Adaptation" Meta-programming Technique. In Proc. ESEC-FSE'03, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2003, Helsinki, pp. 237--246.]]
[19]
Java Technology at http://java.sun.com/]]
[20]
Johnson, J. H., "Identifying redundancy in source code using fingerprints," Proc. of the 1993 Conf. of the Centre for Advanced Studies on Collaborative research: software engineering (CASCON '93), pp 171--183.]]
[21]
Johnson, J. H. Substring Matching for Clone Detection and Change Tracking. In Proc. Intl. Conference on Software Maintenance (ICSM '94), pages 120--126.]]
[22]
Kamiya, T., Kusumoto, S, and Inoue, K. CCFinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Engineering, vol. 28 no. 7, July 2002, pp. 654--670.]]
[23]
Karkkainen, J., and Sanders, P. Simple linear work suffix array construction. In Proc. 30th Internat. Colloq. Automata, Languages & Programming (2003) 943--955.]]
[24]
Kasai, T., Lee, G., Arimura, H., Arikawa, S., and Park, K. Linear time longest common prefix computation in suffix arrays and its applications. CPM 2001, LNCS 2089.]]
[25]
Kennedy, A., and Syme, D. Design and implementation of generics for the .NET Common Language Runtime. In Cindy Norris and James B. Fenwick, Jr., editors, Proceedings of the ACM SIGPLAN '01 Conference on Programming Languages Design and Implementation (PLDI-01), pages 1--12, New York, June 2001. ACM Press. Appears as volume 35, number 5 of SIGPLAN Notices.]]
[26]
Kim, D.K., Sim, J.S., Park, H., and Park, K. Linear-time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 186--199.]]
[27]
Ko, P., and Aluru, S. Space efficient linear time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 200--210.]]
[28]
Kontogiannis, K.A., De Mori, R., Merlo, E., Galler, M., and Bernstein, M. Pattern Matching for Clone and Concept Detection. J. Automated Software Eng., vol. 3, pp. 770--108, 1996.]]
[29]
Komondoor, R., and Horwitz, S. Using slicing to identify duplication in source code. In Proc. 8th International Symposium on Static Analysis, 2001, pages 40--56.]]
[30]
Krinke, J. Identifying Similar Code with Program Dependence Graphs. In proceedings of the Eight Working Conference on Reverse Engineering, Stuttgart, Germany, October 2001, pp. 301--309.]]
[31]
Larsson, N.J., and Sadakane, K. Faster Suffix Sorting. Technical Report LU-CS-TR:99-214, Lund University (1999) 20 pp.]]
[32]
Manber, U., and Myers, G. Suffix arrays: a new method for on-line search. SIAM Journal of Computing, 22:935--48, 1993.]]
[33]
Mayrand J., Leblanc C., and Merlo E. Experiment on the automatic detection of function clones in a software system using metrics. In Proc. Intl. Conference on Software Maintenance (ICSM '96), pp. 244--254.]]
[34]
Morzy, T., Wojciechowski, M., and Zakrzewicz, M. Pattern-Oriented Hierarchical Clustering. Advances in Databases and Information Systems, Proceedings Third East European Conference, ADBIS'99, Maribor, Slovenia, 1999. Lecture Notes in Computer Science 1691, Springer Verlag, 1999.]]
[35]
Morzy, T., Wojciechowski, M., and Zakrzewicz, M. Web Users Clustering. In Proc. of the 15th International Symposium on Computer and Information Sciences, Istanbul, Turkey, 2000, pages 374--382.]]
[36]
Morzy, T., Wojciechowski, M., and Zakrzewicz, M. Scalable Hierarchical Clustering Method for Sequences of Categorical Values. In Knowledge Discovery and Data Mining - PAKDD 2001. In Proceedings 5th Pacific-Asia Conference, Hong Kong, China. April 16-18, 2001. Lecture Notes in Artificial Intelligence 2035, Springer Verlag, 2001.]]
[37]
Parnas, D. Software aging. In Proc. 16th International Conference on Software Engineering (ICSE 1994), pages 279--287.]]
[38]
Puglisi, S. J., Smyth, W. F., and Turpin, A. The performance of linear time suffix sorting algorithms. In Proc. Data Compression Conference 2005, to appear (2005).]]
[39]
Ryan, A. P. J., Smyth, W. F., Turpin, A., and Xiaoyang Y. New suffix array algorithms -- linear but not fast? In Proc. 15th Australasian Workshop on Combinatorial Algorithms, Seok-Hee Hong (ed.) (2004) 148--156.]]
[40]
Sadakane, K. A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation. In Proc. IEEE Data Compression Conference (1998) 129--138.]]
[41]
Somerville, I. Software Engineering, Addison-Wesley Publishing Co., New York (1998).]]
[42]
XVCL website at : http://xvcl.comp.nus.edu.sg/overview_brochure.php]]

Cited By

View all
  • (2022)Mining microservice design patternsProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563472(190-195)Online publication date: 7-Nov-2022
  • (2022)Deep learning application on code clone detectionJournal of Systems and Software10.1016/j.jss.2021.111141184:COnline publication date: 3-Jan-2022
  • (2021)A Study of Software Clone Detection Techniques for Better Software Maintenance and Reliability2021 International Conference on Computing Sciences (ICCS)10.1109/ICCS54944.2021.00056(249-253)Online publication date: Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
September 2005
402 pages
ISBN:1595930140
DOI:10.1145/1081706
  • cover image ACM SIGSOFT Software Engineering Notes
    ACM SIGSOFT Software Engineering Notes  Volume 30, Issue 5
    September 2005
    462 pages
    ISSN:0163-5948
    DOI:10.1145/1095430
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clone detection
  2. similarity patterns
  3. software clones

Qualifiers

  • Article

Conference

ESEC/FSE05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Mining microservice design patternsProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563472(190-195)Online publication date: 7-Nov-2022
  • (2022)Deep learning application on code clone detectionJournal of Systems and Software10.1016/j.jss.2021.111141184:COnline publication date: 3-Jan-2022
  • (2021)A Study of Software Clone Detection Techniques for Better Software Maintenance and Reliability2021 International Conference on Computing Sciences (ICCS)10.1109/ICCS54944.2021.00056(249-253)Online publication date: Dec-2021
  • (2020)Towards A Novel Conceptual Framework for Analyzing Code Clones to Assist in Software Development and Software Reuse2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS48265.2020.9120965(105-111)Online publication date: May-2020
  • (2019)Similarity Analysis of Control Software Using Graph Mining2019 IEEE 17th International Conference on Industrial Informatics (INDIN)10.1109/INDIN41052.2019.8972335(508-515)Online publication date: Jul-2019
  • (2018)Structural clones: An evolution perspective2018 IEEE 12th International Workshop on Software Clones (IWSC)10.1109/IWSC.2018.8327313(9-15)Online publication date: Mar-2018
  • (2017)Mining implicit design templates for actionable code reuseProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155615(394-404)Online publication date: 30-Oct-2017
  • (2017)Mining implicit design templates for actionable code reuse2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE.2017.8115652(394-404)Online publication date: Oct-2017
  • (2017)Managing Software Complexity with Power-GenericsTowards a Synergistic Combination of Research and Practice in Software Engineering10.1007/978-3-319-65208-5_3(31-48)Online publication date: 6-Aug-2017
  • (2016)Clone analysis and detection in android applications2016 3rd International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI.2016.7811010(520-525)Online publication date: Nov-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media