Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1985404.1985411acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automated type-3 clone oracle using Levenshtein metric

Published: 23 May 2011 Publication History

Abstract

Clone detection techniques quality and performance evaluation require a system along with its clone oracle, that is a reference database of all accepted clones in the investigated system. Many challenges, including finding an adequate clone definition and scalability to industrial size systems, must be overcome to create good oracles. This paper presents an original method to construct clone oracles based on the Levenshtein metric. Although other oracles exist, this is the largest known oracle for type-3 clones that was created by an automated process on massive data sets. The method behind the creation of the oracle as well as actual oracles characteristics are presented. Discussion of the results in relation to other ways of building oracles is also provided along with future research possibilities.

References

[1]
Eclipse. http://www.eclipse.org.
[2]
Tomcat. http://tomcat.apache.org.
[3]
R. Al-Ekram, C. Kapser, R. Holt, and M. Godfrey. Cloning by accident: An empirical study of source code cloning across software systems. In International Symposium on Empirical Software Engineering, 2005.
[4]
G. Antoniol, U. Villano, E. Merlo, and M. D. Penta. Analyzing clone evolution in the linux kernel. Information and Software Technology, pages 755--765, 2002.
[5]
B. Baker. Finding clones with dup: Analysis of an experiment. IEEE Transactions on Software Engineering - IEEE Computer Society Press, 2007.
[6]
M. Balazinska, E. Merlo, M. Dagenais, B. Laguë, and K. Kontogiannis. Advanced clone-analysis as a basis for object-oriented system refactoring. In Proc. Working Conference on Reverse Engineering (WCRE) pages 98--107. IEEE Computer Society Press, 2000.
[7]
H. Basit, S. Pugliesi, W. Smyth, A. Turpin, and S. Jarzabek. Efficient token based clone detection with flexible tokenization. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2007.
[8]
I. Baxter, A. Yahin, l. Moura, M. Sant'Anna, and L. Bier. Clone detection using abstract syntax trees. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press, pages 368--377, 1998.
[9]
S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering - IEEE Computer Society Press, 33(9):577--591, 2007.
[10]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of 23rd International Conference on Very Large Data Bases, pages 426--435. Morgan Kaufmann Publishers, 1997.
[11]
E. Duala-Ekoko and M. Robillard. Tracking code clones in evolving software. In Proceedings of the International Conference on Software Engineering. IEEE Computer Society Press, 2007.
[12]
S. Ducasse, O. Nierstrasz, and M. Rieger. On theeffectiveness of clone detection by string matching. International Journal on Software Maintenance and Evolution: Research and Practice - Wiley InterScience, (18):37--58, 2006.
[13]
N. Göde and R. Koschke. Incremental clone detection. In Proceedings of the 2009 European Conference on Software Maintenance and Reengineering, pages 219--228. IEEE Computer Society, 2009.
[14]
J. Guo and Y. Zou. Detecting clones in business applications. In Proceedings of the Working Conference on Reverse Engineering, 2008.
[15]
B. Hummel, E. Juergens, L. Heinemann, and M. Conradt. Index-based code clone detection: incremental, distributed, scalable. Software Maintenance, IEEE International Conference on, 0:1--9, 2010.
[16]
Z. Jiang and A. Hassan. A framework for studying clones in large software systems. In Workshop on Source Code Analysis and Manipulation, 2007.
[17]
E. Juergens, F. Deissenboeck, and B. Hummel. Clone detective - a workbench for clone detection research. In Proceedings of the International Conference on Software Engineering, pages 603--606. IEEE Computer Society Press, 2009.
[18]
T. Kamiya. Variation analysis of context-sharing identifiers with code clone. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press. IEEE Computer Society Press, 2008.
[19]
T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multi-linguistic token-based code clone detection system for large scale source code. volume 28, pages 654--670. IEEE Computer Society Press, 2002.
[20]
M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2005.
[21]
T. Lavoie, M. Eilers-Smith, and E. Merlo. Challenging cloning related problems with GPU-based algorithms. In IWSC10 Proceedings of the 4th International Workshop on Software Clones, pages 25--32, 2010.
[22]
A. Marcus and J. I. Maletic. Identification of high-level concept clones in source code. In ASE '01: Proceedings of the 16th IEEE International Conference on Automated Software Engineering, page 107, Washington, DC, USA, 2001. IEEE Computer Society.
[23]
J. Mayrand, C. Leblanc, and E. Merlo. Experiment on the automatic detection of function clones in a software system using metrics. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press, pages 244--253, Monterey, CA, Nov 1996.
[24]
T. Mende, F. Beckerwert, R. Koschke, and G. Meier. Supporting the grow-and-prune model in software product lines evolution using clone detection. In CSMR '08 Proceedings of the 2008 12th European Conference on Software Maintenance and Reengineering - IEEE Computer Society Press, pages 163--172, 2008.
[25]
T. Mende, R. Koschke, and F. Beckwermert. An evaluation of code similarity identification for the grow-and-prune model. Journal of Software Maintenance and Evolution, 21(2):143--169, march-april 2009.
[26]
E. Merlo, G. Antoniol, M. D. Penta, and F. Rollo. Linear complexity object-oriented similarity for clone detection and software evolution analysis. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press, pages 412--416. IEEE Computer Society Press, 2004.
[27]
C. Roy and J. Cordy. A survey on software clone detection research. Technical Report Technical Report 2007--541, School of Computing, Queen's University, November 2007.
[28]
C. Roy and J. Cordy. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In International Conference on Program Comprehension, pages 172--181. IEEE Computer Society Press, 2008.
[29]
C. Roy, J. Cordy, and R. Koschke. Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. 74(7):470--495, may 2009.
[30]
C. K. Roy and J. R. Cordy. A mutation / injection-based automatic framework for evaluating clone detection tools. In ICSTW09 International Conference on Software Testing, Verification and Validation Workshops, pages 157--166, 2009.
[31]
R. Tiarks, R. Koschke, and R. Falke. An assessment of type-3 clones as detected by state-of-the-art tools. In Workshop on Source Code Analysis and Manipulation, pages 67--76. IEEE Computer Society Press, 2009.

Cited By

View all
  • (2023)An Empirical Comparison on the Results of Different Clone Detection Setups for C-based Projects2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00012(74-86)Online publication date: May-2023
  • (2022)Phishing Kits Source Code Similarity Distribution: A Case Study2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00116(983-994)Online publication date: Mar-2022
  • (2021)BigCloneBenchCode Clone Analysis10.1007/978-981-16-1927-4_7(93-105)Online publication date: 4-Aug-2021
  • Show More Cited By

Index Terms

  1. Automated type-3 clone oracle using Levenshtein metric

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IWSC '11: Proceedings of the 5th International Workshop on Software Clones
    May 2011
    92 pages
    ISBN:9781450305884
    DOI:10.1145/1985404
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 May 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clone benchmark
    2. clone detection
    3. software clones
    4. type-3 clones

    Qualifiers

    • Research-article

    Conference

    ICSE11
    Sponsor:
    ICSE11: International Conference on Software Engineering
    May 23, 2011
    HI, Waikiki, Honolulu, USA

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An Empirical Comparison on the Results of Different Clone Detection Setups for C-based Projects2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00012(74-86)Online publication date: May-2023
    • (2022)Phishing Kits Source Code Similarity Distribution: A Case Study2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00116(983-994)Online publication date: Mar-2022
    • (2021)BigCloneBenchCode Clone Analysis10.1007/978-981-16-1927-4_7(93-105)Online publication date: 4-Aug-2021
    • (2020)Oracles of Bad SmellsProceedings of the XXXIV Brazilian Symposium on Software Engineering10.1145/3422392.3422415(62-71)Online publication date: 21-Oct-2020
    • (2019)The Mutation and Injection Framework: Evaluating Clone Detection Tools with Mutation AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2019.2912962(1-1)Online publication date: 2019
    • (2019)Granularity-Based Assessment of Similarity Between Short Text StringsProceedings of the Third International Conference on Microelectronics, Computing and Communication Systems10.1007/978-981-13-7091-5_9(91-107)Online publication date: 24-May-2019
    • (2018)Benchmarks for software clone detection: A ten-year retrospective2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER.2018.8330194(26-37)Online publication date: Mar-2018
    • (2018)Assessing lexical similarity between short sentences of source code based on granularityInternational Journal of Information Technology10.1007/s41870-018-0213-1Online publication date: 1-Aug-2018
    • (2017)Schemes for Labeling Semantic Code Clones using Machine Learning2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA.2017.00-25(981-985)Online publication date: Dec-2017
    • (2015)Examining the effectiveness of using concolic analysis to detect code clonesProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695929(1610-1615)Online publication date: 13-Apr-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media