Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Finding Clones with Dup: Analysis of an Experiment

Published: 01 September 2007 Publication History

Abstract

An experiment was carried out by a group of scientists to compare different tools and techniques for detecting duplicated or near-duplicated source code. The overall comparative results are presented elsewhere. This paper takes a closer lookat the results for one tool, Dup, which finds code sections that are textually the same or the same except for systematic substitution of parameters such as identifiers and constants. Varous factors that influenced the results are identified and their impact on the results is assessed via rerunning Dup with changed options and modifications. These improve the performance of Dup with regard to the experiment, and could be incorporated into a postprocessor to be used with other tools.

References

[1]
B.S. Baker, “A Theory of Parameterized Pattern Matching: Algorithms and Applications,” Proc. 25th ACM Symp. Theory of Computing, pp. 71-80, May 1993.
[2]
B.S. Baker, “On Finding Duplication and Near-Duplication in Large Software Systems,” Proc. Second IEEE Working Conf. Reverse Eng., pp. 86-95, July 1995.
[3]
B.S. Baker, “Parameterized Pattern Matching: Algorithms and Applications,” J. Computer and System Sciences, vol. 52, no. 1, pp.28-42, Feb. 1996.
[4]
B.S. Baker, “Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance,” SIAM J. Computing, vol. 26, no. 5, pp. 1343-1362, Oct. 1997.
[5]
I. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier, “Clone Detection Using Abstract Syntax Trees,” Proc. Int'l Conf. Software Maintenance, pp. 368-377, 1998.
[6]
T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A Multi-Linguistic Token-Based Code Clone Detection System for Large-Scale Source Code,” IEEE Trans. Software Eng., vol. 28, no. 7, pp.654-670, July 2002.
[7]
J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. Eighth Working Conf. Reverse Eng. (WCRE '01), pp.301-309, 2001.
[8]
B. Lague, D. Proulx, J. Mayrand, E. Merlo, and J. Hudepohl, “Assessing the Benefits of Incorporating Function Clone Detection in a Development Process,” Proc. Int'l Conf. Software Maintenance, pp. 314-321, 1997.
[9]
J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics,” Proc. Int'l Conf. Software Maintenance, pp. 244-253, 1996.
[10]
S. Ducasse, M. Rieger, and S. Demeyer, “A Language Independent Approach for Detecting Duplicated Code,” Proc. Int'l Conf. Software Maintenance (ICSM '99), pp. 109-118, 1999.
[11]
S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Comparison and Evaluation of Clone Detection Tools,” IEEE Trans. Software Eng., to appear.
[12]
S. Bellon, “Detection of Software Clones,” http://www.bauhaus-stuttgart.de/clones, 2004.
[13]
E. McCreight, “A Space-Economical Suffix Tree Construction Algorithm,” J. ACM, vol. 23, no. 2, pp. 262-272, 1976.
[14]
M. Crochemore and W. Rytter, Jewels of Stringology. World Scientific, 2003.
[15]
A. Amir, M. Farach, and S. Muthukrishnan, “Alphabet Dependence in Parameterized Matching,” Information Processing Letters, vol. 49, pp. 111-115, 1994.
[16]
R.M. Idury and A.A. Schaffer, “Multiple Matching of Parameterized Patterns,” Proc. Fifth Ann. Symp. Combinatorial Pattern Matching (CPM '94), M. Crochemore and D. Gusfield, eds., pp.226-239, June 1994.
[17]
S.R. Kosaraju, “Faster Algorithms for the Construction of Parameterized Suffix Trees,” Proc. 36th Ann. Symp. Foundations of Computer Science (FOCS '95), pp. 631-639, 1995.
[18]
B. Kernighan, personal comm., 1991.

Cited By

View all
  • (2024)Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644420(125-136)Online publication date: 15-Apr-2024
  • (2022)Accurate and Language Agnostic Code Clone Detection by Measuring Edit Distance of ANTLR Parse TreeInternational Journal of Software Innovation10.4018/IJSI.29791510:1(1-22)Online publication date: 6-May-2022
  • (2019)Proactive clone recommendation system for extract method refactoringProceedings of the 3rd International Workshop on Refactoring10.1109/IWoR.2019.00020(67-70)Online publication date: 28-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 33, Issue 9
September 2007
65 pages

Publisher

IEEE Press

Publication History

Published: 01 September 2007

Author Tags

  1. Redundant code
  2. duplicated code
  3. softwareclones

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644420(125-136)Online publication date: 15-Apr-2024
  • (2022)Accurate and Language Agnostic Code Clone Detection by Measuring Edit Distance of ANTLR Parse TreeInternational Journal of Software Innovation10.4018/IJSI.29791510:1(1-22)Online publication date: 6-May-2022
  • (2019)Proactive clone recommendation system for extract method refactoringProceedings of the 3rd International Workshop on Refactoring10.1109/IWoR.2019.00020(67-70)Online publication date: 28-May-2019
  • (2018)CCAlignerProceedings of the 40th International Conference on Software Engineering10.1145/3180155.3180179(1066-1077)Online publication date: 27-May-2018
  • (2018)A Behavior-Based Framework for Assessing Product Line-AbilityAdvanced Information Systems Engineering10.1007/978-3-319-91563-0_35(571-586)Online publication date: 11-Jun-2018
  • (2015)Reusability based program clone detectionProceedings of the 16th International Conference on Computer Systems and Technologies10.1145/2812428.2812471(90-97)Online publication date: 25-Jun-2015
  • (2014)A replication and reproduction of code clone detection studiesProceedings of the Thirty-Seventh Australasian Computer Science Conference - Volume 14710.5555/2667473.2667486(105-114)Online publication date: 20-Jan-2014
  • (2014)Scalable detection of missed cross-function refactoringsProceedings of the 2014 International Symposium on Software Testing and Analysis10.1145/2610384.2610394(138-148)Online publication date: 21-Jul-2014
  • (2014)Detection of semantically similar codeFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-014-3430-18:6(996-1011)Online publication date: 1-Dec-2014
  • (2013)Large scale multi-language clone analysis in a telecommunication industrial settingProceedings of the 7th International Workshop on Software Clones10.5555/2662708.2662723(69-75)Online publication date: 19-May-2013
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media