Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1985404.1985410acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Is cloned code older than non-cloned code?

Published: 23 May 2011 Publication History

Abstract

It is still a debated question whether cloned code causes increased maintenance efforts. If cloned code is more stable than non-cloned code, i.e. it is changed less often, it will require less maintenance efforts. The more stable cloned code is, the longer it will not have been changed, so the stability can be estimated through the code's age. This paper presents a study on the average age of cloned code. For three large open source systems, the age of every line of source code is computed as the date of the last change in that line. In addition, every line is categorized whether it belongs to cloned code as detected by a clone detector. The study shows that on average, cloned code is older than non-cloned code. Moreover, if a file has cloned code, the average age of the cloned code of the file is lower than the average age of the non-cloned code in the same file. The results support the previous findings that cloned code is more stable than non-cloned code.

References

[1]
L. Aversano, L. Cerulo, and M. D. Penta. How clones are maintained: An empirical study. In 11th European Conference on Software Maintenance and Reengineering (CSMR), 2007.
[2]
N. Bettenburg, W. Shang, W. Ibrahim, B. Adams, Y. Zou, and A. E. Hassan. An empirical study on inconsistent changes to code clones at release level. In Reverse Engineering, Working Conference on, pages 85--94, Los Alamitos, CA, USA, 2009. IEEE Computer Society.
[3]
J. Cordy. Comprehending reality -- practical barriers to industrial adoption of software maintenance automation. In 11th IEEE International Workshop on Program Comprehension, pages 196--205, 2003.
[4]
R. Geiger, B. Fluri, H. C. Gall, and M. Pinzger. Relation of code clones and change couplings. In 9th International Conference of Funtamental Approaches to Software Engineering (FASE), number 3922 in LNCS, pages 411--425. Springer, Mar. 2006.
[5]
N. Göde. Evolution of type-1 clones. In Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, pages 77--86. IEEE Computer Society, 2009.
[6]
N. Göde and J. Harder. Clone stability. In Proceedings of the 15th European Conference on Software Maintenance and Reengineering, 2011.
[7]
N. Göde and R. Koschke. Studying clone evolution using incremental clone detection. Journal of Software Maintenance and Evolution: Research and Practice, 2010.
[8]
K. Hotta, Y. Sano, Y. Higo, and S. Kusumoto. Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software. In Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), IWPSE-EVOL '10, pages 73--82, New York, NY, USA, 2010. ACM.
[9]
Y. Jia, D. Binkley, M. Harman, J. Krinke, and M. Matsushita. KClone: a proposed approach to fast precise code clone detection. In Third International Workshop on Detection of Software Clones (IWSC), 2009.
[10]
E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In Proceedings of the 31st International Conference on Software Engineering, ICSE '09, pages 485--495, Washington, DC, USA, 2009. IEEE Computer Society.
[11]
C. Kapser and M. W. Godfrey. Cloning considered harmful considered harmful. In 13th Working Conference on Reverse Engineering (WCRE), pages 19--28, 2006.
[12]
M. Kim, V. Sazawal, and D. Notkin. An empirical study of code clone genealogies. In Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC/FSE), pages 187--196, 2005.
[13]
J. Krinke. A study of consistent and inconsistent changes to code clones. In 14th Working Conference on Reverse Engineering (WCRE), Oct. 2007.
[14]
J. Krinke. Is cloned code more stable than non-cloned code? In Eighth IEEE International Working Conference on Source Code Analysis and Manipulation, pages 57--66. IEEE Computer Society, September 2008.
[15]
J. Krinke, N. Gold, Y. Jia, and D. Binkley. Cloning and copying between gnome projects. In 7th IEEE Working Conference on Mining Software Repositories, may 2010.
[16]
J. Krinke, N. Gold, Y. Jia, and D. Binkley. Distinguishing copies from originals in software clones. In International Workshop on Software Clones, May 2010.
[17]
A. Lozano and M. Wermelinger. Tracking clones' imprint. In Proceedings of the 4th International Workshop on Software Clones, IWSC '10, pages 65--72, New York, NY, USA, 2010. ACM.
[18]
A. Lozano, M. Wermelinger, and B. Nuseibeh. Evaluating the harmfulness of cloning: A change based experiment. In Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR '07, pages 18--, Washington, DC, USA, 2007. IEEE Computer Society.
[19]
A. Lozano Rodriguez. Assessing the effect of source code characteristics on changeability. PhD thesis, The Open University, 2009.
[20]
A. Monden, D. Nakae, T. Kamiya, S. ichi Sato, and K. ichi Matsumoto. Software quality analysis by code clones in industrial legacy software. In Eighth IEEE International Symposium on Software Metrics (METRICS'02), 2002.
[21]
R. K. Saha, M. Asaduzzaman, M. F. Zibran, C. K. Roy, and K. A. Schneider. Evaluating code clone genealogies at release level: An empirical study. In Source Code Analysis and Manipulation, IEEE International Workshop on, pages 87--96, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[22]
G. M. Selim, L. Barbour, W. Shang, B. Adams, A. E. Hassan, and Y. Zou. Studying the impact of clones on software defects. In Reverse Engineering, Working Conference on, pages 13--21, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[23]
S. Thummalapenta, L. Cerulo, L. Aversano, and M. Di Penta. An empirical study on the maintenance of source code clones. Empirical Software Engineering, March 2009.

Cited By

View all
  • (2023)A Comparative Study of Code Clone Genealogies in Test Code and Production Code2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00110(913-920)Online publication date: Mar-2023
  • (2022)Exploring and understanding cross-service code clones in microservice projectsProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527925(449-459)Online publication date: 16-May-2022
  • (2022)Clones in deep learning code: what, where, and why?Empirical Software Engineering10.1007/s10664-021-10099-x27:4Online publication date: 1-Jul-2022
  • Show More Cited By

Recommendations

Reviews

Brian D. Goodman

One method of reducing software development risk is through the reuse of proven code-typically, source code that a developer or team has used before that is formally an asset or trusted when copied and pasted to perform in a stable manner. This is a common practice since there is a lot of duplicate capability from system to system. Krinke's study looks at this phenomenon and tries to understand whether cloned code complicates software over time. If you are part of a development team and have ever wondered if code cloning is risky, this paper is worth reading. Krinke presents the approaches of empirical studies that have come before him, and differentiates his approach by leveraging the source code repositories of three open-source projects (ArgoUML, JBoss, and jEdit). Most source code repositories, such as the concurrent versions system (CVS) and Subversion, track changes line by line; the premise is that if cloned code changes less often than noncloned code, it would seem that the changes in the overall system are occurring in the newly authored code. Krinke's initial findings show that "cloned code is usually older than non-cloned code" and "the cloned code in a file is usually [81 percent] older than the non-cloned code in the same file." In the projects evaluated, changes over time were not focused on the cloned code and thus it appears to be the more stable overall. The study's findings are interesting because it is often thought that introducing new code or code not authored by the acting development team ("not invented here") introduces risk. This may still be true, but it is not reflected in changes tracked at the code level. There are, however, some challenges with this research. For example, how clone detection is implemented can radically impact what is recorded (false positive or negative). Formatting can often be recorded as a change, but the change itself has no impact on the code itself. The inclusion of test code would likely increase the amount of cloned code detected since it is often auto-generated. More interesting is Krinke's proposal that developers avoid modifying cloned code when it's being leveraged. One of the reasons the code was lifted in the first place was because it did something well known to the exploiter. If the developers change it, the trust that it should operate the same way may be compromised. This behavior may affect the results of this study. For additional details on these issues and reviews of related work be sure to read the whole paper. Otherwise, be on the lookout for a follow-up study that addresses some of the potential shortcomings. As we better understand what happens in practice, more faith and clarity can be brought to bear in future executions. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IWSC '11: Proceedings of the 5th International Workshop on Software Clones
May 2011
92 pages
ISBN:9781450305884
DOI:10.1145/1985404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clone detection
  2. mining software archives
  3. software evolution

Qualifiers

  • Research-article

Conference

ICSE11
Sponsor:
ICSE11: International Conference on Software Engineering
May 23, 2011
HI, Waikiki, Honolulu, USA

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Comparative Study of Code Clone Genealogies in Test Code and Production Code2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00110(913-920)Online publication date: Mar-2023
  • (2022)Exploring and understanding cross-service code clones in microservice projectsProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527925(449-459)Online publication date: 16-May-2022
  • (2022)Clones in deep learning code: what, where, and why?Empirical Software Engineering10.1007/s10664-021-10099-x27:4Online publication date: 1-Jul-2022
  • (2022)Semantic Clone Detection via Probabilistic Software ModelingFundamental Approaches to Software Engineering10.1007/978-3-030-99429-7_16(288-309)Online publication date: 29-Mar-2022
  • (2021)Stability in Software Engineering: Survey of the State-of-the-Art and Research DirectionsIEEE Transactions on Software Engineering10.1109/TSE.2019.292561647:7(1468-1510)Online publication date: 1-Jul-2021
  • (2021)Toxic Code Snippets on Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2019.290030747:3(560-581)Online publication date: 1-Mar-2021
  • (2021)ID-correspondence: a measure for detecting evolutionary couplingEmpirical Software Engineering10.1007/s10664-020-09921-926:1Online publication date: 12-Jan-2021
  • (2021)A Summary on the Stability of Code Clones and Current Research TrendsCode Clone Analysis10.1007/978-981-16-1927-4_12(169-180)Online publication date: 4-Aug-2021
  • (2020)Investigating Near-Miss Micro-Clones in Evolving SoftwareProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389262(208-218)Online publication date: 13-Jul-2020
  • (2020)Is Static Analysis Able to Identify Unnecessary Source Code?ACM Transactions on Software Engineering and Methodology10.1145/336826729:1(1-23)Online publication date: 30-Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media