Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1137983.1138002acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Constructing universal version history

Published: 22 May 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Developers often copy code for parts or entire products to start a new product or a new release. In order to understand the software change history and to determine the code authorship, we propose to construct a universal version history from multiple version control repositories. To that end we create two practical code copy detection methods at the level of the source code file: prefix-postfix algorithm and prefix algorithm. The full pathname of a file and its version history are used to construct the universal version history of a file by linking together change histories of files that had the same code at any point in the past. The assumption of both algorithms is that developers often duplicate files by copying entire directories. Once the copying is identified we propose an algorithm to link version histories from multiple repositories in order to construct universal version history. The results show that about 41.32% of source files (in the repository involving more than 6M versions of around 2M files) were duplicated among the Avaya's source code repositories for more than ten different projects. The prefix-postfix algorithm is more suitable than prefix algorithm due to the reasonable error rates after validation of the known copying behaviors.

    References

    [1]
    Brenda Baker. On finding duplication and near duplication in large software system, IEEE Working Conference on Reverse Engineering 1995.
    [2]
    B. Lague, D. Proulx, E. Merlo, J. Maryland, J. Hudepohl, Assessing the benefits of incorporating function clone detection in a development process, IEEE International Conference on Software Maintenance 1997.
    [3]
    Akito Monden, Daikai Nakae, Toshihiro Kamiya, Shin-ichi Sato and Ken-ichi Matsumoto. Software quality analysis by code clones in industrial legacy software, Proceedings of the 8th International Symposium on Software Metrics 2002.
    [4]
    Ira Baxter, Andrew Yahin, Leonardo Moura, Marcelo SantAnna and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings of the 8th International Symposium on Software Metrics 1998.
    [5]
    S. Ducasse, M. Rieger, and S. Demeyer. A language independent approach for detecting duplicated code. International Conference on Software Maintenance 1999.
    [6]
    T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Engineering, Vol. 28, No.7, 2002.
    [7]
    Cory Kapser and Michael W. Godfrey. Improved tool support for the investigation of duplication in software. International Conference on Software Maintenance 2005.

    Cited By

    View all
    • (2023)Applying the Universal Version History Concept to Help De-Risk Copy-Based Code Reuse2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00012(1-12)Online publication date: 2-Oct-2023
    • (2020)A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared CommitsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387499(513-517)Online publication date: 29-Jun-2020
    • (2019)A Methodology for Measuring FLOSS EcosystemsTowards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability10.1007/978-981-13-7099-1_1(1-29)Online publication date: 6-Jul-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '06: Proceedings of the 2006 international workshop on Mining software repositories
    May 2006
    191 pages
    ISBN:1595933972
    DOI:10.1145/1137983
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 May 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. change history
    2. cloning
    3. code authorship
    4. code copying
    5. version control

    Qualifiers

    • Article

    Conference

    ICSE06
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Applying the Universal Version History Concept to Help De-Risk Copy-Based Code Reuse2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00012(1-12)Online publication date: 2-Oct-2023
    • (2020)A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared CommitsProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387499(513-517)Online publication date: 29-Jun-2020
    • (2019)A Methodology for Measuring FLOSS EcosystemsTowards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability10.1007/978-981-13-7099-1_1(1-29)Online publication date: 6-Jul-2019
    • (2013)Risky files: an approach to focus quality improvement effortProceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering10.1145/2491411.2494572(691-694)Online publication date: 18-Aug-2013
    • (2009)Amassing and indexing a large sample of version control systemsProceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories10.1109/MSR.2009.5069476(11-20)Online publication date: 16-May-2009
    • (2008)Evaluation of source code copy detection methods on freebsdProceedings of the 2008 international working conference on Mining software repositories10.1145/1370750.1370766(61-66)Online publication date: 10-May-2008
    • (2007)Large-Scale Code Reuse in Open Source SoftwareProceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development10.1109/FLOSS.2007.10Online publication date: 20-May-2007

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media