Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ITNG.2007.17guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Probabilistic Approach to Source Code Authorship Identification

Published: 02 April 2007 Publication History

Abstract

There exists a need for tools to help identify the authorship of source code. This includes situations in which the ownership of code is questionable, such as in plagiarism or intellectual property infringement disputes. Authorship identification can also be used to assist in the apprehension of the creators of malware. In this paper we present an approach to identifying the authors of source code. We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. We demonstrate our approach on a case study that involves two kinds of software: one based on open source developers working on various projects, and another based on students working on assignments with the same requirements. In our case study we are able to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches.

Cited By

View all
  • (2024)Enhancing Robustness of Code Authorship Attribution through Expert Feature KnowledgeProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652121(199-209)Online publication date: 11-Sep-2024
  • (2021)Authorship attribution of source code: a language-agnostic approach and applicability in software engineeringProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468606(932-944)Online publication date: 20-Aug-2021
  • (2019)Code Authorship AttributionACM Computing Surveys10.1145/329257752:1(1-36)Online publication date: 13-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ITNG '07: Proceedings of the International Conference on Information Technology
April 2007
1099 pages
ISBN:0769527760

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 April 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Robustness of Code Authorship Attribution through Expert Feature KnowledgeProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652121(199-209)Online publication date: 11-Sep-2024
  • (2021)Authorship attribution of source code: a language-agnostic approach and applicability in software engineeringProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468606(932-944)Online publication date: 20-Aug-2021
  • (2019)Code Authorship AttributionACM Computing Surveys10.1145/329257752:1(1-36)Online publication date: 13-Feb-2019
  • (2019)Adversarial Authorship Attribution in Open-Source ProjectsProceedings of the Ninth ACM Conference on Data and Application Security and Privacy10.1145/3292006.3300032(291-302)Online publication date: 13-Mar-2019
  • (2019)ASAPInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-019-00517-321:4(471-484)Online publication date: 1-Aug-2019
  • (2018)Integration of Static and Dynamic Code Stylometry Analysis for Programmer De-anonymizationProceedings of the 11th ACM Workshop on Artificial Intelligence and Security10.1145/3270101.3270110(74-84)Online publication date: 15-Oct-2018
  • (2018)Android authorship attribution through string analysisProceedings of the 13th International Conference on Availability, Reliability and Security10.1145/3230833.3230849(1-10)Online publication date: 27-Aug-2018
  • (2015)De-anonymizing programmers via code stylometryProceedings of the 24th USENIX Conference on Security Symposium10.5555/2831143.2831160(255-270)Online publication date: 12-Aug-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media