Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1188966.1188994dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
Article

On approximate matching of programs for protecting libre software

Published: 16 October 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Libre software licensing schemes are sometimes abused by companies or individuals. In order to encourage open source development it is necessary to build tools that can help in the rapid identification of open source licensing violations. This paper describes an attempt to build such tool. We introduce a framework for approximate matching of programs, and describe an implementation for Java byte-code programs. First, we statically analyze a program to remove dead code, simplify expressions and then extract slices which are generated from assignment statements. We then compare programs by matching between sets of slices based on a distance function. We demonstrate the effectiveness of our method by running experiments on programs generated from two compilers and transformed by two commercial grade control flow obfuscators. Our method achieves an acceptable level of precision.

    References

    [1]
    http://jmusic.ci.qut.edu.au/.]]
    [2]
    http://q-lang.sourceforge.net.]]
    [3]
    http://www.fsf.org/.]]
    [4]
    http://www.fsf.org/licensing/licenses/ and http://gpl-violations.org/.]]
    [5]
    http://www.gnu.org/software/binutils/.]]
    [6]
    http://www.leesw.com/smokescreen/.]]
    [7]
    http://www.zelix.com.]]
    [8]
    Brenda S. Baker and Udi Manber. Deducing similarities in java sources from byte-codes. In Proc. of Usenix Annual Technical Conf., pages 179--190, 1998.]]
    [9]
    Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. In Advances in Cryptology - CRYPTO, 2001.]]
    [10]
    Ira D. Baxter, Andrew Yahin, Leonardo M. De Moura, Marcelo Sant'Anna, and Lorraine Bier. Clone detection using abstract syntax trees. In ICSM, pages 368--377, 1998.]]
    [11]
    Wayne D. Blizard. Multiset theory. Notre Dame journal of formal logic, 30(1):36--66, 1989.]]
    [12]
    M. Christodorescu and S. Jha. Static analysis of executables to detect malicious patterns. 12th USENIX Security Symposium, pages 169--186, 2003.]]
    [13]
    Christian Collberg and Clark Thomborson. Software watermarking: Models and dynamic embeddings. In Principles of Programming Languages 1999, POPL'99, pages 311--324, 1999.]]
    [14]
    Christian Collberg, Clark Thomborson, and Douglas Low. Manufacturing cheap, resilient, and stealthy opaque constructs. In Principles of Programming Languages 1998, POPL'98, pages 184--196, 1998.]]
    [15]
    Christian S. Collberg, Clark D. Thomborson, and Douglas Low. Breaking abstractions and unstructuring data structures. In International Conference on Computer Languages, pages 28--38, 1998.]]
    [16]
    R Cytron, J Ferrante, BK Rosen, and MN Wegman. Efficiently computing static single assignment form and the control flow dependence graph. ACM Symposium on Principles of Programming Languages, 1989.]]
    [17]
    Edsger Wybe Dijkstra. A Discipline of Programming. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1997.]]
    [18]
    Oliver Gnther. Efficient computation of spatial joins. In Proceedings of the Ninth International Conference on Data Engineering, pages 50--59, Washington, DC, USA, 1993. IEEE Computer Society.]]
    [19]
    Susan Horwitz. Identifying the semantic and textual differences between two versions of a program. In PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, pages 234--245, New York, NY, USA, 1990. ACM Press.]]
    [20]
    Justin Pappas Johnson. Economics of open source software, 2000.]]
    [21]
    Asif Khalak. Economic model for impact of open source software, 2000.]]
    [22]
    Raghavan Komondoor and Susan Horwitz. Using slicing to identify duplication in source code. Lecture Notes in Computer Science, 2126:40--60, 2001.]]
    [23]
    C. Kruegel, W. Robertson, F. Valeur, and G. Vigna. Static disassembly of obfuscated binaries. pages 255--270, 2004.]]
    [24]
    William Landi. Undecidability of static analysis. ACM Lett. Program. Lang. Syst., 1(4):323--337, 1992.]]
    [25]
    Douglas Low. Java Control Flow Obfuscation. PhD thesis, University of Auckland, Auckland, New Zealand, 1998.]]
    [26]
    Wise M. YAP3: Improved detection of similarities in computer program and other texts. SIGCSEB: SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education), 28, 1996.]]
    [27]
    A. Monden, H. Iida, K. Matsumoto, Katsuro Inoue, and Koji Torii. A practical method for watermarking java programs. In compsac2000, 24th Computer Software and Applications Conference, 2000.]]
    [28]
    Eugene M. Myers. A precise interprocedural data flow algorithm. In POPL '81: Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 219--230, New York, NY, USA, 1981. ACM Press.]]
    [29]
    George C. Necula. Translation validation for an optimizing compiler. ACM Sigplan, pages 83--95, 2000.]]
    [30]
    A Pnueli, M. Siegel, and E. Singerman. Translation validation. TACAS'98, 1998.]]
    [31]
    L. Prechelt, G. Malpohl, and Michael Philippsen. Finding plagiarism among a set of programs with jplag. Journal of Universal Computer Science, 8(11):1016--1038, 2002.]]
    [32]
    Eric Steven Raymond. Homesteading the noosphere. In The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, pages 79--135. O'Reilly & Associates, 1999. Originally appeared online in 1998.]]
    [33]
    Kamiya T, Kusumoto S, and Inoue K. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. In IEEE Transactions on Software Engineering 28(7), pages 654--670, 2002.]]
    [34]
    Kuo-Chung Tai. The tree-to-tree correction problem. J. ACM, 26(3):422--433, 1979.]]
    [35]
    Haruaki Tamada, Masahide Nakamura, Akito Monden, and Ken ichi Matsumoto. Detecting the theft of programs using birthmarks. Information Science Technical Report, 2003.]]
    [36]
    Haruaki Tamada, Masahide Nakamura, Akito Monden, and Ken ichi Matsumoto. A method for extracting program finger-prints from java class files. The Institute of Electronics, Information and Communication Engineers Technical Report, ISEC2003-29:127--133, 2003.]]
    [37]
    Haruaki Tamada, Masahide Nakamura, Akito Monden, and Ken ichi Matsumoto. Design and evaluation of birthmarks for detecting theft of java programs. IASTED International Conference on Software Engineering (IASTED SE 2004), pages 569--575, 2004.]]
    [38]
    Haruaki TAMADA, Masahide NAKAMURA, Akito MONDEN, and Ken-ichi MATSUMOTO. Java Birthmarks - Detecting the Software Theft-. IEICE Trans Inf Syst, E88-D(9):2148--2158, 2005.]]
    [39]
    Haruaki Tamada, Keiji Okamoto, Masahide Nakamura, Akito Monden, and Ken ichi Matsumoto. Dynamic software birthmarks to detect the theft of windows applications. International Symposium on Future Software Technology 2004 (ISFST 2004), 2004.]]
    [40]
    F. Tip. A survey of program slicing techniques. Journal of programming languages, 3:121--189, 1995.]]
    [41]
    Helene Touzet. A linear tree edit distance algorithm for similar ordered trees. In Lecture Notes in Computer Science, volume 3537, pages 334--345, 2005.]]
    [42]
    Raja Vall-Rai, Laurie Hendren, Vijay Sundaresan, Patrick Lam, Etienne Gagnon, and Phong Co. Soot - a java optimization framework. CASCON99, 1999.]]
    [43]
    Raja Vallée-Rai, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. Optimizing Java bytecode using the Soot framework: Is it feasible? In Compiler Construction, 9th International Conference (CC 2000), pages 18--34, 2000.]]
    [44]
    Robert van Engelen, David Whalley, and Xin Yuan. Automatic validation of code-improving transformations on low-level program representations. Sci. Comput. Program., 52(1-3):257--280, 2004.]]
    [45]
    Mark Weiser. Program slicing. IEEE Transactions on Software Engineering, pages 352--357, 1984.]]
    [46]
    Rui Yang, Panos Kalnis, and Anthony K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD, pages 754--765, 2005.]]

    Cited By

    View all
    • (2010)On the configuration of the similarity search data structure d-index for high dimensional objectsProceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III10.1007/978-3-642-12179-1_37(443-457)Online publication date: 23-Mar-2010
    • (2009)Efficient Similarity Search by Reducing I/O with Compressed SketchesProceedings of the 2009 Second International Workshop on Similarity Search and Applications10.1109/SISAP.2009.22(30-38)Online publication date: 29-Aug-2009

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    CASCON '06: Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
    October 2006
    388 pages

    Sponsors

    • IBM Toronto Lab
    • CAS

    Publisher

    IBM Corp.

    United States

    Publication History

    Published: 16 October 2006

    Qualifiers

    • Article

    Acceptance Rates

    CASCON '06 Paper Acceptance Rate 24 of 90 submissions, 27%;
    Overall Acceptance Rate 24 of 90 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)On the configuration of the similarity search data structure d-index for high dimensional objectsProceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III10.1007/978-3-642-12179-1_37(443-457)Online publication date: 23-Mar-2010
    • (2009)Efficient Similarity Search by Reducing I/O with Compressed SketchesProceedings of the 2009 Second International Workshop on Similarity Search and Applications10.1109/SISAP.2009.22(30-38)Online publication date: 29-Aug-2009

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media