Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512235acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

HiddenCPG: Large-Scale Vulnerable Clone Detection Using Subgraph Isomorphism of Code Property Graphs

Published: 25 April 2022 Publication History

Abstract

A code property graph (CPG) is a joint representation of syntax, control flows, and data flows of a target application. Recent studies have demonstrated the promising efficacy of leveraging CPGs for the identification of vulnerabilities. It recasts the problem of implementing a specific static analysis for a target vulnerability as a graph query composition problem. It requires devising coarse-grained graph queries that model vulnerable code patterns. Unfortunately, such coarse-grained queries often leave vulnerabilities due to faulty input sanitization undetected. In this paper, we propose, a scalable system designed to identify various web vulnerabilities, including bugs that stem from incorrect sanitization. We designed to find a subgraph in a target CPG that matches a given CPG query having a known vulnerability, which is known as the subgraph isomorphism problem. To address the scalability challenge that stems from the NP-complete nature of this problem, leverages optimization techniques designed to boost the efficiency of matching vulnerable subgraphs. found confirmed vulnerabilities including CVEs among 2,464 potential vulnerabilities in real-world CPGs having a combined total of 1 billion nodes and 1.2 billion edges.

References

[1]
[n. d.]. Common Vulnerability Enumeration (CVE). https://cve.mitre.org.
[2]
[n. d.]. GitHub. https://github.com.
[3]
[n. d.]. Github PHP project. https://github.com/topics/php?o=desc&s=stars.
[4]
[n. d.]. GitHub REST API. https://docs.github.com/en/rest.
[5]
[n. d.]. GitHut: a small place to discover languages in GitHub. https://githut.info/.
[6]
[n. d.]. Joern. https://github.com/ShiftLeftSecurity/joern.
[7]
[n. d.]. Usage of server-side programming languages for websites. https://w3techs.com/technologies/overview/programming_language/all.
[8]
[n. d.]. VF2 Implement a (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. https://github.com/yaolili/VF2.
[9]
[n. d.]. Wikitten. https://github.com/devaneando/Wikitten.
[10]
Abeer Alhuzali, Birhanu Eshete, Rigel Gjomemo, and VN Venkatakrishnan. 2016. Chainsaw: Chained automated workflow-based exploit generation. In Proceedings of the ACM Conference on Computer and Communications Security. 641–652.
[11]
Abeer Alhuzali, Rigel Gjomemo, Birhanu Eshete, and VN Venkatakrishnan. 2018. NAVEX: precise and scalable exploit generation for dynamic web applications. In Proceedings of the USENIX Security Symposium. 377–392.
[12]
Michael Backes, Konrad Rieck, Malte Skoruppa, Ben Stock, and Fabian Yamaguchi. 2017. Efficient and Flexible Discovery of PHP Application Vulnerabilities. In Proceedings of the IEEE European Symposium on Security and Privacy. 334–349.
[13]
Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2008. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In Proceedings of the IEEE Symposium on Security and Privacy. 387–401.
[14]
Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna, and Lorraine Bier. 1998. Clone detection using abstract syntax trees. In Proceedings of the International Conference on Software Maintenance. 368–377.
[15]
Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and evaluation of clone detection tools. IEEE Transactions on Software Engineering 33, 9 (2007), 577–591.
[16]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. BinGO: Cross-architecture cross-os binary search. In Proceedings of the International Symposium on Foundations of Software Engineering. 678–689.
[17]
Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 10 (2004), 1367–1372.
[18]
Johannes Dahse and Jörg Schwenk. 2010. RIPS-A static source code analyser for vulnerabilities in PHP scripts. In Seminar Work (Seminer Çalismasi). Horst Görtz Institute Ruhr-University Bochum.
[19]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In Proceedings of the Network and Distributed System Security Symposium. 58–79.
[20]
Aurore Fass, Michael Backes, and Ben Stock. 2001. HideNoSeek: Camouflaging malicious javascript in benign asts. In Proceedings of the ACM Conference on Computer and Communications Security. 1899–1913.
[21]
Aurore Fass, Dolière Francis Somé, Michael Backes, and Ben Stock. 2021. DoubleX: Statically Detecting Vulnerable Data Flows in Browser Extensions at Scale. In Proceedings of the ACM Conference on Computer and Communications Security.
[22]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the ACM Conference on Computer and Communications Security. 480–491.
[23]
Jin Huang, Yu Li, Junjie Zhang, and Rui Dai. 2019. UChecker: Automatically Detecting PHP-Based Unrestricted File Upload Vulnerabilities. In Proceedings of the International Conference on Dependable Systems Networks. 581–592.
[24]
Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: finding unpatched code clones in entire os distributions. In Proceedings of the IEEE Symposium on Security and Privacy. 48–62.
[25]
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. DECKARD: Scalable and accurate tree-based detection of code clones. In Proceedings of the International Conference on Software Engineering. 96–105.
[26]
Martin Johns and Moritz Jodeit. 2011. Scanstud: a methodology for systematic, fine-grained evaluation of static analysis tools. In Proceedings of the International Conference on Software Testing, Verification and Validation Workshops. 523–530.
[27]
Nenad Jovanovic, Christopher Kruegel, and Engin Kirda. 2006. Pixy: a static analysis tool for detecting Web application vulnerabilities. In Proceedings of the IEEE Symposium on Security and Privacy. 258–263.
[28]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654–670.
[29]
Alexandros Kapravelos, Yan Shoshitaishvili, Marco Cova, Christopher Kruegel, and Giovanni Vigna. 2013. Revolver: An automated approach to the detection of evasive web-based malware. In Proceedings of the USENIX Security Symposium. 637–652.
[30]
Soheil Khodayari and Giancarlo Pellegrino. 2021. JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals. In Proceedings of the USENIX Security Symposium.
[31]
Miryung Kim, Vibha Sazawal, David Notkin, and Gail Murphy. 2005. An empirical study of code clone genealogies. In Proceedings of the ACM Special Interest Group on Software Engineering. 187–196.
[32]
Seulbae Kim and Heejo Lee. 2018. Software systems at risk: An empirical study of cloned vulnerabilities in practice. Computers & Security 77(2018), 720–736.
[33]
Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A scalable approach for vulnerable code clone discovery. In Proceedings of the IEEE Symposium on Security and Privacy. 595–614.
[34]
Raghavan Komondoor and Susan Horwitz. 2001. Using slicing to identify duplication in source code. In Proceedings of the International Static Analysis Symposium. 40–56.
[35]
Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 million flows later: large-scale detection of DOM-based XSS. In Proceedings of the ACM Conference on Computer and Communications Security. 1193–1204.
[36]
Jingyue Li and Michael D Ernst. 2012. CBCD: Cloned buggy code detector. In Proceedings of the International Conference on Software Engineering. 310–320.
[37]
Penghui Li and Wei Meng. 2021. LChecker: Detecting Loose Comparison Bugs in PHP. In Proceedings of the Web Conference. 2721–2732.
[38]
Penghui Li, Wei Meng, Kangjie Lu, and Changhua Luo. 2021. On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution. In Proceedings of the Web Conference. 58–69.
[39]
Song Li, Mingqing Kang, Jianwei Hou, and Yinzhi Cao. 2022. Mining Node.js Vulnerabilities via Object Dependence Graph and Query. In Proceedings of the USENIX Security Symposium.
[40]
Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2006. CP-Miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering 32, 3 (2006), 176–192.
[41]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: an automated vulnerability detection system based on code similarity analysis. In Proceedings of the ACM Conference on Computer and Communications Security. 201–213.
[42]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A deep learning-based system for vulnerability detection. In Proceedings of the Network and Distributed System Security Symposium.
[43]
Benjamin Livshits, Aditya V Nori, Sriram K Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the ACM Conference on Programming Language Design and Implementation. 75–86.
[44]
Heloise Maurel, Santiago Vidal, and Tamara Rezk. 2021. Statically Identifying XSS using Deep Learning. In Proceedings of the International Conference on Security and Cryptography.
[45]
Jean Mayrand, Claude Leblanc, and Ettore Merlo. 1996. Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics. In Proceedings of the International Conference on Software Maintenance. 244.
[46]
Manishankar Mondal, Chanchal K Roy, and Kevin A Schneider. 2017. Bug propagation through code cloning: An empirical study. In Proceedings of the International Conference on Software Maintenance. 227–237.
[47]
Paulo Nunes, Ibéria Medeiros, José C Fonseca, Nuno Neves, Miguel Correia, and Marco Vieira. 2018. Benchmarking static analysis tools for web security. IEEE Transactions on Reliability 67, 3 (2018), 1159–1175.
[48]
Nam H Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2010. Detection of recurring software vulnerabilities. In Proceedings of the International Conference on Automated Software Engineering. 447–456.
[49]
Dhavleesh Rattan, Rajesh Bhatia, and Maninder Singh. 2013. Software clone detection: A systematic review. Information and Software Technology 55, 7 (2013), 1165–1199.
[50]
Chanchal Kumar Roy and James R Cordy. 2007. A survey on software clone detection research. Queen’s School of Computing TR 541, 115 (2007), 64–68.
[51]
Chanchal K Roy and James R Cordy. 2008. NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proceedings of the International Conference on Program Comprehension. 172–181.
[52]
Chanchal K Roy, James R Cordy, and Rainer Koschke. 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming 74, 7 (2009), 470–495.
[53]
Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the International Conference on Very Large Data Bases 1, 1(2008), 364–375.
[54]
Sooel Son and Vitaly Shmatikov. 2011. SAFERPHP: Finding semantic vulnerabilities in PHP applications. In Proceedings of the ACM SIGPLAN Workshop on Programming Languages and Analysis for Security.
[55]
Fangqi Sun, Liang Xu, and Zhendong Su. 2014. Detecting Logic Vulnerabilities in E-commerce Applications. In Proceedings of the Network and Distributed System Security Symposium.
[56]
Yasushi Ueda, Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. On detection of gapped code clones using gap locations. In Proceedings of the Asia-Pacific Software Engineering Conference. 327–336.
[57]
Julian R Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42.
[58]
Steven Van Acker, Nick Nikiforakis, Lieven Desmet, Wouter Joosen, and Frank Piessens. 2012. FlashOver: Automated discovery of cross-site scripting vulnerabilities in rich internet applications. In Proceedings of the ACM Symposium on Information, Computer and Communications Security. 12–13.
[59]
Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy. 2018. CCAligner: a token based large-gap clone detector. In Proceedings of the International Conference on Software Engineering. 1066–1077.
[60]
Gary Wassermann and Zhendong Su. 2007. Sound and precise analysis of web applications for injection vulnerabilities. In Proceedings of the ACM Conference on Programming Language Design and Implementation. 32–41.
[61]
Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, 2020. MVP: Detecting Vulnerabilities using Patch-Enhanced Vulnerability Signatures. In Proceedings of the USENIX Security Symposium. 1165–1182.
[62]
Yichen Xie and Alex Aiken. 2006. Static Detection of Security Vulnerabilities in Scripting Languages. In Proceedings of the USENIX Security Symposium. 179–192.
[63]
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural network-based graph embedding for cross-platform binary code similarity detection. In Proceedings of the ACM Conference on Computer and Communications Security. 363–376.
[64]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the IEEE Symposium on Security and Privacy. 590–604.
[65]
Xifeng Yan, Jiawei Han, and Ramin Afshar. 2003. CloSpan: Mining: Closed sequential patterns in large datasets. In Proceedings of the SIAM international conference on data mining. 166–177.
[66]
Haibo Zhang and Kouichi Sakurai. 2021. A Survey of Software Clone Detection From Security Perspective. IEEE Access 9(2021), 48157–48173.
[67]
Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware android malware classification using weighted contextual API dependency graphs. In Proceedings of the ACM Conference on Computer and Communications Security. 1105–1116.

Cited By

View all
  • (2024)Malicious Package Detection using Metadata InformationProceedings of the ACM Web Conference 202410.1145/3589334.3645543(1779-1789)Online publication date: 13-May-2024
  • (2024)RecurScan: Detecting Recurring Vulnerabilities in PHP Web ApplicationsProceedings of the ACM Web Conference 202410.1145/3589334.3645530(1746-1755)Online publication date: 13-May-2024
  • (2024)Enhancing vulnerability detection via AST decomposition and neural sub-tree encodingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121865238:PBOnline publication date: 27-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Proceedings of the ACM Web Conference 2022
April 2022
3764 pages
ISBN:9781450390965
DOI:10.1145/3485447
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clone detection
  2. subgraph isomorphism
  3. web vulnerabilities

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Institute of Information & communications Technology Planning & Evaluation (IITP)

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)227
  • Downloads (Last 6 weeks)7
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Malicious Package Detection using Metadata InformationProceedings of the ACM Web Conference 202410.1145/3589334.3645543(1779-1789)Online publication date: 13-May-2024
  • (2024)RecurScan: Detecting Recurring Vulnerabilities in PHP Web ApplicationsProceedings of the ACM Web Conference 202410.1145/3589334.3645530(1746-1755)Online publication date: 13-May-2024
  • (2024)Enhancing vulnerability detection via AST decomposition and neural sub-tree encodingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121865238:PBOnline publication date: 27-Feb-2024
  • (2022)Precise (Un)Affected Version Analysis for Web VulnerabilitiesProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556933(1-13)Online publication date: 10-Oct-2022
  • (2022)A Fine-Grained Approach for Vulnerabilities Discovery Using Augmented Vulnerability SignaturesKnowledge Science, Engineering and Management10.1007/978-3-031-10989-8_3(27-38)Online publication date: 6-Aug-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media