Abstract
Vulnerability detection has long been an important issue in software security. The existing methods mainly define the rules and features of vulnerabilities through experts, which are time-consuming and laborious, and usually with poor accuracy. Thus automatic vulnerability detection methods based on code representation graph and Graph Neural Network (GNN) have been proposed with the advantage of effectively capture both the semantics and structure information of the source code, showing a better performance. However, these methods ignore the redundant information in the graph and the GNN model, leading to a still unsatisfactory performance. To alleviate this problem, we propose a attention-based automatic vulnerability detection approach with Gated Graph Sequence Neural Network (GGNN). Firstly, we introduce two preprocessing methods namely pruning and symbolization representation to reduce the redundant information of the input code representation graph, and then put the graph into the GGNN layer to update the node features. Next, the key subgraph extraction and global feature aggregation are realized through the attention-based Pooling layers. Finally, the classification result is obtained through a linear classifier. The experimental results show the effectiveness of our proposed preprocessing methods and attention-based Pooling layers, especially the higher Accuracy and F1-score gains compared with the state-of-the-art automatic vulnerability detection approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
National vulnerability database. In: https://nvd.nist.gov/
Common vulnerabilities and exposures. In: https://cve.mitre.org
Flawfinder. In: http://www.dwheeler.com/flawfinder
Viega J, Bloch JT, McGraw G (2000) ITS4: a static vulnerability scanner for C and C++ code. In: 16th Annual Computer Security Applications Conference (ACSAC), 11–15 December 2000, New Orleans, Louisiana, USA, p 257
Rough-auditing-tool-for-security. In: https://code.google.com/archive/p/rough-auditing-tool-for-security
Checkmarx: In: https://www.checkmarx.com/
Hp fortify. In: https://www.hpfod.com/
Kim S, Woo S, Lee H, Oh H (2017) VUDDY: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22–26, pp 595–614
Li Z, Zou D, Xu S, et al (2016) Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC, Los Angeles, CA, USA, December 5–9, pp 201–213
Yamaguchi F, Golde N, Arp D, Rieck K (2014) Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18–21, pp 590–604
Perl, H., Dechand, S., et al (2015) Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12–16, 2015, pp 426–437
Grieco G, Grinblat GL et al (2016) Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM on Conference on Data and Application Security and Privacy, CODASPY, New Orleans, LA, USA, March 9–11, pp 85–96
Wu F, Wang J, Liu J, Wang W (2017) Vulnerability detection with deep learning. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), pp 1298–1302
Li Z, Zou D, et al. (2018) Sysevr: a framework for using deep learning to detect software vulnerabilities
Li Z, Zou D, et al. (2018) Vuldeepecker: a deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium, NDSS, San Diego, California, USA, February 18–21
Srikant S, Lesimple N, O’Reilly U (2020) Dependency-based neural representations for classifying lines of programs
Li Y, Tarlow D, Brockschmidt M, Zemel RS (2016) Gated graph sequence neural networks. In: Bengio Y, LeCun Y (eds) 4th International Conference on Learning Representations, ICLR, San Juan, Puerto Rico, May 2–4
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR, Toulon, France, April 24–26
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Boston, MA, USA, June 7–12, pp 3431–3440
Zheng W, Gao J et al (2020) The impact factors on the performance of machine learning-based vulnerability detection: a comparative study. J Syst Softw 110659
Cho K, van Merrienboer B et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1724–1734
Allamanis M, Brockschmidt M, Khademi M (2018) Learning to represent programs with graphs. In: 6th International Conference on Learning Representations, ICLR, Vancouver, BC, Canada, April 30–May 3
Zhou Y, Liu S, et al. (2019) Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp 10197–10207
Rabheru R, Hanif H, Maffeis S (2020) A hybrid graph neural network approach for detecting PHP vulnerabilities
Wang H, Ye G et al (2021) Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans Inf Forensics Secur 16:1943–1958
Cheng X, Wang H et al (2019) Static detection of control-flow-related vulnerabilities using graph embedding. In: Pang J, Sun J (eds) 24th International Conference on Engineering of Complex Computer Systems. ICECCS 2019, Guangzhou, China, November 10–13, pp 41–50
Cao S, Sun X, Bo L, Wei Y, Li B (2021) Bgnn4vd: constructing bidirectional graph neural-network for vulnerability detection. Inf Softw Technol 136:106576
Wu Y, Lu J, Zhang Y, Jin S (2021) Vulnerability detection in C/C++ source code with graph representation learning. In: 11th IEEE Annual Computing and Communication Workshop and Conference, CCWC, Las Vegas, NV, USA, January 27–30, pp 1519–1524
Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: Proceedings of the 36th International Conference on Machine Learning, ICML, 9–15 June 2019, Long Beach, California, USA, vol. 97, pp 3734–3743
Yamaguchi F. Joern. In: https://joern.io/
Russell LKR. Common vulnerabilities and exposures. In: https://osf.io/d45bw/
Russell RL, Kim LY, Hamilton LH, Lazovich T, Harer J, Ozdemir O, Ellingwood PM, McConley MW (2018) Automated vulnerability detection in source code using deep representation learning. In: Wani MA, Kantardzic MM, Mouchaweh MS, Gama J, Lughofer E (eds) 17th IEEE International Conference on Machine Learning and Applications. ICMLA 2018, Orlando, FL, USA, December 17–20, pp 757–762
Yang Z, Yang D, et al. (2016) Hierarchical attention networks for document classification. In: NAACL HLT, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12–17, pp 1480–1489
Lin G, Zhang J, Luo W, Pan L, de Vel OY, Montague P, Xiang Y (2021) Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans Depend Secur Comput 18(5):2469–2485
Cppcheck. In: http://cppcheck.net
Acknowledgements
This work was supported by Nature Science Foundation of China (Grant No. 61872104), Fundamental Research Funds for the Central Universities in China (Grant No. 3072020CF0603), National Natural Science Foundation of China (Grant No. 62106150), CAAC Key Laboratory of Civil Aviation Wide Surveillance and Safety Operation Management and Control Technology (Grant No. 202102), and CCF-NSFOCUS (Grant No. 2021001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, G., Yang, L., Zhang, L. et al. An attention-based automatic vulnerability detection approach with GGNN. Int. J. Mach. Learn. & Cyber. 14, 3113–3127 (2023). https://doi.org/10.1007/s13042-023-01824-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01824-7