Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Multi-relational Instruction Association Graph for Cross-Architecture Binary Similarity Comparison

  • Conference paper
  • First Online:
Security and Privacy in Communication Networks (SecureComm 2022)

Abstract

Cross-architecture binary similarity comparison is essential in many security applications. Recently, researchers have proposed learning-based approaches to improve comparison performance. They adopted a paradigm of instruction pre-training, individual binary encoding, and distance-based similarity comparison. However, instruction embeddings pre-trained on external code corpus are not universal in diverse real-world applications. And separately encoding cross-architecture binaries will accumulate the semantic gap of instruction sets, limiting the comparison accuracy. This paper proposes a novel cross-architecture binary similarity comparison approach with multi-relational instruction association graph. We associate mono-architecture instruction tokens with context relevance and cross-architecture tokens with potential semantic correlations from different perspectives. Then we exploit the relational graph convolutional network (R-GCN) to perform type-specific graph information propagation. Our approach can bridge the gap in the cross-architecture instruction representation spaces while avoiding the external pre-training workload. We conduct extensive experiments on basic block-level and function-level datasets to prove the superiority of our approach. Furthermore, evaluations on a large-scale real-world IoT malware reuse function collection show that our approach is valuable for identifying malware propagated on IoT devices of various architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Antonakakis, M., et al.: Understanding the mirai botnet. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 1093–1110 (2017)

    Google Scholar 

  2. Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variantdetection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2013)

    Article  Google Scholar 

  3. Chandramohan, M., Xue, Y., Xu, Z., Liu, Y., Cho, C.Y., Tan, H.B.K.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 678–689 (2016)

    Google Scholar 

  4. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017 (2017)

    Google Scholar 

  5. Costin, A., Zaddach, J.: IoT malware: comprehensive survey, analysis framework and case studies. BlackHat USA 1(1), 1–9 (2018)

    Google Scholar 

  6. Cozzi, E., Vervier, P.A., Dell’Amico, M., Shen, Y., Bilge, L., Balzarotti, D.: The tangled genealogy of IoT malware. In: Annual Computer Security Applications Conference, pp. 1–16 (2020)

    Google Scholar 

  7. David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. ACM SIGPLAN Not. 51(6), 266–280 (2016)

    Article  Google Scholar 

  8. Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)

    Google Scholar 

  9. Duan, Y., Li, X., Wang, J., Yin, H.: DeepBinDiff: learning program-wide code representations for binary diffing. In: Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS 2020) (2020)

    Google Scholar 

  10. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)

    Google Scholar 

  11. Farhadi, M.R., Fung, B.C., Charland, P., Debbabi, M.: BinClone: detecting code clones in malware. In: 2014 Eighth International Conference on Software Security and Reliability (SERE), pp. 78–87. IEEE (2014)

    Google Scholar 

  12. Feng, Q., Wang, M., Zhang, M., Zhou, R., Henderson, A., Yin, H.: Extracting conditional formulas for cross-platform bug search. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 346–359 (2017)

    Google Scholar 

  13. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016)

    Google Scholar 

  14. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch Geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)

    Google Scholar 

  15. Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899. IEEE (2018)

    Google Scholar 

  16. Herwig, S., Harvey, K., Hughey, G., Roberts, R., Levin, D.: Measurement and analysis of Hajime, a peer-to-peer IoT botnet. In: Network and Distributed Systems Security (NDSS) Symposium (2019)

    Google Scholar 

  17. Hu, X., Shin, K.G., Bhatkar, S., Griffin, K.: MutantX-S: scalable malware clustering based on static features. In: 2013 USENIX Annual Technical Conference (USENIX ATC 2013), pp. 187–198 (2013)

    Google Scholar 

  18. Hu, Y., Zhang, Y., Li, J., Gu, D.: Cross-architecture binary semantics understanding via similar code comparison. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 57–67. IEEE (2016)

    Google Scholar 

  19. Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 155–166 (2017)

    Google Scholar 

  20. Kargén, U., Shahmehri, N.: Towards robust instruction-level trace alignment of binary code. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 342–352. IEEE (2017)

    Google Scholar 

  21. Khoo, W.M., Mycroft, A., Anderson, R.: Rendezvous: a search engine for binary code. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp. 329–338. IEEE (2013)

    Google Scholar 

  22. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks (2016)

    Google Scholar 

  23. Lageman, N., Kilmer, E.D., Walls, R.J., McDaniel, P.D.: BinDNN: resilient function matching using deep learning. In: Deng, R., Weng, J., Ren, K., Yegneswaran, V. (eds.) SecureComm 2016. LNICST, vol. 198, pp. 517–537. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59608-2_29

    Chapter  Google Scholar 

  24. Lee, Y.R., Kang, B., Im, E.G.: Function matching-based binary-level software similarity calculation. In: Research in Adaptive and Convergent Systems, RACS 2013, Montreal, QC, Canada, 1–4 October 2013, pp. 322–327. ACM (2013)

    Google Scholar 

  25. python Levenshtein. https://pypi.org/project/python-Levenshtein/

  26. Liang, H., Xie, Z., Chen, Y., Ning, H., Wang, J.: FIT: inspect vulnerabilities in cross-architecture firmware by deep learning and bipartite matching. Comput. Secur. 99, 102032 (2020)

    Article  Google Scholar 

  27. Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: Insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 349–358 (2012)

    Google Scholar 

  28. Liu, B., et al.: \(\alpha \)diff: cross-version binary code similarity detection with DNN. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 667–678 (2018)

    Google Scholar 

  29. Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400 (2014)

    Google Scholar 

  30. Massarelli, L., Di Luna, G.A., Petroni, F., Querzoni, L., Baldoni, R.: Investigating graph embedding neural networks with unsupervised features extraction for binary analysis. In: Proceedings of the 2nd Workshop on Binary Analysis Research (BAR) (2019)

    Google Scholar 

  31. Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15

    Chapter  Google Scholar 

  32. Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 253–270 (2017)

    Google Scholar 

  33. Ming, J., Xu, D., Wu, D.: Memoized semantics-based binary diffing with application to malware lineage inference. In: Federrath, H., Gollmann, D. (eds.) SEC 2015. IAICT, vol. 455, pp. 416–430. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18467-8_28

    Chapter  Google Scholar 

  34. Ng, B.H., Prakash, A.: Expose: discovering potential binary code re-use. In: 2013 IEEE 37th Annual Computer Software and Applications Conference, pp. 492–501. IEEE (2013)

    Google Scholar 

  35. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy, pp. 709–724. IEEE (2015)

    Google Scholar 

  36. Pewny, J., Schuster, F., Bernhard, L., Holz, T., Rossow, C.: Leveraging semantic signatures for bug search in binary programs. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 406–415 (2014)

    Google Scholar 

  37. Qiao, Y., Yun, X., Zhang, Y.: Fast reused function retrieval method based on simhash and inverted index. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 937–944. IEEE (2016)

    Google Scholar 

  38. radare2. https://www.radare.org/n/radare2.html

  39. Redmond, K., Luo, L., Zeng, Q.: https://github.com/nlp-code-analysis/cross-arch-instr-model/

  40. Redmond, K., Luo, L., Zeng, Q.: A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis. arXiv preprint arXiv:1812.09652 (2018)

  41. Ruttenberg, B., et al.: Identifying shared software components to support malware forensics. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 21–40. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_2

    Chapter  Google Scholar 

  42. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)

    Article  Google Scholar 

  43. Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38

    Chapter  Google Scholar 

  44. Wang, B., Dou, Y., Sang, Y., Zhang, Y., Huang, J.: IoTCMal: towards a hybrid IoT honeypot for capturing and analyzing malware. In: ICC 2020–2020 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE (2020)

    Google Scholar 

  45. Wang, S., Wu, D.: In-memory fuzzing for binary code similarity analysis. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 319–330. IEEE (2017)

    Google Scholar 

  46. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)

    Google Scholar 

  47. Xu, Z., Chen, B., Chandramohan, M., Liu, Y., Song, F.: Spain: security patch analysis for binaries towards understanding the pain and pills. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 462–472. IEEE (2017)

    Google Scholar 

  48. Yu, Z., Cao, R., Tang, Q., Nie, S., Huang, J., Wu, S.: Order matters: semantic-aware neural networks for binary code similarity detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1145–1152 (2020)

    Google Scholar 

  49. Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: https://nmt4binaries.github.io/

  50. Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. In: 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, 24–27 February 2019 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuhao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, Q., Zhang, Y., Li, S. (2023). Multi-relational Instruction Association Graph for Cross-Architecture Binary Similarity Comparison. In: Li, F., Liang, K., Lin, Z., Katsikas, S.K. (eds) Security and Privacy in Communication Networks. SecureComm 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 462. Springer, Cham. https://doi.org/10.1007/978-3-031-25538-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25538-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25537-3

  • Online ISBN: 978-3-031-25538-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics