Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Learning Types for Binaries

  • Conference paper
  • First Online:
Formal Methods and Software Engineering (ICFEM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10610))

Included in the following conference series:

  • 1075 Accesses

Abstract

Type inference for Binary codes is a challenging problem due partly to the fact that much type-related information has been lost during the compilation from high-level source code. Most of the existing research on binary code type inference tend to resort to program analysis techniques, which can be too conservative to infer types with high accuracy or too heavy-weight to be viable in practice. In this paper, we propose a new approach to learning types for recovered variables from their related representative instructions. Our idea is motivated by “duck typing”, where the type of a variable is determined by its features and properties. Our approach first learns a classifier from existing binaries with debug information and then uses this classifier to predict types for new, unseen binaries. We have implemented our approach in a tool called BITY and used it to conduct some experiments on a well-known benchmark coreutils (v8.4). The results show that our tool is more precise than the commercial tool Hey-Rays, both in terms of correct types and compatible types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In FASTCALL convention, the first two parameters are passed in ECX and EDX.

  2. 2.

    Hex-Rays makes use of debug information, so we perform both our tool and Hex-Rays on stripped binaries.

  3. 3.

    Theoretically, we can use the radio of the number of common levels among the number of maximum levels between \(t_1\) and \(t_2\) [4]. Since we consider 3 levels in practice, we use the half here.

References

  1. Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: Network and Distributed System Security Symposium (2010)

    Google Scholar 

  2. Lee, J.H., Avgerinos, T., Brumley, D.: Tie: principled reverse engineering of types in binary programs. In: Network and Distributed System Security Symposium (2011)

    Google Scholar 

  3. Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: approaching C++ decompilation. In: Reverse Engineering, pp. 347–356 (2011)

    Google Scholar 

  4. Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 51–60 (2013)

    Google Scholar 

  5. Noonan, M., Loginov, A., Cok, D.: Polymorphic type inference for machine code. In: ACM Sigplan Conference on Programming Language Design and Implementation, pp. 27–41 (2016)

    Google Scholar 

  6. The IDA Pro and Hex-Rays. http://www.hex-rays.com/idapro/

  7. Balakrishnan, G., Reps, T.: Analyzing memory accesses in x86 binary executables. University of Wisconsin-Madison Department of Computer Sciences (2012)

    Google Scholar 

  8. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)

    Article  Google Scholar 

  9. Smola, A.J., Schlkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica 22(1), 211–231 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. IntelCorporation: Intel 64 and IA-32 Architectures Software Developer Manuals, December 2016

    Google Scholar 

  11. Crnic, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    Google Scholar 

  12. Salton, G.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149(PB), 677–682 (2015)

    Article  Google Scholar 

  14. LIBSVM. http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  15. 178 Algorithm C language source code. http://www.codeforge.com/article/220463

  16. Xu, S.: Commonly Used Algorithm Assembly (C Language Description). Tsinghua University Press, Beijing (2004). (in Chinese)

    Google Scholar 

  17. Robbins, E., Howe, J.M., King, A.: Theory propagation and rational-trees. In: Symposium on Principles and Practice of Declarative Programming, pp. 193–204 (2013)

    Google Scholar 

  18. Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. 48(4), 65 (2016)

    Article  Google Scholar 

  19. Zhang, M., Prakash, A., Li, X., Liang, Z., Yin, H.: Identifying and analyzing pointer misuses for sophisticated memory-corruption exploit diagnosis. Proc. West. Pharmacol. Soc. 47(47), 46–49 (2013)

    Google Scholar 

  20. Yan, Q., McCamant, S.: Conservative signed/unsigned type inference for binaries using minimum cut. Technical report, University of Minnesota (2014)

    Google Scholar 

  21. Slowinska, A., Stancescu, T., Bos, H.: Howard: a dynamic excavator for reverse engineering data structures. In: Network and Distributed System Security Symposium (2011)

    Google Scholar 

  22. Elwazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Artiste: automatic generation of hybrid data structure signatures from binary code executions. Technical report TRIMDEA-SW-2012-001, IMDEA Software Institute (2012)

    Google Scholar 

  23. Haller, I., Slowinska, A., Bos, H.: MemPick: high-level data structure detection in C/C++ binaries. In: Reverse Engineering, pp. 32–41 (2013)

    Google Scholar 

  24. Jin, W., Cohen, C., Gennari, J., Hines, C., Chaki, S., Gurfinkel, A., Havrilla, J., Narasimhan, P.: Recovering C++ objects from binaries using inter-procedural data-flow analysis. In: ACM Sigplan on Program Protection and Reverse Engineering Workshop, p. 1 (2014)

    Google Scholar 

  25. Yoo, K., Barua, R.: Recovery of object oriented features from C++ binaries. In: Asia-Pacific Software Engineering Conference, pp. 231–238 (2014)

    Google Scholar 

  26. Katz, O., El-Yaniv, R., Yahav, E.: Estimating types in binaries using predictive modeling. In: ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 313–326 (2016)

    Google Scholar 

  27. Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code”. In: The ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124 (2015)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful comments. This work was partially supported by the National Natural Science Foundation of China under Grants No. 61502308 and 61373033, Science and Technology Foundation of Shenzhen City under Grant No. JCYJ20170302153712968.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwu Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Xu, Z., Wen, C., Qin, S. (2017). Learning Types for Binaries. In: Duan, Z., Ong, L. (eds) Formal Methods and Software Engineering. ICFEM 2017. Lecture Notes in Computer Science(), vol 10610. Springer, Cham. https://doi.org/10.1007/978-3-319-68690-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68690-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68689-9

  • Online ISBN: 978-3-319-68690-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics