Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2671225.2671279guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

BYTEWEIGHT: learning to recognize functions in binary code

Published: 20 August 2014 Publication History

Abstract

Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge.
In this paper, we propose BYTEWEIGHT, a new automatic function identification algorithm. Our approach automatically learns key features for recognizing functions and can therefore easily be adapted to different platforms, new compilers, and new optimizations. We evaluated our tool against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2,200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2,200 binaries, we found that BYTE-WEIGHT missed 44,621 functions in comparison with the 266,672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459,247 functions, BYTEWEIGHT misidentified only 43,992 functions.

References

[1]
ABADI, M., BUDIU, M., ERLINGSSON, U., AND LIGATTI, J. Control-flow integrity--principles, implementations, and applications. ACM Transactions on Information and System Security 13, 1 (2009), 1-40.
[2]
BALAKRISHNAN, G. WYSINWYX: What You See Is Not What You Execute. PhD thesis, University of Wisconsin-Madison, 2007.
[3]
BAP: Binary analysis platform. http://bap.ece.cmu.edu/.
[4]
BinDiff. http://www.zynamics.com/bindiff.html.
[5]
BinNavi. http://www.zynamics.com/binnavi.html.
[6]
BitBlaze: Binary analysis for computer security. http:// bitblaze.cs.berkeley.edu/.
[7]
BOURQUIN, M., KING, A., AND ROBBINS, E. BinSlayer: Accurate comparison of binary executables. In Proceedings of the 2nd ACM Program Protection and Reverse Engineering Workshop (2013), ACM.
[8]
BRUMLEY, D., JAGER, I., AVGERINOS, T., AND SCHWARTZ, E. J. BAP: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification (2011), Springer, pp. 463-469.
[9]
CABALLERO, J., JOHNSON, N. M., MCCAMANT, S., AND SONG, D. Binary code extraction and interface identification for security applications. In Proceedings of the 17th Network and Distributed System Security Symposium (2010), The Internet Society.
[10]
CHA, S. K., AVGERINOS, T., REBERT, A., AND BRUMLEY, D. Unleashing mayhem on binary code. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (2012), IEEE, pp. 380-394.
[11]
CHOI, S., PARK, H., LIM, H.-I., AND HAN, T. A static birthmark of binary executables based on API call structure. In Proceeding of the 12th Asian Computing Science Conference (2007), Springer, pp. 2-16.
[12]
DAVI, L., DMITRIENKO, A., EGELE, M., FISCHER, T., HOLZ, T., HUND, R., STEFAN, N., AND SADEGHI, A.-R. MoCFI: A framework to mitigate control-flow attacks on smartphones. In Proceedings of the 19th Network and Distributed System Security Symposium (2012), The Internet Society.
[13]
Dia2dump Sample. http://msdn.microsoft.com/en-us/ library/b5ke49f5.aspx.
[14]
Dyninst API. http://www.dyninst.org/.
[15]
ERLINGSSON, U., ABADI, M., VRABLE, M., BUDIU, M., AND NECULA, G. C. XFI: Software guards for system address spaces. In Proceedins of the 7th Symposium on Operating Systems Design and Implementation (2006), USENIX, pp. 75-88.
[16]
IDA FLIRT Technology. https://www.hex-rays.com/ products/ida/tech/flirt/in_depth.shtml.
[17]
GCC--Function Inline. http://gcc.gnu.org/onlinedocs/ gcc/Inline.html.
[18]
GUILFANOV, I. Decompilers and beyond. In BlackHat USA (2008).
[19]
HARRIS, L. C., AND MILLER, B. P. Practical analysis of stripped binary code. ACM SIGARCH Computer Architecture News 33, 5 (2005), 63-68.
[20]
HU, X., CHIUEH, T.-C., AND SHIN, K. G. Large-scale malware indexing using function-call graphs. In Proceedings of the 16th ACM Conference on Computer and Communications Security (2009), ACM, pp. 611-620.
[21]
KHOO, W. M., MYCROFT, A., AND ANDERSON, R. Rendezvous: A search engine for binary code. In Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (2013), IEEE, pp. 329-338.
[22]
KINDER, J. Static Analysis of x86 Executables. PhD thesis, Technische Universität Darmstadt, 2010.
[23]
KRUEGEL, C., ROBERTSON, W., VALEUR, F., AND VIGNA, G. Static disassembly of obfuscated binaries. In Proceedings of the 13th USENIX Security Symposium (2004), USENIX, pp. 255-270.
[24]
PAPPAS, V., POLYCHRONAKIS, M., AND KEROMYTIS, A. D. Smashing the gadgets: Hindering return-oriented programming using in-place code randomization. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (2012), IEEE, pp. 601- 615.
[25]
PERKINS, J. H., KIM, S., LARSEN, S., AMARASINGHE, S., BACHRACH, J., CARBIN, M., PACHECO, C., SHERWOOD, F., SIDIROGLOU, S., SULLIVAN, G., WONG, W.-F., ZIBIN, Y., ERNST, M. D., AND RINARD, M. Automatically patching errors in deployed software. In Proceedings of the ACM 22nd Symposium on Operating Systems Principles (2009), ACM, pp. 87-102.
[26]
ROSENBLUM, N. The new Dyninst code parser: Binary code isn't as simple as it used to be, 2006.
[27]
ROSENBLUM, N. E., ZHU, X., MILLER, B. P., AND HUNT, K. Learning to analyze binary computer code. In Proceedings of the 23rd National Conference on Artificial Intelligence (2008), AAAI, pp. 798-804.
[28]
SCHWARTZ, E., LEE, J., WOO, M., AND BRUMLEY, D. Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In Proceedings of the 22nd USENIX Security Symposium (2013), USENIX, pp. 353-368.
[29]
SHARIF, M., LANZI, A., GIFFIN, J., AND LEE, W. Impeding malware analysis using conditional code obfuscation. In Proceedings of the 16th Network and Distributed System Security Symposium (2008), Internet Society.
[30]
SIDIROGLOU, S., LAADAN, O., KEROMYTIS, A. D., AND NIEH, J. Using rescue points to navigate software recovery. In Proceedings of the 2007 IEEE Symposium on Security and Privacy (2007), IEEE, pp. 273-280.
[31]
Unstrip. http://www.paradyn.org/html/tools/unstrip. html.
[32]
VAN EMMERIK, M. J., AND WADDINGTON, T. Using a decompiler for real-world source recovery. In Proceedings of the 11th Working Conference on Reverse Engineering (2004), IEEE, pp. 27-36.
[33]
ZHANG, C., WEI, T., CHEN, Z., DUAN, L., SZEKERES, L., MCCAMANT, S., SONG, D., AND ZOU, W. Practical control flow integrity & randomization for binary executables. In Proceedings of the 2013 IEEE Symposium on Security and Privacy (2013), IEEE, pp. 559-573.
[34]
ZHANG, M., AND SEKAR, R. Control flow integrity for COTS binaries. In Proceedings of the 22nd USENIX Security Symposium (2013), pp. 337-352.

Cited By

View all
  • (2022)A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and FeaturesACM Computing Surveys10.1145/348686055:1(1-41)Online publication date: 17-Jan-2022
  • (2020)On the Impact of Exception Handling Compatibility on Binary InstrumentationProceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation10.1145/3411502.3418428(23-28)Online publication date: 13-Nov-2020
  • (2020)Devil is Virtual: Reversing Virtual Inheritance in C++ BinariesProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417251(133-148)Online publication date: 30-Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
SEC'14: Proceedings of the 23rd USENIX conference on Security Symposium
August 2014
1067 pages
ISBN:9781931971157
  • Program Chair:
  • Kevin Fu

Sponsors

  • Akamai: Akamai
  • Google Inc.
  • IBMR: IBM Research
  • NSF
  • Microsoft Reasearch: Microsoft Reasearch
  • USENIX Assoc: USENIX Assoc

Publisher

USENIX Association

United States

Publication History

Published: 20 August 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and FeaturesACM Computing Surveys10.1145/348686055:1(1-41)Online publication date: 17-Jan-2022
  • (2020)On the Impact of Exception Handling Compatibility on Binary InstrumentationProceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation10.1145/3411502.3418428(23-28)Online publication date: 13-Nov-2020
  • (2020)Devil is Virtual: Reversing Virtual Inheritance in C++ BinariesProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417251(133-148)Online publication date: 30-Oct-2020
  • (2019)DEEPVSAProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361462(1787-1804)Online publication date: 14-Aug-2019
  • (2019)RAZORProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361459(1733-1750)Online publication date: 14-Aug-2019
  • (2019)Percentages, probabilities and professions of performanceProceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test10.5555/3359012.3359020(8-8)Online publication date: 12-Aug-2019
  • (2019)From Hack to Elaborate Technique—A Survey on Binary RewritingACM Computing Surveys10.1145/331641552:3(1-37)Online publication date: 18-Jun-2019
  • (2018)Statistical Reconstruction of Class Hierarchies in BinariesACM SIGPLAN Notices10.1145/3296957.317320253:2(363-376)Online publication date: 19-Mar-2018
  • (2018)Library and Function Identification by Optimized Pattern Matching on Compressed DatabasesProceedings of the 2nd Reversing and Offensive-oriented Trends Symposium10.1145/3289595.3289598(1-12)Online publication date: 29-Nov-2018
  • (2018)TOSSProceedings of the 2018 Workshop on Forming an Ecosystem Around Software Transformation10.1145/3273045.3273048(1-7)Online publication date: 15-Oct-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media