research-article

Open access

StackBERT: Machine Learning Assisted Static Stack Frame Size Recovery on Stripped and Optimized Binaries

Authors:

Chinmay Deshpande,

David Gens,

Michael FranzAuthors Info & Claims

AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

Pages 85 - 95

https://doi.org/10.1145/3474369.3486865

Published: 15 November 2021 Publication History

PDF eReader

Abstract

The call stack represents one of the core abstractions that compiler-generated programs leverage to organize binary execution at runtime. For many use cases reasoning about stack accesses of binary functions is crucial: security-sensitive applications may require patching even after deployment, and binary instrumentation, rewriting, and lifting all necessitate detailed knowledge about the function frame layout of the affected program. As no comprehensive solution to the stack symbolization problem exists to date, existing approaches have to resort to workarounds like emulated stack environments, resulting in increased runtime overheads.

In this paper we present StackBERT, a framework to statically reason about and reliably recover stack frame information of binary functions in stripped and highly optimized programs. The core idea behind our approach is to formulate binary analysis as a self-supervised learning problem by automatically generating ground truth data from a large corpus of open-source programs. We train a state-of-the-art Transformer model with self-attention and finetune for stack frame size prediction. We show that our finetuned model yields highly accurate estimates of a binary function's stack size from its function body alone across different instruction-set architectures, compiler toolchains, and optimization levels. We successfully verify the static estimates against runtime data through dynamic executions of standard benchmarks and additional studies, demonstrating that StackBERT's predictions generalize to 93.44% of stripped and highly optimized test binaries not seen during training. We envision these results to be useful for improving binary rewriting and lifting approaches in the future.

Supplementary Material

MP4 File (AISec21-fp21.mp4)

In this talk, we present our work StackBERT - a framework to statically reason about and reliably recover stack frame information of binary functions in stripped and optimized binaries. We observe that the function call stack is a critical abstraction to reason about for binary lifting engines. To aid this task, we focus on solving the problem of statically predicting the function stack frame size given its raw disassembly. StackBERT formulates this as a supervised learning problem by automatically generating ground truth data from a large corpus of open-source programs. We train a state-of-the-art Transformer model and finetune it for stack-frame size prediction. We show that our finetuned model yields highly accurate estimates of a binary function's stack size from its function body alone across different instruction-set architectures, compiler toolchains, and optimization levels. We demonstrate that StackBERT?s predictions generalize to 93.44% of test binaries not seen during training.

Download
168.67 MB

References

[1]

Toufique Ahmed, Premkumar Devanbu, and Anand Ashok Sawant. 2021. Finding Inlined Functions in Optimized Binaries. https://arxiv.org/pdf/2103.05221.pdf. (2021).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

LeanBin: Harnessing Lifting and Recompilation to Debloat Binaries

Polynima: Practical Hybrid Recompilation for Multithreaded Binaries

What You Trace is What You Get: Dynamic Stack-Layout Recovery for Binary Recompilation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations