Program comprehension can help analysts understand the primary behavior of a binary and enhance the efficiency of reverse engineering analysis. The existing works focus on instruction translation and function name prediction. However, they are limited in understanding the entire program. The recovered source file information can offer insights into the primary behavior of a binary, serving as high-level program summaries. Nevertheless, the files recovered by the function clustering-based approach contain binary functions with discontinuous distributions, resulting in low accuracy. Additionally, there is no existing research related to predicting the names of these recovered files. To this end, we propose a framework for source file information recovery in binaries, DeLink. This framework first leverages a file structure recovery approach based on boundary location to recognize files within a binary. Then, it utilizes an encoder-decoder model to predict the names of these files. The experimental results show that our file structure recovery approach achieves an average improvement of 14% across six evaluation metrics and requires only an average time of 16.74 seconds, outperforming the state-of-the-art work in both recovery quality and efficiency. Additionally, our file name prediction model achieves 70.09% precision and 63.91% recall. Moreover, we demonstrate the effective application of DeLink in malware homology analysis.


    Share this Publication link

    Share on social media