poster

Towards Robust Detection of PDF-based Malware

Authors:

Kai Yuan Tay,

Shawn Chua,

Melissa Chua,

Vivek BalachandranAuthors Info & Claims

CODASPY '22: Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy

Pages 370 - 372

https://doi.org/10.1145/3508398.3519365

Published: 15 April 2022 Publication History

Get Access

Abstract

With the indisputable prevalence of PDFs, several studies into PDF malware and their evasive variants have been conducted to test the robustness of ML-based PDF classifier frameworks, Hidost and Mimicus. As heavily documented, the fundamental difference between them is that Hidost investigates the logical structure of PDFs, while Mimicus detects malicious indicators through their structural features. However, there exists techniques to mutate such features such that malicious PDFs are able to bypass these classifiers. In this work, we investigated three known attacks: Mimicry, Mimicry+, and Reverse Mimicry to compare how effective they are in evading classifiers in Hidost and Mimicus. The results shows that Mimicry and Mimicry+ are effective in bypassing models in Mimicus but not in Hidost, while Reverse Mimicy is effective against both models in Mimicus and Hidost.

Supplementary Material

MP4 File (codaspy_pdfmalware_vid.mp4)

In this video, we introduce our paper titled Towards Robust Detection of PDF-based Malware. In the first part of the video, we highlight the prevalence of PDFs in enterprise systems, and how adversaries have picked up on the trend and devised methods to propagate malware through PDFs. Subsequently, we described the methodology, where we use Machine Learning-based PDF classifier frameworks, Hidost and Mimicus, to classify both original and malware manipulated by the three adversarial attacks, Mimicry, Mimicry+, and Reverse Mimicry. We then show the results as to how classification accuracy by Hidost and Mimicus was affected by the adversarial PDFs, and discuss our analysis of the results, followed by highlighting possible improvements and our concluding statement.

Download
168.46 MB

References

[1]

, Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). ACM, New York, NY, USA, 785--794. https://doi.org/10.1145/2939672.2939785

Digital Library

Google Scholar

[2]

, Melissa Chua and Vivek Balachandran. 2018. Effectiveness of android obfuscation on evading anti-malware. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy . 143--145.

Digital Library

Google Scholar

[3]

, Gerhard Eschelbeck. 2015. Sophos Security Threat Report 2014. https://news.sophos.com/en-us/2013/12/10/sophos-security-threat-report-2014/

Google Scholar

[4]

Nicolas Fleury, Theo Dubrunquez, and Ihsen Alouani. 2021. PDF-Malware: An Overview on Threats, Detection and Evasion Attacks. arxiv: 2107.12873 [cs.CR]

Google Scholar

[5]

Recorded Future. 2015. Gone in a flash: Top 10 vulnerabilities used by exploit kits. https://www.recordedfuture.com/top-vulnerabilities-2015/

Google Scholar

[6]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30 (2017), 3146--3154.

Google Scholar

[7]

Davide Maiorca, Igino Corona, and Giorgio Giacinto. 2013. Looking at the Bag is Not Enough to Find the Bomb: An Evasion of Structural Methods for Malicious PDF Files Detection. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (Hangzhou, China) (ASIA CCS '13). Association for Computing Machinery, New York, NY, USA, 119--130. https://doi.org/10.1145/2484313.2484327

Digital Library

Google Scholar

[8]

Russ Smoak. 2014. Cisco 2014 annual Security report. https://blogs.cisco.com/security/cisco-2014-annual-security-report-threat-intelligence-offers-view-into-network-compromises

Google Scholar

[9]

Charles Smutz and Angelos Stavrou. 2012. Malicious PDF Detection Using Metadata and Structural Features. In Proceedings of the 28th Annual Computer Security Applications Conference (Orlando, Florida, USA) (ACSAC '12). Association for Computing Machinery, New York, NY, USA, 239--248. https://doi.org/10.1145/2420950.2420987

Digital Library

Google Scholar

[10]

Corporation Symantec. 2014. 2014 Internet Security Threat Report. https://docs.broadcom.com/doc/istr-14-april-volume-19-en

Google Scholar

[11]

Liang Tong, Bo Li, Chen Hajaj, Chaowei Xiao, Ning Zhang, and Yevgeniy Vorobeychik. 2019. Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 285--302. https://www.usenix.org/conference/usenixsecurity19/presentation/tong

Google Scholar

[12]

Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers: A case study on PDF malware classifiers. Proceedings 2016 Network and Distributed System Security Symposium (2016). https://doi.org/10.14722/ndss.2016.23115

Crossref

Google Scholar

[13]

Pavel Laskov. 2016. Hidost: A static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, Vol. 2016, 1 (2016). https://doi.org/10.1186/s13635-016-0045-0

Digital Library

Google Scholar

[14]

Nedimrndic and Pavel Laskov. 2014. Practical Evasion of a Learning-Based Classifier: A Case Study. In 2014 IEEE Symposium on Security and Privacy. 197--211. https://doi.org/10.1109/SP.2014.20

Digital Library

Google Scholar

Cited By

View all

Abu Al-Haija QOdeh AQattous H(2022)PDF Malware Detection Based on Optimizable Decision TreesElectronics10.3390/electronics1119314211:19(3142)Online publication date: 30-Sep-2022
https://doi.org/10.3390/electronics11193142

Index Terms

Towards Robust Detection of PDF-based Malware
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

Malware still constitutes a major threat in the cybersecurity landscape, also due to the widespread use of infection vectors such as documents. These infection vectors hide embedded malicious code to the victim users, facilitating the use of social ...
Evading PDF Malware Classifiers with Generative Adversarial Network
Cyberspace Safety and Security
Abstract
Generative adversarial networks (GANs) have become one of the most popular research topics in deep learning. It is widely used in the term of image, and through the constant competition between generator and discriminator, it can generate so ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...

Comments

Information & Contributors

Information

Published In

CODASPY '22: Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy

April 2022

392 pages

ISBN:9781450392204

DOI:10.1145/3508398

General Chair:
Anupam Joshi
University of Maryland, Baltimore County, USA
,
Program Chairs:
Maribel Fernandez
King's College London, UK
,
Rakesh M. Verma
University of Houston, USA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2022

Check for updates

Author Tags

Qualifiers

Poster

Conference

CODASPY '22

Sponsor:

SIGSAC

CODASPY '22: Twelveth ACM Conference on Data and Application Security and Privacy

April 24 - 27, 2022

MD, Baltimore, USA

Acceptance Rates

Overall Acceptance Rate 149 of 789 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
135
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Abu Al-Haija QOdeh AQattous H(2022)PDF Malware Detection Based on Optimizable Decision TreesElectronics10.3390/electronics1119314211:19(3142)Online publication date: 30-Sep-2022
https://doi.org/10.3390/electronics11193142

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

Evading PDF Malware Classifiers with Generative Adversarial Network

Malware detection using adaptive data compression

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks

Evading PDF Malware Classifiers with Generative Adversarial Network

Malware detection using adaptive data compression

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations