Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3308558.3313752acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Anything to Hide? Studying Minified and Obfuscated Code in the Web

Published: 13 May 2019 Publication History

Abstract

JavaScript has been used for various attacks on client-side web applications. To hinder both manual and automated analysis from detecting malicious scripts, code minification and code obfuscation may hide the behavior of a script. Unfortunately, little is currently known about how real-world websites use such code transformations. This paper presents an empirical study of obfuscation and minification in 967,149 scripts (424,023 unique) from the top 100,000 websites. The core of our study is a highly accurate (95%-100%) neural network-based classifier that we train to identify whether obfuscation or minification have been applied and if yes, using what tools. We find that code transformations are very widespread, affecting 38% of all scripts. Most of the transformed code has been minified, whereas advanced obfuscation techniques, such as encoding parts of the code or fetching all strings from a global array, affect less than 1% of all scripts (2,842 unique scripts in total). Studying which code gets obfuscated, we find that obfuscation is particularly common in certain website categories, e.g., adult content. Further analysis of the obfuscated code shows that most of it is similar to the output produced by a single obfuscation tool and that some obfuscated scripts trigger suspicious behavior, such as likely fingerprinting and timing attacks. Finally, we show that obfuscation comes at a cost, because it slows down execution and risks to produce code that changes the intended behavior. Overall, our study shows that the security community must consider minified and obfuscated JavaScript code, and it provides insights into what kinds of transformations to focus on. Our learned classifiers provide an automated and accurate way to identify obfuscated code, and we release a set of real-world obfuscated scripts for future research.

References

[1]
Ismail Adel AL-Taharwa, Hahn-Ming Lee, Albert B. Jeng, Kuo-Ping Wu, Cheng-Seen Ho, and Shyi-Ming Chen. 2015. JSOD: JavaScript Obfuscation Detector. Sec. and Commun. Netw. 8, 6 (April 2015), 1092-1107.
[2]
Esben Andreasen, Liang Gong, Anders Møller, Michael Pradel, Marija Selakovic, Koushik Sen, and Cristian-Alexandru Staicu. 2017. A Survey of Dynamic Analysis and Test Generation for JavaScript. Comput. Surveys (2017).
[3]
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts. CoRR arXiv:1809.05193(2018).
[4]
Mariano Ceccato, Andrea Capiluppi, Paolo Falcarin, and Cornelia Boldyreff. 2015. A large study on the effect of code obfuscation on the quality of Java code. Empirical Software Engineering 20, 6 (01 Dec 2015), 1486-1524.
[5]
Mariano Ceccato, Paolo Falcarin, Alessandro Cabutto, Yosief Weldezghi Frezghi, and Cristian-Alexandru Staicu. 2016. Search Based Clustering for Protecting Software with Diversified Updates. In Search Based Software Engineering - 8th International Symposium, SSBSE 2016, Raleigh, NC, USA, October 8-10, 2016, Proceedings. 159-175.
[6]
Marco Cova, Christopher Krügel, and Giovanni Vigna. 2010. Detection and analysis of drive-by-download attacks and malicious JavaScript code. International Conference on World Wide Web (WWW).
[7]
Charlie Curtsinger, Benjamin Livshits, Benjamin Zorn, and Christian Seifert. 2011. ZOZZLE: Fast and Precise In-browser JavaScript Malware Detection. In Proceedings of the 20th USENIX Conference on Security(SEC'11). USENIX Association, Berkeley, CA, USA, 3-3. http://dl.acm.org/citation.cfm?id=2028067.2028070
[8]
Mahmoud Hammad, Joshua Garcia, and Sam Malek. 2017. A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products. Proceedings of the 2018 IEEE 26th International Conference on Program Comprehension - ICSE 2018.
[9]
Mehran Jodavi, Mahdi Abadi, and Elham Parhizkar. 2015. JSObfusDetector: A binary PSO-based one-class classifier ensemble to detect obfuscated JavaScript code. Proceedings of the International Symposium on Artificial Intelligence and Signal Processing, AISP 2015(2015), 322-327.
[10]
Scott Kaplan, Ben Livshits, Ben Zorn, Christian Siefert, and Charlie Cursinger. 2011. ”NOFUS: Automatically Detecting” + String.fromCharCode(32) + ”ObFuSCateD ”.toLowerCase() + ”JavaScript Code”. Technical Report.
[11]
Alexandros Kapravelos, Chris Grier, Neha Chachra, Christopher Kruegel, Giovanni Vigna, and Vern Paxson. 2014. Hulk: Eliciting Malicious Behavior in Browser Extensions. Proceedings of the 23rd USENIX Conference on Security.
[12]
Rezwana Karim, Mohan Dhawan, Vinod Ganapathy, and Chung-chieh Shan. 2012. An Analysis of the Mozilla Jetpack Extension Framework. ECOOP 2012 - Object-Oriented Programming - 26th European Conferenc.
[13]
Tobias Lauinger, Abdelberi Chaabane, Sajjad Arshad, William Robertson, Christo Wilson, and Engin Kirda. 2017. Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web. In Network and Distributed System Security Symposium (NDSS). The Internet Society.
[14]
Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 million flows later: large-scale detection of DOM-based XSS. ACM SIGSAC Conference on Computer and Communications Security (CCS).
[15]
Peter Likarish, Eunjin Jung, and Insoon Jo. 2009. Obfuscated malicious JavaScript detection using classification techniques. In 2009 4th International Conference on Malicious and Unwanted Software, MALWARE 2009. 47 - 54.
[16]
Gen Lu and Saumya K. Debray. 2012. Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach. International Conference on Software Security and Reliability (SERE).
[17]
William Melicher, Anupam Das, Mahmood Sharif, Lujo Bauer, and Limin Jia. 2018. Riding out DOMsday: Towards Detecting and Preventing DOM Cross-Site Scripting. Network and Distributed System Security Symposium (NDSS).
[18]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI'16). AAAI Press, 1287-1293. http://dl.acm.org/citation.cfm?id=3015812.3016002
[19]
Keaton Mowery and Hovav Shacham. 2012. Pixel perfect: Fingerprinting canvas in HTML5. Web 2.0 Security & Privacy, (W2SP).
[20]
Erdal Mutlu, Serdar Tasiran, and Benjamin Livshits. 2015. Detecting JavaScript Races that Matter. In European Software Engineering Conference and International Symposium on Foundations of Software Engineering (ESEC/FSE).
[21]
Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker, Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2012. You Are What You Include: Large-scale Evaluation of Remote Javascript Inclusions. In ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM.
[22]
Jibesh Patra, Pooja N. Dixit, and Michael Pradel. 2018. ConflictJS: Finding and Understanding Conflicts Between JavaScript Libraries. In ICSE. 741-751.
[23]
Michael Pradel, Parker Schuh, and Koushik Sen. 2015. TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript. In International Conference on Software Engineering (ICSE).
[24]
Veselin Raychev, Pavol Bielik, Martin T. Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, (POPL). ACM.
[25]
Veselin Raychev, Martin T. Vechev, and Andreas Krause. 2015. Predicting Program Properties from ”Big Code”. In Principles of Programming Languages (POPL). 111-124.
[26]
Andreas Reiter and Alexander Marsalek. 2017. WebRTC: your privacy is at risk. Proceedings of the Symposium on Applied Computing, SAC 2017.
[27]
Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. 2011. The Eval That Men Do. In ECOOP 2011 - Object-Oriented Programming, Mira Mezini (Ed.). Springer Berlin Heidelberg.
[28]
Prateek Saxena, Steve Hanna, Pongsin Poosankam, and Dawn Song. 2010. FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications. Network and Distributed System Security Symposium (NDSS).
[29]
Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In International Conference on Software Engineering (ICSE). 61-72.
[30]
Sooel Son and Vitaly Shmatikov. 2013. The Postman Always Rings Twice: Attacking and Defending postMessage in HTML5 Websites. In Network and Distributed System Security Symposium (NDSS). The Internet Society.
[31]
Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers. In USENIX Security Symposium. 361-376.
[32]
Cristian-Alexandru Staicu, Michael Pradel, and Ben Livshits. 2018. Understanding and Automatically Preventing Injection Attacks on Node.js. In Network and Distributed System Security Symposium (NDSS).
[33]
Bernhard Tellenbach, Sergio Paganoni, and Marc Rennhard. 2016. Detecting Obfuscated JavaScripts using Machine Learning. International Journal on Advances in Security 9, 3 & 4(2016), 196-206.
[34]
Corrado Aaron Visaggio, Giuseppe Antonio Pagin, and Gerardo Canfora. 2013. An empirical study of metric-based methods to detect obfuscated code. International Journal of Security and its Applications 7, 2(2013), 59-74.
[35]
Pei Wang, Qinkun Bao, Li Wang, Shuai Wang, Zhaofeng Chen, and Tao Wei. 2018. Software Protection on the Go: A Large-Scale Empirical Study on Mobile App Obfuscation. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018. ACM, New York, NY, USA, 11.
[36]
Yao Wang, Wan-dong Cai, and Peng-cheng Wei. 2016. A deep learning approach for detecting malicious JavaScript code. 9 (02 2016).
[37]
Wei Xu, Fangfang Zhang, and Sencun Zhu. 2012. The power of obfuscation techniques in malicious JavaScript code: A measurement study. In 7th International Conference on Malicious and Unwanted Software, MALWARE 2012, Fajardo, PR, USA, October 16-18, 2012. 9-16.
[38]
W Xu, F Zhang, and S Zhu. 2013. JStill: Mostly static detection of obfuscated malicious javascript code. Proceedings of the 3rd ACM Conference on Data and Application Security and Privacy (CODASPY) (2013), 117-128. http://www.cse.psu.edu/ ~ szhu/papers/JStill.pdf
[39]
Khaled Yakdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study. IEEE Symposium on Security and Privacy (SP).

Cited By

View all
  • (2024)Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry EnvironmentsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695262(1990-2001)Online publication date: 27-Oct-2024
  • (2024)Detecting and Understanding Self-Deleting JavaScript CodeProceedings of the ACM Web Conference 202410.1145/3589334.3645540(1768-1778)Online publication date: 13-May-2024
  • (2023)Formalizing BPE TokenizationElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.388.4388(16-27)Online publication date: 15-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. empirical study
  2. machine learning
  3. obfuscation
  4. web security

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)6
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry EnvironmentsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695262(1990-2001)Online publication date: 27-Oct-2024
  • (2024)Detecting and Understanding Self-Deleting JavaScript CodeProceedings of the ACM Web Conference 202410.1145/3589334.3645540(1768-1778)Online publication date: 13-May-2024
  • (2023)Formalizing BPE TokenizationElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.388.4388(16-27)Online publication date: 15-Sep-2023
  • (2023)An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript DetectorsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598146(1420-1432)Online publication date: 12-Jul-2023
  • (2023)Jack-in-the-box: An Empirical Study of JavaScript Bundling on the Web and its Security ImplicationsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623140(3198-3212)Online publication date: 15-Nov-2023
  • (2023)Function-Level Code Obfuscation Detection Through Self-Attention-Guided Multi-Representation FusionInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350066334:04(651-673)Online publication date: 11-Dec-2023
  • (2023)JSRevealer: A Robust Malicious JavaScript Detector against Obfuscation2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00041(339-351)Online publication date: Jun-2023
  • (2023)A comprehensive survey of phishing: mediums, intended targets, attack and defence techniques and a novel taxonomyInternational Journal of Information Security10.1007/s10207-023-00768-x23:2(819-848)Online publication date: 19-Oct-2023
  • (2023)A Detector Using Variant Stacked Denoising Autoencoders with Logistic Regression for Malicious JavaScript with ObfuscationsNew Trends in Computer Technologies and Applications10.1007/978-981-19-9582-8_33(374-386)Online publication date: 10-Feb-2023
  • (2022)FP-Radar: Longitudinal Measurement and Early Detection of Browser FingerprintingProceedings on Privacy Enhancing Technologies10.2478/popets-2022-00562022:2(557-577)Online publication date: 3-Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media