Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detecting Android Malware and Classifying Its Families in Large-scale Datasets

Published: 18 October 2021 Publication History

Abstract

To relieve the burden of security analysts, Android malware detection and its family classification need to be automated. There are many previous works focusing on using machine (or deep) learning technology to tackle these two important issues, but as the number of mobile applications has increased in recent years, developing a scalable and precise solution is a new challenge that needs to be addressed in the security field. Accordingly, in this article, we propose a novel approach that not only enhances the performance of both Android malware and its family classification, but also reduces the running time of the analysis process. Using large-scale datasets obtained from different sources, we demonstrate that our method is able to output a high F-measure of 99.71% with a low FPR of 0.37%. Meanwhile, the computation time for processing a 300K dataset is reduced to nearly 3.3 hours. In addition, in classification evaluation, we demonstrate that the F-measure, precision, and recall are 97.5%, 96.55%, 98.64%, respectively, when classifying 28 malware families. Finally, we compare our method with previous studies in both detection and classification evaluation. We observe that our method produces better performance in terms of its effectiveness and efficiency.

References

[1]
2002. MALLET Documentation. Retrieved from https://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-40/Nice/Urdu-MT/code/Tools/POS/postagger/mallet_0.4/doc/documentation.html.
[2]
2011. Dedxer. Retrieved from http://dedexer.sourceforge.net/.
[3]
2018. McAfee Labs Threats Report June 2018. Retrieved from https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-jun-2018.pdf. (2018).
[4]
2018. The Statistics Portal. Retrieved from https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-operating-systems/.
[5]
2019. Google Play Store. Retrieved from https://play.google.com/store.
[6]
2019. Opera Mobile Store - Bemobi. http://android.oms.apps.bemobi.com/.
[7]
2019. scikit-learn:machine learning in Python. http://scikit-learn.org/stable/.
[8]
2019. TensorFlow. Retrieved from https://www.tensorflow.org.
[9]
2019. VirusTotal- Free Online Virus, Malware and URL Scanner. Retrieved from https://www.virustotal.com.
[10]
Yousra Aafer, Wenliang Du, and Heng Yin. 2013. DroidAPIMiner: Mining API-Level features for robust malware detection in Android. In Proceedings of the 9th International ICST Conference on Security and Privacy in Communication Networks. 86–103.
[11]
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. AndroZoo: Collecting millions of Android apps for the research community. In Proceedings of the 13th International Conference on Mining Software Repositories, Miryung Kim, Romain Robbes, and Christian Bird (Eds.). ACM, 468–471.
[12]
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and explainable detection of Android malware in your pocket. In Proceedings of the 21st Annual Network and Distributed System Security Symposium.
[13]
Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, Antonia Bertolino, Gerardo Canfora, and Sebastian G. Elbaum (Eds.). IEEE Computer Society, 426–436.
[14]
Benjamin Bichsel, Veselin Raychev, Petar Tsankov, and Martin T. Vechev. 2016. Statistical deobfuscation of Android applications. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi (Eds.). ACM, 343–355.
[15]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993–1022.
[16]
Lorenzo Cavallaro David Korczynski. 2015. ClusTheDroid: Clustering Android Malware. Master’s thesis. Royal Holloway University of London.
[17]
Ming Fan, Jun Liu, Xiapu Luo, Kai Chen, Zhenzhou Tian, Qinghua Zheng, and Ting Liu. 2018. Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans. Inf. Forens. Secur. 13, 8 (2018), 1890–1905.
[18]
Joshua Garcia, Mahmoud Hammad, and Sam Malek. 2018. Lightweight, obfuscation-resilient detection and family identification of Android malware. ACM Trans. Softw. Eng. Methodol. 26, 3 (2018), 11:1–11:29.
[19]
Hugo Gascon, Fabian Yamaguchi, Daniel Arp, and Konrad Rieck. 2013. Structural detection of Android malware using embedded call graphs. In Proceedings of the ACM Workshop on Artificial Intelligence and Security, Co-located with CCS 2013, Ahmad-Reza Sadeghi, Blaine Nelson, Christos Dimitrakakis, and Elaine Shi (Eds.). ACM, 45–54.
[20]
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th International Conference on Software Engineering. 1025–1035.
[21]
Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. 2016. Deep4MalDroid: A deep learning framework for Android malware detection based on Linux kernel system call graphs. In Proceedings of the International Conference on Web Intelligence - Workshops. 104–111.
[22]
Quentin Jérome, Kevin Allix, Radu State, and Thomas Engel. 2014. Using opcode-sequences to detect malicious Android applications. In Proceedings of the IEEE International Conference on Communications. 914–919.
[23]
J. Macqueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 281–297.
[24]
Byeongho Kang, Boojoong Kang, Jungtae Kim, and Eul Gyu Im. 2013. Android malware classification method: Dalvik bytecode frequency analysis. In Proceedings of the Research in Adaptive and Convergent Systems Conference. 349–350.
[25]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.
[26]
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.
[27]
Wenjia Li, Zi Wang, Juecong Cai, and Sihua Cheng. 2018. An Android malware detection approach using weight-adjusted deep learning. In Proceedings of the International Conference on Computing, Networking and Communications. 437–441.
[28]
Yuping Li, Jiyong Jang, Xin Hu, and Xinming Ou. 2017. Android malware clustering through malicious payload mining. In Proceedings of the 20th International Symposium on Research in Attacks, Intrusions, and Defenses. 192–214.
[29]
Niall McLaughlin, Jesús Martínez del Rincón, BooJoong Kang, Suleiman Y. Yerima, Paul C. Miller, Sakir Sezer, Yeganeh Safaei, Erik Trickel, Ziming Zhao, Adam Doupé, and Gail-Joon Ahn. 2017. Deep Android malware detection. In Proceedings of the 7th ACM Conference on Data and Application Security and Privacy. 301–308.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Advances in Neural Information Processing Systems.3111–3119.
[31]
Abdurrahman Pektas, Mahmut Çavdar, and Tankut Acarman. 2016. Android malware classification by applying online machine learning. In Proceedings of the 31st International Symposium on Computer and Information Sciences. 72–80.
[32]
Hao Peng, Christopher S. Gates, Bhaskar Pratim Sarma, Ninghui Li, Yuan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. 2012. Using probabilistic generative models for ranking risks of Android apps. In Proceedings of the ACM Conference on Computer and Communications Security. 241–252.
[33]
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. 45–50. Retrieved from http://is.muni.cz/publication/884893/en.
[34]
Justin Sahs and Latifur Khan. 2012. A machine learning approach to Android malware detection. In Proceedings of the European Intelligence and Security Informatics Conference. 141–147.
[35]
Andrea Saracino, Daniele Sgandurra, Gianluca Dini, and Fabio Martinelli. 2018. MADAM: Effective and efficient behavior-based Android malware detection and prevention. IEEE Trans. Depend. Sec. Comput. 15, 1 (2018), 83–97.
[36]
Xin Su, Dafang Zhang, Wenjia Li, and Kai Zhao. 2016. A deep learning approach to Android malware feature learning and detection. In Proceedings of the IEEE Trustcom/BigDataSE/ISPA Conference. 244–251.
[37]
Guillermo Suarez-Tangil, Juan E. Tapiador, Pedro Peris-Lopez, and Jorge Blasco Alís. 2014. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst. Appl. 41, 4 (2014), 1104–1117.
[38]
Takeshi Takahashi and Tao Ban. 2019. Android Application Analysis Using Machine Learning Techniques. Springer International Publishing, Cham, 181–205. DOI:https://doi.org/10.1007/978-3-319-98842-9_7
[39]
Kimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015. CopperDroid: Automatic reconstruction of Android malware behaviors. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium.
[40]
Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep ground truth analysis of current Android malware. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 252–276.
[41]
Fengguo Wei, Xingwei Lin, Xinming Ou, Ting Chen, and Xiaosong Zhang. 2018. JN-SAF: Precise and efficient NDK/JNI-aware inter-language static analysis framework for security vetting of Android applications with native code. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1137–1150.
[42]
Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee, and Kuo-Ping Wu. 2012. DroidMat: Android malware detection through manifest and API calls tracing. In Proceedings of the 7th Asia Joint Conference on Information Security. IEEE Computer Society, 62–69.
[43]
Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-Sec: Deep learning in Android malware detection. In Proceedings of the ACM SIGCOMM Conference. 371–372.
[44]
Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android malware: Characterization and evolution. In Proceedings of the IEEE Symposium on Security and Privacy. 95–109.

Cited By

View all
  • (2023)Android Malware Category and Family Classification Using Static Analysis2023 International Conference on Information Networking (ICOIN)10.1109/ICOIN56518.2023.10049039(162-167)Online publication date: 11-Jan-2023
  • (2022)Lightweight Privacy-Preserving Ride-Sharing Protocols for Autonomous CarsProceedings of the 6th ACM Computer Science in Cars Symposium10.1145/3568160.3570234(1-11)Online publication date: 8-Dec-2022
  • (2022)Family Classification of Malicious Applications using Hybrid Analysis and Computationally Economical Machine Learning Techniques2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00072(442-449)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems
ACM Transactions on Management Information Systems  Volume 13, Issue 2
June 2022
261 pages
ISSN:2158-656X
EISSN:2158-6578
DOI:10.1145/3483345
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2021
Accepted: 01 April 2021
Revised: 01 March 2021
Received: 01 November 2019
Published in TMIS Volume 13, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Android malware
  2. machine learning
  3. deep learning
  4. natural language processing
  5. semantic-aware

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)93
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Android Malware Category and Family Classification Using Static Analysis2023 International Conference on Information Networking (ICOIN)10.1109/ICOIN56518.2023.10049039(162-167)Online publication date: 11-Jan-2023
  • (2022)Lightweight Privacy-Preserving Ride-Sharing Protocols for Autonomous CarsProceedings of the 6th ACM Computer Science in Cars Symposium10.1145/3568160.3570234(1-11)Online publication date: 8-Dec-2022
  • (2022)Family Classification of Malicious Applications using Hybrid Analysis and Computationally Economical Machine Learning Techniques2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00072(442-449)Online publication date: Nov-2022
  • (2022)Ensemble Framework Combining Family Information for Android Malware DetectionThe Computer Journal10.1093/comjnl/bxac11466:11(2721-2740)Online publication date: 20-Aug-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media