research-article

Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

Authors:

Ibrahim Alabdulmohsin,

XiangLiang ZhangAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 2395 - 2400

https://doi.org/10.1145/2983323.2983700

Published: 24 October 2016 Publication History

Abstract

Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

References

[1]

Y. Bengio, O. Delalleau, and N. L. Roux. Label Propagation and Quadratic Criterion, pages 193--216. MIT Press, 2006.

[2]

J. Caballero, C. Grier, C. Kreibich, and V. Paxson. Measuring pay-per-install: The commoditization of malware distribution. In USENIX Conference on Security, 2011.

Digital Library

[3]

M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv., 44(2), 2012.

Digital Library

[4]

L. Invernizzi and P. M. Comparetti. Evilseed: A guided approach to finding malicious web pages. In IEEE Security and Privacy, 2012.

Digital Library

[5]

L. Invernizzi, S. Miskovic, R. Torres, C. Kruegel, S. Saha, G. Vigna, S. Lee, and M. Mellia. NAZCA: Detecting malware distribution in large-scale networks. In NDSS, 2014.

[6]

J. Jang, D. Brumley, and S. Venkataraman. Bitshred: Feature hashing malware for scalable triage and semantic analysis. In ACM CCS, pages 309--320, 2011.

Digital Library

[7]

A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection of evasive web-based malware. In USENIX Security, 2013.

Digital Library

[8]

D. Kirat, G. Vigna, and C. Kruegel. Barecloud: bare-metal analysis-based evasive malware detection. In USENIX Security, 2014.

Digital Library

[9]

B. J. Kwon, J. Mondal, J. Jang, L. Bilge, and T. Dumitras. The dropper effect: Insights into malware distribution with downloader graph analytics. In ACM CCS, pages 1118--1129, 2015.

Digital Library

[10]

Z. Li, S. Alrwais, Y. Xie, F. Yu, and X. Wang. Finding the linchpins of the dark web: A study on topologically dedicated hosts on malicious web infrastructures. In IEEE Security and Privacy, 2013.

Digital Library

[11]

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying suspicious urls: An application of large-scale online learning. In ICML, 2009.

Digital Library

[12]

T. Nelms, R. Perdisci, M. Antonakakis, and M. Ahamad. Webwitness: Investigating, categorizing, and mitigating malware download paths. In USENIX Security, 2015.

Digital Library

[13]

J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. NIPS, 10(3):61--74, 1999.

[14]

C. Rossow, C. Dietrich, and H. Bos. Large-scale analysis of malware downloaders. In DIMVA. 2013.

Digital Library

[15]

D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security. 2008.

Digital Library

[16]

A. Tamersoy, K. Roundy, and D. H. Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In SIGKDD, 2014.

Digital Library

[17]

Y. Yamaguchi, C. Faloutsos, and H. Kitagawa. Socnl: Bayesian label propagation with confidence. In PAKDD, 2015.

[18]

J. Zhang, C. Seifert, J. W. Stokes, and W. Lee. Arrow: Generating signatures to detect drive-by downloads. In WWW, 2011.

Digital Library

[19]

X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.

Digital Library

Cited By

Mvula PBranco PJourdan GViktor H(2024)A Survey on the Applications of Semi-Supervised Learning to Cyber-SecurityACM Computing Surveys10.1145/3657647Online publication date: 11-Apr-2024
https://dl.acm.org/doi/10.1145/3657647
Yan BYang CShi CFang YLi QYe YDu J(2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3610228
Chae DPark SKim EHong JKim S(2021)Identifying the Author Group of Malwares through Graph Embedding and Human-in-the-Loop ClassificationApplied Sciences10.3390/app1114664011:14(6640)Online publication date: 20-Jul-2021
https://doi.org/10.3390/app11146640
Show More Cited By

Index Terms

Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph

Recommendations

Opcode sequences as representation of executables for data-mining-based unknown malware detection

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. The volume of malware is growing faster every year and poses a serious global security threat. Consequently, malware detection has become a ...
A framework for metamorphic malware analysis and real-time detection

Metamorphism is a technique that mutates the binary code using different obfuscations. It is difficult to write a new metamorphic malware and in general malware writers reuse old malware. To hide detection the malware writers change the obfuscations (...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
205
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mvula PBranco PJourdan GViktor H(2024)A Survey on the Applications of Semi-Supervised Learning to Cyber-SecurityACM Computing Surveys10.1145/3657647Online publication date: 11-Apr-2024
https://dl.acm.org/doi/10.1145/3657647
Yan BYang CShi CFang YLi QYe YDu J(2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3610228
Chae DPark SKim EHong JKim S(2021)Identifying the Author Group of Malwares through Graph Embedding and Human-in-the-Loop ClassificationApplied Sciences10.3390/app1114664011:14(6640)Online publication date: 20-Jul-2021
https://doi.org/10.3390/app11146640
Zhang SZhou ZLi DZhong YLiu QYang WLi S(2021)Attributed Heterogeneous Graph Neural Network for Malicious Domain Detection2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD49262.2021.9437852(397-403)Online publication date: 5-May-2021
https://doi.org/10.1109/CSCWD49262.2021.9437852
Hong JPark SKim TNoh YKim SKim DKim WHung CChen QXie XEsposito CHuang JPark JZhang Q(2019)Malware classification for identifying author groupsProceedings of the Conference on Research in Adaptive and Convergent Systems10.1145/3338840.3355684(169-174)Online publication date: 24-Sep-2019
https://dl.acm.org/doi/10.1145/3338840.3355684
Mao WCai ZZeng BGuan X(2019)Learning edge weights in file co-occurrence graphs for malware detectionData Mining and Knowledge Discovery10.1007/s10618-018-0593-733:1(168-203)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10618-018-0593-7
Tan GZhang PLiu QLiu XZhu CGuo L(2018)MalFilter: A Lightweight Real-Time Malicious URL Filtering System in Large-Scale Networks2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00089(565-571)Online publication date: Dec-2018
https://doi.org/10.1109/BDCloud.2018.00089
Mao WCai ZYang YShi XGuan X(2018)From big data to knowledgeComputers and Security10.1016/j.cose.2017.12.00574:C(167-183)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1016/j.cose.2017.12.005
Hong JPark SKim SKim DKim W(2017)Classifying malwares for identification of author groupsConcurrency and Computation: Practice and Experience10.1002/cpe.419730:3Online publication date: 31-Jul-2017
https://doi.org/10.1002/cpe.4197

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents