research-article

Open access

Blocklist-Forecast: Proactive Domain Blocklisting by Identifying Malicious Hosting Infrastructure

Authors:

Udesh Kumarasinghe,

Mohamed Nabeel,

Charitha ElvitigalaAuthors Info & Claims

RAID '24: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses

Pages 35 - 48

https://doi.org/10.1145/3678890.3678925

Published: 30 September 2024 Publication History

All formats PDF

Abstract

Domain blocklists play an important role in blocking malicious domains reaching users. However, existing blocklists are reactive in nature and slow to react to attacks, by which time the damage is already caused. This is mainly due to the fact that existing blocklists and reputation systems rely on either website content or user interactions with the websites in order to ascertain if a website is malicious. In this work, we explore the possibility of predicting malicious domains proactively, given a seed list of malicious domains from such reactive blocklists. We observe that malicious domains often share the infrastructure utilized for previous attacks, reuse or rotate resources. Leveraging this observation, we selectively crawl passive DNS data to identify domains in the "neighborhood" of seed malicious domains extracted from reactive blocklists. Due to the increased utilization of cloud hosting, not all such domains in the neighborhood are malicious. Further vetting is required to identify unseen malicious domains. Along with the proximity, we identify that hosting and lexical features help distinguish malicious domains from benign ones. We model the infrastructure as a heterogeneous network graph and design a graph neural network to detect malicious domains. Our approach is blocklist-agnostic in that it can work with any blocklist and detect new malicious domains. We demonstrate our approach utilizing 7 month longitudinal data from three popular blocklists, PhishTank, OpenPhish, and VirusTotal. Our experimental results show that, our approach for VirusTotal feed detects 4.7 unseen malicious domains for every seed malicious domain at a very low FPR of 0.059. Further, we observe the concerning trend that 47% of predicted malicious domains that are later flagged in VirusTotal are identified only after more than 3 weeks to months since our model detects them.

References

[1]

2022. Alexa Rank. https://www.alexa.com/ Accessed: 12-10-2023.

[2]

2023. OpenPhish. https://openphish.com/ Accessed: 12-10-2023.

[3]

2023. PhishTank. https://phishtank.org/ Accessed: 12-10-2023.

[4]

Bhupendra Acharya and Phani Vadrevu. 2021. PhishPrint: Evading Phishing Detection Crawlers by Prior Profiling. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3775–3792. https://www.usenix.org/conference/usenixsecurity21/presentation/acharya

[5]

Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information Systems Security 16, 4, Article 14 (April 2014), 28 pages.

Digital Library

[6]

Euijin Choo, Mohamed Nabeel, Doowon Kim, Ravindu De Silva, Ting Yu, and Issa Khalil. 2023. A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs. Proc. ACM Meas. Anal. Comput. Syst. (2023). https://doi.org/10.1145/3626790

Digital Library

[7]

Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 135–144.

Digital Library

[8]

Farsight Security, Inc.2023. DNS Database. https://www.dnsdb.info/ Accessed: 10-01-2023.

[9]

Edona Fasllija, Hasan Ferit Enişer, and Bernd Prünster. 2019. Phish-hook: Detecting phishing certificates using certificate transparency logs. In International Conference on Security and Privacy in Communication Systems. Springer, 320–334.

[10]

Álvaro Feal, Pelayo Vallina, Julien Gamba, Sergio Pastrana, Antonio Nappa, Oliver Hohlfeld, Narseo Vallina-Rodriguez, and Juan Tapiador. 2021. Blocklist Babel: On the Transparency and Dynamics of Open Source Blocklisting. IEEE Transactions on Network and Service Management 18, 2 (2021), 1334–1349. https://doi.org/10.1109/TNSM.2021.3075552

[11]

Mark Felegyhazi, Christian Kreibich, and Vern Paxson. 2010. On the Potential of Proactive Domain Blacklisting. In Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (San Jose, California) (LEET’10). USENIX Association, USA, 6.

Digital Library

[12]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.

[13]

I. Goodfellow, J. Shlens, and C. Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR.

[14]

Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754

Digital Library

[15]

W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS.

[16]

Shuang Hao, Alex Kantchelian, Brad Miller, Vern Paxson, and Nick Feamster. 2016. PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1568–1579.

Digital Library

[17]

L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. 2012. EvilSeed: A Guided Approach to Finding Malicious Web Pages. In 2012 IEEE Symposium on Security and Privacy. 428–442. https://doi.org/10.1109/SP.2012.33

Digital Library

[18]

Taeri Kim, Noseong Park, Jiwon Hong, and Sang-Wook Kim. 2022. Phishing URL Detection: A Network-based Approach Robust to Evasion. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, 1769–1782.

Digital Library

[19]

T. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.

[20]

Neeraj Kumar, Sukhada Ghewari, Harshal Tupsamudre, Manish Shukla, and Sachin Lodha. 2021. When Diversity Meets Hostility: A Study of Domain Squatting Abuse in Online Banking. In 2021 APWG Symposium on Electronic Crime Research (eCrime). 1–15. https://doi.org/10.1109/eCrime54498.2021.9738769

[21]

Udesh Kumarasinghe, Fatih Deniz, and Mohamed Nabeel. 2022. PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning. arXiv preprint arXiv:2203.07969 (2022).

[22]

Zhiping Li, Fangfang Yuan, Yanbing Liu, Cong Cao, Fang Fang, and Jianlong Tan. 2022. Heterogeneous Graph Attention Network for Malicious Domain Detection. In Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Bristol, UK, September 6-9, 2022, Proceedings, Part II(Lecture Notes in Computer Science, Vol. 13530), Elias Pimenidis, Plamen P. Angelov, Chrisina Jayne, Antonios Papaleonidas, and Mehmet Aydin (Eds.). Springer, 506–518. https://doi.org/10.1007/978-3-031-15931-2_42

Digital Library

[23]

Zhicheng Liu, Shuhao Li, Yongzheng Zhang, Xiaochun Yun, and Chengwei Peng. 2020. Ringer: Systematic Mining of Malicious Domains by Dynamic Graph Convolutional Network. In Computational Science – ICCS 2020, Valeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Jack J. Dongarra, Peter M. A. Sloot, Sérgio Brissos, and João Teixeira (Eds.). Springer International Publishing, Cham, 379–398.

[24]

Mohamed Nabeel, Issa M. Khalil, Bei Guan, and Ting Yu. 2020. Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference. ACM Trans. Priv. Secur. 23, 4 (2020).

Digital Library

[25]

Pejman Najafi, Alexander Mühle, Wenzel Pünter, Feng Cheng, and Christoph Meinel. 2019. MalRank: A Measure of Maliciousness in SIEM-Based Knowledge Graphs. In Proceedings of the 35th Annual Computer Security Applications Conference (San Juan, Puerto Rico, USA) (ACSAC ’19). Association for Computing Machinery, New York, NY, USA, 417–429. https://doi.org/10.1145/3359789.3359791

Digital Library

[26]

Adam Oest, Penghui Zhang, Brad Wardman, Eric Nunes, Jakub Burgis, Ali Zand, Kurt Thomas, Adam Doupé, and Gail-Joon Ahn. 2020. Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 361–377. https://www.usenix.org/conference/usenixsecurity20/presentation/oest-sunrise

[27]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[28]

Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In IMC.

[29]

M. Schlichtkrull, T. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web. 593–607.

[30]

T. Schmidt, H. Şirin, D. Zügner, A. Bojchevski, and S. Günnemann. 2021. Robustness of Graph Neural Networks at Scale. In NeurIPS.

[31]

M. Sharif, J. Urakawa, N. Christin, A. Kubota, and A. Yamada. 2018. Predicting Impending Exposure to Malicious Content from User Behavior. In CCS. 1487–1501.

[32]

Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17–37. https://doi.org/10.1109/TKDE.2016.2598561

Digital Library

[33]

Ravindu De Silva, Mohamed Nabeel, Charith Elvitigala, Issa Khalil, Ting Yu, and Chamath Keppitiyagama. 2021. Compromised or Attacker-Owned: A Large Scale Classification and Study of Hosting Domains of Malicious URLs. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3721–3738. https://www.usenix.org/conference/usenixsecurity21/presentation/desilva

[34]

Fabio Soldo, Anh Le, and Athina Markopoulou. 2010. Predictive Blacklisting as an Implicit Recommendation System. In 2010 Proceedings IEEE INFOCOM. 1–9. https://doi.org/10.1109/INFCOM.2010.5461982

[35]

Xiaoqing Sun, Mingkai Tong, Jiahai Yang, Liu Xinran, and Liu Heng. 2019. HinDom: A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing, 399–412. https://www.usenix.org/conference/raid2019/presentation/sun

[36]

Xiaoqing Sun, Zhiliang Wang, Jiahai Yang, and Xinran Liu. 2020. Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks. Computers and Security 99 (2020), 102057. https://doi.org/10.1016/j.cose.2020.102057

[37]

Xiaoqing Sun, Jiahai Yang, Zhiliang Wang, and Heng Liu. 2020. HGDom: Heterogeneous Graph Convolutional Networks for Malicious Domain Detection. In IEEE/IFIP NOMS. 1–9. https://doi.org/10.1109/NOMS47738.2020.9110462

Digital Library

[38]

Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax. In International Conference on Learning Representations. https://openreview.net/forum?id=rklz9iAcKQ

[39]

L. Wang, A. Nappa, J. Caballero, T. Ristenpart, and A. Akella. 2014. WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds. In Proceedings of the 2014 IMC. 101–114.

[40]

Florian Weimer. 2005. Passive DNS Replication. In FIRST Conference on Computer Security Incident. 98.

[41]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 32, 1 (2021), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386

[42]

Jian Zhang, Phillip Porras, and Johannes Ullrich. 2008. Highly Predictive Blacklisting. In Proceedings of the 17th Conference on Security Symposium (San Jose, CA) (SS’08). USENIX Association, USA, 107–122.

Digital Library

[43]

Shuai Zhang, Zhou Zhou, Da Li, Youbing Zhong, Qingyun Liu, Wei Yang, and Shu Li. 2021. Attributed Heterogeneous Graph Neural Network for Malicious Domain Detection. In 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). 397–403.

[44]

Yury Zhauniarovich, Issa Khalil, Ting Yu, and Marc Dacier. 2018. A Survey on Malicious Domains Detection through DNS Data Analysis. ACM Comput. Surv. 51, 4, Article 67 (jul 2018), 36 pages. https://doi.org/10.1145/3191329

Digital Library

Index Terms

Blocklist-Forecast: Proactive Domain Blocklisting by Identifying Malicious Hosting Infrastructure
1. Networks
  1. Network properties
    1. Network security
      1. Web protocol security
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
    2. Social engineering attacks
      1. Phishing

Recommendations

Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference

Malicious domains, including phishing websites, spam servers, and command and control servers, are the reason for many of the cyber attacks nowadays. Thus, detecting them in a timely manner is important to not only identify cyber attacks but also take ...
Discovering Malicious Domains through Passive DNS Data Graph Analysis
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security

Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain ...
Co-Clustering Host-Domain Graphs to Discover Malware Infection
AIAM 2019: Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing

Malware is at root of most of cyber-attacks, which has led to billions of dollars in damage every year. Most malware, especially Advanced Persistent Threat (APT) malware make use of Domain Name System (DNS) to control compromised machines and steal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

RAID '24: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses

September 2024

719 pages

ISBN:9798400709593

DOI:10.1145/3678890

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RAID '24

RAID '24: The 27th International Symposium on Research in Attacks, Intrusions and Defenses

September 30 - October 2, 2024

Padua, Italy

Acceptance Rates

RAID '24 Paper Acceptance Rate 43 of 173 submissions, 25%;

Overall Acceptance Rate 43 of 173 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
477
Total Downloads

Downloads (Last 12 months)477
Downloads (Last 6 weeks)210

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten