Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3678890.3678925acmotherconferencesArticle/Chapter ViewAbstractPublication PagesraidConference Proceedingsconference-collections
research-article
Open access

Blocklist-Forecast: Proactive Domain Blocklisting by Identifying Malicious Hosting Infrastructure

Published: 30 September 2024 Publication History

Abstract

Domain blocklists play an important role in blocking malicious domains reaching users. However, existing blocklists are reactive in nature and slow to react to attacks, by which time the damage is already caused. This is mainly due to the fact that existing blocklists and reputation systems rely on either website content or user interactions with the websites in order to ascertain if a website is malicious. In this work, we explore the possibility of predicting malicious domains proactively, given a seed list of malicious domains from such reactive blocklists. We observe that malicious domains often share the infrastructure utilized for previous attacks, reuse or rotate resources. Leveraging this observation, we selectively crawl passive DNS data to identify domains in the "neighborhood" of seed malicious domains extracted from reactive blocklists. Due to the increased utilization of cloud hosting, not all such domains in the neighborhood are malicious. Further vetting is required to identify unseen malicious domains. Along with the proximity, we identify that hosting and lexical features help distinguish malicious domains from benign ones. We model the infrastructure as a heterogeneous network graph and design a graph neural network to detect malicious domains. Our approach is blocklist-agnostic in that it can work with any blocklist and detect new malicious domains. We demonstrate our approach utilizing 7 month longitudinal data from three popular blocklists, PhishTank, OpenPhish, and VirusTotal. Our experimental results show that, our approach for VirusTotal feed detects 4.7 unseen malicious domains for every seed malicious domain at a very low FPR of 0.059. Further, we observe the concerning trend that 47% of predicted malicious domains that are later flagged in VirusTotal are identified only after more than 3 weeks to months since our model detects them.

References

[1]
2022. Alexa Rank. https://www.alexa.com/ Accessed: 12-10-2023.
[2]
2023. OpenPhish. https://openphish.com/ Accessed: 12-10-2023.
[3]
2023. PhishTank. https://phishtank.org/ Accessed: 12-10-2023.
[4]
Bhupendra Acharya and Phani Vadrevu. 2021. PhishPrint: Evading Phishing Detection Crawlers by Prior Profiling. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3775–3792. https://www.usenix.org/conference/usenixsecurity21/presentation/acharya
[5]
Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information Systems Security 16, 4, Article 14 (April 2014), 28 pages.
[6]
Euijin Choo, Mohamed Nabeel, Doowon Kim, Ravindu De Silva, Ting Yu, and Issa Khalil. 2023. A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs. Proc. ACM Meas. Anal. Comput. Syst. (2023). https://doi.org/10.1145/3626790
[7]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 135–144.
[8]
Farsight Security, Inc.2023. DNS Database. https://www.dnsdb.info/ Accessed: 10-01-2023.
[9]
Edona Fasllija, Hasan Ferit Enişer, and Bernd Prünster. 2019. Phish-hook: Detecting phishing certificates using certificate transparency logs. In International Conference on Security and Privacy in Communication Systems. Springer, 320–334.
[10]
Álvaro Feal, Pelayo Vallina, Julien Gamba, Sergio Pastrana, Antonio Nappa, Oliver Hohlfeld, Narseo Vallina-Rodriguez, and Juan Tapiador. 2021. Blocklist Babel: On the Transparency and Dynamics of Open Source Blocklisting. IEEE Transactions on Network and Service Management 18, 2 (2021), 1334–1349. https://doi.org/10.1109/TNSM.2021.3075552
[11]
Mark Felegyhazi, Christian Kreibich, and Vern Paxson. 2010. On the Potential of Proactive Domain Blacklisting. In Proceedings of the 3rd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (San Jose, California) (LEET’10). USENIX Association, USA, 6.
[12]
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
[13]
I. Goodfellow, J. Shlens, and C. Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In ICLR.
[14]
Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754
[15]
W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS.
[16]
Shuang Hao, Alex Kantchelian, Brad Miller, Vern Paxson, and Nick Feamster. 2016. PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1568–1579.
[17]
L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. 2012. EvilSeed: A Guided Approach to Finding Malicious Web Pages. In 2012 IEEE Symposium on Security and Privacy. 428–442. https://doi.org/10.1109/SP.2012.33
[18]
Taeri Kim, Noseong Park, Jiwon Hong, and Sang-Wook Kim. 2022. Phishing URL Detection: A Network-based Approach Robust to Evasion. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, 1769–1782.
[19]
T. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
[20]
Neeraj Kumar, Sukhada Ghewari, Harshal Tupsamudre, Manish Shukla, and Sachin Lodha. 2021. When Diversity Meets Hostility: A Study of Domain Squatting Abuse in Online Banking. In 2021 APWG Symposium on Electronic Crime Research (eCrime). 1–15. https://doi.org/10.1109/eCrime54498.2021.9738769
[21]
Udesh Kumarasinghe, Fatih Deniz, and Mohamed Nabeel. 2022. PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning. arXiv preprint arXiv:2203.07969 (2022).
[22]
Zhiping Li, Fangfang Yuan, Yanbing Liu, Cong Cao, Fang Fang, and Jianlong Tan. 2022. Heterogeneous Graph Attention Network for Malicious Domain Detection. In Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Bristol, UK, September 6-9, 2022, Proceedings, Part II(Lecture Notes in Computer Science, Vol. 13530), Elias Pimenidis, Plamen P. Angelov, Chrisina Jayne, Antonios Papaleonidas, and Mehmet Aydin (Eds.). Springer, 506–518. https://doi.org/10.1007/978-3-031-15931-2_42
[23]
Zhicheng Liu, Shuhao Li, Yongzheng Zhang, Xiaochun Yun, and Chengwei Peng. 2020. Ringer: Systematic Mining of Malicious Domains by Dynamic Graph Convolutional Network. In Computational Science – ICCS 2020, Valeria V. Krzhizhanovskaya, Gábor Závodszky, Michael H. Lees, Jack J. Dongarra, Peter M. A. Sloot, Sérgio Brissos, and João Teixeira (Eds.). Springer International Publishing, Cham, 379–398.
[24]
Mohamed Nabeel, Issa M. Khalil, Bei Guan, and Ting Yu. 2020. Following Passive DNS Traces to Detect Stealthy Malicious Domains Via Graph Inference. ACM Trans. Priv. Secur. 23, 4 (2020).
[25]
Pejman Najafi, Alexander Mühle, Wenzel Pünter, Feng Cheng, and Christoph Meinel. 2019. MalRank: A Measure of Maliciousness in SIEM-Based Knowledge Graphs. In Proceedings of the 35th Annual Computer Security Applications Conference (San Juan, Puerto Rico, USA) (ACSAC ’19). Association for Computing Machinery, New York, NY, USA, 417–429. https://doi.org/10.1145/3359789.3359791
[26]
Adam Oest, Penghui Zhang, Brad Wardman, Eric Nunes, Jakub Burgis, Ali Zand, Kurt Thomas, Adam Doupé, and Gail-Joon Ahn. 2020. Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 361–377. https://www.usenix.org/conference/usenixsecurity20/presentation/oest-sunrise
[27]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[28]
Quirin Scheitle, Oliver Hohlfeld, Julien Gamba, Jonas Jelten, Torsten Zimmermann, Stephen D. Strowes, and Narseo Vallina-Rodriguez. 2018. A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists. In IMC.
[29]
M. Schlichtkrull, T. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web. 593–607.
[30]
T. Schmidt, H. Şirin, D. Zügner, A. Bojchevski, and S. Günnemann. 2021. Robustness of Graph Neural Networks at Scale. In NeurIPS.
[31]
M. Sharif, J. Urakawa, N. Christin, A. Kubota, and A. Yamada. 2018. Predicting Impending Exposure to Malicious Content from User Behavior. In CCS. 1487–1501.
[32]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17–37. https://doi.org/10.1109/TKDE.2016.2598561
[33]
Ravindu De Silva, Mohamed Nabeel, Charith Elvitigala, Issa Khalil, Ting Yu, and Chamath Keppitiyagama. 2021. Compromised or Attacker-Owned: A Large Scale Classification and Study of Hosting Domains of Malicious URLs. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3721–3738. https://www.usenix.org/conference/usenixsecurity21/presentation/desilva
[34]
Fabio Soldo, Anh Le, and Athina Markopoulou. 2010. Predictive Blacklisting as an Implicit Recommendation System. In 2010 Proceedings IEEE INFOCOM. 1–9. https://doi.org/10.1109/INFCOM.2010.5461982
[35]
Xiaoqing Sun, Mingkai Tong, Jiahai Yang, Liu Xinran, and Liu Heng. 2019. HinDom: A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, Chaoyang District, Beijing, 399–412. https://www.usenix.org/conference/raid2019/presentation/sun
[36]
Xiaoqing Sun, Zhiliang Wang, Jiahai Yang, and Xinran Liu. 2020. Deepdom: Malicious domain detection with scalable and heterogeneous graph convolutional networks. Computers and Security 99 (2020), 102057. https://doi.org/10.1016/j.cose.2020.102057
[37]
Xiaoqing Sun, Jiahai Yang, Zhiliang Wang, and Heng Liu. 2020. HGDom: Heterogeneous Graph Convolutional Networks for Malicious Domain Detection. In IEEE/IFIP NOMS. 1–9. https://doi.org/10.1109/NOMS47738.2020.9110462
[38]
Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax. In International Conference on Learning Representations. https://openreview.net/forum?id=rklz9iAcKQ
[39]
L. Wang, A. Nappa, J. Caballero, T. Ristenpart, and A. Akella. 2014. WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds. In Proceedings of the 2014 IMC. 101–114.
[40]
Florian Weimer. 2005. Passive DNS Replication. In FIRST Conference on Computer Security Incident. 98.
[41]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 32, 1 (2021), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386
[42]
Jian Zhang, Phillip Porras, and Johannes Ullrich. 2008. Highly Predictive Blacklisting. In Proceedings of the 17th Conference on Security Symposium (San Jose, CA) (SS’08). USENIX Association, USA, 107–122.
[43]
Shuai Zhang, Zhou Zhou, Da Li, Youbing Zhong, Qingyun Liu, Wei Yang, and Shu Li. 2021. Attributed Heterogeneous Graph Neural Network for Malicious Domain Detection. In 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD). 397–403.
[44]
Yury Zhauniarovich, Issa Khalil, Ting Yu, and Marc Dacier. 2018. A Survey on Malicious Domains Detection through DNS Data Analysis. ACM Comput. Surv. 51, 4, Article 67 (jul 2018), 36 pages. https://doi.org/10.1145/3191329

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RAID '24: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses
September 2024
719 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024

Check for updates

Author Tags

  1. domain association
  2. graph learning
  3. malicious domains
  4. passive DNS

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RAID '24

Acceptance Rates

RAID '24 Paper Acceptance Rate 43 of 173 submissions, 25%;
Overall Acceptance Rate 43 of 173 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 477
    Total Downloads
  • Downloads (Last 12 months)477
  • Downloads (Last 6 weeks)210
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media