research-article

DarkVec: automatic analysis of darknet traffic with word embeddings

Authors:

Luca Gioacchini,

Zied Ben Houidi,

Dario RossiAuthors Info & Claims

CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies

Pages 76 - 89

https://doi.org/10.1145/3485983.3494863

Published: 03 December 2021 Publication History

Abstract

Darknets are passive probes listening to traffic reaching IP addresses that host no services. Traffic reaching them is unsolicited by nature and often induced by scanners, malicious senders and misconfigured hosts. Its peculiar nature makes it a valuable source of information to learn about malicious activities. However, the massive amount of packets and sources that reach darknets makes it hard to extract meaningful insights. In particular, multiple senders contact the darknet while performing similar and coordinated tasks, which are often commanded by common controllers (botnets, crawlers, etc.). How to automatically identify and group those senders that share similar behaviors remains an open problem.

We here introduce DarkVec, a methodology to identify clusters of senders (i.e., IP addresses) engaged in similar activities on darknets. DarkVec leverages word embedding techniques (e.g., Word2Vec) to capture the co-occurrence patterns of sources hitting the darknets. We extensively test DarkVec and explore its design space in a case study using one month of darknet data. We show that with a proper definition of service, the generated embeddings can be easily used to (i) associate unknown senders' IP addresses to the correct known labels (more than 96% accuracy), and (ii) identify new attack and scan groups of previously unknown senders. We contribute DarkVec source code and datasets to the community also to stimulate the use of word embeddings to automatically learn patterns on generic traffic traces.

Supplementary Material

MP4 File (3494863-presentation.mp4)

DarkVec: Automatic Analysis of Darknet Traffic with Word Embeddings

Download
253.59 MB

MP4 File (3494863-long.mp4)

DarkVec: Automatic Analysis of Darknet Traffic with Word Embeddings - Extended version

Download
249.56 MB

References

[1]

2021. AbuseIPDB - IP address abuse reports - Making the Internet safer, one IP at a time. https://www.abuseipdb.com/.

[2]

2021. The Best IP Geolocation Database: IPIP.NET. https://en.ipip.net/.

[3]

2021. BinaryEdge. https://www.binaryedge.io/.

[4]

2021. Censys. https://censys.io/.

[5]

2021. FireHOL IP Lists - IP Blacklists - IP Blocklists - IP Reputation. http://iplists.firehol.org/.

[6]

2021. Gensim, Topic modelling for humans. https://radimrehurek.com/gensim/.

[7]

2021. GreyNoise. https://greynoise.io/.

[8]

2021. Internet Census Group. https://www.internet-census.org/home.html.

[9]

2021. Michigan Engineering - University of Michigan College of Engineering. https://www.engin.umich.edu/.

[10]

2021. Project Sonar. https://www.rapid7.com/research/project-sonar/.

[11]

2021. The Shadowserver Foundation. https://www.shadowserver.org/.

[12]

2021. Sharashka Data Feeds - Security Data That Works. https://sharashka.io/data-feeds.

[13]

2021. Shodan, the search engine. https://www.shodan.io/.

[14]

2021. SPort 5555 (tcp/udp) Attack Activity. https://isc.sans.edu/port.html?port=5555.

[15]

2021. Stretchoid Opt-Out. http://www.stretchoid.com/.

[16]

Charu C Aggarwal. 2015. Data mining: the textbook. Springer.

[17]

K. Benson, A. Dainotti, K. Claffy, A. Snoeren, and M. Kallitsis. 2015. Leveraging Internet Background Radiation for Opportunistic Network Analysis. In Proceedings of the ACM Internet Measurement Conference (IMC'15). 423--436. http://dl.acm.org/citation.cfm?doid=2815675.2815702

[18]

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.

[19]

Joao Ceron, Klaus Steding-Jessen, Cristine Hoepers, Lisandro Granville, and Cintia Margi. 2019. Improving IoT Botnet Investigation Using an Adaptive Network Layer. Sensors 19 (02 2019), 727.

[20]

Dvir Cohen, Yisroel Mirsky, Yuval Elovici, Rami Puzis, Manuel Kamp, Tobias Martin, and Asaf Shabtai. [n.d.]. DANTE: A Framework for Mining and Monitoring Darknet Traffic. 88--109. http://arxiv.org/abs/2003.02575

[21]

A. Dainotti, A. King, K. Claffy, F. Papale, and A. Pescape. 2015. Analysis of a "/0" Stealth Scan From a Botnet. IEEE/ACM Trans. Netw. 23, 2 (2015), 341--354.

Digital Library

[22]

Z. Durumeric, M. Bailey, and J. Halderman. 2014. An Internet-Wide View of Internet-Wide Scanning. In Proceedings of the 23rd USENIX Conference on Security Symposiu (SEC'14). 65--78. http://dl.acm.org/citation.cfm?id=2671225.2671230

[23]

C. Fachkha, E. Bou-Harb, and M. Debbabi. 2015. Inferring Distributed Reflection Denial of Service Attacks from Darknet. Comput. Commun. 62, C (2015), 59--71.

[24]

C. Fachkha and M. Debbabi. 2016. Darknet as a Source of Cyber Intelligence: Survey, Taxonomy, and Characterization. Commun. Surveys Tuts. 18, 2 (2016), 1197--1227.

Digital Library

[25]

J. Fruhlinger. 2018. The Mirai botnet explained: How teen scammers and CCTVcameras almost brought down the internet. https://www.csoonline.com/article/3258748/\the-mirai-botnet-explained-\how-teen-scammers-and-cctv-cameras-almost-brought-down-the\-internet.html.

[26]

M. Jonker, A. King, J. Krupp, C. Rossow, A. Sperotto, and A. Dainotti. 2017. Millions of Targets Under Attack: A Macroscopic Characterization of the DoS Ecosystem. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC'17). 100--113. http://dl.acm.org/citation.cfm?doid=3131365.3131383

[27]

Sofiane Lagraa, Yutian Chen, and Jérôme François. 2019. Deep Mining Port Scans from Darknet. International Journal of Network Management 29, 3 (2019), e2065.

Digital Library

[28]

Sofiane Lagraa and Jérome François. 2017. Knowledge discovery of port scans from darknet. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). IEEE, 935--940.

Digital Library

[29]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs.CL]

[30]

Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013).

[31]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 [cs.CL]

[32]

D. Moore, C. Shannon, D. Brown, G. Voelker, and S. Savage. 2006. Inferring Internet Denial-of-Service Activity. ACM Trans. Comput. Syst. 24, 2 (2006), 115--139.

Digital Library

[33]

Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery 24, 3 (2012), 515--554.

Digital Library

[34]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162

[35]

E. Raftopoulos, E. Glatz, X. Dimitropoulos, and A. Dainotti. 2015. How Dangerous Is Internet Scanning? A Measurement Study of the Aftermath of an Internet-Wide Scan. In Proceedings of the 7th Workshop on Traffic Monitoring and Analysis (TMA'15). 158--172. http://link.springer.com/10.1007/978-3-319-17172-2_11

[36]

P. Richter and A. Berger. 2019. Scanning the Scanners: Sensing the Internet from a Massively Distributed Network Telescope. In Proceedings of the Internet Measurement Conference (IMC'19). 144--157.

Digital Library

[37]

Markus Ring, Alexander Dallmann, Dieter Landes, and Andreas Hotho. 2017. IP2Vec: Learning Similarities Between IP Addresses. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW). 657--666.

[38]

Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans. Database Syst. 42, 3, Article 19 (July 2017), 21 pages.

Digital Library

[39]

F. Soro, M. Allegretta, M. Mellia, I. Drago, and L. Bertholdo. 2020. Sensing the Noise: Uncovering Communities in Darknet Traffic. In Proceedings of the Mediterranean Communication and Computer Networking Conference (MedComNet). 1--8. https://ieeexplore.ieee.org/document/9191555/

[40]

F. Soro, I. Drago, M. Trevisan, M. Mellia, J. Ceron, and J. J. Santanna. 2019. Are Darknets All The Same? On Darknet Visibility for Security Monitoring. In Proceedings of the IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN). 1--6. https://ieeexplore.ieee.org/document/8847113/

[41]

S. Staniford, D. Moore, V. Paxson, and N. Weaver. 2004. The Top Speed of Flash Worms. In Proceedings of the ACM Workshop on Rapid Malcode (WORM'04). http://portal.acm.org/citation.cfm?doid=1029618.1029624

Cited By

Ravalico DValentim RTrevisan MDrago I(2024)Can Blocklists Explain Darknet Traffic?2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10558914(1-4)Online publication date: 21-May-2024
https://doi.org/10.23919/TMA62044.2024.10558914
Gioacchini LMellia MVassio LDrago IMilan GHouidi ZRossi D(2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2023.3329442
Huang KGioacchini LMellia MVassio L(2024)Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00037(287-296)Online publication date: 8-Jul-2024
https://doi.org/10.1109/EuroSPW61312.2024.00037
Show More Cited By

Index Terms

DarkVec: automatic analysis of darknet traffic with word embeddings
1. Networks
  1. Network services
    1. Network management
    2. Network monitoring
2. Security and privacy
  1. Network security

Recommendations

i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis
Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets, and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes ...
WormTerminator: an effective containment of unknown and polymorphic fast spreading worms
ANCS '06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems

The fast spreading worm is becoming one of the most serious threats to today's networked information systems. A fast spreading worm could infect hundreds of thousands of hosts within a few minutes. In order to stop a fast spreading worm, we need the ...
A Survey on Intrusion Detection and Prevention Systems
Abstract
In the digital world, malicious activities that violate the confidentiality, integrity, or availability of data and devices are known as intrusions. An intrusion detection system (IDS) analyses the activities of a single system or a network to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies

December 2021

507 pages

ISBN:9781450390989

DOI:10.1145/3485983

General Chairs:
Georg Carle
Technical University of Munich, Germany
,
Jörg Ott
Technical University of Munich, Germany

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Qualifiers

Research-article

Funding Sources

Huawei R&D Center (France)
SmartData@PoliTO center for Big Data technologies

Conference

CoNEXT '21

Sponsor:

SIGCOMM

CoNEXT '21: The 17th International Conference on emerging Networking EXperiments and Technologies

December 7 - 10, 2021

Virtual Event, Germany

Acceptance Rates

Overall Acceptance Rate 198 of 789 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
463
Total Downloads

Downloads (Last 12 months)102
Downloads (Last 6 weeks)10

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ravalico DValentim RTrevisan MDrago I(2024)Can Blocklists Explain Darknet Traffic?2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10558914(1-4)Online publication date: 21-May-2024
https://doi.org/10.23919/TMA62044.2024.10558914
Gioacchini LMellia MVassio LDrago IMilan GHouidi ZRossi D(2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2023.3329442
Huang KGioacchini LMellia MVassio L(2024)Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00037(287-296)Online publication date: 8-Jul-2024
https://doi.org/10.1109/EuroSPW61312.2024.00037
Saleem JIslam RIslam M(2024)Darknet Traffic Analysis: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.337376912(42423-42452)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3373769
Yang JLiang WWang XLi SJiang XMu YZeng S(2024)DarkMor: A framework for darknet traffic detection that integrates local and spatial featuresNeurocomputing10.1016/j.neucom.2024.128377607(128377)Online publication date: Nov-2024
https://doi.org/10.1016/j.neucom.2024.128377
Singh Samra RBarcellos M(2023)DDoS2Vec: Flow-Level Characterisation of Volumetric DDoS Attacks at ScaleProceedings of the ACM on Networking10.1145/36291351:CoNEXT3(1-25)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1145/3629135
Yan BYang CShi CFang YLi QYe YDu J(2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3610228
Gioacchini LVassio LMellia MDrago IHouidi ZRossi D(2023)i-DarkVec: Incremental Embeddings for Darknet Traffic AnalysisACM Transactions on Internet Technology10.1145/359537823:3(1-28)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3595378
Soro FFavale TGiordano DDrago IRescio TMellia MHouidi ZRossi D(2023)Enlightening the Darknets: Augmenting Darknet Visibility With Active ProbesIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326767120:4(5012-5025)Online publication date: 17-Apr-2023
https://dl.acm.org/doi/10.1109/TNSM.2023.3267671
Zakroum MFrançois JGhogho MChrisment I(2023)Self-Supervised Latent Representations of Network Flows and Application to Darknet Traffic ClassificationIEEE Access10.1109/ACCESS.2023.326320611(90749-90765)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3263206
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents