Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485983.3494863acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

DarkVec: automatic analysis of darknet traffic with word embeddings

Published: 03 December 2021 Publication History

Abstract

Darknets are passive probes listening to traffic reaching IP addresses that host no services. Traffic reaching them is unsolicited by nature and often induced by scanners, malicious senders and misconfigured hosts. Its peculiar nature makes it a valuable source of information to learn about malicious activities. However, the massive amount of packets and sources that reach darknets makes it hard to extract meaningful insights. In particular, multiple senders contact the darknet while performing similar and coordinated tasks, which are often commanded by common controllers (botnets, crawlers, etc.). How to automatically identify and group those senders that share similar behaviors remains an open problem.
We here introduce DarkVec, a methodology to identify clusters of senders (i.e., IP addresses) engaged in similar activities on darknets. DarkVec leverages word embedding techniques (e.g., Word2Vec) to capture the co-occurrence patterns of sources hitting the darknets. We extensively test DarkVec and explore its design space in a case study using one month of darknet data. We show that with a proper definition of service, the generated embeddings can be easily used to (i) associate unknown senders' IP addresses to the correct known labels (more than 96% accuracy), and (ii) identify new attack and scan groups of previously unknown senders. We contribute DarkVec source code and datasets to the community also to stimulate the use of word embeddings to automatically learn patterns on generic traffic traces.

Supplementary Material

MP4 File (3494863-presentation.mp4)
DarkVec: Automatic Analysis of Darknet Traffic with Word Embeddings
MP4 File (3494863-long.mp4)
DarkVec: Automatic Analysis of Darknet Traffic with Word Embeddings - Extended version

References

[1]
2021. AbuseIPDB - IP address abuse reports - Making the Internet safer, one IP at a time. https://www.abuseipdb.com/.
[2]
2021. The Best IP Geolocation Database: IPIP.NET. https://en.ipip.net/.
[3]
2021. BinaryEdge. https://www.binaryedge.io/.
[4]
2021. Censys. https://censys.io/.
[5]
2021. FireHOL IP Lists - IP Blacklists - IP Blocklists - IP Reputation. http://iplists.firehol.org/.
[6]
2021. Gensim, Topic modelling for humans. https://radimrehurek.com/gensim/.
[7]
2021. GreyNoise. https://greynoise.io/.
[8]
2021. Internet Census Group. https://www.internet-census.org/home.html.
[9]
2021. Michigan Engineering - University of Michigan College of Engineering. https://www.engin.umich.edu/.
[10]
2021. Project Sonar. https://www.rapid7.com/research/project-sonar/.
[11]
2021. The Shadowserver Foundation. https://www.shadowserver.org/.
[12]
2021. Sharashka Data Feeds - Security Data That Works. https://sharashka.io/data-feeds.
[13]
2021. Shodan, the search engine. https://www.shodan.io/.
[14]
2021. SPort 5555 (tcp/udp) Attack Activity. https://isc.sans.edu/port.html?port=5555.
[15]
2021. Stretchoid Opt-Out. http://www.stretchoid.com/.
[16]
Charu C Aggarwal. 2015. Data mining: the textbook. Springer.
[17]
K. Benson, A. Dainotti, K. Claffy, A. Snoeren, and M. Kallitsis. 2015. Leveraging Internet Background Radiation for Opportunistic Network Analysis. In Proceedings of the ACM Internet Measurement Conference (IMC'15). 423--436. http://dl.acm.org/citation.cfm?doid=2815675.2815702
[18]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.
[19]
Joao Ceron, Klaus Steding-Jessen, Cristine Hoepers, Lisandro Granville, and Cintia Margi. 2019. Improving IoT Botnet Investigation Using an Adaptive Network Layer. Sensors 19 (02 2019), 727.
[20]
Dvir Cohen, Yisroel Mirsky, Yuval Elovici, Rami Puzis, Manuel Kamp, Tobias Martin, and Asaf Shabtai. [n.d.]. DANTE: A Framework for Mining and Monitoring Darknet Traffic. 88--109. http://arxiv.org/abs/2003.02575
[21]
A. Dainotti, A. King, K. Claffy, F. Papale, and A. Pescape. 2015. Analysis of a "/0" Stealth Scan From a Botnet. IEEE/ACM Trans. Netw. 23, 2 (2015), 341--354.
[22]
Z. Durumeric, M. Bailey, and J. Halderman. 2014. An Internet-Wide View of Internet-Wide Scanning. In Proceedings of the 23rd USENIX Conference on Security Symposiu (SEC'14). 65--78. http://dl.acm.org/citation.cfm?id=2671225.2671230
[23]
C. Fachkha, E. Bou-Harb, and M. Debbabi. 2015. Inferring Distributed Reflection Denial of Service Attacks from Darknet. Comput. Commun. 62, C (2015), 59--71.
[24]
C. Fachkha and M. Debbabi. 2016. Darknet as a Source of Cyber Intelligence: Survey, Taxonomy, and Characterization. Commun. Surveys Tuts. 18, 2 (2016), 1197--1227.
[25]
J. Fruhlinger. 2018. The Mirai botnet explained: How teen scammers and CCTVcameras almost brought down the internet. https://www.csoonline.com/article/3258748/\the-mirai-botnet-explained-\how-teen-scammers-and-cctv-cameras-almost-brought-down-the\-internet.html.
[26]
M. Jonker, A. King, J. Krupp, C. Rossow, A. Sperotto, and A. Dainotti. 2017. Millions of Targets Under Attack: A Macroscopic Characterization of the DoS Ecosystem. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC'17). 100--113. http://dl.acm.org/citation.cfm?doid=3131365.3131383
[27]
Sofiane Lagraa, Yutian Chen, and Jérôme François. 2019. Deep Mining Port Scans from Darknet. International Journal of Network Management 29, 3 (2019), e2065.
[28]
Sofiane Lagraa and Jérome François. 2017. Knowledge discovery of port scans from darknet. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). IEEE, 935--940.
[29]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs.CL]
[30]
Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013).
[31]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. arXiv:1310.4546 [cs.CL]
[32]
D. Moore, C. Shannon, D. Brown, G. Voelker, and S. Savage. 2006. Inferring Internet Denial-of-Service Activity. ACM Trans. Comput. Syst. 24, 2 (2006), 115--139.
[33]
Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery 24, 3 (2012), 515--554.
[34]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162
[35]
E. Raftopoulos, E. Glatz, X. Dimitropoulos, and A. Dainotti. 2015. How Dangerous Is Internet Scanning? A Measurement Study of the Aftermath of an Internet-Wide Scan. In Proceedings of the 7th Workshop on Traffic Monitoring and Analysis (TMA'15). 158--172. http://link.springer.com/10.1007/978-3-319-17172-2_11
[36]
P. Richter and A. Berger. 2019. Scanning the Scanners: Sensing the Internet from a Massively Distributed Network Telescope. In Proceedings of the Internet Measurement Conference (IMC'19). 144--157.
[37]
Markus Ring, Alexander Dallmann, Dieter Landes, and Andreas Hotho. 2017. IP2Vec: Learning Similarities Between IP Addresses. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW). 657--666.
[38]
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans. Database Syst. 42, 3, Article 19 (July 2017), 21 pages.
[39]
F. Soro, M. Allegretta, M. Mellia, I. Drago, and L. Bertholdo. 2020. Sensing the Noise: Uncovering Communities in Darknet Traffic. In Proceedings of the Mediterranean Communication and Computer Networking Conference (MedComNet). 1--8. https://ieeexplore.ieee.org/document/9191555/
[40]
F. Soro, I. Drago, M. Trevisan, M. Mellia, J. Ceron, and J. J. Santanna. 2019. Are Darknets All The Same? On Darknet Visibility for Security Monitoring. In Proceedings of the IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN). 1--6. https://ieeexplore.ieee.org/document/8847113/
[41]
S. Staniford, D. Moore, V. Paxson, and N. Weaver. 2004. The Top Speed of Flash Worms. In Proceedings of the ACM Workshop on Rapid Malcode (WORM'04). http://portal.acm.org/citation.cfm?doid=1029618.1029624

Cited By

View all
  • (2024)Can Blocklists Explain Darknet Traffic?2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10558914(1-4)Online publication date: 21-May-2024
  • (2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
  • (2024)Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00037(287-296)Online publication date: 8-Jul-2024
  • Show More Cited By

Index Terms

  1. DarkVec: automatic analysis of darknet traffic with word embeddings

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies
        December 2021
        507 pages
        ISBN:9781450390989
        DOI:10.1145/3485983
        • General Chairs:
        • Georg Carle,
        • Jörg Ott
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 03 December 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Badges

        Qualifiers

        • Research-article

        Funding Sources

        • Huawei R&D Center (France)
        • SmartData@PoliTO center for Big Data technologies

        Conference

        CoNEXT '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 198 of 789 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)102
        • Downloads (Last 6 weeks)10
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Can Blocklists Explain Darknet Traffic?2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10558914(1-4)Online publication date: 21-May-2024
        • (2024)Cross-Network Embeddings Transfer for Traffic AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2023.332944221:3(2686-2699)Online publication date: Jun-2024
        • (2024)Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00037(287-296)Online publication date: 8-Jul-2024
        • (2024)Darknet Traffic Analysis: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.337376912(42423-42452)Online publication date: 2024
        • (2024)DarkMor: A framework for darknet traffic detection that integrates local and spatial featuresNeurocomputing10.1016/j.neucom.2024.128377607(128377)Online publication date: Nov-2024
        • (2023)DDoS2Vec: Flow-Level Characterisation of Volumetric DDoS Attacks at ScaleProceedings of the ACM on Networking10.1145/36291351:CoNEXT3(1-25)Online publication date: 28-Nov-2023
        • (2023)Graph Mining for Cybersecurity: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/361022818:2(1-52)Online publication date: 13-Nov-2023
        • (2023)i-DarkVec: Incremental Embeddings for Darknet Traffic AnalysisACM Transactions on Internet Technology10.1145/359537823:3(1-28)Online publication date: 21-Aug-2023
        • (2023)Enlightening the Darknets: Augmenting Darknet Visibility With Active ProbesIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326767120:4(5012-5025)Online publication date: 17-Apr-2023
        • (2023)Self-Supervised Latent Representations of Network Flows and Application to Darknet Traffic ClassificationIEEE Access10.1109/ACCESS.2023.326320611(90749-90765)Online publication date: 2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media