Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

Published: 02 January 2021 Publication History

Abstract

Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.
This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

References

[1]
Sebastian Abt and Harald Baier. 2013. Are we missing labels? A study of the availability of ground-truth in network security research. In Proceedings of the Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS’14).
[2]
United States Military Academy. 2009. CDX 2009 Network. Retrieved from https://www.westpoint.edu/centers-and-research/cyber-research-center/data-sets.
[3]
Akamai. 2018. The state of the internet / security report. Retrieved from https://www.akamai.com/uk/en/multimedia/documents/case-study/spring-2018-state-of-the-internet-security-report.pdf.
[4]
Rafael Ramos Regis Barbosa, Ramin Sadre, Aiko Pras, and Remco Meent. 2010. Simpleweb/University of Twente Traffic Traces Data Repository. Technical Report. Centre for Telematics and Information Technology, University of Twente.
[5]
Steven M. Bellovin. 1992. Packets found on an internet 1 introduction 2 address space oddities. Comput. Commun. 23, 3 (1992), 1--8.
[6]
Monowar H. Bhuyan, Dhruba K. Bhattacharyya, and Jugal K. Kalita. 2015. Towards generating real-life datasets for network intrusion detection. Int. J. Netw. Secur. 17, 6 (2015), 683--701.
[7]
Daniela Brauckhoff, Arno Wagner, and May Martin. 2008. FLAME: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security (CSET’08).
[8]
CAIDA. 2017. CAIDA Data—Overview of Datasets, Monitors, and Reports. Retrieved from http://www.caida.org/data/overview/.
[9]
National CyberWatch Center. 2017. Mid-Atlantic Collegiate Cyber Defense Competition. Retrieved from https://maccdc.org/.
[10]
Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Nikolay Milanov, Christian Koch, David Hausheer, and Max Mühlhäuser. 2015. ID2T: A DIY dataset creation toolkit for intrusion detection systems. In Proceedings of the Conference on Communications and Network Security (CNS’15). IEEE, 739--740.
[11]
Michelle Cotton, Lars Eggert, Joe Touch, Magnus Westerlund, and Stuart Cheshire. 2011. Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service name and Transport Protocol Port Number Registry. RFC 6335. Retrieved from http://buildbot.tools.ietf.org/html/rfc6335.
[12]
Gideon Creech and Jiankun Hu. 2013. Generation of a new IDS test dataset: Time to Retire the KDD Collection. In Proceedings of the Wireless Communications and Networking Conference (WCNC’13). IEEE, 4487--4492.
[13]
Robert K. Cunningham, Richard P. Lippmann, David J. Fried, Simson L. Garfinkel, Isaac Graf, Kris R. Kendall, Seth E. Webster, Dan Wyschogrod, and Marc A. Zissman. 1999. Evaluating Intrusion Detection Systems Without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation. Technical Report. MIT Lincoln Lab.
[14]
Peter B. Danzig and Sugih Jamin. 1991. tcplib: A library of internetwork traffic characteristics. Library 48 (1991), 1--8.
[15]
Romain Fontugne, Pierre Borgnat, Patrice Abry, and Kensuke Fukuda. 2010. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proceedings of the Conference on Emerging Networking EXperiments and Technologies (CoNEXT’10). ACM, 1--12.
[16]
Sebastian Garcia. 2011. Stratosphere Research Laboratory. Retrieved from https://www.stratosphereips.org/.
[17]
Sebastian Garcia, Martin Grill, Jan Stiborek, and Alejandro Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Secur. 45 (2014), 100--123.
[18]
Carlos Garcia Cordero, Sascha Hauke, Max Mühlhäuser, and Mathias Fischer. 2016. Analyzing flow-based anomaly intrusion detection using replicator neural networks. In Proceedings of the 14th Annual Conference on Privacy, Security and Trust (PST’16). 317--324.
[19]
Dan Grossman. 2002. New Terminology and Clarifications for Diffserv. RFC 3260. Retrieved from http://buildbot.tools.ietf.org/html/rfc3260.
[20]
W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87 (2017), 185--192.
[21]
Santiago Hernández. 2018. Awesome-Cybersecurity-Datasets. Retrieved from https://github.com/shramos/Awesome-Cybersecurity-Datasets.
[22]
IMPACT. 2017. Information Marketplace. Retrieved from https://www.impactcybertrust.org.
[23]
Kadangode K. Ramakrishnan, Sally Floyd, and D. Black. 2001. The Addition of Explicit Congestion Notification (ECN’01) to IP. Technical Report.
[24]
KDD Cup 99. 1999. Knowledge Discovery and Data Mining Tools Competition. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[25]
Robert Koch, Mario Golling, and Gabi Dreo Rodosek. 2014. Towards comparability of intrusion detection systems: New data sets. In Proceedings of the TERENA Networking Conference. 7.
[26]
Anukool Lakhina, Mark Crovella, and Christophe Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM Press, 217--228.
[27]
Imed Lassoued. 2011. Adaptive Monitoring and Management of Internet Traffic. PhD Thesis. Université de Nice.
[28]
Marc Liberatore and Prashant Shenoy. 2013. Umass trace repository. Retrieved from http://traces.cs.umass.edu.
[29]
Thomas Lukaseder. 2017. 2017-SUEE-data-set. Retrieved from https://github.com/vs-uulm/2017-SUEE-data-set.
[30]
Matthew V. Mahoney. 2003. Network traffic anomaly detection based on packet bytes. In Proceedings of the 2003 ACM Symposium on Applied Computing. ACM, 346--350.
[31]
Matthew V. Mahoney and Philip K. Chan. 2003. An analysis of the 1999 DARPA/lincoln laboratory evaluation data for network anomaly detection. In Proceedings of the International Symposium on Recent Advances in Intrusion Detection. 220--237.
[32]
John McHugh. 2000. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans. Info. Syst. Secur. 3, 4 (2000), 262--294.
[33]
Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1--6.
[34]
Boris Nechaev, Mark Allman, Vern Paxson, and Andrei V. Gurtov. 2010. A preliminary analysis of TCP performance in an enterprise network. INM/WREN 10 (2010).
[35]
NETRESEC. 2010. Capture files from Mid-Atlantic CCDC. Retrieved from https://www.netresec.com/?page=MACCDC.
[36]
Vern Paxson. 1999. Bro: A system for detecting network intruders in real-time. Comput. Netw. 31, 23--24 (1999), 2435--2463.
[37]
Jon Postel et al. 1981. Internet Protocol. RFC 791. Retrieved from http://buildbot.tools.ietf.org/html/rfc791.
[38]
Nadun Rajasinghe, Jagath Samarabandu, and Xianbin Wang. 2018. INSecS-DCS: A highly customizable network intrusion dataset creation framework. In Proceedings of the IEEE Canadian Conference on Electrical 8 Computer Engineering (CCECE’18). IEEE, 1--4.
[39]
Joyce Reynolds and Jon Postel. 1994. Assigned Numbers. Technical Report.
[40]
Haakon Ringberg, Matthew Roughan, and Jennifer Rexford. 2008. The need for simulation in evaluating anomaly detectors. SIGCOMM Comput. Commun. Rev. 38, 1 (Jan. 2008), 55--59.
[41]
Benjamin Sangster, Thomas Cook, Robert Fanelli, Erik Dean, William J. Adams, Chris Morrell, and Gregory Conti. 2009. Toward instrumenting network warfare competitions to generate labeled datasets. In Proceedings of the USENIX Security’s Workshop on Cyber Security Experimentation and Test (CSET’09).
[42]
Mike Sconzo. 2015. Samples of Security Related Data. Retrieved from https://www.secrepo.com/.
[43]
Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A. Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 3 (2012), 357--374.
[44]
John Sonchack, Adam J. Aviv, and Jonathan M. Smith. 2013. Bridging the data gap: Data related challenges in evaluating large scale collaborative security systems. In Proceedings of the 6th Workshop on Cyber Security Experimentation and Test.
[45]
Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2006. Description of Kyoto University benchmark data. Academic Center for Computing and Media Studies (ACCMS), Kyoto University.
[46]
Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Proceedings of the WOMBAT Workshop on Information Security Threats Data Collection and Sharing (WISTDCS’08). IEEE, 31--39.
[47]
Anna Sperotto, Ramin Sadre, Frank Van Vliet, and Aiko Pras. 2009. A labeled data set for flow-based intrusion detection. In Proceedings of the International Workshop on IP Operations and Management. Springer, 39--50.
[48]
SPIRENT. 2002. pcapr: PCAP files repository. Retrieved from https://www.pcapr.net/.
[49]
Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--6.
[50]
Emmanouil Vasilomanolakis, Carlos Garcia Cordero, Nikolay Milanov, and Max Mühlhäuser. 2016. Towards the creation of synthetic, yet realistic, intrusion detection datasets. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS’16). IEEE, 1209--1214.
[51]
Emmanouil Vasilomanolakis, Shankar Karuppayah, Max Mühlhäuser, and Mathias Fischer. 2015. Taxonomy and survey of collaborative intrusion detection. Comput. Surveys 47, 4 (2015), 33.
[52]
Emmanouil Vasilomanolakis, Matthias Krügl, Carlos Garcia Cordero, Max Mühlhäuser, and Mathias Fischer. 2015. SkipMon: A locality-aware collaborative intrusion detection system. In Proceedings of the IEEE 34th International Performance on Computing and Communications Conference (IPCCC’15). IEEE, 1--8.
[53]
Richard Zuech, Taghi M. Khoshgoftaar, Naeem Seliya, Maryam M. Najafabadi, and Clifford Kemp. 2015. A new intrusion detection benchmarking system. In Proceedings of the FLAIRS Conference. 252--256.

Cited By

View all
  • (2024)Methodology for the Detection of Contaminated Training Datasets for Machine Learning-Based Network Intrusion-Detection SystemsSensors10.3390/s2402047924:2(479)Online publication date: 12-Jan-2024
  • (2024)SEDAT: A Stacked Ensemble Learning-Based Detection Model for Multiscale Network AttacksElectronics10.3390/electronics1315295313:15(2953)Online publication date: 26-Jul-2024
  • (2024)Revolutionizing Threat Hunting in Communication Networks: Introducing a Cutting-Edge Large-Scale Multiclass Dataset2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638287(1-5)Online publication date: 13-Aug-2024
  • Show More Cited By

Index Terms

  1. On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Privacy and Security
      ACM Transactions on Privacy and Security  Volume 24, Issue 2
      May 2021
      242 pages
      ISSN:2471-2566
      EISSN:2471-2574
      DOI:10.1145/3446639
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 January 2021
      Accepted: 01 September 2020
      Revised: 01 March 2020
      Received: 01 June 2018
      Published in TOPS Volume 24, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Intrusion detection systems
      2. attack injection
      3. datasets
      4. synthetic dataset

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,494
      • Downloads (Last 6 weeks)185
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Methodology for the Detection of Contaminated Training Datasets for Machine Learning-Based Network Intrusion-Detection SystemsSensors10.3390/s2402047924:2(479)Online publication date: 12-Jan-2024
      • (2024)SEDAT: A Stacked Ensemble Learning-Based Detection Model for Multiscale Network AttacksElectronics10.3390/electronics1315295313:15(2953)Online publication date: 26-Jul-2024
      • (2024)Revolutionizing Threat Hunting in Communication Networks: Introducing a Cutting-Edge Large-Scale Multiclass Dataset2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638287(1-5)Online publication date: 13-Aug-2024
      • (2024)Improving Synthetic Network Attack Traffic Generation2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00042(326-334)Online publication date: 8-Jul-2024
      • (2024)A Tale of Two Methods: Unveiling the Limitations of GAN and the Rise of Bayesian Networks for Synthetic Network Traffic Generation2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00036(273-286)Online publication date: 8-Jul-2024
      • (2024)Bad Design Smells in Benchmark NIDS Datasets2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP60621.2024.00042(658-675)Online publication date: 8-Jul-2024
      • (2024)Diving Deep With BotLab-DS1: A Novel Ground Truth-Empowered Botnet DatasetIEEE Access10.1109/ACCESS.2024.336712212(28898-28910)Online publication date: 2024
      • (2024)Fed-Evolver: An automated evolving approach for federated Intrusion Detection System using adversarial autoencoder in SDN-enabled networksInternet of Things10.1016/j.iot.2024.101397(101397)Online publication date: Oct-2024
      • (2023)ReinforSec: An Automatic Generator of Synthetic Malware Samples and Denial-of-Service Attacks through Reinforcement LearningSensors10.3390/s2303123123:3(1231)Online publication date: 20-Jan-2023
      • (2023)ZBDS2023: A multi location Zigbee dataset to build innovative IoT Intrusion Detection Systems2023 19th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)10.1109/WiMob58348.2023.10187745(84-91)Online publication date: 21-Jun-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media