research-article

Open access

On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection

Authors:

Carlos Garcia Cordero,

Emmanouil Vasilomanolakis,

Aidmar Wainakh,

Max Mühlhäuser,

Simin Nadjm-TehraniAuthors Info & Claims

ACM Transactions on Privacy and Security (TOPS), Volume 24, Issue 2

Article No.: 8, Pages 1 - 39

https://doi.org/10.1145/3424155

Published: 02 January 2021 Publication History

All formats PDF

Abstract

Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.

This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

References

[1]

Sebastian Abt and Harald Baier. 2013. Are we missing labels? A study of the availability of ground-truth in network security research. In Proceedings of the Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS’14).

Abstract

References

Cited By

Index Terms

Recommendations

A hybrid intrusion detection system design for computer network security

Service-independent payload analysis to improve intrusion detection in network traffic

A Comparative Study on the Impact of Adversarial Machine Learning Attacks on Contemporary Intrusion Detection Datasets

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations