A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015
- Published
- Accepted
- Subject Areas
- Artificial Intelligence, Computer Networks and Communications, Data Mining and Machine Learning
- Keywords
- Review, Machine Learning, Supervised Learning, Classification, Intrusion Detection, KDD99
- Copyright
- © 2016 Özgür et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015. PeerJ Preprints 4:e1954v1 https://doi.org/10.7287/peerj.preprints.1954v1
Abstract
Although KDD99 dataset is more than 15 years old, it is still widely used in academic research. To investigate wide usage of this dataset in Machine Learning Research (MLR) and Intrusion Detection Systems (IDS); this study reviews 149 research articles from 65 journals indexed in Science Citation In- dex Expanded and Emerging Sources Citation Index during the last six years (2010–2015). If we include papers presented in other indexes and conferences, number of studies would be tripled. The number of published studies shows that KDD99 is the most used dataset in IDS and machine learning areas, and it is the de facto dataset for these research areas. To show recent usage of KDD99 and the related sub-dataset (NSL-KDD) in IDS and MLR, the following de- scriptive statistics about the reviewed studies are given: main contribution of articles, the applied algorithms, compared classification algorithms, software toolbox usage, the size and type of the used dataset for training and test- ing, and classification output classes (binary, multi-class). In addition to these statistics, a checklist for future researchers that work in this area is provided.
Author Comment
This is a preprint submission to PeerJ Preprints.