Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-540-73499-4_31guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection

Published: 18 July 2007 Publication History

Abstract

During the past number of years, machine learning and data mining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniques. This has led to the application of various supervised and unsupervised techniques for the purpose of intrusion detection. In this paper, we conduct a set of experiments to analyze the performance of unsupervised techniques considering their main design choices. These include the heuristics proposed for distinguishing abnormal data from normal data and the distribution of dataset used for training. We evaluate the performance of the techniques with various distributions of training and test datasets, which are constructed from KDD99 dataset, a widely accepted resource for IDS evaluations. This comparative study is not only a blind comparison between unsupervised techniques, but also gives some guidelines to researchers and practitioners on applying these techniques to the area of intrusion detection.

References

[1]
Available at http://kdd.ics.uci.edu//databases/kddcup99/kddcup99.html
[2]
Balasko, B., Abonyi, J., Feil, B.: Fuzzy clustering and data analysis toolbox, Available at http://www.fmt.vein.hu/softcomp/fclusttoolbox
[3]
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA, USA (1981).
[4]
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD Rec. 29(2), 93-104 (2000).
[5]
Brownlee, K.: Statistical theory and methodology. John Wiley and Sons, New York (1967).
[6]
Chan, P., Mahoney, M., Arshad, M.: Learning rules and clusters for anomaly detection in network traffic. Managing Cyber Threats: Issues, Approaches and Challenges, 81-99 (2003).
[7]
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001) Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[8]
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Stat. Soc. 39, 1-38 (1977).
[9]
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 13(2), 222-232 (1987).
[10]
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernatics 3, 32-57 (1974).
[11]
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. Data Mining for Security Applications (2002).
[12]
Guan, Y., Ghorbani, A., Belacel, N.: Y-Means: A clustering method for intrusion detection. In: Canadian Conference on Electrical and Computer Engineering, Montreal, Quebec, Canada (2003).
[13]
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., Muller, K.: From outliers to prototypes: Ordering data. Neurocomputing 69(13-15), 1608-1618 (2006).
[14]
Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: KDD '01. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 293-298. ACM Press, New York (2001).
[15]
Kohonen, T.: Self-organizing map. Springer, Heidelberg (1997).
[16]
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the Third SIAM International Conference on Data Mining (2003).
[17]
Lei, J.Z., Ghorbani, A.: Network intrusion detection using an improved competitive learning neural network. In: CNSR, pp. 190-197 (2004).
[18]
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: 5th Berkley Symposium on Math and Probability, pp. 281-297 (1967).
[19]
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: ACM Workshop on Data Mining Applied to Security (DMSA), ACM Press, New York (2001).
[20]
Ramadas, M., Ostermann, S., Tjaden, B.: Detecting anomalous network traffic with self-organizing maps. In: Vigna, G., Krügel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, Springer, Heidelberg (2003).
[21]
Sabhnani, M., Serpen, G.: Application of machine learning algorithms to kdd intrusion detection dataset within misuse detection context. In: Proceedings of the International Conference on Machine Learning, Models, Technologies and Applications (MLMTA 2003), vol. 1, pp. 209-215 (2003).
[22]
Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443-1472 (2001).
[23]
Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Som toolbox for matlab 5, Helsinki Univ. Technology (2000), Available at http://www.cis.hut.fi/ projects/somtoolbox
[24]
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005).
[25]
Ye, N., Emran, S.M., Chen, Q., Vilbert, S.: Multivariate statistical analysis of audit trails for host-based intrusion detection. IEEE Trans. Comput. 51(7), 810- 820 (2002).
[26]
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Clustering-based network intrusion detection (2005).

Cited By

View all
  • (2018)Unsupervised host behavior classification from connection patternsInternational Journal of Network Management10.1002/nem.75020:5(317-337)Online publication date: 26-Dec-2018
  • (2011)Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluationProceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security10.1145/1978672.1978676(29-36)Online publication date: 10-Apr-2011
  • (2010)Uncovering relations between traffic classifiers and anomaly detectors via graph theoryProceedings of the Second international conference on Traffic Monitoring and Analysis10.1007/978-3-642-12365-8_8(101-114)Online publication date: 7-Apr-2010
  • Show More Cited By
  1. A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      MLDM '07: Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
      July 2007
      910 pages
      ISBN:9783540734987

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 18 July 2007

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 31 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Unsupervised host behavior classification from connection patternsInternational Journal of Network Management10.1002/nem.75020:5(317-337)Online publication date: 26-Dec-2018
      • (2011)Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluationProceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security10.1145/1978672.1978676(29-36)Online publication date: 10-Apr-2011
      • (2010)Uncovering relations between traffic classifiers and anomaly detectors via graph theoryProceedings of the Second international conference on Traffic Monitoring and Analysis10.1007/978-3-642-12365-8_8(101-114)Online publication date: 7-Apr-2010
      • (2009)Towards systematic traffic annotationProceedings of the 5th international student workshop on Emerging networking experiments and technologies10.1145/1658997.1659006(15-16)Online publication date: 1-Dec-2009
      • (2008)A Clustering Method for Improving Performance of Anomaly-Based Intrusion Detection SystemIEICE - Transactions on Information and Systems10.1093/ietisy/e91-d.5.1282E91-D:5(1282-1291)Online publication date: 1-May-2008

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media