Abstract
Ensemble approaches are promising for anomaly detection due to the heterogeneity of network traffic. However, existing ensemble approaches lack applicability and efficiency. We propose ODDITY, a new end-to-end data-driven ensemble framework. ODDITY use Diverse Autoencoders trained on a pre-clustered subset with contrastive representation learning to encourage base-leaners to give distinct predictions. Then, ODDITY combines the extracted features with a supervised gradient boosting meta-learner. Experiments using benchmarking and real-world network traffic datasets demonstrate that ODDITY is superior in terms of efficiency and precision. ODDITY averages 0.8350 AUPRC on benchmarking datasets (10% better than traditional machine learning algorithms and 6% better than the state-of-the-art semi-supervised ensemble method). ODDITY also outperforms the state-of-the-art on real-world datasets regarding better detection accuracy and speed. Moreover, ODDITY is more resilient to evasion attacks and has a promising potential for unsupervised anomaly detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cse-cic-ids2018 datasets. https://www.unb.ca/cic/datasets/ids-2018.html. Accessed 23 June 2021
Aggarwal, C.C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles? ACM SIGKDD Explor. Newsl. 17(1), 24–47 (2015)
Aggarwal, C.C., Sathe, S.: Outlier Ensembles, pp. 1–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54765-7
Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: Proceedings of IEEE International Conference on Data Mining Workshop (2014)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/tpami.2013.50
Bow, S.T.: Multilayer perceptron. In: Pattern Recognition and Image Preprocessing, pp. 201–224, November 2002
Breiman, L.: Machine learning. Bagging predictors 24(2), 123–140 (1996). https://doi.org/10.1007/BF00058655
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF. ACM SIGMOD Rec. 29(2), 93–104 (2000). https://doi.org/10.1145/335191.335388
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14 (2017)
Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98, September 2017. https://doi.org/10.1137/1.9781611974973.11
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Dua, D., Graff, C.: UCI Machine Learning Repository (2017)
Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn. 58, 121–134 (2016). https://doi.org/10.1016/j.patcog.2016.03.028
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997). https://doi.org/10.1006/jcss.1997.1504
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. KI-2012 Poster 59–63 (2012)
Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.: Simple black-box adversarial attacks. In: International Conference on Machine Learning, pp. 2484–2493 (2019)
Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 44(4), 625–638 (2004)
Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30 (2017)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Adv. Neural. Inf. Process. Syst. 7, 231–238 (1994)
Liao, Y., Vemuri, V.: Use of k-nearest neighbor classifier for intrusion detection. Comput. Secur. 21(5), 439–448 (2002)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining (2008). https://doi.org/10.1109/icdm.2008.17
Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. (2021)
Micenková, B., McWilliams, B., Assent, I.: Learning outlier ensembles: the best of both worlds-supervised and unsupervised. In: Proceedings of the ACM SIGKDD 2014 Workshop on Outlier Detection and Description under Data Diversity (ODD2). New York, pp. 51–54. Citeseer (2014)
Micenková, B., McWilliams, B., Assent, I.: Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104 (2015)
Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv:1802.09089 (2018)
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS) (2015)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rayana, S., Akoglu, L.: Less is more: Building selective anomaly ensembles with application to event detection in temporal graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining (2015)
Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G.: Unsupervised boosting-based autoencoder ensembles for outlier detection. In: PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 91–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_8, https://arxiv.org/pdf/1910.09754v1.pdf
Sathe, S., Aggarwal, C.: Lodes: Local density meets spectral outlier detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining (2016)
Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. Miami Univ. Dept. of Electrical and Computer Engineering, Technical report (2003)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Zhao, Y., Hryniewicki, M.K.: XGBOD: improving supervised outlier detection with unsupervised representation learning. In: 2018 International Joint Conference on Neural Networks (2018)
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019)
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Experimental Datasets
Details of each dataset used in the experiment are provided in Table 5. Due to the size of the KDD99, UNSW-NB15, and IDS-2018 datasets, we randomly choose a portion of them.
B Feature Importance map
The feature importance map in Fig. 7 reveals the importance of features for the Letter dataset and initially has 32 features (column 1–32), and DAs in ODDITY extract 20 more features (column 33–52). As shown in Fig. 7, the final classifier LGBM in ODDITY assign high importance factors on features (column 51, column 54, column 48, etc.) results in improved performance.
C ODDITY in Unsupervised setting
Replace the final supervised meta-learner with an unsupervised classifier to extend ODDITY to unsupervised learning. We incorporate ODDITY with three unsupervised classifiers, namely HBOS [16], Isolation Forest [22] and MCD [18]. Above mentioned methods are implemented using PyOD [36]. By choosing AUROC as the metrics, the hyperparameters and architecture of ODDITY remain the same as in Sect. 5.2. Table 6 summarizes the experimental results of averaging of ten trials. After utilizing the diverse features extracted by DA, the ROC of kNN improves by 0.3 %, the ROC of IF improves by 1.3 %, and the ROC of MCD improves by 8%. Since MCD + MCD outperforms others, we further compare the performance of MCD +ODDITY with other commonly used unsupervised anomaly detection techniques, including kNN, IF, PCA [32], and LOF [8]. ODDITY shows compelling potential in unsupervised anomaly detection by outperforming all other methods (Table 7).
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, H. et al. (2022). ODDITY: An Ensemble Framework Leverages Contrastive Representation Learning for Superior Anomaly Detection. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-15777-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6
eBook Packages: Computer ScienceComputer Science (R0)