Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP

Published: 01 May 2020 Publication History

Abstract

Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Network administrators routinely monitor and analyze network data for diagnosing and alleviating these, making decisions based on their experience. However, as networks grow and become complex, monitoring large data files and quickly processing them, makes it improbable to identify errors and rectify these. Abnormal file transfers have been classified by simply setting alert thresholds, via tools such as PerfSonar and TCP statistics (Tstat). This paper investigates the feasibility of unsupervised feature extraction methods for identifying network anomaly patterns with three unsupervised classification methods—principal component analysis, autoencoder and isolation forest. We collect file transfer statistics from two experiment sets—synthetic iPerf generated traffic and 1000 Genome workflow runs, with synthetically introduced anomalies. Our results show that while PCA and a simple autoencoder finds it difficult to detect clusters, the tree-variant isolation forest is able to identify anomalous packets by breaking down TCP traces into tree classes early.

References

[1]
1000 Genomes Project Consortium. (2012). A global reference for human genetic variation. Nature, 526(7571), 68–74.
[2]
Bansal, N., & Kaushal, R. (2015). Unusual internet traffic detection at network edge. International Conference on Computing and Network Communications (CoCoNet).
[3]
Barford, P., Kline, J., Plonka, D., & Ron, A. (2002). A signal analysis of network traffic anomalies. In SIGCOMM Work. on Internet Measurement (pp. 71–82). ISBN 1-58113-603-X.
[4]
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning.
[5]
But, J., Keller, U., Kennedy, D., & Armitage, G. (2005). Passive TCP stream estimation of RTT and jitter parameters. In The IEEE conference on local computer networks (LCN).
[6]
Casas P, Fiandino P, Wassermann S, Traverso S, D’Alconzo A, Tego E, Matera F, and Mellia M Unveiling network and service performance degradation in the wild with mplane IEEE Communications Magazine 2016 54 71-79
[7]
Dai, W., Yang, Q., Xue, G.-R., & Yu, Y. (2007). Boosting for transfer learning. In Proceedings of the 24th international conference on machine learning, ICML (pp. 193–200). ISBN 978-1-59593-793-3.
[8]
Finamore A, Mellia M, Meo M, Munafo MM, Torino PD, and Rossi D Experiences of internet traffic monitoring with tstat IEEE Network 2011 25 3 8-14 ISSN 0890-8044
[9]
Gaikwad, P., Mandal, A., Ruth, P., Juve, G., Krol, D., & Deelman, E. (2016). Anomaly detection for scientific workflow applications on networked clouds. In International conference on high performance computing and simulation.
[10]
Gunter, D., Tierney, B. L., Brown, A., Swany, M., Bresnahan, J., & Schopf, J. M. (2007). Log summarization and anomaly detection for troubleshooting distributed systems. In Proceedings of the 8th IEEE/ACM international conference on grid computing, GRID (pp. 226–234). ISBN 978-1-4244-1559-5.
[11]
Hanemann A, Boote JW, Boyd EL, Durand J, Kudarimoti L, Łapacz R, Martin Swany D, Trocha S, and Zurawski J Benatallah B, Casati F, and Traverso P PerfSONAR: A service oriented architecture for multi-domain network monitoring Service-Oriented Computing—ICSOC 2005 2005 Berlin Springer
[12]
Hofstede, R., Celeda, P., Trammell, B., Drago, I., Sadre, R., Sperotto, A., & Pras, A. (2014). Flow monitoring explained: From packet capture to data analysis with NetFlow and IPFIX. In IEEE Communications Surveys and Tutorials (pp. 2037–2064). IEEE Communications Society.
[13]
Hubert, B., Graf, T., Maxwell, G., van Mook, R., van Oosterhout, M., Schroeder, P., Spaans, J., & Larroy, P. (2002). Linux advanced routing and traffic control. In Ottawa Linux Symposium (vol. 213).
[15]
Jasinska, E. (2006). Sflow, I can feel your traffic. In Amsterdam Internet Exchange (AMS-IX).
[16]
Jiang D, Zhengzheng X, Zhang P, and Zhu T A transform domain-based anomaly detection approach to network-wide traffic J. Netw. Comput. Appl. 2014 40 C 292-306
[17]
Jolliffe Ian Principal Component Analysis International Encyclopedia of Statistical Science 2011 Berlin, Heidelberg Springer Berlin Heidelberg 1094-1096
[18]
Lakhina, A., Crovella, M., & Diot, C. (2004). Diagnosing network-wide traffic anomalies. In SIGCOMM (pp. 219–230). ISBN 1-58113-862-8.
[19]
Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature distributions. In Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM) (pp. 217–228).
[20]
Liu, F., Ming, K. T., & Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM (pp. 413–422). ISBN 978-0-7695-3502-9.
[21]
Mellia, M. (2002). TCP statistic and analysis tool. IEEE Network, 16.
[22]
Mellia M, Meo M, Muscariello L, and Rossi D Passive analysis of TCP anomalies Computer Networks 2008 52 663-2676
[23]
Mellia M, Meo M, Muscariello L, and Rossi D Passive analysis of TCP anomalies Computer Networks 2008 52 14 2663-2676 ISSN 1389-1286
[24]
Mirza M, Sommers J, Barford P, and Zhu X A machine learning approach to TCP throughput prediction IEEE/ACM Transactions on Networking 2010 18 4 1026-1039
[25]
Muscariello, L., Mellia, M., & Meo, M. (2006). Passive identification and analysis of tcp anomalies. Distributed Cooperative Laboratories: Networking, Instrumentation, and Measurements.
[26]
Palmieri F and Fiore U Network anomaly detection through nonlinear analysis Computers and Security 2010 29 7 737-755
[27]
Parichehreh, A., Alfredsson, S., & Brunstrom, A. (2018). Measurement analysis of TCP congestion control algorithms in LTE uplink. In Network traffic measurement and analysis conference.
[28]
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th international conference on machine learning, ICML (pp. 759–766).
[29]
Rossi, D., Mellia, M., & Casetti, C. (2003). User patience and the web: A hands-on investigation. Global Telecommunications Conference.
[30]
Singh, A., Rao, A., Purawat, S., & Altintas, I. (2017). A machine learning approach for modular workflow performance prediction. In Proceedings of the 12th workshop on workflows in support of large-scale science, WORKS (pp. 7:1–7:11). ISBN 978-1-4503-5129-4.
[31]
Trevisan, M., Drago, I., & Mellia, M. (2018). Measuring web speed from passive traces. In ACM, IRTF and ISOC applied networking research workshop 2018 (ANRW 18).
[32]
Trevisan M, Finamore A, Mellia M, Munafo M, and Rossi D IEEE Communications Magazine 2017 55 3 163-169 ISSN 0163-6804
[33]
Vassio, L., Figuereido, F., Paula, A., da Silva, C., Mellia, M., & Almeida, J. (2017). Mining and modeling web trajectories from passive traces. IEEE Bigtable.
[34]
Wang, H., Gong, Z., Guan, Q., & Wang, B. (2008). Detection network anomalies based on packet and flow analysis. In International conference on networking.
[35]
Yang, M., Liu, X., Kroeger, W., Sim, A., & Wu, K. (2018). Identifying anomalous file transfer events in LCLS workflow. In Proceedings of the 1st international workshop on autonomous infrastructure for science, AI-Science (pp. 7:1–7:4). ISBN 978-1-4503-5862-0.
[36]
Zander, S., Nguyen, T., & Armitage, G. (2005). Automated traffic classification and application identification using machine learning. In Proceedings of the The IEEE conference on local computer networks 30th anniversary, LCN (pp. 250–257). ISBN 0-7695-2421-4.
[37]
Zhang, J., & Zulkernine, M. (2006). Anomaly based network intrusion detection with unsupervised outlier detection. In IEEE Communications.

Cited By

View all
  • (2022)Unsupervised Abnormal Traffic Detection through Topological Flow Analysis2022 14th International Conference on Communications (COMM)10.1109/COMM54429.2022.9817285(1-6)Online publication date: 16-Jun-2022

Index Terms

  1. Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Machine Language
        Machine Language  Volume 109, Issue 5
        May 2020
        242 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 May 2020
        Accepted: 13 February 2020
        Revision received: 16 October 2019
        Received: 02 February 2019

        Author Tags

        1. PCA
        2. Autoencoders
        3. Isolation forest
        4. Network traffic

        Qualifiers

        • Research-article

        Funding Sources

        • U.S. Department of Energy (US)

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)Unsupervised Abnormal Traffic Detection through Topological Flow Analysis2022 14th International Conference on Communications (COMM)10.1109/COMM54429.2022.9817285(1-6)Online publication date: 16-Jun-2022

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media