research-article

Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP

Authors:

George Papadimitriou,

Anirban Mandal,

Ewa DeelmanAuthors Info & Claims

Machine Learning, Volume 109, Issue 5

Pages 1127 - 1143

https://doi.org/10.1007/s10994-020-05870-y

Published: 01 May 2020 Publication History

Abstract

Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Network administrators routinely monitor and analyze network data for diagnosing and alleviating these, making decisions based on their experience. However, as networks grow and become complex, monitoring large data files and quickly processing them, makes it improbable to identify errors and rectify these. Abnormal file transfers have been classified by simply setting alert thresholds, via tools such as PerfSonar and TCP statistics (Tstat). This paper investigates the feasibility of unsupervised feature extraction methods for identifying network anomaly patterns with three unsupervised classification methods—principal component analysis, autoencoder and isolation forest. We collect file transfer statistics from two experiment sets—synthetic iPerf generated traffic and 1000 Genome workflow runs, with synthetically introduced anomalies. Our results show that while PCA and a simple autoencoder finds it difficult to detect clusters, the tree-variant isolation forest is able to identify anomalous packets by breaking down TCP traces into tree classes early.

References

[1]

1000 Genomes Project Consortium. (2012). A global reference for human genetic variation. Nature, 526(7571), 68–74.

[2]

Bansal, N., & Kaushal, R. (2015). Unusual internet traffic detection at network edge. International Conference on Computing and Network Communications (CoCoNet).

[3]

Barford, P., Kline, J., Plonka, D., & Ron, A. (2002). A signal analysis of network traffic anomalies. In SIGCOMM Work. on Internet Measurement (pp. 71–82). ISBN 1-58113-603-X.

[4]

Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning.

[5]

But, J., Keller, U., Kennedy, D., & Armitage, G. (2005). Passive TCP stream estimation of RTT and jitter parameters. In The IEEE conference on local computer networks (LCN).

[6]

Casas P, Fiandino P, Wassermann S, Traverso S, D’Alconzo A, Tego E, Matera F, and Mellia M Unveiling network and service performance degradation in the wild with mplane IEEE Communications Magazine 2016 54 71-79

[7]

Dai, W., Yang, Q., Xue, G.-R., & Yu, Y. (2007). Boosting for transfer learning. In Proceedings of the 24th international conference on machine learning, ICML (pp. 193–200). ISBN 978-1-59593-793-3.

[8]

Finamore A, Mellia M, Meo M, Munafo MM, Torino PD, and Rossi D Experiences of internet traffic monitoring with tstat IEEE Network 2011 25 3 8-14 ISSN 0890-8044

[9]

Gaikwad, P., Mandal, A., Ruth, P., Juve, G., Krol, D., & Deelman, E. (2016). Anomaly detection for scientific workflow applications on networked clouds. In International conference on high performance computing and simulation.

[10]

Gunter, D., Tierney, B. L., Brown, A., Swany, M., Bresnahan, J., & Schopf, J. M. (2007). Log summarization and anomaly detection for troubleshooting distributed systems. In Proceedings of the 8th IEEE/ACM international conference on grid computing, GRID (pp. 226–234). ISBN 978-1-4244-1559-5.

[11]

Hanemann A, Boote JW, Boyd EL, Durand J, Kudarimoti L, Łapacz R, Martin Swany D, Trocha S, and Zurawski J Benatallah B, Casati F, and Traverso P PerfSONAR: A service oriented architecture for multi-domain network monitoring Service-Oriented Computing—ICSOC 2005 2005 Berlin Springer

[12]

Hofstede, R., Celeda, P., Trammell, B., Drago, I., Sadre, R., Sperotto, A., & Pras, A. (2014). Flow monitoring explained: From packet capture to data analysis with NetFlow and IPFIX. In IEEE Communications Surveys and Tutorials (pp. 2037–2064). IEEE Communications Society.

[13]

Hubert, B., Graf, T., Maxwell, G., van Mook, R., van Oosterhout, M., Schroeder, P., Spaans, J., & Larroy, P. (2002). Linux advanced routing and traffic control. In Ottawa Linux Symposium (vol. 213).

[14]

Iperf. (2000). https://iperf.fr/.

[15]

Jasinska, E. (2006). Sflow, I can feel your traffic. In Amsterdam Internet Exchange (AMS-IX).

[16]

Jiang D, Zhengzheng X, Zhang P, and Zhu T A transform domain-based anomaly detection approach to network-wide traffic J. Netw. Comput. Appl. 2014 40 C 292-306

[17]

Jolliffe Ian Principal Component Analysis International Encyclopedia of Statistical Science 2011 Berlin, Heidelberg Springer Berlin Heidelberg 1094-1096

[18]

Lakhina, A., Crovella, M., & Diot, C. (2004). Diagnosing network-wide traffic anomalies. In SIGCOMM (pp. 219–230). ISBN 1-58113-862-8.

[19]

Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature distributions. In Proceedings of the 2005 conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM) (pp. 217–228).

[20]

Liu, F., Ming, K. T., & Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM (pp. 413–422). ISBN 978-0-7695-3502-9.

[21]

Mellia, M. (2002). TCP statistic and analysis tool. IEEE Network, 16.

[22]

Mellia M, Meo M, Muscariello L, and Rossi D Passive analysis of TCP anomalies Computer Networks 2008 52 663-2676

[23]

Mellia M, Meo M, Muscariello L, and Rossi D Passive analysis of TCP anomalies Computer Networks 2008 52 14 2663-2676 ISSN 1389-1286

[24]

Mirza M, Sommers J, Barford P, and Zhu X A machine learning approach to TCP throughput prediction IEEE/ACM Transactions on Networking 2010 18 4 1026-1039

[25]

Muscariello, L., Mellia, M., & Meo, M. (2006). Passive identification and analysis of tcp anomalies. Distributed Cooperative Laboratories: Networking, Instrumentation, and Measurements.

[26]

Palmieri F and Fiore U Network anomaly detection through nonlinear analysis Computers and Security 2010 29 7 737-755

[27]

Parichehreh, A., Alfredsson, S., & Brunstrom, A. (2018). Measurement analysis of TCP congestion control algorithms in LTE uplink. In Network traffic measurement and analysis conference.

[28]

Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th international conference on machine learning, ICML (pp. 759–766).

[29]

Rossi, D., Mellia, M., & Casetti, C. (2003). User patience and the web: A hands-on investigation. Global Telecommunications Conference.

[30]

Singh, A., Rao, A., Purawat, S., & Altintas, I. (2017). A machine learning approach for modular workflow performance prediction. In Proceedings of the 12th workshop on workflows in support of large-scale science, WORKS (pp. 7:1–7:11). ISBN 978-1-4503-5129-4.

[31]

Trevisan, M., Drago, I., & Mellia, M. (2018). Measuring web speed from passive traces. In ACM, IRTF and ISOC applied networking research workshop 2018 (ANRW 18).

[32]

Trevisan M, Finamore A, Mellia M, Munafo M, and Rossi D IEEE Communications Magazine 2017 55 3 163-169 ISSN 0163-6804

[33]

Vassio, L., Figuereido, F., Paula, A., da Silva, C., Mellia, M., & Almeida, J. (2017). Mining and modeling web trajectories from passive traces. IEEE Bigtable.

[34]

Wang, H., Gong, Z., Guan, Q., & Wang, B. (2008). Detection network anomalies based on packet and flow analysis. In International conference on networking.

[35]

Yang, M., Liu, X., Kroeger, W., Sim, A., & Wu, K. (2018). Identifying anomalous file transfer events in LCLS workflow. In Proceedings of the 1st international workshop on autonomous infrastructure for science, AI-Science (pp. 7:1–7:4). ISBN 978-1-4503-5862-0.

[36]

Zander, S., Nguyen, T., & Armitage, G. (2005). Automated traffic classification and application identification using machine learning. In Proceedings of the The IEEE conference on local computer networks 30th anniversary, LCN (pp. 250–257). ISBN 0-7695-2421-4.

[37]

Zhang, J., & Zulkernine, M. (2006). Anomaly based network intrusion detection with unsupervised outlier detection. In IEEE Communications.

Cited By

Trofti PPătraşon AHîji A(2022)Unsupervised Abnormal Traffic Detection through Topological Flow Analysis2022 14th International Conference on Communications (COMM)10.1109/COMM54429.2022.9817285(1-6)Online publication date: 16-Jun-2022
https://dl.acm.org/doi/10.1109/COMM54429.2022.9817285

Index Terms

Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP

Index terms have been assigned to the content through auto-classification.

Recommendations

CADI: Contextual Anomaly Detection using an Isolation-Forest
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing

Reconstructing the data inner structure and identifying abnormal points are two major tasks in many data analysis processes. A step beyond the decomposition of a data set as inliers and outliers, that then may be interpreted as anomalies, is to ...
Active Learning-based Isolation Forest (ALIF): Enhancing anomaly detection with expert feedback
Abstract
The detection of anomalous behaviours is an emerging need in many applications, particularly in contexts where security and reliability are critical. The definition of anomaly varies depending on the domain; however, it is often impractical or ...
Isolation Forest
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that ...

Comments

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 109, Issue 5

May 2020

242 pages

ISSN:0885-6125

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2020

Accepted: 13 February 2020

Revision received: 16 October 2019

Received: 02 February 2019

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy (US)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Trofti PPătraşon AHîji A(2022)Unsupervised Abnormal Traffic Detection through Topological Flow Analysis2022 14th International Conference on Communications (COMM)10.1109/COMM54429.2022.9817285(1-6)Online publication date: 16-Jun-2022
https://dl.acm.org/doi/10.1109/COMM54429.2022.9817285

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents