Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3528082.3544834acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

PISketch: finding persistent and infrequent flows

Published: 22 August 2022 Publication History

Abstract

1Finding persistent and inactive activity periods is very helpful in practice, for example to detect intrusion activities. Most of the literature focuses on finding persistent flows or frequent flows. No previous work is able to find persistent and infrequent flows. In this paper, we propose a novel sketch data structure, PISketch, to find persistent and infrequent flows in real time. The key idea of PISketch is to define a weight and its Reward and Penalty System for each flow to combine and balance the information of both persistency and infrequency, and to keep high-weighted flows in a limited space through a strategy. We implement PISketch on P4 and CPU platforms, and compare the performance of PISketch with two strawman solutions (On-Off + CM sketch, and PIE + CM sketch), in terms of finding persistent and infrequent flows. Our experimental results demonstrate the advantage of PISketch, by comparing it to two strawman solutions: 1) The F1 Score of PISketch is around 22.1% and 57.6% higher than two strawman solutions, respectively; 2) The Average Relative Error (ARE) of PISketch is around 820.9 (up to 1188.8) and 126.2 (up to 265.6) times lower than two strawman solutions, respectively; 3) The insertion throughput of PISketch is around 1.23 and 16.5 times higher than two strawman solutions, respectively. Moreover, we implement two concrete cases of PISketch through end-to-end experiments. All of our codes are available at GitHub.

References

[1]
Yinda Zhang, Jinyang Li, Yutian Lei, Tong Yang, Zhetao Li, Gong Zhang, and Bin Cui. On-off sketch: a fast and accurate sketch on persistence. Proc. VLDB Endow., 14(2):128--140, 2021.
[2]
Haipeng Dai, Muhammad Shahzad, Alex X. Liu, Meng Li, Yuankun Zhong, and Guihai Chen. Identifying and estimating persistent items in data streams. IEEE/ACM Transactions on Networking, 26(6):2429--2442, 2018.
[3]
Haipeng Dai, Meng Li, Alex X. Liu, Jiaqi Zheng, and Guihai Chen. Finding persistent items in distributed datasets. IEEE/ACM Transactions on Networking, 28(1):1--14, 2020.
[4]
He Huang, Yu-E Sun, Chaoyi Ma, Shigang Chen, You Zhou, Wenjian Yang, Shaojie Tang, Hongli Xu, and Yan Qiao. An efficient k-persistent spread estimator for traffic measurement in high-speed networks. IEEE/ACM Transactions on Networking, 28(4):1463--1476, 2020.
[5]
Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. One sketch to rule them all: Rethinking network flow monitoring with univmon. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16), page 101--114, 2016.
[6]
Qun Huang, Xin Jin, Patrick PC Lee, Runhui Li, Lu Tang, Yi-Chao Chen, and Gong Zhang. Sketchvisor: Robust network measurement for software packet processing. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17), pages 113--126, 2017.
[7]
Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. Elastic sketch: Adaptive and fast network-wide measurements. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '18), pages 561--575, 2018.
[8]
Tong Yang, Haowei Zhang, Jinyang Li, Junzhi Gong, Steve Uhlig, Shigang Chen, and Xiaoming Li. Heavykeeper: An accurate algorithm for finding top-k elephant flows. IEEE/ACM Transactions on Networking, 27(5):1845--1858, 2019.
[9]
Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman, Roy Friedman, and Vyas Sekar. Nitrosketch: Robust and general sketch-based monitoring in software switches. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '19), page 334--350, 2019.
[10]
Yinda Zhang, Zaoxing Liu, Ruixin Wang, Tong Yang, Jizhou Li, Ruijie Miao, Peng Liu, Ruwen Zhang, and Junchen Jiang. Cocosketch: High-performance sketch-based measurement over arbitrary partial key query. In Proceedings of the 2021 ACM SIGCOMM Conference (SIGCOMM '21), page 207--222, 2021.
[11]
Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. Sketchlib: Enabling efficient sketch-based monitoring on programmable switches. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 743--759, 2022.
[12]
Eric Cole. Advanced Persistent Threat: Understanding the Danger and How to Protect Your Organization. Syngress Publishing, 2012.
[13]
Adel Alshamrani, Sowmya Myneni, Ankur Chowdhary, and Dijiang Huang. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Communications Surveys and Tutorials, 21(2):1851--1877, 2019.
[14]
fatedier/frp: A fast reverse proxy to help you expose a local server behind a nat or firewall to the internet. https://github.com/fatedier/frp.
[15]
Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[16]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. Pfabric: Minimal near-optimal datacenter transport. In Proceedings of the 2013 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '13), page 435--446, 2013.
[17]
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y Zhao, et al. Packet-level telemetry in large datacenter networks. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15), pages 479--491, 2015.
[18]
Tong Yang, Gaogang Xie, YanBiao Li, Qiaobin Fu, Alex X. Liu, Qi Li, and Laurent Mathy. Guarantee ip lookup performance with fib explosion. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14), pages 39--50, 2014.
[19]
Abhishek Kumar, Jun Xu, and Jia Wang. Space-code bloom filter for efficient per-flow traffic measurement. IEEE Journal on Selected Areas in Communications, 24(12):2327--2339, 2006.
[20]
Source code and more details related to PISketch. https://github.com/pkufzc/PISketch.
[21]
Bibudh Lahiri, Jaideep Chandrashekar, and Srikanta Tirthapura. Space-efficient tracking of persistent items in a massive data stream. In Proceedings of the 5th ACM International Conference on Distributed Event-Based System (DEBS '11), pages 255--266, 2011.
[22]
You Zhou, Yian Zhou, Min Chen, and Shigang Chen. Persistent spread measurement for big network data based on register intersection. Proc. ACM Meas. Anal. Comput. Syst., 1(1):1--29, 2017.
[23]
David Eppstein, Michael T Goodrich, Frank Uyeda, and George Varghese. What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, 41(4):218--229, 2011.
[24]
Michael T. Goodrich and Michael Mitzenmacher. Invertible bloom lookup tables. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792--799, 2011.
[25]
Amin Shokrollahi, Michael Luby, et al. Raptor codes. Foundations and trends® in communications and information theory, 6(3--4):213--322, 2011.
[26]
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Yin Zhang, Peter A Dinda, Ming-Yang Kao, and Gokhan Memik. Reversible sketches: enabling monitoring and analysis over high-speed data streams. IEEE/ACM Transactions on Networking, 15(5):1059--1072, 2007.
[27]
Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, and Yan Chen. Sketch-based change detection: Methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement (IMC '03), pages 234--247, 2003.
[28]
Cristian Estan and George Varghese. New directions in traffic measurement and accounting. ACM SIGCOMM Computer Communication Review, 32(4):323--336, 2002.
[29]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693--703, 2002.
[30]
Tao Li, Shigang Chen, and Yibei Ling. Per-flow traffic measurement through randomized counter sharing. IEEE/ACM Transactions on Networking, 20(5):1622--1634, 2012.
[31]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented sketch: Faster and more accurate stream processing. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD), pages 1449--1463, 2016.
[32]
Tong Yang, Yang Zhou, Hao Jin, Shigang Chen, and Xiaoming Li. Pyramid sketch: A sketch framework for frequency estimation of data streams. Proc. VLDB Endow., 10(11):1442--1453, 2017.
[33]
Tong Yang, Junzhi Gong, Haowei Zhang, Lei Zou, Lei Shi, and Xiaoming Li. Heavyguardian: Separate and guard hot items in data streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pages 2584--2593, 2018.
[34]
Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, and Steve Uhlig. Cold filter: A meta-framework for faster and more accurate stream processing. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 741--756, 2018.
[35]
Yikai Zhao, Kaicheng Yang, Zirui Liu, Tong Yang, Li Chen, Shiyi Liu, Naiqian Zheng, Ruixin Wang, Hanbo Wu, Yi Wang, and Nicholas Zhang. Lightguardian: A Full-Visibility, lightweight, in-band telemetry system using sketchlets. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 991--1010, 2021.
[36]
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[37]
Barefoot tofino: World's fastest p4-programmable ethernet switch asics. https://barefootnetworks.com/products/brief-tofino/.
[38]
Bob jenkins' hash function web page, paper published in dr dobb's journal. http://burtleburtle.net/bob/hash/evahash.html.
[39]
The CAIDA Anonymized Internet Traces. https://www.caida.org/catalog/datasets/overview/.
[40]
MAWI Working Group Traffic Archive. http://mawi.wide.ad.jp/mawi/.
[41]
The Network dataset Internet Traces. http://snap.stanford.edu/data/.
[42]
Longkang Shang, Dong Guo, Yuede Ji, and Qiang Li. Discovering unknown advanced persistent threat using shared features mined by neural networks. Computer Networks, 189:107937, 2021.
[43]
Jiazhong Lu, Kai Chen, Zhongliu Zhuo, and XiaoSong Zhang. A temporal correlation and traffic analysis approach for apt attacks detection. Cluster Computing, 22(3):7347--7358, 2019.
[44]
Jiayu Tan and Jian Wang. Detecting advanced persistent threats based on entropy and support vector machine. In International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), pages 153--165, 2018.
[45]
Deana Shick and Angela Horneman. Investigating advanced persistent threat 1 (apt1). Technical Report CMU/SEI-2014-TR-001, Carnegie Mellon University, 2014.
[46]
Mila Parkour (2013) Contagio malware data-base. https://www.mediafire.com/folder/c2az029ch6cke/TRAFFIC_PATTERNS_COLLECTION#734479hwy1b97.
[47]
The supplementary material of PISketch. https://github.com/pkufzc/PISketch/blob/main/PISketch_Supplementary_Material.pdf.
[48]
Tcpdump Examples. https://hackertarget.com/tcpdump-examples/.

Cited By

View all
  • (2024)A Generic Framework for Finding Special Quadratic Elements in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.339202932:4(3269-3284)Online publication date: Aug-2024
  • (2024)PIB Sketch: Accurately Tracking Persistent and Infrequent Flows with Bursty Characteristic2024 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59868.2024.10850354(1-8)Online publication date: 22-Nov-2024
  • (2024)BurstDetector: Real-Time and Accurate Across-Period Burst Detection in High-Speed NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621114(2338-2347)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. PISketch: finding persistent and infrequent flows

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FFSPIN '22: Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures
    August 2022
    36 pages
    ISBN:9781450393294
    DOI:10.1145/3528082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. infrequent items
    2. persistent items
    3. sketch
    4. weight fusion

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China (NSFC)
    • Key-Area Research and Development Program of Guangdong Province

    Conference

    SIGCOMM '22
    Sponsor:
    SIGCOMM '22: ACM SIGCOMM 2022 Conference
    August 22, 2022
    Amsterdam, Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Generic Framework for Finding Special Quadratic Elements in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.339202932:4(3269-3284)Online publication date: Aug-2024
    • (2024)PIB Sketch: Accurately Tracking Persistent and Infrequent Flows with Bursty Characteristic2024 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59868.2024.10850354(1-8)Online publication date: 22-Nov-2024
    • (2024)BurstDetector: Real-Time and Accurate Across-Period Burst Detection in High-Speed NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621114(2338-2347)Online publication date: 20-May-2024
    • (2024)KTSketch: Finding k-Persistent t-Spread Flows in High-Speed NetworksWeb and Big Data10.1007/978-981-97-7241-4_21(326-342)Online publication date: 28-Aug-2024
    • (2023)Dichotomy Graph Sketch: Summarizing Graph Streams with High Accuracy Based on Deep LearningApplied Sciences10.3390/app13241330613:24(13306)Online publication date: 16-Dec-2023
    • (2023)PISketch: Finding Persistent and Infrequent FlowsIEEE/ACM Transactions on Networking10.1109/TNET.2023.327228731:6(3191-3206)Online publication date: 11-May-2023
    • (2023)Finding Simplex Items in Data Streams2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00152(1953-1966)Online publication date: Apr-2023
    • (2022)Top-k heavy weight triangles listing on graph streamWorld Wide Web10.1007/s11280-022-01117-z26:4(1827-1851)Online publication date: 17-Nov-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media