Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Real-time Spread Burst Detection in Data Streaming

Published: 22 May 2023 Publication History

Abstract

Data streaming has many applications in network monitoring, web services, e-commerce, stock trading, social networks, and distributed sensing. This paper introduces a new problem of real-time burst detection in flow spread, which differs from the traditional problem of burst detection in flow size. It is practically significant with potential applications in cybersecurity, network engineering, and trend identification on the Internet. It is a challenging problem because estimating flow spread requires us to remember all past data items and detecting bursts in real time requires us to minimize spread estimation overhead, which was not the priority in most prior work. This paper provides the first efficient, real-time solution for spread burst detection. It is designed based on a new real-time super spreader identifier, which outperforms the state of the art in terms of both accuracy and processing overhead. The super spreader identifier is in turn based on a new sketch design for real-time spread estimation, which outperforms the best existing sketches.

References

[1]
n. d.]. Amazon Kinesis Data Streams. https://aws.amazon.com/kinesis/data-streams/.
[2]
Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Shir Landau Feibish, Danny Raz, and Minlan Yu. 2020. Routing Oblivious Measurement Analytics. In 2020 IFIP Networking Conference (Networking). IEEE, 449--457.
[3]
Ran Ben Basat, Gil Einziger, Michael Mitzenmacher, and Shay Vargaftik. 2021. SALSA: Self-adjusting Lean Streaming Analytics. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 864--875.
[4]
Ran Ben-Basat, Gil Einziger, Shir Landau Feibish, Jalil Moraney, Bilal Tayh, and Danny Raz. 2021. Routing-Oblivious Network-Wide Measurements. IEEE/ACM Transactions on Networking, Vol. 29, 6 (2021), 2386--2398.
[5]
Jing Cao, Yu Jin, Aiyou Chen, Tian Bu, and Z-L Zhang. 2009. Identifying High Cardinality Internet Hosts. In IEEE INFOCOM 2009. IEEE, 810--818.
[6]
G. Cormode. 2011. Sketch Techniques for Approximate Query Processing. Foundations and Trends in Sample, NOW publishers (2011).
[7]
Graham Cormode and S Muthukrishnan. 2005. Space Efficient Mining of Multigraph Streams. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 271--282.
[8]
M. Durand and P. Flajolet. 2003. Loglog Counting of Large Cardinalities. In European Symposium on Algorithms. Springer, 605--617.
[9]
C. Estan, G. Varghese, and M. Fisk. 2003. Bitmap Algorithms for Counting Active Flows on High Speed Links. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. 153--166.
[10]
C. Estan, G. Varghese, and M. Fisk. 2006. Bitmap Algorithms for Counting Active Flows on High-speed Links. IEEE/ACM Transactions on Networking, Vol. 14, 5 (2006), 925--937.
[11]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. Hyperloglog: The Analysis of a Near-optimal Cardinality Estimation Algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science, 137--156.
[12]
P. Flajolet and G N. Martin. 1985. Probabilistic Counting Algorithms for Data Base Applications. Journal of computer and system sciences, Vol. 31, 2 (1985), 182--209.
[13]
J. Gong, T. Yang, H. Zhang, H. Li, S. Uhlig, S. Chen, L. Uden, and X. Li. 2018. HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 909--921. https://www.usenix.org/conference/atc18/presentation/gong
[14]
Amit Goyal, Hal Daumé III, and Graham Cormode. 2012. Sketch Algorithms for Estimating Point Queries in NLP. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. 1093--1103.
[15]
S. Heule, M. Nunkesser, and A. Hall. 2013. HyperLogLog in Practice: Algorithmic Engineering of a State-of-The-Art Cardinality Estimation Algorithm. Proc. of EDBT (2013).
[16]
Kaggle. 2020. eCommerce behavior data from multi category store (Dec. 2019-April 2020). https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store?resource=download.
[17]
Noriaki Kamiyama, Tatsuya Mori, and Ryoichi Kawahara. 2007. Simple and Adaptive Identification of Superspreaders by Flow Sampling. In IEEE INFOCOM 2007--26th IEEE International Conference on Computer Communications. IEEE, 2481--2485.
[18]
T. Li, S. Chen, and Y. Ling. 2011. Fast and Compact Per-Flow Traffic Measurement through Randomized Counter Sharing. IEEE INFOCOM (2011).
[19]
Weijiang Liu, Wenyu Qu, Jian Gong, and Keqiu Li. 2015b. Detection of Superpoints using a Vector Bloom Filter. IEEE Transactions on Information Forensics and Security, Vol. 11, 3 (2015), 514--527.
[20]
Yang Liu, Wenji Chen, and Yong Guan. 2015a. Identifying High-cardinality Hosts From Network-wide Traffic Measurements. IEEE Transactions on Dependable and Secure Computing, Vol. 13, 5 (2015), 547--558.
[21]
Chaoyi Ma, Haibo Wang, Olufemi Odegbile, and Shigang Chen. 2021. Noise Measurement and Removal for Data Streaming Algorithms with Network Applications. In 2021 IFIP Networking Conference (IFIP Networking). IEEE, 1--9.
[22]
David Moore, Colleen Shannon, and K Claffy. 2002. Code-Red: A Case Study on the Spread and Victims of an Internet Worm. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment. 273--284.
[23]
Debjyoti Paul, Yanqing Peng, and Feifei Li. 2019. Bursty Event Detection Throughout Histories. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1370--1381.
[24]
Lu Tang, Qun Huang, and Patrick PC Lee. 2020. SpreadSketch: Toward Invertible and Network-wide Detection of Superspreaders. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 1608--1617.
[25]
Daniel Ting. 2014. Streamed Approximate Counting of Distinct Elements: Beating Optimal Batch Methods. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 442--451.
[26]
UCSD. 2019. The CAIDA Anonymized Internet Traces Dataset (April 2008 -January 2019). https://www.caida.org/catalog/datasets/passive_dataset/.
[27]
Shobha Venkataraman, Dawn Song, Phillip B Gibbons, and Avrim Blum. 2004. New Streaming Algorithms for Fast Detection of Superspreaders. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa School Of Computer Science.
[28]
Haibo Wang, Chaoyi Ma, Shigang Chen, and Yuanda Wang. 2022a. Fast and Accurate Cardinality Estimation by Self-Morphing Bitmaps. IEEE/ACM Transactions on Networking (2022).
[29]
Haibo Wang, Chaoyi Ma, Shigang Chen, and Yuanda Wang. 2022b. Online Cardinality Estimation by Self-morphing Bitmaps. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1--13.
[30]
Haibo Wang, Chaoyi Ma, Olufemi O Odegbile, Shigang Chen, and Jih-Kwon Peir. 2021. Randomized error removal for online spread estimation in data streaming. Proceedings of the VLDB Endowment, Vol. 14, 6 (2021), 1040--1052.
[31]
Haibo Wang, Chaoyi Ma, Olufemi O Odegbile, Shigang Chen, and Jih-Kwon Peir. 2022c. Randomized Error Removal for Online Spread Estimation in High-Speed Networks. IEEE/ACM Transactions on Networking (2022).
[32]
L. Wang, T. Yang, H. Wang, J. Jiang, Z. Cai, B. Cui, and X. Li. 2019. Fine-grained Probability Counting for Cardinality Estimation of Data Streams. World Wide Web, Vol. 22, 5 (2019), 2065--2081.
[33]
Pinghui Wang, Xiaohong Guan, Tao Qin, and Qiuzhen Huang. 2011. A Data Streaming Method for Monitoring Host Connection Degrees of High-speed Links. IEEE Transactions on Information Forensics and Security, Vol. 6, 3 (2011), 1086--1098.
[34]
K. Whang, B. T Vander-Zanden, and H. M Taylor. 1990. A Linear-time Probabilistic Counting Algorithm for Database Applications. ACM Transactions on Database Systems (TODS), Vol. 15, 2 (1990), 208--229.
[35]
Wikipedia. 2023. Corrected sample standard error. https://en.wikipedia.org/wiki/Standard_deviation.
[36]
Q. Xiao, S. Chen, M. Chen, and Y. Ling. 2015. Hyper-compact Virtual Estimators for Big Network Data Based on Register Sharing. In ACM SIGMETRICS Performance Evaluation Review, Vol. 43. ACM, 417--428.
[37]
Qingjun Xiao, Shigang Chen, You Zhou, Min Chen, Junzhou Luo, Tengli Li, and Yibei Ling. 2017a. Cardinality Estimation for Elephant Flows: A Compact Solution based on Virtual Register Sharing. IEEE/ACM Transactions on Networking, Vol. 25, 6 (2017), 3738--3752.
[38]
Q. Xiao, S. Chen, Y. Zhou, and J. Luo. 2020. Estimating Cardinality for Arbitrarily Large Data Stream with Improved Memory Efficiency. IEEE/ACM Transactions on Networking, Vol. 28, 2 (2020), 433--446.
[39]
Q. Xiao, Y. Zhou, and S. Chen. 2017b. Better with Fewer Bits: Improving the Performance of Cardinality Estimation of Large Data Streams. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 1--9.
[40]
Wei Xie, Feida Zhu, Jing Jiang, Ee-Peng Lim, and Ke Wang. 2016. Topicsketch: Real-time Bursty Topic Detection from Twitter. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 8 (2016), 2216--2229.
[41]
T. Yang, H. Zhou, Y.and Jin, S. Chen, and X. Li. 2017. Pyramid Sketch: A Sketch Framework for Frequency Estimation of Data Streams. Proceedings of the VLDB Endowment, Vol. 10, 11 (2017), 1442--1453.
[42]
M. Yoon, T. Li, S. Chen, and J. Peir. 2009. Fit a Spread Estimator in Small Memory. In IEEE INFOCOM 2009. IEEE, 504--512.
[43]
M. Yu, L. Jose, and R. Miao. 2013. Software Defined Traffic Measurement with OpenSketch. Proc. of USENIX Symposium on Networked Systems Design and Implementation (2013).
[44]
Qi Zhao, Abhishek Kumar, and Jun (Jim) Xu. 2005. Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations. In Internet Measurement Conference. 77--90.
[45]
Zheng Zhong, Shen Yan, Zikun Li, Decheng Tan, Tong Yang, and Bin Cui. 2021. BurstSketch: Finding Bursts in Data Streams. In Proceedings of the 2021 International Conference on Management of Data. 2375--2383.
[46]
Y. Zhou, T. Yang, J. Jiang, B. Cui, M. Yu, X. Li, and S. Uhlig. 2018. Cold Filter: A Meta-framework for Faster and More Accurate Stream Processing. In Proceedings of the 2018 International Conference on Management of Data. 741--756.
[47]
Y. Zhou, Y. Zhang, C. Ma, S. Chen, and O. O Odegbile. 2019. Generalized Sketch Families for Network Traffic Measurement. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 3, 3 (2019), 1--34.
[48]
Cliff Changchun Zou, Weibo Gong, and Don Towsley. 2002. Code Red Worm Propagation Modeling and Analysis. In Proceedings of the 9th ACM conference on Computer and communications security. 138--147.

Cited By

View all
  • (2024)Enhancing Accuracy for Super Spreader Identification in High-Speed Data StreamsProceedings of the VLDB Endowment10.14778/3681954.368198817:11(3124-3137)Online publication date: 30-Aug-2024
  • (2024)PmTrackProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314337:4(1-30)Online publication date: 12-Jan-2024
  • (2023)Real-time Spread Burst Detection in Data StreamingACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359356651:1(51-52)Online publication date: 27-Jun-2023

Index Terms

  1. Real-time Spread Burst Detection in Data Streaming

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
      Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 7, Issue 2
      POMACS
      June 2023
      247 pages
      EISSN:2476-1249
      DOI:10.1145/3599176
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 May 2023
      Published in POMACS Volume 7, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data streaming
      2. real-time
      3. spread burst

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)226
      • Downloads (Last 6 weeks)30
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Enhancing Accuracy for Super Spreader Identification in High-Speed Data StreamsProceedings of the VLDB Endowment10.14778/3681954.368198817:11(3124-3137)Online publication date: 30-Aug-2024
      • (2024)PmTrackProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314337:4(1-30)Online publication date: 12-Jan-2024
      • (2023)Real-time Spread Burst Detection in Data StreamingACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359356651:1(51-52)Online publication date: 27-Jun-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media