Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Query Planning for Robust and Scalable Hybrid Network Telemetry Systems

Published: 28 March 2024 Publication History

Abstract

Network telemetry systems have become hybrid combinations of state-of-the-art stream processors and modern programmable data-plane devices. However, the existing designs of such systems have not focused on ensuring that these systems are also deployable in practice, i.e., able to scale and deal with the dynamics in real-world traffic and query workloads. Unfortunately, efforts to scale these hybrid systems are hampered by severe constraints on available compute resources in the data plane (e.g., memory, ALUs). Similarly, the limited runtime programmability of existing hardware data-plane targets critically affects efforts to make these systems robust. This paper presents the design and implementation of DynaMap, a new hybrid telemetry system that is both robust and scalable. By planning for telemetry queries dynamically, DynaMap allows the remapping of stateful dataflow operators to data-plane registers at runtime. We model the problem of mapping dataflow operators to data-plane targets formally and develop a new heuristic algorithm for solving this problem. We implement our algorithm in prototype and demonstrate its feasibility with existing hardware targets based on Intel Tofino. Using traffic workloads from different real-world production networks, we show that our prototype of DynaMap improves performance on average by 1-2 orders of magnitude over state-of-the-art hybrid systems that use only static query planning.

References

[1]
Y. Altmann, S. McLaughlin, and N. Dobigeon. 2014. Sampling from a multivariate Gaussian distribution truncated on a simplex: A review. In 2014 IEEE Workshop on Statistical Signal Processing (SSP).
[2]
Kevin Borders, Jonathan Springer, and Matthew Burnside. 2012. Chimera: A declarative language for streaming network traffic analysis. In USENIX Security Symposium.
[3]
X. Chen, S. Landau-Feibish, M. Braverman, and J. Rexford. 2020. BeauCoup: Answering Many Network Traffic Queries, One Memory Update at a Time. In Proc. ACM SIGCOMM.
[4]
Yu-Ting Chen, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When Apache Spark Meets FPGAs: A Case Study for next-Generation DNA Sequencing Acceleration. In Proceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing (Denver, CO) (HotCloud'16). USENIX Association, USA, 64--70.
[5]
C. Cranor, T. Johnson, O. Spatschek, and V. Shkapenyuk. 2003. Gigascope: A stream database for network applications. In Proc. ACM SIGMOD.
[6]
A. Deshpande, Z. Ives, and V. Raman. 2007. Adaptive Query Planning. In Foundations and Trends in Databases.
[7]
Edgecore. 2022. Programmable Tofino switches for data centers. https://www.edge-core.com/productsInfo.php?id=335.
[8]
Yong Feng, Haoyu Song, Jiahao Li, Zhikang Chen, Wenquan Xu, and Bin Liu. 2021. In-Situ Programmable Switching Using RP4: Towards Runtime Data Plane Programmability. In Proc. ACM HotNets'21.
[9]
A. Gupta, R. Harrison, M. Canini, N. Feamster, J. Rexford, and W. Willinger. 2018. Sonata: Query-driven network telemetry. In Proc. ACM SIGCOMM.
[10]
N. Handigol, B. Heller, V. Jeyakumar, D. Mazières, and N. McKeown. 2014. I know what your packet did last hop: Using packet histories to troubleshoot networks. In Proc. NSDI.
[11]
Rob Harrison, Shir Landau Feibish, Arpit Gupta, Ross Teixeira, S. Muthukrishnan, and Jennifer Rexford. 2020. Carpe Elephants: Seize the Global Heavy Hitters. In Proc. of ACM SIGCOMM SPIN Workshop (SPIN '20).
[12]
J.M. Hellerstein. 2017. Query optimization. In In P. Bailis, J.M. Hellerstein, and M. Stonebraker, editors, Readings in Database Systems (Chapter 7).
[13]
Heather Hulett, Todd G. Will, and Gerhard J. Woeginger. 2008. Multigraph realizations of degree sequences: Maximization is easy, minimization is hard. Oper. Res. Lett. 36, 5 (2008), 594--596.
[14]
Mobin Javed and Vern Paxson. 2013. Detecting stealthy, distributed SSH brute-forcing. In ACM SIGSAC Conference on Computer & Communications Security. 85--96.
[15]
Jaeyeon Jung, Vern Paxson, Arthur W Berger, and Hari Balakrishnan. 2004. Fast portscan detection using sequential hypothesis testing. In IEEE Symposium on Security and Privacy. IEEE, 211--225.
[16]
C. Kim, A. Sivaraman, N. Katta, A. Bas, A. Dixit, and L.J. Wobker. 2015. In-band network telemetry via programmable dataplanes. In Proc. SOSR.
[17]
Y. Li, K. Gao, X. Jin, and W. Xu. 2020. Concerto: Cooperative Network-Wide Telemetry with Controllable Error Rate. In Proc. APSys.
[18]
Y. Li, R. Miao, C. Kim, and M. Yu. 2016. FlowRadar: A Better NetFlow for Data Centers. In Proc. NSDI.
[19]
Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman, Roy Friedman, and Vyas Sekar. 2019. Nitrosketch: Robust and General Sketch-Based Monitoring in Software Switches. In ACM SIGCOMM.
[20]
Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. 2016. One sketch to rule them all: Rethinking Network Flow Monitoring with UnivMon. In ACM SIGCOMM.
[21]
Chris Misa, Walt O'Connor, Durairajan Ramakrishnan, Reza Rejaie, and Walter Willinger. 2022. Dynamic Scheduling of Approximate Telemetry Queries. In Proc. USENIX NSDI'22.
[22]
MoonGen. 2021. MoonGen Packet Generator. http://scholzd.github.io/MoonGen/.
[23]
M. Moshref, M. Yu, R. Govindan, and A. Vahdat. 2014. DREAM: dynamic resource allocation for software-defined measurement. In Proc. ACM SIGCOMM.
[24]
Masoud Moshref, Minlan Yu, Ramesh Govindan, and Amin Vahdat. 2016. Trumpet: Timely and precise triggers in data centers. In Proceedings of the 2016 ACM SIGCOMM Conference. 129--143.
[25]
S. Narayana, A. Sivaraman, V. Nathan, P. Goyal, V. Arun, M. Alizadeh, V. Jeyakumar, and C. Kim. 2017. Language- Directed Hardware Design for Network Performance Monitoring. In Proc. ACM SIGCOCMM.
[26]
opensoc 2015. OpenSOC. http://opensoc.github.io/.
[27]
P4. 2022. P4Runtime Specification. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html.
[28]
report [n. d.]. Apache Flink. http://flink.apache.org/.
[29]
Richard Serfozo. 1999. Introduction to Stochastic Networks.
[30]
John Sonchack, Adam J Aviv, Eric Keller, and Jonathan M Smith. 2018. Turboflow: Information rich flow record generation on commodity switches. In Proceedings of the Thirteenth EuroSys Conference. 1--16.
[31]
J. Sonchack, O. Michel, A.J. Aviv, E. Keller, and Smith J.M. 2018. Scaling Hardware Accelerated Monitoring to Concurrent and Dynamic Queries With ?Flow. In Proc. ATC.
[32]
Praveen Tammana, Rachit Agarwal, and Myungjin Lee. 2016. Simplifying datacenter network debugging with pathdump. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 233--248.
[33]
Praveen Tammana, Rachit Agarwal, and Myungjin Lee. 2018. Distributed network monitoring and debugging with switchpointer. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 453--456.
[34]
Y. Tokusashi, H.T. Dang, F. Pedone, R. Soule, and N. Zilberman. 2019. The Case For In-Network Computing On Demand. In Proc. EuroSys.
[35]
url [n. d.]. Apache Spark. http://spark.apache.org/.
[36]
url [n. d.]. Barefoot's Tofino. https://www.barefootnetworks.com/technology/.
[37]
url [n. d.]. Sonata Queries. https://github.com/sonata-queries/sonata-queries.
[38]
Weitao Wang, Xinyu Crystal Wu, Praveen Tammana, Ang Chen, and TS Eugene Ng. 2022. Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon. In USENIX NSDI.
[39]
Wikipedia. 2022. F-score. https://en.wikipedia.org/wiki/F-score.
[40]
Peter R Winters. 1960. Forecasting sales by exponentially weighted moving averages. Management science 6, 3 (1960), 324--342.
[41]
Jiarong Xing, Yiming Qiu, Kuo-Feng Hsu, Hongyi Liu, Matty Kadosh, Alan Lo, Aditya Akella, Thomas Anderson, Arvind Krishnamurthy, T. S. Eugene Ng, and Ang Chen. 2021. A Vision for Runtime Programmable Networks. In Proc. ACM HotNets'21.
[42]
Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. 2018. Elastic sketch: Adaptive and fast network-wide measurements. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 561--575.
[43]
Da Yu, Yibo Zhu, Behnaz Arzani, Rodrigo Fonseca, Tianrong Zhang, Karl Deng, and Lihua Yuan. 2019. dShark: A general, easy to program and scalable framework for analyzing in-network packet traces. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 207--220.
[44]
M. Yu. 2019. Network Telemetry: Towards A Top-Down Approach. In ACM SIGCOMM Computer Communication Review.
[45]
Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software Defined Traffic Measurement with OpenSketch. In USENIX NSDI.
[46]
Y. Yuan, D. Lin, A. Mishra, S. Marwaha, R. Alur, and B. T. Loo. 2017. Quantitative Network Monitoring with NetQRE. In Proc. ACM SIGCOMM.
[47]
Hao Zheng, Chen Tian, Tong Yang, Huiping Lin, Chang Liu, Zhaochen Zhang, Wanchun Dou, and Guihai Chen. 2022. FlyMon: enabling on-the-fly task reconfiguration for network measurement. In Proceedings of the ACM SIGCOMM 2022 Conference. 486--502.
[48]
Yu Zhou, Dai Zhang, Kai Gao, Chen Sun, Jiamin Cao, Yangyang Wang, Mingwei Xu, and Jianping Wu. 2020. Newton: intent-driven network traffic monitoring. In ACM CoNEXT. 295--308.
[49]
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, and Haitao Zheng. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM.

Cited By

View all

Index Terms

  1. Query Planning for Robust and Scalable Hybrid Network Telemetry Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Networking
      Proceedings of the ACM on Networking  Volume 2, Issue CoNEXT1
      PACMNET
      March 2024
      95 pages
      EISSN:2834-5509
      DOI:10.1145/3655593
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 March 2024
      Published in PACMNET Volume 2, Issue CoNEXT1

      Check for updates

      Author Tags

      1. analytics
      2. programmable switches
      3. stream processing

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 146
        Total Downloads
      • Downloads (Last 12 months)146
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media