Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3265723.3265731acmconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

BurstRadar: Practical Real-time Microburst Monitoring for Datacenter Networks

Published: 27 August 2018 Publication History

Abstract

Microbursts can degrade application performance in datacenters by causing increased latency, jitter and packet loss. The detection of microbursts and identification of the contributing flows is the first step towards mitigating this problem. Unfortunately, microbursts are unpredictable and typically last for 10's or 100's of μs and the high line rates (> 10 Gbps) in modern datacenter networks further exacerbate the problem. In this paper, we show that modern programmable switching ASICs have made it practical to detect and characterize microbursts at high line rates. Our system, called BurstRadar, operates in the dataplane and monitors microbursts by capturing the telemetry information for only the packets involved in microbursts. We have implemented a prototype of BurstRadar on a Barefoot Tofino switch using the P4 programming language. Our evaluation on a multi-gigabit testbed using microburst traffic distributions from Facebook's production network shows that BurstRadar incurs 10 times less data collection and processing overhead than existing solutions. Furthermore, BurstRadar can handle simultaneous microburst traffic on multiple egress ports while consuming very few resources in the switching ASIC.

References

[1]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In Proceedings of SIGCOMM.
[2]
Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of IMC.
[3]
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. SIGCOMM CCR 44, 3 (2014), 87--95.
[4]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. In Proceedings of SIGCOMM.
[5]
Cavium. 2018. XPliant Ethernet Switch Product Family. https://goo.gl/xzfLLo
[6]
Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Jennifer Rexford, and Ori Rottenstreich. 2018. Catching the Microburst Culprits with Snappy. In Proceedings of the Afternoon Workshop on Self-Driving Networks.
[7]
Cisco. 2017. Monitor Microbursts on Cisco Nexus 5600 Platform and Cisco Nexus 6000 Series Switches. https://goo.gl/5Xxhpm
[8]
Benoit Claise. 2004. Cisco Systems Netflow Services Export version 9. RFC 3954 (2004). https://tools.ietf.org/html/rfc3954
[9]
P4 Language Consortium. 2018. Baseline switch.p4. https://github.com/p4lang/switch
[10]
P4 Language Consortium. 2018. Portable Switch Architecture. https://p4.org/p4-spec/docs/PSA.html
[11]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of NSDI.
[12]
The Linux Foundation. 2018. DPDK. http://dpdk.org/
[13]
P4.org Applications Working Group. 2018. In-band Network Telemetry (INT) Dataplane Specification v1.0. https://goo.gl/HtPE9K
[14]
Nikhil Handigol, Brandon Heller, Vimalkumar Jeyakumar, David Mazières, and Nick McKeown. 2014. I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks. In Proceedings of NSDI.
[15]
Intel. 2018. FlexPipe. https://goo.gl/PzPudG
[16]
Vimalkumar Jeyakumar, Mohammad Alizadeh, Yilong Geng, Changhoon Kim, and David Mazières. 2014. Millions of little minions: Using packets for low latency network programming and visibility. In Proceedings of SIGCOMM.
[17]
Juniper Networks. 2016. Network Analytics Overview. https://goo.gl/TbNwSC
[18]
Zaid Ali Kahn. 2016. Project Falco: Decoupling Switching Hardware and Software. https://goo.gl/U7PUQZ
[19]
Rishi Kapoor, Alex C Snoeren, Geoffrey M Voelker, and George Porter. 2013. Bullet trains: a study of NIC burst behavior at microsecond timescales. In Proceedings of CoNext.
[20]
Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-band network telemetry via programmable dataplanes. In Proceedings of SIGCOMM (Poster).
[21]
Richard Martin. 2007. Wall Street's Quest To Process Data At The Speed Of Light. Information Week.
[22]
Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of SIGCOMM.
[23]
Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat Arun, Mohammad Alizadeh, Vimalkumar Jeyakumar, and Changhoon Kim. 2017. Language-directed hardware design for network performance monitoring. In Proceedings of SIGCOMM.
[24]
Arista Networks. 2015. Latency Analyzer (LANZ) Architectures and Configuration. https://goo.gl/LrRNi4
[25]
Barefoot Networks. 2018. Tofino. https://goo.gl/cdEK1E
[26]
Peter Phaal. 2004. sFlow. http://sflow.org/sflow{_}version{_}5.txt
[27]
Seladb. 2018. PcapPlusPlus. https://github.com/seladb/PcapPlusPlus
[28]
Danfeng Shan, Fengyuan Ren, Peng Cheng, and Ran Shu. 2016. Microburst in Data Centers: Observations, Implications, and Applications. arXiv preprint arXiv:1604.07621 (2016).
[29]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, et al. 2015. Jupiter rising: A decade of clos topologies and centralized control in Google's datacenter network. In Proceedings of SIGCOMM.
[30]
Frank Uyeda, Luca Foschini, Fred Baker, Subhash Suri, and George Varghese. 2011. Efficiently Measuring Bandwidth at All Time Scales. In Proceedings of NSDI.
[31]
Amin Vahdat. 2017. ONS Keynote: Cloud Native Networking. https://youtu.be/1xBZ5DGZZmQ
[32]
Qiao Zhang, Vincent Liu, and Hongyi Zeng. 2017. High-Resolution Measurement of Data Center Microbursts. In Proceedings of IMC.
[33]
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y Zhao, et al. 2015. Packet-level telemetry in large datacenter networks. In Proceedings of SIGCOMM.

Cited By

View all
  • (2024)μMon: Empowering Microsecond-level Network Monitoring with WaveletsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672236(274-290)Online publication date: 4-Aug-2024
  • (2024)In-Network Address Caching for Virtual NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672213(735-749)Online publication date: 4-Aug-2024
  • (2024)OffsetINT: Achieving High Accuracy and Low Bandwidth for In-Band Network TelemetryIEEE Transactions on Services Computing10.1109/TSC.2023.332369717:3(1072-1083)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
APSys '18: Proceedings of the 9th Asia-Pacific Workshop on Systems
August 2018
150 pages
ISBN:9781450360067
DOI:10.1145/3265723
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APSys '18
Sponsor:
APSys '18: 9th Asia-Pacific Workshop on Systems
August 27 - 28, 2018
Jeju Island, Republic of Korea

Acceptance Rates

APSys '18 Paper Acceptance Rate 18 of 48 submissions, 38%;
Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)12
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)μMon: Empowering Microsecond-level Network Monitoring with WaveletsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672236(274-290)Online publication date: 4-Aug-2024
  • (2024)In-Network Address Caching for Virtual NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672213(735-749)Online publication date: 4-Aug-2024
  • (2024)OffsetINT: Achieving High Accuracy and Low Bandwidth for In-Band Network TelemetryIEEE Transactions on Services Computing10.1109/TSC.2023.332369717:3(1072-1083)Online publication date: May-2024
  • (2024)Bottleneck Identification in Cloudified Mobile Networks based on Distributed TelemetryIEEE Transactions on Mobile Computing10.1109/TMC.2023.3312051(1-18)Online publication date: 2024
  • (2023)MFGAD-INT: in-band network telemetry data-driven anomaly detection using multi-feature fusion graph deep learningJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00492-w12:1Online publication date: 28-Aug-2023
  • (2023)MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network TelemetryProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605622(347-357)Online publication date: 7-Aug-2023
  • (2023)ChameleMon: Shifting Measurement Attention as Network State ChangesProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604850(881-903)Online publication date: 10-Sep-2023
  • (2023)Advancing SDN from OpenFlow to P4: A SurveyACM Computing Surveys10.1145/355697355:9(1-37)Online publication date: 16-Jan-2023
  • (2023)Dependable Virtualized Fabric on Programmable Data PlaneIEEE/ACM Transactions on Networking10.1109/TNET.2022.322461731:4(1748-1764)Online publication date: Aug-2023
  • (2023)Buffer-Based High-Coverage and Low-Overhead Request Event Monitoring in the CloudIEEE/ACM Transactions on Networking10.1109/TNET.2022.322461031:4(1732-1747)Online publication date: Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media