Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3387514.3405876acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Microscope: Queue-based Performance Diagnosis for Network Functions

Published: 30 July 2020 Publication History

Abstract

By moving monolithic network appliances to software running on commodity hardware, network function virtualization allows flexible resource sharing among network functions and achieves scalability with low cost. However, due to resource contention, network functions can suffer from performance problems that are hard to diagnose. In particular, when many flows traverse a complex topology of NF instances, it is hard to pinpoint root causes for a flow experiencing performance issues such as low throughput or high latency. Simply maintaining resource counters at individual NFs is not sufficient since the effect of resource contention can propagate across NFs and over time. In this paper, we introduce Microscope, a performance diagnosis tool, for network functions that leverages queuing information at NFs to identify the root causes (i.e., resources, NFs, traffic patterns of flows etc.). Our evaluation on realistic NF chains and traffic shows that we can correctly capture root causes behind 89.7% of performance impairments, up to 2.5 times more than the state-of-the-art tools with low overhead.

Supplementary Material

MP4 File (3387514.3405876.mp4)
This is a 20-minute introduction video for SIGCOMM 2020 paper "Microscope: Queue-based Performance Diagnosis for Network Functions".

References

[1]
Brocade vyatta 5400 vrouter. http://www.brocade.com/products/all/network-functions-virtualization/product-details/5400-vrouter/index.page.
[2]
The cooperative association for internet data analysis (caida). http://www.caida.org/.
[3]
Data plane development kit. https://www.dpdk.org/.
[4]
Evolution of the broadband network gateway. https://www.tmcnet.com/tmc/whitepapers/documents/whitepapers/2013/6756-evolution-the-broadband-network-gateway.pdf.
[5]
Ieee standard 1588-2008. http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4579757.
[6]
Jaeger: open source, end-to-end distributed tracing. https://www.jaegertracing.io/.
[7]
Microscope survey form and results. https://www.dropbox.com/s/66cp4k3wl8zm0q5/survey.pdf?dl=0.
[8]
Migration to ethernet-based broadband aggregation. https://www.broadband-forum.org/download/TR-101_Issue-2.pdf.
[9]
Nfv proofs of concept. http://www.etsi.org/technologies-clusters/technologies/nfv/nfv-poc.
[10]
Open vswitch. https://www.openvswitch.org/.
[11]
Vpp. https://fd.io/.
[12]
Zipkin: A distributed tracing system. https://zipkin.io/.
[13]
Omid Alipourfard and Minlan Yu. Decoupling algorithms and optimizations in network functions. In Proceedings of the 17th ACM Workshop on Hot Topics in Networks, pages 71--77, 2018.
[14]
Bilal Anwer, Theophilus Benson, Nick Feamster, and Dave Levin. Programming slick network functions. In Proceedings of the 1st acm sigcomm symposium on software defined networking research, pages 1--13, 2015.
[15]
Muhammad Bilal Anwer, Murtaza Motiwala, Mukarram bin Tariq, and Nick Feamster. Switchblade: A platform for rapid deployment of network protocols on programmable hardware. In Proceedings of the ACM SIGCOMM 2010 conference, pages 183--194, 2010.
[16]
Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth Kandula, David A Maltz, and Ming Zhang. Towards highly reliable enterprise network services via inference of multi-level dependencies. ACM SIGCOMM Computer Communication Review, 37(4):13--24, 2007.
[17]
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richard Mortier. Using magpie for request extraction and workload modelling. In OSDI, volume 4, pages 18--18, 2004.
[18]
Anat Bremler-Barr, Yotam Harchol, and David Hay. Openbox: a software-defined framework for developing, deploying, and managing network functions. In Proceedings of the 2016 ACM SIGCOMM Conference, pages 511--524. ACM, 2016.
[19]
Mike Y Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric Brewer. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings International Conference on Dependable Systems and Networks, pages 595--604. IEEE, 2002.
[20]
Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. {HUG}: Multi-resource fairness for correlated and elastic demands. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 16), pages 407--424, 2016.
[21]
Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. Route-bricks: exploiting parallelism to scale software routers. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 15--28, 2009.
[22]
Nick G Duffield and Matthias Grossglauser. Trajectory sampling for direct traffic observation. IEEE/ACM transactions on networking, 9(3):280--292, 2001.
[23]
Daniel E Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. Maglev: A fast and reliable software network load balancer. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 16), pages 523--535, 2016.
[24]
Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, and Georg Carle. Moongen: A scriptable high-speed packet generator. In Proceedings of the 2015 Internet Measurement Conference, pages 275--287, 2015.
[25]
Cristian Estan, Stefan Savage, and George Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 137--148, 2003.
[26]
Rodrigo Fonseca, George Porter, Randy H Katz, and Scott Shenker. X-trace: A pervasive network tracing framework. In 4th {USENIX } Symposium on Networked Systems Design & Implementation ( {NSDI} 07), 2007.
[27]
Rohan Gandhi, Hongqiang Harry Liu, Y Charlie Hu, Guohan Lu, Jitendra Padhye, Lihua Yuan, and Ming Zhang. Duet: Cloud scale load balancing with hardware and software. ACM SIGCOMM Computer Communication Review, 44(4):27--38, 2014.
[28]
Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 18), pages 81--94, 2018.
[29]
Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. Apunet: Revitalizing {GPU } as packet processing accelerator. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 17), pages 83--96, 2017.
[30]
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. Softnic: A software nic to augment hardware. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-155, 2015.
[31]
Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. Packetshader: a gpu-accelerated software router. ACM SIGCOMM Computer Communication Review, 40(4):195--206, 2010.
[32]
Muhammad Asim Jamshed, Jihyung Lee, Sangwoo Moon, Insu Yun, Deokjin Kim, Sungryoul Lee, Yung Yi, and KyoungSoo Park. Kargus: a highly-scalable software-based intrusion detection system. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 317--328. ACM, 2012.
[33]
Keon Jang, Sangjin Han, Seungyeop Han, Sue B Moon, and KyoungSoo Park. Sslshader: Cheap ssl acceleration with commodity processors. In NSDI, pages 1--14, 2011.
[34]
Murad Kablan, Azzam Alsudais, Eric Keller, and Franck Le. Stateless network functions: Breaking the tight coupling of state and processing. In 14th {USENIX } Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 97--112, 2017.
[35]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 19), pages 345--360, 2019.
[36]
Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitendra Padhye, and Paramvir Bahl. Detailed diagnosis in enterprise networks. ACM SIGCOMM Computer Communication Review, 39(4):243--254, 2009.
[37]
Rishi Kapoor, Alex C Snoeren, Geoffrey M Voelker, and George Porter. Bullet trains: a study of nic burst behavior at microsecond timescales. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, pages 133--138, 2013.
[38]
Georgios P Katsikas, Tom Barbette, Dejan Kostic, Rebecca Steinert, and Gerald Q Maguire Jr. Metron:{NFV} service chains at the true speed of the underlying hardware. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 18), pages 171--186, 2018.
[39]
Joongi Kim, Keon Jang, Keunhong Lee, Sangwook Ma, Junhyun Shim, and Sue Moon. Nba (network balancing act): A high-performance packet processing framework for heterogeneous processors. In Proceedings of the Tenth European Conference on Computer Systems, page 22. ACM, 2015.
[40]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M Frans Kaashoek. The click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263--297, 2000.
[41]
Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C. Snoeren. Ip fault localization via risk modeling. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI'05, pages 57--70, Berkeley, CA, USA, 2005. USENIX Association.
[42]
Sameer G Kulkarni, Wei Zhang, Jinho Hwang, Shriram Rajagopalan, KK Ramakrishnan, Timothy Wood, Mayutan Arumaithurai, and Xiaoming Fu. Nfvnice: Dynamic backpressure and scheduling for nfv service chains. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pages 71--84. ACM, 2017.
[43]
Bojie Li, Kun Tan, Layong Larry Luo, Yanqing Peng, Renqian Luo, Ningyi Xu, Yongqiang Xiong, Peng Cheng, and Enhong Chen. Clicknp: Highly flexible and high performance network processing with reconfigurable hardware. In Proceedings of the 2016 ACM SIGCOMM Conference, pages 1--14. ACM, 2016.
[44]
Jonathan Mace, Peter Bodik, Rodrigo Fonseca, and Madanlal Musuvathi. Retro: Targeted resource management in multi-tenant distributed systems. In 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15), pages 589--603, 2015.
[45]
Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. Pivot tracing: Dynamic causal monitoring for distributed systems. ACM Transactions on Computer Systems (TOCS), 35(4):1--28, 2018.
[46]
Karthik Nagaraj, Charles Killian, and Jennifer Neville. Structured comparative analysis of systems logs to diagnose performance problems. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ( {NSDI} 12), pages 353--366, 2012.
[47]
Jaehyun Nam, Junsik Seo, and Seungwon Shin. Probius: Automated approach for vnf and service chain analysis in software-defined nfv. In Proceedings of the Symposium on SDN Research, pages 1--13, 2018.
[48]
Shoumik Palkar, Chang Lan, Sangjin Han, Keon Jang, Aurojit Panda, Sylvia Ratnasamy, Luigi Rizzo, and Scott Shenker. E2: a framework for nfv applications. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 121--136, 2015.
[49]
Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. Netbricks: Taking the v out of {NFV}. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pages 203--216, 2016.
[50]
Luigi Rizzo. Netmap: a novel framework for fast packet i/o. In 21st USENIX Security Symposium (USENIX Security 12), pages 101--112, 2012.
[51]
Chen Sun, Jun Bi, Zhilong Zheng, Heng Yu, and Hongxin Hu. Nfp: Enabling network function parallelism in nfv. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pages 43--56. ACM, 2017.
[52]
Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. Ernest: efficient performance prediction for large-scale advanced analytics. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16), pages 363--378, 2016.
[53]
Wenfei Wu, Keqiang He, and Aditya Akella. Perfsight: Performance diagnosis for software dataplanes. In Proceedings of the 2015 Internet Measurement Conference, pages 409--421, 2015.
[54]
Shaula Alexander Yemini, Shmuel Kliger, Eyal Mozes, Yechiam Yemini, and David Ohsie. High speed and robust event correlation. IEEE communications Magazine, 34(5):82--90, 1996.
[55]
Kai Zhang, Bingsheng He, Jiayu Hu, Zeke Wang, Bei Hua, Jiayi Meng, and Lishan Yang. G-net: Effective {GPU} sharing in {NFV} systems. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18), pages 187--200, 2018.
[56]
Yang Zhang, Bilal Anwer, Vijay Gopalakrishnan, Bo Han, Joshua Reich, Aman Shaikh, and Zhi-Li Zhang. Parabox: Exploiting parallelism for virtual network functions in service chaining. In Proceedings of the Symposium on SDN Research, pages 143--149, 2017.

Cited By

View all
  • (2024)CollaSFC: An Intelligent Collaborative Approach for In-network SFC Failure Detection in Data Center for AI ComputingProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673798(41-47)Online publication date: 4-Aug-2024
  • (2024)AIRIC: Orchestration of Virtualized Radio Access Networks With Noisy NeighboursIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.333974942:2(432-445)Online publication date: Feb-2024
  • (2024)Non-invasive performance prediction of high-speed softwarized network services with limited knowledgeIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621097(2328-2337)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '20: Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication
July 2020
814 pages
ISBN:9781450379557
DOI:10.1145/3387514
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NFV
  2. diagnosis
  3. performance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSF

Conference

SIGCOMM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CollaSFC: An Intelligent Collaborative Approach for In-network SFC Failure Detection in Data Center for AI ComputingProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673798(41-47)Online publication date: 4-Aug-2024
  • (2024)AIRIC: Orchestration of Virtualized Radio Access Networks With Noisy NeighboursIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.333974942:2(432-445)Online publication date: Feb-2024
  • (2024)Non-invasive performance prediction of high-speed softwarized network services with limited knowledgeIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621097(2328-2337)Online publication date: 20-May-2024
  • (2024)Syscall Analysis for Resource Stress Identification for Container Network Functions2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00037(256-266)Online publication date: 7-Jul-2024
  • (2023)ChameleMon: Shifting Measurement Attention as Network State ChangesProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604850(881-903)Online publication date: 10-Sep-2023
  • (2023)Buffer-Based High-Coverage and Low-Overhead Request Event Monitoring in the CloudIEEE/ACM Transactions on Networking10.1109/TNET.2022.322461031:4(1732-1747)Online publication date: Aug-2023
  • (2023)DTFL: A Digital Twin-Assisted Graph Neural Network Approach for Service Function Chains Failure LocalizationIEEE Transactions on Cloud Computing10.1109/TCC.2023.329450611:4(3573-3590)Online publication date: Oct-2023
  • (2022)A Survey of NFV Network Acceleration from ETSI PerspectiveElectronics10.3390/electronics1109145711:9(1457)Online publication date: 2-May-2022
  • (2022)PrintQueueProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544257(516-529)Online publication date: 22-Aug-2022
  • (2022)ScaleFlux: Efficient Stateful Scaling in NFVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.320420933:12(4801-4817)Online publication date: 1-Dec-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media