Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3386367.3432879acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Chameleon: predictable latency and high utilization with queue-aware and adaptive source routing

Published: 24 November 2020 Publication History

Abstract

This paper presents Chameleon, a cloud network providing both predictable latency and high utilization, typically two conflicting goals, especially in multi-tenant datacenters. Chameleon exploits routing flexibilities available in modern communication networks to dynamically adapt toward the demand, and uses network calculus principles along individual paths. More specifically, Chameleon employs source routing on the "queue-level topology", a network abstraction that accounts for the current states of the network queues and, hence, the different delays of different paths. Chameleon is based on a simple greedy algorithm and can be deployed at the edge; it does not require any modifications of network devices. We implement and evaluate Chameleon in simulations and a real testbed. Compared to state-of-the-art, we find that Chameleon can admit and embed significantly, i.e., up to 15 times more flows, improving network utilization while meeting strict latency guarantees.

Supplementary Material

MP4 File (3386367.3432879.mp4)
Presentation Video

References

[1]
[n. d.]. EU-project VirtuWind, Deliverable D3.2, Detailed Intra-Domain SDN & NFV Architecture. http://www.virtuwind.eu/. ([n. d.]). Accessed: 2020-01-30.
[2]
[n. d.]. Use Cases IEC/IEEE 60802 v1.3. http://www.ieee802.org/1/files/public/docs2018/60802-industrial-use-cases-0918-v13.pdf. ([n. d.]). Accessed: 2020-01-30.
[3]
Ariel Adam, Amnon Ilan, and Thomas Nadeau. [n. d.]. Introduction to virtio-networking and vhost-net (Red Hat Blog). https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net. ([n. d.]). Accessed: 2020-02-02.
[4]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2011. Data center tcp (dctcp). ACM SIGCOMM computer communication review 41, 4 (2011), 63--74.
[5]
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less is more: trading a little bandwidth for ultra-low latency in the data center. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 19--19.
[6]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pfabric: Minimal near-optimal datacenter transport. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 435--446.
[7]
Wei Bai, Li Chen, Kai Chen, and Haitao Wu. 2016. Enabling ECN in Multi-Service Multi-Queue Data Centers. In NSDI. 537--549.
[8]
Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. 2011. Towards predictable datacenter networks. In ACM SIGCOMM computer communication review, Vol. 41. ACM, 242--253.
[9]
Hitesh Ballani, Keon Jang, Thomas Karagiannis, Changhoon Kim, Dinan Gunawardena, and Greg O'Shea. 2013. Chatty Tenants and the Cloud Network Sharing Problem. In Nsdi, Vol. 13. 171--184.
[10]
Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In NSDI. 407--424.
[11]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
[12]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. In ACM SIGOPS operating systems review, Vol. 41. ACM, 205--220.
[13]
J-D Decotignie. 2005. Ethernet-based real-time and industrial communications. Proc. IEEE 93, 6 (2005), 1102--1117.
[14]
Edsger W Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269--271.
[15]
Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, and Georg Carle. 2015. Moongen: A scriptable high-speed packet generator. In Proceedings of the 2015 Internet Measurement Conference. 275--287.
[16]
Nick Feamster and Jennifer Rexford. 2017. Why (and how) networks should run themselves. arXiv preprint arXiv:1710.11583 (2017).
[17]
Daniel Firestone. 2017. VFP: A Virtual Switch Platform for Host SDN in the Public Cloud. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 315--328.
[18]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: SmartNICs in the public cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 51--66.
[19]
Klaus-Tycho Foerster, Stefan Schmid, and Stefano Vissicchio. 2018. Survey of consistent software-defined network updates. IEEE Communications Surveys & Tutorials 21, 2 (2018), 1435--1461.
[20]
Piotr Gaj, Jurgen Jasperneite, and Max Felser. 2013. Computer communication within industrial distributed environment - A survey. In IEEE Transactions on Industrial Informatics, Vol. 9. IEEE, 182--189.
[21]
Stewart Grant, Anil Yelam, Maxwell Bland, and Alex C Snoeren. 2020. SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 681--693.
[22]
Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. 2015. Queues don't matter when you can jump them!. In NSDI. 1--14.
[23]
Jochen W Guck, Amaury Van Bemten, Martin Reisslein, and Wolfgang Kellerer. 2017. Unicast QoS routing algorithms for SDN: A comprehensive survey and performance evaluation. IEEE Communications Surveys & Tutorials 20, 1 (2017), 388--415.
[24]
Chuanxiong Guo, Guohan Lu, Helen J Wang, Shuang Yang, Chao Kong, Peng Sun, Wenfei Wu, and Yongguang Zhang. 2010. Secondnet: a data center network virtualization architecture with bandwidth guarantees. In Proceedings of the 6th International COnference. ACM, 15.
[25]
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. SoftNIC: A software NIC to augment hardware. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-155 (2015).
[26]
Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re-architecting datacenter networks and stacks for low latency and high performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 29--42.
[27]
Chi-Yao Hong, Matthew Caesar, and P Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication. ACM, 127--138.
[28]
Shuihai Hu, Wei Bai, Kai Chen, Chen Tian, Ying Zhang, and Haitao Wu. 2016. Providing bandwidth guarantees, work conservation and low latency simultaneously in the cloud. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1--9.
[29]
Takeru Inoue. 2018. Reliability analysis for disjoint paths. IEEE Transactions on Reliability 68, 3 (2018), 985--998.
[30]
Virajith Jalaparti, Peter Bodik, Srikanth Kandula, Ishai Menache, Mikhail Rybalkin, and Chenyu Yan. 2013. Speeding up distributed request-response workflows. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 219--230.
[31]
Keon Jang, Justine Sherry, Hitesh Ballani, and Toby Moncaster. 2015. Silo: Predictable message latency in the cloud. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 435--448.
[32]
Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Changhoon Kim, and Albert Greenberg. 2013. EyeQ: Practical network performance isolation at the edge. REM 1005, A1 (2013), A2.
[33]
Alpar Jüttner, Balazs Szviatovski, Ildikó Mécs, and Zsolt Rajkó. 2001. Lagrange relaxation based method for the QoS routing problem. In Proc. IEEE INFOCOM, Vol. 2. 859--868.
[34]
Sotirios Katsikeas, Konstantinos Fysarakis, Andreas Miaoudakis, Amaury Van Bemten, Ioannis Askoxylakis, Ioannis Papaefstathiou, and Anargyros Plemenos. 2017. Lightweight & secure industrial IoT communications via the MQ telemetry transport protocol. In 2017 IEEE Symposium on Computers and Communications (ISCC). IEEE, 1193--1200.
[35]
Fernando A. Kuipers. 2012. An overview of algorithms for network survivability. International Scholarly Research Notices 2012 (2012).
[36]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, et al. 2020. Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 514--528.
[37]
Praveen Kumar, Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble, Nate Foster, et al. 2019. PicNIC: predictable virtualized NIC. In Proceedings of the ACM Special Interest Group on Data Communication. 351--366.
[38]
Maciej Kuźniar, Peter Perešíni, and Dejan Kostić. 2014. What you need to know about SDN control and data planes. Technical Report.
[39]
Vinh The Lam, Sivasankar Radhakrishnan, Rong Pan, Amin Vahdat, and George Varghese. 2012. Netshare and stochastic netshare: predictable bandwidth allocation for data centers. ACM SIGCOMM Computer Communication Review 42, 3 (2012), 5--11.
[40]
Jean-Yves Le Boudec and Patrick Thiran. 2012. Network Calculus: A Theory of Deterministic Queuing Systems for the Internet. Springer.
[41]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, et al. 2019. HPCC: high precision congestion control. In Proceedings of the ACM Special Interest Group on Data Communication. 44--58.
[42]
Endace Technology Limited. 2016. Endace DAG 7.5G4 Datasheet". https://www.endace.com/dag-7.5g4-datasheet.pdf. (2016). Accessed: 2018-10-26.
[43]
Fangming Liu, Jian Guo, Xiaomeng Huang, and John CS Lui. 2016. eBA: Efficient bandwidth guarantee under traffic variability in datacenters. IEEE/ACM Transactions on Networking 25, 1 (2016), 506--519.
[44]
Zhuotao Liu, Kai Chen, Haitao Wu, Shuihai Hu, Yih-Chun Hut, Yi Wang, and Gong Zhang. 2018. Enabling Work-Conserving Bandwidth Guarantees for Multi-Tenant Datacenters via Dynamic Tenant-Queue Binding. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 1--9.
[45]
William M Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C Snoeren, and George Porter. 2019. Expanding across time to deliver bandwidth efficiency and low latency. arXiv preprint arXiv:1903.12307 (2019).
[46]
Jeffrey C Mogul and Lucian Popa. 2012. What we talk about when we talk about cloud network performance. ACM SIGCOMM Computer Communication Review 42, 5 (2012), 44--48.
[47]
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. ACM, 221--235.
[48]
Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2014. Fastpass: A centralized zero-queue datacenter network. In ACM SIGCOMM Computer Communication Review, Vol. 44. 307--318.
[49]
Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, and Ion Stoica. 2012. FairCloud: sharing the network in cloud computing. ACM SIGCOMM Computer Communication Review 42, 4 (2012), 187--198.
[50]
Lucian Popa, Praveen Yalagandula, Sujata Banerjee, Jeffrey C Mogul, Yoshio Turner, and Jose Renato Santos. 2013. Elasticswitch: Practical work-conserving bandwidth guarantees for cloud computing. In ACM SIGCOMM Computer Communication Review, Vol. 43. ACM, 351--362.
[51]
Diana Andreea Popescu. 2019. Latency-driven performance in data centres. Ph.D. Dissertation. University of Cambridge.
[52]
Neil Robertson and Paul D Seymour. 1995. Graph minors. XIII. The disjoint paths problem. Journal of combinatorial theory, Series B 63, 1 (1995), 65--110.
[53]
Henrique Rodrigues, Jose Renato Santos, Yoshio Turner, Paolo Soares, and Dorgival O Guedes. 2011. Gatekeeper: Supporting Bandwidth Guarantees for Multitenant Datacenter Networks. WIOV 1, 3 (2011), 784--789.
[54]
Ahmed Saeed, Nandita Dukkipati, Vytautas Valancius, Carlo Contavalli, Amin Vahdat, et al. 2017. Carousel: Scalable traffic shaping at end hosts. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 404--417.
[55]
Ermin Sakic and Wolfgang Kellerer. 2018. Impact of adaptive consistency on distributed sdn applications: An empirical study. IEEE Journal on Selected Areas in Communications 36, 12 (2018), 2702--2715.
[56]
Thilo Sauter. 2010. The three generations of field-level networks - evolution and compatibility issues. In IEEE Transactions on Industrial Electronics, Vol. 57. IEEE, 3585--3595.
[57]
Alan Shieh, Srikanth Kandula, Albert G Greenberg, Changhoon Kim, and Bikas Saha. 2011. Sharing the Data Center Network. In NSDI, Vol. 11. 23--23.
[58]
Balajee Vamanan, Jahangir Hasan, and TN Vijaykumar. 2012. Deadline-aware datacenter tcp (d2tcp). ACM SIGCOMM Computer Communication Review 42, 4 (2012), 115--126.
[59]
Amaury Van Bemten, Nemanja Ðerić, Amir Varasteh, Andreas Blenk, Stefan Schmid, and Wolfgang Kellerer. 2019. Empirical predictability study of SDN switches. In 2019 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). IEEE, 1--13.
[60]
Amaury Van Bemten, Nemanja Ðerić, Johannes Zerwas, Andreas Blenk, Stefan Schmid, and Wolfgang Kellerer. 2019. Loko: Predictable latency in small networks. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies. 355--369.
[61]
Amaury Van Bemten, Jochen W Guck, Carmen Mas Machuca, and Wolfgang Kellerer. 2018. Routing metrics depending on previous edges: The Mn taxonomy and its corresponding solutions. In 2018 IEEE International Conference on Communications (ICC). IEEE, 1--7.
[62]
Amaury Van Bemten and Wolfgang Kellerer. 2016. Network Calculus: A Comprehensive Guide. Technical University of Munich, Chair of Communication Networks, Technical Report No. 201603 (October 2016).
[63]
Bhanu Chandra Vattikonda, George Porter, Amin Vahdat, and Alex C Snoeren. 2012. Practical TDMA for datacenter ethernet. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 225--238.
[64]
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. ACM SIGCOMM Computer Communication Review 41, 4 (2011), 50--61.
[65]
Jackson Woodruff, Andrew W Moore, and Noa Zilberman. 2019. Measuring Burstiness in Data Center Applications. (2019).
[66]
Di Xie, Ning Ding, Y Charlie Hu, and Ramana Kompella. 2012. The only constant is change: incorporating time-varying network reservations in data centers. ACM SIGCOMM Computer Communication Review 42, 4 (2012), 199--210.
[67]
Jin Y Yen. 1971. Finding the k shortest loopless paths in a network. Management Science 17, 11 (1971), 712--716.
[68]
Eitan Zahavi, Alexander Shpiner, Ori Rottenstreich, Avinoam Kolodny, and Isaac Keslassy. 2019. Links as a Service (LaaS): Guaranteed tenant isolation in the shared cloud. IEEE Journal on Selected Areas in Communications 37, 5 (2019), 1072--1084.
[69]
David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, and Randy Katz. 2012. DeTail: reducing the flow completion time tail in datacenter networks. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication. ACM, 139--150.
[70]
Junxue Zhang, Wei Bai, and Kai Chen. 2019. Enabling ECN for datacenter networks with RTT variations. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies. 233--245.

Cited By

View all
  • (2024)An East-Westbound Control Architecture for Multi-Segment Deterministic Networking2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619797(732-737)Online publication date: 3-Jun-2024
  • (2024)Integrating Deterministic Networking with 5G2024 20th International Conference on Network and Service Management (CNSM)10.23919/CNSM62983.2024.10814502(1-3)Online publication date: 28-Oct-2024
  • (2024)Modellierung und Validierung von Latenzzeiten in industriellen Agentensystemen mit Digitalen Zwillingen der KI.Fabrikat - Automatisierungstechnik10.1515/auto-2023-022172:10(958-979)Online publication date: 9-Oct-2024
  • Show More Cited By

Index Terms

  1. Chameleon: predictable latency and high utilization with queue-aware and adaptive source routing

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          CoNEXT '20: Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies
          November 2020
          585 pages
          ISBN:9781450379489
          DOI:10.1145/3386367
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 24 November 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. latency
          2. network calculus
          3. predictability
          4. reconfigurations

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          CoNEXT '20
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 198 of 789 submissions, 25%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)50
          • Downloads (Last 6 weeks)6
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)An East-Westbound Control Architecture for Multi-Segment Deterministic Networking2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619797(732-737)Online publication date: 3-Jun-2024
          • (2024)Integrating Deterministic Networking with 5G2024 20th International Conference on Network and Service Management (CNSM)10.23919/CNSM62983.2024.10814502(1-3)Online publication date: 28-Oct-2024
          • (2024)Modellierung und Validierung von Latenzzeiten in industriellen Agentensystemen mit Digitalen Zwillingen der KI.Fabrikat - Automatisierungstechnik10.1515/auto-2023-022172:10(958-979)Online publication date: 9-Oct-2024
          • (2024)Routing-Aware Shaping for Feasible Multi-Domain DeterminismProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3696727(2419-2424)Online publication date: 4-Dec-2024
          • (2024)Exploiting Queue Information for Scalable Delay-Constrained Routing in Deterministic NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.343576921:5(5260-5272)Online publication date: Oct-2024
          • (2024)On the Benefits of Traffic “Reprofiling” the Multiple Hops Case—Part IIEEE/ACM Transactions on Networking10.1109/TNET.2024.339203032:4(3421-3436)Online publication date: Aug-2024
          • (2024)The Effects of Topologies on the Performance of Real-Time Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588878(346-350)Online publication date: 24-Jun-2024
          • (2024)Lightweight Determinism in Large-Scale NetworksIEEE Communications Magazine10.1109/MCOM.003.230055562:12(120-126)Online publication date: Dec-2024
          • (2024)Hirail: Core-Agnostic Deterministic Networks for Long-Distance Time-Sensitive IIoT ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2024.335789311:10(17198-17209)Online publication date: 15-May-2024
          • (2023)CrossBal: Data and Control Plane Cooperation for Efficient and Scalable Network Load Balancing2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327790(1-9)Online publication date: 30-Oct-2023
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media