Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

R2C2: A Network Stack for Rack-scale Computers

Published: 17 August 2015 Publication History

Abstract

Rack-scale computers, comprising a large number of micro-servers connected by a direct-connect topology, are expected to replace servers as the building block in data centers. We focus on the problem of routing and congestion control across the rack's network, and find that high path diversity in rack topologies, in combination with workload diversity across it, means that traditional solutions are inadequate. We introduce R2C2, a network stack for rack-scale computers that provides flexible and efficient routing and congestion control. R2C2 leverages the fact that the scale of rack topologies allows for low-overhead broadcasting to ensure that all nodes in the rack are aware of all network flows. We thus achieve rate-based congestion control without any probing; each node independently determines the sending rate for its flows while respecting the provider's allocation policies. For routing, nodes dynamically choose the routing protocol for each flow in order to maximize overall utility. Through a prototype deployed across a rack emulation platform and a packet-level simulator, we show that R2C2 achieves very low queuing and high throughput for diverse and bursty workloads, and that routing flexibility can provide significant throughput gains.

Supplementary Material

WEBM File (p551-costa.webm)

References

[1]
H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic Routing in Future Data Centers. In SIGCOMM, 2010.
[2]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010.
[3]
M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI, 2012.
[4]
M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM, 2013.
[5]
J. M. andJeff Shamma. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games and Economic Behavior, 2012.
[6]
S. Angel, H. Ballani, T. Karagiannis, G. O'Shea, and E. Thereska. End-to-end Performance Isolation through Virtual Datacenters . In OSDI, 2014.
[7]
K. Asanovic. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers. In FAST, 2014. Keynote.
[8]
B. Awerbuch, R. Khandekar, and S. Rao. Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework. ACM Trans. Algorithms, 9(1), Dec. 2012.
[9]
S. Balakrishnan, R. Black, A. Donnelly, P. England, A. Glass, D. Harper, S. Legtchenko, A. Ogus, E. Peterson, and A. Rowstron. Pelican: A Building Block for Exascale Cold Data Storage. In OSDI, 2014.
[10]
H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards Predictable Datacenter Networks. In SIGCOMM, 2011.
[11]
H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O'Shea. Chatty Tenants and the Cloud Network Sharing Problem. In NSDI, 2013.
[12]
D. Bertsekas and R. Gallager. Data Networks. Prentice Hall, 1987.
[13]
D. Bertsimas and J. Tsitsiklis. Simulated Annealing. Statistical Science, 8(1), 1993.
[14]
R. S. Cahn. Wide Area Network Design: Concepts and Tools for Optimization. Morgan Kaufmann, 1998.
[15]
M. Chowdhury and I. Stoica. Coflow: A Networking Abstraction for Cluster Applications. In HotNets, 2012.
[16]
P. Costa, H. Ballani, and D. Narayanan. Rethinking the Network Stack for Rack-scale Computers. In HotCloud, 2014.
[17]
Cray Inc. Modifying Your Application to Avoid Aries Network Congestion, 2013.
[18]
Cray Inc. Network Resiliency for Cray XC30 Systems, 2013.
[19]
A. Daglis, S. Novaković, E. Bugnion, B. Falsafi, and B. Grot. Manycore Network Interfaces for In-memory Rack-scale Computing. In ISCA, 2015.
[20]
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2003.
[21]
J. Dean and L. A. Barroso. The Tail at Scale. Communications of ACM, 2013.
[22]
A. A. Dixit, P. Prakash, Y. C. Hu, and R. R. Kompella. On the Impact of Packet Spraying in Data Center Networks. In INFOCOM, 2013.
[23]
F. R. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In SIGCOMM, 2014.
[24]
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In NSDI, 2014.
[25]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009.
[26]
S. Han, N. Egi, A. Panda, S. Ratnasamy, G. Shi, and S. Shenker. Network Support for Resource Disaggregation in Next-generation Datacenters. In HotNets, 2013.
[27]
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975.
[28]
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012.
[29]
K. Jang, J. Sherry, H. Ballani, and T. Moncaster. Silo: Predictable Message Latency in the Cloud. In SIGCOMM, 2015.
[30]
V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI, 2013.
[31]
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA Efficiently for Key-value Services. In SIGCOMM, 2014.
[32]
S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of data center traffic: measurements & analysis. In IMC, 2009.
[33]
D. Nace, N.-L. Doan, E. Gourdin, and B. Liau. Computing Optimal Max-min Fair Resource Allocation for Elastic Flows. IEEE/ACM Trans. Netw., 14(6), Dec. 2006.
[34]
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ASPLOS, 2014.
[35]
G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects. In SIGCOMM, 2012.
[36]
J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A Centralized "Zero-queue" Datacenter Network. In SIGCOMM, 2014.
[37]
L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the Network in Cloud Computing. In SIGCOMM, 2012.
[38]
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In ISCA, 2014.
[39]
S. Radhakrishnan, Y. Geng, V. Jeyakumar, A. Kabbani, G. Porter, and A. Vahdat. SENIC: Scalable NIC for End-Host Rate Limiting. In NSDI, 2014.
[40]
B. Radunović and J.-Y. L. Boudec. A Unified Framework for Max-min and Min-max Fairness with Applications. IEEE/ACM Trans. Netw., 15(5), Oct. 2007.
[41]
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving Datacenter Performance and Robustness with Multipath TCP. In SIGCOMM, 2011.
[42]
T. Roughgarden and E. Tardos. How Bad is Selfish Routing? J. ACM, 2002.
[43]
B. Schroeder and G. A. Gibson. Understanding Failures in Petascale Computers. Journal of Physics, 78, 2007.
[44]
A. Singh, W. J. Dally, B. Towles, and A. K. Gupta. Locality-preserving Randomized Oblivious Routing on Torus Networks. In SPAA, 2002.
[45]
L. G. Valiant and G. J. Brebner. Universal Schemes for Parallel Communication. In STOC, 1981.
[46]
B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware Datacenter TCP (D$^2$TCP). In SIGCOMM, 2012.
[47]
M. Walraed-Sullivan, J. Padhye, and D. A. Maltz. Theia: Simple and Cheap Networking for Ultra-Dense Data Centers. In HotNets, 2014.
[48]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never Than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011.
[49]
H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang. MDCube: A High Performance Network Structure for Modular Data Center Interconnection. In CoNEXT, 2009.
[50]
Amazon joins other web giants trying to design its own chips. http://bit.ly/1J5t0fE.
[51]
Boston Viridis Data Sheet. http://bit.ly/1fBnsQ9.
[52]
Calxeda EnergyCore ECX-1000. http://bit.ly/1nCgdHO.
[53]
Design Guide for Photonic Architecture. http://bit.ly/NYpT1h.
[54]
Google Ramps Up Chip Design. http://ubm.io/1iQooNe.
[55]
How Microsoft Designs its Cloud-Scale Servers. http://bit.ly/1HKCy27.
[56]
HP Moonshot System. http://bit.ly/1mZD4yJ.
[57]
Intel Atom Processor D510. http://intel.ly/1wJmS3D.
[58]
Intel, Facebook Collaborate on Future Data Center Rack Technologies. http://intel.ly/MRpOM0.
[59]
Intel Rack Scale Architecture. http://ubm.io/1iejjx5.
[60]
Maze: A Rack-scale Computer Emulation Platform. http://aka.ms/maze.
[61]
RDMA Aware Networks Programming User Manual. http://bit.ly/1ysVa1O.
[62]
SeaMicro SM15000 Fabric Compute Systems. http://bit.ly/1hQepIh.

Cited By

View all
  • (2024)Uniform-Cost Multi-Path Routing for Reconfigurable Data Center NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672245(433-448)Online publication date: 4-Aug-2024
  • (2024)KLNK: Expanding Page Boundaries in a Distributed Shared Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340988235:9(1524-1535)Online publication date: 1-Sep-2024
  • (2024)A biological-like synthesis framework for software engineering environmentsInternational Journal of Computers and Applications10.1080/1206212X.2023.230118346:4(208-217)Online publication date: 9-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCOMM Computer Communication Review
ACM SIGCOMM Computer Communication Review  Volume 45, Issue 4
SIGCOMM'15
October 2015
659 pages
ISSN:0146-4833
DOI:10.1145/2829988
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
    August 2015
    684 pages
    ISBN:9781450335423
    DOI:10.1145/2785956
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2015
Published in SIGCOMM-CCR Volume 45, Issue 4

Check for updates

Author Tags

  1. cloud computing
  2. congestion control
  3. data center networks
  4. networks
  5. rack-scale computers
  6. rack-scale network stack
  7. route selection
  8. transport protocols

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)18
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Uniform-Cost Multi-Path Routing for Reconfigurable Data Center NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672245(433-448)Online publication date: 4-Aug-2024
  • (2024)KLNK: Expanding Page Boundaries in a Distributed Shared Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340988235:9(1524-1535)Online publication date: 1-Sep-2024
  • (2024)A biological-like synthesis framework for software engineering environmentsInternational Journal of Computers and Applications10.1080/1206212X.2023.230118346:4(208-217)Online publication date: 9-Jan-2024
  • (2023)TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory ProgramsACM Transactions on Storage10.1145/358313919:2(1-30)Online publication date: 13-Feb-2023
  • (2022)An ultra-low latency and compatible PCIe interconnect for rack-scale communicationProceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569128(232-244)Online publication date: 30-Nov-2022
  • (2022)Memory Network Architecture for Packet Processing in Functions VirtualizationIEEE Transactions on Network and Service Management10.1109/TNSM.2022.315909119:3(3304-3322)Online publication date: Sep-2022
  • (2022)Asynchronous Optical Traffic Offloading of Hybrid Optical/Electrical Data Center NetworksIEEE Transactions on Cloud Computing10.1109/TCC.2020.299248910:2(805-820)Online publication date: 1-Apr-2022
  • (2022)A reconfigurable rack-scale interconnect architecture based on PCIe fabric2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988411(306-310)Online publication date: 28-Oct-2022
  • (2021)SRFabric: A Semi-Reconfigurable Rack Scale TopologyMathematical Problems in Engineering10.1155/2021/66989572021(1-14)Online publication date: 25-Apr-2021
  • (2020)GraphTMProceedings of the 21st International Conference on Distributed Computing and Networking10.1145/3369740.3369774(1-10)Online publication date: 4-Jan-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media