Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dcell: a scalable and fault-tolerant network structure for data centers

Published: 17 August 2008 Publication History

Abstract

A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. DCell scales doubly exponentially as the node degree increases. DCell is fault tolerant since it does not have single point of failure and its distributed fault-tolerant routing protocol performs near shortest-path routing even in the presence of severe link or node failures. DCell also provides higher network capacity than the traditional tree-based structure for various types of services. Furthermore, DCell can be incrementally expanded and a partial DCell provides the same appealing features. Results from theoretical analysis, simulations, and experiments show that DCell is a viable interconnection structure for data centers.

References

[1]
S. Akers and B. Krishnamurthy. A group-theoretic model for symmetric interconnection networks. IEEE trans. Computers, 1989.
[2]
S. Arnold. Google Version 2.0: The Calculating Predator, 2007. Infonortics Ltd.
[3]
L. Barroso, J. Dean, and U. Hölzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, March-April 2003.
[4]
A. Carter. Do It Green: Media Interview with Michael Manos, 2007. http://edge.technet.com/Media/Doing-IT-Green/.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04, 2004.
[6]
J. Duato, S. Yalamanchili, and L. Ni. Interconnection networks: an engineering approach. Morgan Kaufmann, 2003.
[7]
F. Chang et. al. Bigtable: A Distributed Storage System for Structured Data. In OSDI'06, 2006.
[8]
S. Ghemawat, H. Gobioff, and S. Leung. The Google File System. In ACM SOSP'03, 2003.
[9]
T. Hoff. Google Architecture, July 2007. http://highscalability.com/google-architecture.
[10]
Intel. High-Performance 1000BASE-SX and 1000BASE-LX Gigabit Fiber Connections for Servers. http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/pro1000mf_mf-lx.pdf.
[11]
M. Isard, M. Budiu, and Y. Yu. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In ACM EuroSys, 2007.
[12]
F. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays. Trees. Hypercubes. Morgan Kaufmann, 1992.
[13]
K. Liszka, J. Antonio, and H. Siegel. Is an Alligator Better Than an Armadillo? IEEE Concurrency, Oct-Dec 1997.
[14]
D. Loguinov, A. Kumar, V. Rai, and S. Ganesh. Graph-Theoretic Analysis of Structured Peer-to-Peer Systems: Routing Distances and Fault Resilience. In ACM SIGCOMM, 2003.
[15]
J. Moy. OSPF Version 2, April 1998. RFC 2328.
[16]
L. Ni and P. McKinley. A Survey of Wormhole Routing Techniques in Direct Networks. IEEE Computer, Feb 1993.
[17]
B. Parhami. Introduction to Parallel Processing: Algorithms and Architectures. Kluwer Academic, 2002.
[18]
Jon Postel. Internet Protocol. RFC 791.
[19]
L. Rabbe. Powering the Yahoo! network, 2006. http://yodel.yahoo.com/2006/11/27/powering-the-yahoo-network/.
[20]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In ACM SIGCOMM'01, 2001.
[21]
H. Jay Seigel, W. Nation, C. Kruskal, and L. Napolitando. Using the Multistage Cube Network Topology in Parallel Supercomputers. Proceedings of the IEEE, Dec 1989.
[22]
J. Snyder. Microsoft: Datacenter Growth Defies Moore's Law, 2007. http://www.pcworld.com/article/id,130921/article.html.
[23]
I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In ACM SIGCOMM'01, 2001.

Cited By

View all
  • (2025)2-Edge Hamiltonian connectedness: Characterization and results in data center networksApplied Mathematics and Computation10.1016/j.amc.2024.129197490(129197)Online publication date: Apr-2025
  • (2024)Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networksMathematical Biosciences and Engineering10.3934/mbe.202409321:2(2121-2136)Online publication date: 2024
  • (2024)An Improved Fault Diagnosis Algorithm for Highly Scalable Data Center NetworksMathematics10.3390/math1204059712:4(597)Online publication date: 17-Feb-2024
  • Show More Cited By

Index Terms

  1. Dcell: a scalable and fault-tolerant network structure for data centers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 38, Issue 4
      October 2008
      436 pages
      ISSN:0146-4833
      DOI:10.1145/1402946
      Issue’s Table of Contents
      • cover image ACM Conferences
        SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication
        August 2008
        452 pages
        ISBN:9781605581750
        DOI:10.1145/1402958
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 August 2008
      Published in SIGCOMM-CCR Volume 38, Issue 4

      Check for updates

      Author Tags

      1. data center
      2. fault-tolerance
      3. network topology
      4. throughput

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)596
      • Downloads (Last 6 weeks)78
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)2-Edge Hamiltonian connectedness: Characterization and results in data center networksApplied Mathematics and Computation10.1016/j.amc.2024.129197490(129197)Online publication date: Apr-2025
      • (2024)Fault-tolerant Hamiltonian cycle strategy for fast node fault diagnosis based on PMC in data center networksMathematical Biosciences and Engineering10.3934/mbe.202409321:2(2121-2136)Online publication date: 2024
      • (2024)An Improved Fault Diagnosis Algorithm for Highly Scalable Data Center NetworksMathematics10.3390/math1204059712:4(597)Online publication date: 17-Feb-2024
      • (2024)Investigating Data Center Network ProtocolsProceedings of the 2024 Applied Networking Research Workshop10.1145/3673422.3674897(91-93)Online publication date: 23-Jul-2024
      • (2024)Fault-Tolerant Communication in HSDC: Ensuring Reliable Data Transmission in Smart CitiesIEEE Transactions on Reliability10.1109/TR.2024.337195373:4(1933-1945)Online publication date: Dec-2024
      • (2024)New Techniques to Route in Folded-Clos Topology Data Center NetworksProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00116(819-828)Online publication date: 17-Nov-2024
      • (2024)Research Progress on Computing Power Network for the Power Industry2024 IEEE 12th International Conference on Information, Communication and Networks (ICICN)10.1109/ICICN62625.2024.10761814(201-210)Online publication date: 21-Aug-2024
      • (2024)Formal Algebraic Model of a Generic BCube Data Center Network Architecture2024 International Conference on Electrical, Communication and Computer Engineering (ICECCE)10.1109/ICECCE63537.2024.10823531(1-5)Online publication date: 30-Oct-2024
      • (2024)Optical Data Center Networking: A Comprehensive Review on Traffic, Switching, Bandwidth Allocation, and ChallengesIEEE Access10.1109/ACCESS.2024.351321412(186413-186444)Online publication date: 2024
      • (2024)Fault-tolerant unicast paths constructive algorithms in a family of recursive networksJournal of the Chinese Institute of Engineers10.1080/02533839.2024.230823547:3(265-272)Online publication date: 7-Feb-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media