Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3544216.3544265acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking

Published: 22 August 2022 Publication History

Abstract

We present a decade of evolution and production experience with Jupiter datacenter network fabrics. In this period Jupiter has delivered 5x higher speed and capacity, 30% reduction in capex, 41% reduction in power, incremental deployment and technology refresh all while serving live production traffic. A key enabler for these improvements is evolving Jupiter from a Clos to a direct-connect topology among the machine aggregation blocks. Critical architectural changes for this include: A datacenter interconnection layer employing Micro-Electro-Mechanical Systems (MEMS) based Optical Circuit Switches (OCSes) to enable dynamic topology reconfiguration, centralized Software-Defined Networking (SDN) control for traffic engineering, and automated network operations for incremental capacity delivery and topology engineering. We show that the combination of traffic and topology engineering on direct-connect fabrics achieves similar throughput as Clos fabrics for our production traffic patterns. We also optimize for path lengths: 60% of the traffic takes direct path from source to destination aggregation blocks, while the remaining transits one additional block, achieving an average block-level path length of 1.4 in our fleet today. OCS also achieves 3x faster fabric reconfiguration compared to pre-evolution Clos fabrics that used a patch panel based interconnect.

Supplementary Material

PDF File (p66-poutievski-supp.pdf)
Supplemental material.

References

[1]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. SIGCOMM Comput. Commun. Rev. 38, 4 (August 2008), 63--74.
[2]
Alexey Andreyev, Xu Wang, and Alex Eckert. 2019. Reinventing Facebook's data center network. https://engineering.fb.com/2019/03/14/data-center-engineering/f16-minipack/. Facebook Engineering (2019).
[3]
David Applegate, Lee Breslau, and Edith Cohen. 2004. Coping with Network Failures: Routing Strategies for Optimal Demand Oblivious Restoration. In Proc. ACM SIGMETRICS.
[4]
David Applegate and Edith Cohen. 2003. Making Intra-domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs. In Proc. ACM SIGCOMM.
[5]
Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond Optical Switching. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '20).
[6]
Maciej Besta and Torsten Hoefler. 2014. Slim fly: A cost effective low-diameter network topology. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 348--359.
[7]
Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Murphy. 2016. Site Reliability Engineering. https://sre.google/books/. Google Engineering (2016).
[8]
Peirui Cao, Shizhen Zhao, Min Yee Teh, Yunzhuo and Xinbing Wang. 2021. TROD: Evolving From Electrical Data Center to Optical Data Center. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1--11.
[9]
Kai Chen, Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, Yueping Zhang, Xitao Wen, and Yan Chen. 2013. OSA: An optical switching architecture for data center networks with unprecedented flexibility. IEEE/ACM Transactions on Networking 22, 2 (2013), 498--511.
[10]
William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.
[11]
N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat. 2010. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers. In Proc. ACM SIGCOMM.
[12]
Andrew D Ferguson, Steve Gribble, Chi-Yao Hong, Charles Edwin Killian, Waqar Mohsin, Henrik Muehe, Joon Ong, Leon Poutievski, Arjun Singh, Lorenzo Vicisano, et al. 2021. Orion: Google's Software-Defined Networking Control Plane. In NSDI. 83--98.
[13]
Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper. 2016. Projector: Agile reconfigurable data center interconnect. In Proceedings of the 2016 ACM SIGCOMM Conference. 216--229.
[14]
Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A Scalable and Flexible Data Center Network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM '09). Association for Computing Machinery, New York, NY, USA, 51--62.
[15]
Chen Griner, Johannes Zerwas, Andreas Blenk, Manya Ghobadi, Stefan Schmid, and Chen Avin. 2021. Cerberus: The Power of Choices in Datacenter Topology Design-A Throughput Perspective. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 3 (2021), 1--33.
[16]
N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer. 2014. FireFly: A Reconfigurable Wireless Data Center Fabric Using Free-space Optics. In Proc. ACM SIGCOMM.
[17]
S. A. Jyothi, A. Singla, P. B. Godfrey, and A. Kolla. 2016. Measuring and Understanding Throughput of Network Topologies. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[18]
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In 2008 International Symposium on Computer Architecture. IEEE, 77--88.
[19]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 514--528.
[20]
Praveen Kumar, Yang Yuan, Chris Yu, Nate Foster, Robert Kleinberg, Petr Lapukhov, ChiunLin Lim, and Robert Soule. 2018. Semi-Oblivious Traffic Engineering: The Road Not Taken. In Proc. NSDI.
[21]
Weiqiang Li, Rui Wang, and Jianan Zhang. 2022. Configuring data center network wiring. US Patent 11,223,527.
[22]
He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan, Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, and Alex C. Snoeren. 2015. Scheduling Techniques for Hybrid Circuit/Packet Networks. In Proc. ACM CoNEXT.
[23]
Hong Liu, Ryohei Urata, Xiang Zhou, and Amin Vahdat. 2020. Evolving Requirements and Trends in Datacenter Networks. Springer handbook of optical networks.
[24]
Vincent Liu, Daniel Halperin, Arvind Krishnamurthy, and Thomas Anderson. 2013. F10: A Fault-Tolerant Engineered Network. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX Association, Lombard, IL, 399--412. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/liuvincent
[25]
Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008 OpenFlow: Enabling Innovation in Campus Networks ACM Computer Communication Review 38 (2008), 69--74. Issue 2.
[26]
William M. Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C. Snoeren, and George Porter. 2020. Expanding across time to deliver bandwidth efficiency and low latency. In Proc. NSDI.
[27]
William M Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C Snoeren, and George Porter. 2017. Rotornet: A scalable, low-complexity, optical datacenter network. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 267--280.
[28]
Jeffrey C. Mogul, Drago Goricanec, Martin Pool, Anees Shaikh, Douglas Turk, Bikash Koley, and Xiaoxue Zhao. 2020. Experiences with Modeling Network Topologies at Multiple Levels of Abstraction. In 17th Symposium on Networked Systems Design and Implementation (NSDI). https://www.usenix.org/conference/nsdi20/presentation/mogul
[29]
George Porter, Richard D. Strong, Nathan Farrington, Alex Forencich, Pang-Chen Sun, Tajana Rosing, Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2013. Integrating microsecond circuit switching into the data center. In Proc. ACM SIGCOMM.
[30]
Matthew Roughan, Mikkel Thorup, and Yin Zhang. 2003. Traffic engineering with estimated traffic matrices. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement. 248--258.
[31]
R Ryf, J Kim, JP Hickey, A Gnauck, D Carr, F Pardo, C Bolle, R Frahm, N Basavanhally, C Yoh, et al. 2001. 1296-port MEMS transparent optical crossconnect with 2.07 petabit/s switch capacity. In Optical Fiber Communication Conference. Optical Society of America, PD28.
[32]
Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. 2017. Dragonfly+: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB). IEEE, 1--8.
[33]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Hanying Liu, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Holzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM '15.
[34]
Rachee Singh, Nikolaj Bjorner, Sharon Shoham, Yawei Yin, John Arnold, and Jamie Gaudette. 2021. Cost-effective capacity provisioning in wide area networks with Shoofly. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 534--546.
[35]
Rachee Singh, Manya Ghobadi, Klaus-Tycho Foerster, Mark Filer, and Phillipa Gill. 2018. RADWAN: rate adaptive wide area network. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 547--560.
[36]
Ankit Singla, P Brighten Godfrey, and Alexandra Kolla. 2014. High throughput data center topology design. In 11th USENIX Symposium on Networked Systems Design and Implementation NSDI 14). 29--41.
[37]
A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. 2012. Jellyfish: Networking Data Centers Randomly. In Proc. USENIX NSDI.
[38]
Martin Suchara, Dahai Xu, Robert Doverspike, David Johnson, and Jennifer Rexford. 2011. Network Architecture for Joint Failure Recovery and Traffic Engineering. In Proc. ACM SIGMETRICS.
[39]
Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, and Hongyi Zeng. 2016. Robotron: Top-down Network Management at Facebook Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). Association for Computing Machinery, New York, NY, USA, 426--439.
[40]
Min Yee Teh, Shizhen Zhao, Peirui Cao, and Keren Bergman. 2020. COUDER: robust topology engineering for optical circuit switched data center networks. arXivpreprint arXiv:2010.00090 (2020).
[41]
Ryohei Urata, Hong Liu, Kevin Yasumura, Erji Mao, Jill Berger, Xiang Zhou, Cedric Lam, Roy Bannon, Darren Hutchinson, Daniel Nelson, Leon Poutievski, Arjun Singh, Joon Ong, and Amin Vahdat. 2022. Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale. arXiv.
[42]
Ryohei Urata, Hong Liu, Xiang Zhou, and Amin Vahdat. 2017. Datacenter interconnect and networking: From evolution to holistic revolution. In Proceedings of Optical Fiber Communication (OFC) 2017 Conference.
[43]
Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina Papagiannaki, TS Eugene Ng, Michael Kozuch, and Michael Ryan. 2010. c-Through: Part-time optics in data centers. In Proceedings of the ACM SIGCOMM 2010 Conference. 327--338.
[44]
Hao Wang, Haiyong Xie, Lili Qiu, Yang Richard Yang, Yin Zhang, and Albert Greenberg. 2006. COPE: Traffic Engineering in Dynamic Networks. In Proc. ACM SIGCOMM.
[45]
Mingyang Zhang, Radhika Niranjan Mysore, Sucha Supittayapornpong, and Ramesh Govindan. 2019. Understanding Lifecycle Management Complexity of Datacenter Topologies. In Proc. NSDI.
[46]
Mingyang Zhang, Jianan Zhang, Rui Wang, Ramesh Govindan, Jeffrey C. Mogul, and Amin Vahdat. 2021. Gemini: Practical Reconfigurable Datacenter Networks with Topology and Traffic Engineering. arXiv:cs.NI/2110.08374
[47]
Y. Zhang and Z. Ge. 2005. Finding critical traffic matrices. In 2005 International Conference on Dependable Systems and Networks (DSN'05).
[48]
R. Zhang-Shen and N. McKeown. 2005. Designing a Predictable Internet Backbone with Valiant Load-balancing. In Proc. IEEE IWQoS.
[49]
Shizhen Zhao, Rui Wang, Junlan Zhou, Joon Ong, Jeffrey C Mogul, and Amin Vahdat. 2019. Minimal rewiring: Efficient live expansion for clos data center networks. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 221--234.
[50]
Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, Leon Poutievski, Arjun Singh, and Amin Vahdat. 2014. WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14). Association for Computing Machinery, New York, NY, USA, Article 5, 14 pages.

Cited By

View all
  • (2024)Numerical evaluation of bandwidth and optical loss in InP-organic hybrid optical modulator with doping optimizationJapanese Journal of Applied Physics10.35848/1347-4065/ad189b63:2(02SP52)Online publication date: 17-Jan-2024
  • (2024)Congestion Control Mechanism Based on Backpressure Feedback in Data Center NetworksFuture Internet10.3390/fi1604013116:4(131)Online publication date: 15-Apr-2024
  • (2024)Software-defined optical networking applications enabled by programmable integrated photonicsJournal of Optical Communications and Networking10.1364/JOCN.52150516:8(D10)Online publication date: 10-Jun-2024
  • Show More Cited By

Index Terms

  1. Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference
        August 2022
        858 pages
        ISBN:9781450394208
        DOI:10.1145/3544216
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 August 2022

        Check for updates

        Author Tags

        1. datacenter network
        2. optical circuit switches
        3. software-defined networking
        4. topology engineering
        5. traffic engineering

        Qualifiers

        • Research-article

        Conference

        SIGCOMM '22
        Sponsor:
        SIGCOMM '22: ACM SIGCOMM 2022 Conference
        August 22 - 26, 2022
        Amsterdam, Netherlands

        Acceptance Rates

        Overall Acceptance Rate 462 of 3,389 submissions, 14%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4,644
        • Downloads (Last 6 weeks)471
        Reflects downloads up to 26 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Numerical evaluation of bandwidth and optical loss in InP-organic hybrid optical modulator with doping optimizationJapanese Journal of Applied Physics10.35848/1347-4065/ad189b63:2(02SP52)Online publication date: 17-Jan-2024
        • (2024)Congestion Control Mechanism Based on Backpressure Feedback in Data Center NetworksFuture Internet10.3390/fi1604013116:4(131)Online publication date: 15-Apr-2024
        • (2024)Software-defined optical networking applications enabled by programmable integrated photonicsJournal of Optical Communications and Networking10.1364/JOCN.52150516:8(D10)Online publication date: 10-Jun-2024
        • (2024)Orchid: enhancing HPC interconnection networks through infrequent topology reconfigurationJournal of Optical Communications and Networking10.1364/JOCN.51603116:6(644)Online publication date: 21-May-2024
        • (2024)Fast and scalable all-optical network architecture for distributed deep learningJournal of Optical Communications and Networking10.1364/JOCN.51169616:3(342)Online publication date: 22-Feb-2024
        • (2024)Flexible silicon photonic architecture for accelerating distributed deep learningJournal of Optical Communications and Networking10.1364/JOCN.49737216:2(A157)Online publication date: 9-Jan-2024
        • (2024)Investigating Data Center Network ProtocolsProceedings of the 2024 Applied Networking Research Workshop10.1145/3673422.3674897(91-93)Online publication date: 23-Jul-2024
        • (2024)POSTER: Opportunistic Credit-Based Transport for Reconfigurable Data Center Networks with TidalProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673714(4-6)Online publication date: 4-Aug-2024
        • (2024)DEMO: An Open Research Framework for Optical Data Center NetworksProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673712(86-88)Online publication date: 4-Aug-2024
        • (2024)Realizing RotorNet: Toward Practical Microsecond Scale Optical NetworkingProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672273(392-414)Online publication date: 4-Aug-2024
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media