Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3651890.3672237acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Transferable Neural WAN TE for Changing Topologies

Published: 04 August 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Recently, researchers have proposed ML-driven traffic engineering (TE) schemes where a neural network model is used to produce TE decisions in lieu of conventional optimization solvers. Unfortunately existing ML-based TE schemes are not explicitly designed to be robust to topology changes that may occur due to WAN evolution, failures or planned maintenance. In this paper, we present HARP, a neural model for TE explicitly capable of handling variations in topology including those not observed in training. HARP is designed with two principles in mind: (i) ensure invariances to natural input transformations (e.g., permutations of node ids, tunnel reordering); and (ii) align neural architecture to the optimization model. Evaluations on a multi-week dataset of a large private WAN show HARP achieves an MLU at most 11% higher than optimal over 98% of the time despite encountering significantly different topologies in testing relative to training data. Further, comparisons with state-of-the-art ML-based TE schemes indicate the importance of the mechanisms introduced by HARP to handle topology variability. Finally, when predicted traffic matrices are provided, HARP outperforms classic optimization solvers achieving a median reduction in MLU of 5 to 10% on the true traffic matrix.

    References

    [1]
    Abilene traffic matrices. http://www.cs.utexas.edu/~yzhang/research/AbileneTM/.
    [2]
    Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. Contracting wide-area network topologies to solve flow problems quickly. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 175--200. USENIX Association, April 2021.
    [3]
    David Applegate and Edith Cohen. Making intra-domain routing robust to changing and uncertain traffic demands: Understanding fundamental tradeoffs. In Proceedings of ACM SIGCOMM, pages 313--324, 2003.
    [4]
    David L. Applegate, Mateo D'iaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O'Donoghue, and Warren Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. In Neural Information Processing Systems, 2021.
    [5]
    Beatrice Bevilacqua, Kyriacos Nikiforou, Borja Ibarz, Ioana Bica, Michela Paganini, Charles Blundell, Jovana Mitrovic, and Petar Veličković. Neural algorithmic reasoning with causal regularisation. ICML, 2023.
    [6]
    Beatrice Bevilacqua, Yangze Zhou, and Bruno Ribeiro. Size-invariant graph representations for graph classification extrapolations. In International Conference on Machine Learning. PMLR, 2021.
    [7]
    Jeremy Bogle, Nikhil Bhatia, Manya Ghobadi, Ishai Menache, Nikolaj Bjorner, Asaf Valadarsky, and Michael Schapira. Teavar: Striking the right utilization-availability balance in wan traffic engineering. In Proceedings of ACM SIGCOMM, 2019.
    [8]
    Yiyang Chang, Chuan Jiang, Ashish Chandra, Sanjay Rao, and Mohit Tawarmalani. Lancet: Better network resilience by designing for pruned failure sets. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3:1--26, 12 2019.
    [9]
    Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. How many random seeds? statistical power analysis in deep reinforcement learning experiments. arXiv preprint arXiv:1806.08295, 2018.
    [10]
    E. Danna, S. Mandal, and A. Singh. A practical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering. In Proceedings of IEEE INFOCOM, pages 846--854, 2012.
    [11]
    Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318--30332, 2022.
    [12]
    Google Developers. Choice of solvers and algorithms. https://developers.google.com/optimization/lp/lp_advanced#choice_of_solvers_and_algorithms/, 2024.
    [13]
    Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. https://github.com/pyg-team/pytorch_geometric, 05 2019.
    [14]
    Bernard Fortz and Mikkel Thorup. Robust optimization of OSPF/IS-IS weights. In Proceedings of International Network Optimization Conference, pages 225--230, 2003.
    [15]
    Nan Geng, Mingwei Xu, Yuan Yang, Chenyi Liu, Jiahai Yang, Qi Li, and Shize Zhang. Distributed and adaptive traffic engineering with deep reinforcement learning. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pages 1--10, 2021.
    [16]
    A. Ghosh, Sangtae Ha, E. Crabbe, and J. Rexford. Scalable multi-class traffic management in data center backbone networks. IEEE Journal on Selected Areas in Communications, 31:2673--2684, 2013.
    [17]
    Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of ACM SIGCOMM, pages 350--361, 2011.
    [18]
    Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. Evolve or die: High-availability design principles drawn from googles network infrastructure. In Proceedings of ACM SIGCOMM, pages 58--72, 2016.
    [19]
    Michael Gschwind, Driss Guessous, and Christian Puhrsch. Accelerated pytorch 2 transformers. https://pytorch.org/blog/accelerated-pytorch-2/, March 2023.
    [20]
    Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
    [21]
    Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
    [22]
    Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters, 2019.
    [23]
    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
    [24]
    Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. Achieving high utilization with software-driven wan. In Proceedings of ACM SIGCOMM, pages 15--26, 2013.
    [25]
    Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, Steve Padgett, Faro Rabe, Saikat Ray, Malveeka Tewari, Matt Tierney, Monika Zahn, Jonathan Zolla, Joon Ong, and Amin Vahdat. B4 and after: Managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined wan. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 74--87, 2018.
    [26]
    Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of ACM SIGCOMM, pages 3--14, 2013.
    [27]
    Chuan Jiang, Zixuan Li, Sanjay Rao, and Mohit Tawarmalani. Flexile: Meeting bandwidth objectives almost always. In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT '22, page 110--125, New York, NY, USA, 2022. Association for Computing Machinery.
    [28]
    Chuan Jiang, Sanjay Rao, and Mohit Tawarmalani. Pcf: Provably resilient flexible routing. In Proceedings of ACM SIGCOMM, page 139--153, 2020.
    [29]
    Grigorios Kakkavas, Michail Kalntis, Vasileios Karyotis, and Symeon Papavassiliou. Future network traffic matrix synthesis and estimation based on deep generative models. In 2021 International Conference on Computer Communications and Networks (ICCCN), pages 1--8, 2021.
    [30]
    Srikanth Kandula, Dina Katabi, Bruce Davie, and Anna Charny. Walking the tightrope: responsive yet stable traffic engineering. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '05, page 253--264, New York, NY, USA, 2005. Association for Computing Machinery.
    [31]
    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017.
    [32]
    Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
    [33]
    Simon Knight, Hung Nguyen, Nickolas Falkner, Rhys Bowden, and Matthew Roughan. The internet topology zoo. IEEE Journal on Selected Areas in Communications, 29:1765 -- 1775, October 2011.
    [34]
    Alok Kumar, Sushant Jain, Uday Naik, Nikhil Kasinadhuni, Enrique Cauich Zermeno, C. Stephen Gunn, Jing Ai, Björn Carlin, Mihai Amarandei-Stavila, Mathieu Robin, Aspi Siganporia, Stephen Stuart, and Amin Vahdat. Bwe: Flexible, hierarchical bandwidth allocation for wan distributed computing. In Proceedings of ACM SIGCOMM, 2015.
    [35]
    Praveen Kumar, Yang Yuan, Chris Yu, Nate Foster, Robert Kleinberg, Petr Lapukhov, Chiun Lin Lim, and Robert Soulé. Semi-oblivious traffic engineering: The road not taken. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 157--170, 2018.
    [36]
    Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3744--3753. PMLR, 09--15 Jun 2019.
    [37]
    Hongqiang Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, and David Gelernter. Traffic engineering with forward fault correction. In Proceedings of ACM SIGCOMM, pages 527--538, 2014.
    [38]
    Libin Liu, Li Chen, Hong Xu, and Hua Shao. Automated traffic engineering in sdwan: Beyond reinforcement learning. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 430--435, 2020.
    [39]
    T. Mallick, M. Kiran, B. Mohammed, and P. Balaprakash. Dynamic graph neural network for traffic forecasting in wide area networks. In 2020 IEEE International Conference on Big Data (Big Data), pages 1--10, Los Alamitos, CA, USA, dec 2020. IEEE Computer Society.
    [40]
    Athina Markopoulou, Gianluca Iannaccone, Supratik Bhattacharyya, Chen-Nee Chuah, Yashar Ganjali, and Christophe Diot. Characterization of failures in an operational ip backbone network. IEEE/ACM Trans. Netw., 16(4):749--762, 2008.
    [41]
    Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.
    [42]
    Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602--4609, 2019.
    [43]
    Pooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, and Himanshu Raj. Solving Max-Min fair resource allocations quickly on large graphs. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1937--1958, Santa Clara, CA, April 2024. USENIX Association.
    [44]
    Laisen Nie, Dingde Jiang, Lei Guo, and Shui Yu. Traffic matrix prediction and estimation based on deep learning in large-scale ip backbone networks. Journal of Network and Computer Applications, 76:16--22, 2016.
    [45]
    S. Orlowski, M. Pióro, A. Tomaszewski, and R. Wessäly. SNDlib 1.0--Survivable Network Design Library. In Proceedings of the 3rd International Network Optimization Conference (INOC 2007), Spa, Belgium, April 2007. http://sndlib.zib.de, extended version accepted in Networks, 2009.
    [46]
    S. Orlowski, M. Pióro, A. Tomaszewski, and R. Wessäly. SNDlib 1.0--Survivable Network Design Library. Networks, 55(3):276--286, 2010.
    [47]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
    [48]
    Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, and Aviv Tamar. DOTE: Rethinking (predictive) WAN traffic engineering. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1557--1581, Boston, MA, April 2023. USENIX Association.
    [49]
    Michal Pióro and Deepankar Medhi. Routing, Flow, and Capacity Design in Communication and Computer Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.
    [50]
    Rahul Potharaju and Navendu Jain. When the network crumbles: An empirical study of cloud network failures and their impact on services. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 15:1--15:17, 2013.
    [51]
    Krzysztof Rusek, José Suárez-Varela, Albert Mestres, Pere Barlet-Ros, and Albert Cabellos-Aparicio. Unveiling the potential of graph neural networks for network modeling and optimization in sdn. In Proceedings of the 2019 ACM Symposium on SDN Research, SOSR '19, page 140--151, New York, NY, USA, 2019. Association for Computing Machinery.
    [52]
    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61--80, 2008.
    [53]
    Daniel Turner, Kirill Levchenko, Alex C. Snoeren, and Stefan Savage. California fault lines: Understanding the causes and impact of network failures. In Proceedings of the ACM SIGCOMM 2010 Conference, pages 315--326, 2010.
    [54]
    Steve Uhlig, Bruno Quoitin, Jean Lepropre, and Simon Balon. Providing public intradomain traffic matrices to the research community. SIGCOMM Comput. Commun. Rev., 36(1):83--86, jan 2006.
    [55]
    Asaf Valadarsky, Michael Schapira, Dafna Shahaf, and Aviv Tamar. Learning to route. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, HotNets '17, page 185--191, New York, NY, USA, 2017. Association for Computing Machinery.
    [56]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
    [57]
    Petar Velivckovi'c, Adrià Puigdomènech Badia, David Budden, Razvan Pascanu, Andrea Banino, Mikhail Dashevskiy, Raia Hadsell, and Charles Blundell. The clrs algorithmic reasoning benchmark. In International Conference on Machine Learning, 2022.
    [58]
    Ye Wang, Hao Wang, Ajay Mahimkar, Richard Alimi, Yin Zhang, Lili Qiu, and Yang Richard Yang. R3: Resilient routing reconfiguration. In Proceedings of ACM SIGCOMM, pages 291--302, 2010.
    [59]
    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ICLR, 2019.
    [60]
    Shenghe Xu, Murali Kodialam, T. V. Lakshman, and Shivendra S. Panwar. Learning based methods for traffic matrix estimation from link measurements. IEEE Open Journal of the Communications Society, 2:488--499, 2021.
    [61]
    Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, and Minlan Yu. Teal: Learning-accelerated optimization of wan traffic engineering. In Proceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM '23, page 378--393, New York, NY, USA, 2023. Association for Computing Machinery.
    [62]
    Zhiyuan Xu, Jian Tang, Jingsong Meng, Weiyi Zhang, Yanzhi Wang, Chi Harold Liu, and Dejun Yang. Experience-driven networking: A deep reinforcement learning based approach. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, page 1871--1879. IEEE Press, 2018.
    [63]
    Junjie Zhang, Minghao Ye, Zehua Guo, Chen-Yu Yen, and H. Jonathan Chao. Cfr-rl: Traffic engineering with reinforcement learning in sdn. IEEE Journal on Selected Areas in Communications, 38(10):2249--2259, 2020.
    [64]
    Hang Zhu, Varun Gupta, Satyajeet Singh Ahuja, Yuandong Tian, Ying Zhang, and Xin Jin. Network planning with deep reinforcement learning. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, SIGCOMM '21, page 258--271, New York, NY, USA, 2021. Association for Computing Machinery.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ACM SIGCOMM '24: Proceedings of the ACM SIGCOMM 2024 Conference
    August 2024
    1033 pages
    ISBN:9798400706141
    DOI:10.1145/3651890
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2024

    Check for updates

    Author Tags

    1. traffic engineering
    2. wide-area networks
    3. network optimization
    4. machine learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ACM SIGCOMM '24
    Sponsor:
    ACM SIGCOMM '24: ACM SIGCOMM 2024 Conference
    August 4 - 8, 2024
    NSW, Sydney, Australia

    Acceptance Rates

    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media