Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380012acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

ResQueue: A Smarter Datacenter Flow Scheduler

Published: 20 April 2020 Publication History

Abstract

Datacenters host a mix of applications: foreground applications perform distributed lookups in order to service user queries and background applications perform batch processing tasks such as data reorganization, backup, and replication. While background flows produce the most load, foreground applications produce the most number of flows. Because packets from both types of applications compete at switches for network bandwidth, the performance of applications is sensitive to scheduling mechanisms. Existing schedulers use flow size to distinguish critical flows from non-critical flows. However, recent studies on datacenter workloads reveal that most flows are small (e.g., most flows consist of only a handful number of packets). In light of recent findings, we make the key observation that because most flows are small, flow size is not sufficient to distinguish critical flows from non-critical flows and therefore existing flow schedulers do not achieve the desired prioritization. In this paper, we introduce ResQueue, which uses a combination of flow size and packet history to calculate the priority of each flow. Our evaluation shows that ResQueue improves tail flow completion times of short flows by up to 60% over the state-of-the-art flow scheduling mechanisms.

References

[1]
Mohammad Alizadeh 2010. Data center TCP (DCTCP)(SIGCOMM ’10).
[2]
Mohammad Alizadeh 2013. pFabric: Minimal Near-optimal Datacenter Transport(SIGCOMM ’13). ACM.
[3]
Mohammad Alizadeh 2014. CONGA: Distributed Congestion-aware Load Balancing for Datacenters(SIGCOMM ’14).
[4]
Hamidrezae Almasi, Hamed Rezaei, Muhammad Usama Chaudhry, and Balajee Vamanan. 2018. Pulser: Fast Congestion Response using Explicit Incast Notifications for Datacenter Networks. arXiv preprint arXiv:1809.09751(2018).
[5]
Behnaz Arzani, Selim Ciraci, Luiz Chamon, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Boon Thau Loo, and Geoff Outhred. 2018. 007: Democratically finding the cause of packet drops. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 419–435.
[6]
Wei Bai 2015. Information-Agnostic Flow Scheduling for Commodity Data Centers. In NSDI.
[7]
Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23, 2 (March 2003), 22–28.
[8]
Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines(1st ed.). Morgan and Claypool Publishers.
[9]
Li Chen, Kai Chen, Wei Bai, and Mohammad Alizadeh. 2016. Scheduling Mix-flows in Commodity Datacenters with Karuna(SIGCOMM ’16).
[10]
Li Chen, Shuihai Hu, Kai Chen, Haitao Wu, and Danny HK Tsang. 2013. Towards minimal-delay deadline-driven data center TCP. In Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks. ACM, 21.
[11]
Yanpei Chen 2009. Understanding TCP incast throughput collapse in datacenter networks. In Proceedings of the 1st ACM workshop on Research on enterprise networking. ACM.
[12]
Inho Cho, Keon Jang, and Dongsu Han. 2017. Credit-Scheduled Delay-Bounded Congestion Control for Datacenters. In Proceedings of SIGCOMM. 239–252.
[13]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (Feb. 2013).
[14]
Peter X. Gao 2015. pHost: Distributed Near-optimal Datacenter Transport over Commodity Network Fabric. In Proceedings of CoNEXT. 1:1–1:12.
[15]
Mark Handley 2017. Re-architecting Datacenter Networks and Stacks for Low Latency and High Performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication(SIGCOMM ’17). ACM.
[16]
Keqiang He 2015. Presto: Edge-based Load Balancing for Fast Datacenter Networks(SIGCOMM ’15).
[17]
Chi-Yao Hong, Matthew Caesar, and P. Brighten Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication(SIGCOMM ’12). ACM, 127–138. https://doi.org/10.1145/2342356.2342389
[18]
Abdul Kabbani, Balajee Vamanan, Jahangir Hasan, and Fabien Duchene. 2014. FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies(CoNEXT ’14). ACM, New York, NY, USA, 149–160.
[19]
Naga Katta 2016. HULA: Scalable Load Balancing Using Programmable Data Planes(SOSR ’16). ACM.
[20]
Chung Laung Liu and James W Layland. 1973. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM) 20, 1 (1973), 46–61.
[21]
Mojtaba Malekpourshahraki, Brent Stephens, and Balajee Vamanan. 2019. Ether: Providing both Interactive Service and Fairness in Multi-Tenant Datacenters. In Proceedings of the 3rd Asia-Pacific Workshop on Networking 2019. ACM, 50–56.
[22]
Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Universal Packet Scheduling. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation(NSDI’16). 501–521.
[23]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter(SIGCOMM ’15). ACM.
[24]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 313–326.
[25]
Ali Munir, Ghufran Baig, Syed M. Irteza, Ihsan A. Qazi, Alex X. Liu, and Fahad R. Dogar. 2014. Friends, Not Foes: Synthesizing Existing Transport Strategies for Data Center Networks. In Proceedings of the 2014 ACM Conference on SIGCOMM(SIGCOMM ’14). ACM, New York, NY, USA, 491–502. https://doi.org/10.1145/2619239.2626305
[26]
Hamed Rezaei, Muhammad Usama Chaudhry, Hamidreza Almasi, and Balajee Vamanan. 2019. ICON: Incast Congestion Control using Packet Pacing in Datacenter Networks. In 2019 11th International Conference on Communication Systems & Networks (COMSNETS). IEEE, 125–132.
[27]
Hamed Rezaei, Mojtaba Malekpourshahraki, and Balajee Vamanan. 2018. Slytherin: Dynamic, network-assisted prioritization of tail packets in datacenter networks(ICCCN’18). IEEE.
[28]
George F Riley and Thomas R Henderson. 2010. The ns-3 network simulator. In Modeling and tools for network simulation. Springer, 15–34.
[29]
Arjun Roy 2015. Inside the social network’s (datacenter) network. In ACM SIGCOMM Computer Communication Review. ACM.
[30]
Vojislav Ðukić, Sangeetha Abdu Jyothi, Bojan Karlaš, Muhsen Owaida, Ce Zhang, and Ankit Singla. 2019. Is advance knowledge of flow sizes a plausible assumption?. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 565–580.
[31]
Balajee Vamanan, Jahangir Hasan, and T.N. Vijaykumar. 2012. Deadline-aware Datacenter TCP (D2TCP). In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication(SIGCOMM ’12).
[32]
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 conference(SIGCOMM ’11). ACM, New York, NY, USA, 50–61. https://doi.org/10.1145/2018436.2018443
[33]
Qiao Zhang, Vincent Liu, Hongyi Zeng, and Arvind Krishnamurthy. 2017. High-resolution measurement of data center microbursts(IMC’17). ACM.
[34]
Yibo Zhu 2015. Congestion Control for Large-Scale RDMA Deployments. In Proceedings of SIGCOMM. 523–536.

Cited By

View all
  • (2024)Key Flow First Prioritized Flow Scheduling Strategy in Multi-Tenant Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336414921:3(3264-3277)Online publication date: Jun-2024
  • (2021)A Buffer Management Algorithm Based on Dynamic Marking Threshold to Restrain MicroBurst in Data Center NetworkInformation10.3390/info1209036912:9(369)Online publication date: 12-Sep-2021
  • (2021)Smartbuf: An Agile Memory Management for Shared-Memory Switches in Datacenters2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS)10.1109/IWQOS52092.2021.9521311(1-7)Online publication date: 25-Jun-2021

Index Terms

  1. ResQueue: A Smarter Datacenter Flow Scheduler
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Proceedings of The Web Conference 2020
        April 2020
        3143 pages
        ISBN:9781450370233
        DOI:10.1145/3366423
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Congestion Control
        2. Datacenter Networks
        3. Flow Scheduling

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)13
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Key Flow First Prioritized Flow Scheduling Strategy in Multi-Tenant Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336414921:3(3264-3277)Online publication date: Jun-2024
        • (2021)A Buffer Management Algorithm Based on Dynamic Marking Threshold to Restrain MicroBurst in Data Center NetworkInformation10.3390/info1209036912:9(369)Online publication date: 12-Sep-2021
        • (2021)Smartbuf: An Agile Memory Management for Shared-Memory Switches in Datacenters2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS)10.1109/IWQOS52092.2021.9521311(1-7)Online publication date: 25-Jun-2021

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media