Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627703.3650060acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Draconis: Network-Accelerated Scheduling for Microsecond-Scale Workloads

Published: 22 April 2024 Publication History

Abstract

We present Draconis, a novel scheduler for workloads in the range of tens to hundreds of microseconds. Draconis challenges the popular belief that programmable switches cannot house the complex data structures, such as queues, needed to support an in-network scheduler. Using programmable switches, Draconis achieves the low scheduling tail latency and high throughput needed to support these microsecond-scale workloads on large clusters. Furthermore, Draconis supports a wide range of complex scheduling policies, including locality-aware scheduling, priority-based scheduling, and resource-based scheduling.
Draconis reduces the 99th percentile scheduling latencies by 3×-200× when compared to state-of-the-art software-based and network-accelerated schedulers, on a range of synthetic workloads. Our evaluation also demonstrates that Draconis has 52× higher throughput than server-based scheduling systems.

References

[1]
D. Meisner, C. M. Sadler, L. A. Barroso, W. Weber, and T. F. Wenisch. Power management of online data-intensive services. 2011 38th Annual International Symposium on Computer Architecture (ISCA), pages 319--330, 2011.
[2]
Xinhui Tian, Rui Han, Lei Wang, Gang Lu, and Jianfeng Zhan. Latency critical big data computing in finance. The Journal of Finance and Data Science, 1(1):33--41, 2015.
[3]
Ciamac Moallemi and Mehmet Saglam. OR forum---the cost of latency in high-frequency trading. Operations Research, 61(5):1070--1086, 2013.
[4]
Stephen F. Elston and Melinda J. Wilson. Big data and smart trading. https://www.risktechforum.com/media/download/61681/download.
[5]
Boming Huang, Yuxiang Huan, Li Da Xu, Lirong Zheng, and Zhuo Zou. Automated trading systems statistical and machine learning methods and hardware implementation: a survey. Enterprise Information Systems, 13(1):132--144, 2019.
[6]
Jeffrey Dean and Luiz André Barroso. The tail at scale. Commun. ACM, 56(2):74--80, 2013.
[7]
Ramana Rao Kompella, Kirill Levchenko, Alex C. Snoeren, and George Varghese. Every microsecond counts: Tracking fine-grain latencies with a lossy difference aggregator. SIGCOMM Comput. Commun. Rev., 39(4):255--266, aug 2009.
[8]
Kay Ousterhout, Aurojit Panda, Joshua Rosen, et al. The case for tiny tasks in compute clusters. Proceedings of the 14th Workshop on Hot Topics in Operating Systems, 2013.
[9]
Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, Athanasios Xygkis, and Igor Zablotchi. Microsecond consensus for microsecond applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 599--616. USENIX Association, November 2020.
[10]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. Shinjuku: Preemptive scheduling for usecond-scale tail latency. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 345--360, 2019.
[11]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. Shenango: Achieving high cpu efficiency for latency-sensitive datacenter workloads. Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation, pages 361--377, 2019.
[12]
Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, and Xin Jin. Racksched: A microsecond-scale scheduler for rack-scale computers. the Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020.
[13]
Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. Firmament: Fast, centralized cluster scheduling at scale. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016.
[14]
Sol Boucher, Anuj Kalia, David G. Andersen, and Michael Kaminsky. Putting the "micro" back in microservice. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 645--650, Boston, MA, July 2018. USENIX Association.
[15]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, page 18--32, New York, NY, USA, 2013.
[16]
W. Chen, A. Pi, S. Wang, and X. Zhou. Characterizing scheduling delay for low-latency data analytics workloads. IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 630--639, 2018.
[17]
Marios Kogias, George Prekas, Adrien Ghosn, Jonas Fietz, and Edouard Bugnion. R2p2: Making rpcs first-class datacenter citizens. 2019 USENIX Annual Technical Conference (ATC 19), 2, 2019.
[18]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 10:10, 2010.
[19]
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Sparrow: distributed low latency scheduling. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 69--84, 2013.
[20]
Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, and Minlan Yu. Hopper: Decentralized speculation-aware cluster scheduling at scale. Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 379--392, 2015.
[21]
Eric Boutin, Jaliya Ekanayake, Wei Lin, et al. Apollo: Scalable and coordinated scheduling for cloud-scale computing. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285--300, 2014.
[22]
Tofino world's fastest p4-programmable ethernet switch asics. Retrieved from https://www.barefootnetworks.com/products/brief-tofino/.
[23]
Mark Van der Boor, Sem C. Borst, Johan S. H. Van Leeuwaarden, and Debankur Mukherjee. Scalable load balancing in networked systems: A survey of recent advances. SIAM Review, 64(3):554--622, 2022.
[24]
Rishabh Iyer, Musa Unal, Marios Kogias, and George Candea. Achieving microsecond-scale tail latency efficiently with approximate optimal scheduling. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 466--481, New York, NY, USA, 2023. Association for Computing Machinery.
[25]
Adam Wierman and Bert Zwart. Is tail-optimal scheduling possible? Operations Research, 60(5):1249--1257, 2012.
[26]
Tofino-2 second-generation of world's fastest p4-programmable ethernet switch asics. Retrieved from https://www.barefootnetworks.com/products/brief-tofino-2/.
[27]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561--577, Carlsbad, CA, October 2018. USENIX Association.
[28]
Panagiotis D. Diamantoulakis, Vasileios M. Kapinas, and George K. Karagiannidis. Big data analytics for dynamic energy management in smart grids. Big Data Res., 2(3):94--101, 2015.
[29]
Dominik Scholz. A look at intel's dataplane development kit. 2014.
[30]
GitHub - UWASL/Draconis: Draconis: Network-Accelerated Scheduling for Microsecond-Scale Workloads --- github.com. https://github.com/UWASL/Draconis. [Accessed 16-02-2024].
[31]
Xin Zhe Khooi, Levente Csikor, Jialin Li, and Dinil Mon Divakaran. In-network applications: Beyond single switch pipelines. In 2021 IEEE 7th International Conference on Network Softwarization (NetSoft), pages 1--8, 2021.
[32]
Mellanox connectx 6 vpi product sheet. https://support.mellanox.com/s/productdetails/a2v50000000p8ReAAI/-connectx6-card.
[33]
Samer Al-Kiswany, Suli Yang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Nice: Network-integrated cluster-efficient storage. Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, pages 29--40, 2017.
[34]
Jialin Li, Ellis Michael, Naveen Kr Sharma, Adriana Szekeres, and Dan R. K. Ports. Just say no to paxos overhead: Replacing consensus with network ordering. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 467--483, 2016.
[35]
Xiaozhou Li, Raghav Sethi, Michael Kaminsky, David G. Andersen, and Michael J. Freedman. Be fast, cheap and in control with SwitchKV. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 31--44, Santa Clara, CA, March 2016. USENIX Association.
[36]
Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. Designing distributed systems using approximate synchrony in data center networks. 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 2015.
[37]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, pages 137--150, 2004.
[38]
Pat Bosshart, Dan Daly, Glen Gibb, et al. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review, 44(3):87--95, 2014.
[39]
P4. Retrieved from https://p4.org/.
[40]
Adam Belay, Andrea Bittau, Ali José Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. Dune: Safe user-level access to privileged CPU features. In Chandu Thekkath and Amin Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, pages 335--348. USENIX Association, 2012.
[41]
Kern build problem-issue #25-project-dune/dune. 2023. https://github.com/project-dune/dune/issues/25.
[42]
Sparrow git repository. 2013. Retrieved 2023 from https://github.com/radlab/sparrow.
[43]
John Wilkes. Google clusterdata 2011 traces. GitHub. Retrieved from https://github.com/google/cluster-data.
[44]
Diana Andreea Popescu. Technical report - latency-driven performance in data centres. Doctoral dissertation, University of Cambridge, 2019.
[45]
Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. Hawk: Hybrid datacenter scheduling. USENIX Annual Technical Conference (USENIX ATC 15), pages 499--510, 2015.
[46]
Konstantinos Karanasos, Sriram Rao, Carlo Curino, et al. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. USENIX Annual Technical Conference (USENIX ATC 15), pages 485--497, 2015.
[47]
Boduo Li, Yanlei Diao, and Prashant Shenoy. Supporting scalable analytics with latency constraints. Proc. VLDB Endow, 8(11):1166--1177, 2015.
[48]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 423--438, 2013.
[49]
Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. Drizzle: Fast and adaptable stream processing at scale. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, page 374--389, New York, NY, USA, 2017. Association for Computing Machinery.
[50]
Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert Soulé. Netpaxos: Consensus at network speed. Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research, pages 1--7, 2015.
[51]
Hatem Takruri, Ibrahim Kettaneh, Ahmed Alquraan, and Samer Al-Kiswany. Flair: Accelerating reads with consistency-aware network routing. 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 723--737, 2020.
[52]
Xin Jin, Xiaozhou Li, Haoyu Zhang, et al. Netcache: Balancing keyvalue stores with fast in-network caching. Proceedings of the 26th Symposium on Operating Systems Principles, pages 121--136, 2017.
[53]
Dan R. K. Ports and Jacob Nelson. When should the network be the computer? Proceedings of the Workshop on Hot Topics in Operating Systems, pages 209--215, 2019.
[54]
Amedeo Sapio, Ibrahim Abdelaziz, Abdulla Aldilaijan, Marco Canini, and Panos Kalnis. In-network computation is a dumb idea whose time has come. Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pages 150--156, 2017.
[55]
Craig Mustard, Fabian Ruffy, Anny Gakhokidze, Ivan Beschastnikh, and Alexandra Fedorova. Jumpgate: In-network processing as a service for data analytics. 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19), 2019.
[56]
Ibrahim Kettaneh, Sreeharsha Udayashankar, Ashraf Abdel-hadi, Robin Grosman, and Samer Al-Kiswany. Falcon: Low latency, network-accelerated scheduling. In Proceedings of the 3rd P4 Workshop in Europe, EuroP4'20, page 7--12, New York, NY, USA, 2020. Association for Computing Machinery.
[57]
Ilias Marinos, Robert N. M. Watson, and Mark Handley. Network stack specialization for performance. Proceedings of the 2014 ACM Conference on SIGCOMM, pages 175--186, 2014.
[58]
George Prekas, Marios Kogias, and Edouard Bugnion. Zygos: Achieving low tail latency for microsecond-scale networked tasks. Proceedings of the 26th Symposium on Operating Systems Principles, pages 325--341, 2017.
[59]
Sarah McClure, Amy Ousterhout, Scott Shenker, and Sylvia Ratnasamy. Efficient scheduling policies for Microsecond-Scale tasks. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 1--18, Renton, WA, April 2022. USENIX Association.
[60]
Draconis: Network-Accelerated Scheduling for Micro-Scale Workloads. https://zenodo.org/records/10688915.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 181
    Total Downloads
  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media