Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3477132.3483546acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open access

Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications

Published: 26 October 2021 Publication History

Abstract

Sharding is widely used to scale an application. Despite a decade of effort to build generic sharding frameworks that can be reused across different applications, the extent of their success remains unclear. We attempt to answer a fundamental question: what barriers prevent a sharding framework from getting adopted by the majority of sharded applications?
We analyze hundreds of sharded applications at Facebook and identify two major barriers: 1) lack of support for geo-distributed applications, which account for most of Facebook's applications, and 2) inability to maintain application availability during planned events such as software upgrades, which happen ≈1000 times more frequently than unplanned failures. A sharding framework that does not help applications to address these fundamental challenges is not sufficiently attractive for most applications to adopt it. Other adoption barriers include the burden of supporting many complex applications in a one-size-fit-all sharding framework and the difficulty in supporting sophisticated shard-placement requirements. Theoretically, a constraint solver can handle complex placement requirements, but in practice it is not scalable enough to perform near-realtime shard placement at a global scale.
We have overcome these adoption barriers in Facebook's sharding framework called Shard Manager. Currently, Shard Manager is used by hundreds of applications running on over one million machines, which account for about 54% of all sharded applications at Facebook.

References

[1]
Emile Aarts, Emile HL Aarts, and Jan Karel Lenstra. 2003. Local Search in Combinatorial Optimization. Princeton University Press.
[2]
Atul Adya, John Dunagan, and Alec Wolman. 2010. Centrifuge: Integrated Lease Management and Partitioning for Cloud Services. In NSDI, Vol. 10. 1--16.
[3]
Atul Adya, Robert Grandl, Daniel Myers, and Henry Qin. 2019. Fast Key-Value Stores: An Idea Whose Time Has Come and Gone. In Proceedings of the Workshop on Hot Topics in Operating Systems. 113--119.
[4]
Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, et al. 2016. Slicer: Auto-sharding for Datacenter Applications. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 739--753.
[5]
Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wolman, and Harbinder Bhogan. 2010. Volley: Automated Data Placement for Geo-Distributed Cloud Services. In 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 10). USENIX Association, San Jose, CA. https://www.usenix.org/conference/nsdi10-0/volley-automated-data-placement-geo-distributed-cloud-services
[6]
Muthukaruppan Annamalai, Kaushik Ravichandran, Harish Srinivas, Igor Zinkovsky, Luning Pan, Tony Savor, David Nagle, and Michael Stumm. 2018. Sharding the Shards: Managing Datastore Locality at Scale with Akkio. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 445--460.
[7]
Masoud Saeida Ardekani and Douglas B Terry. 2014. A Self-configurable Geo-replicated Cloud Storage System. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 367--381.
[8]
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a Needle in Haystack: Facebook's Photo Storage. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10). USENIX Association, Vancouver, BC. https://www.usenix.org/conference/osdi10/finding-needle-haystack-facebooks-photo-storage
[9]
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zheng-ping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation.
[10]
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 USENIX Annual Technical Conference. 49--60.
[11]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes. Queue 14, 1 (2016).
[12]
Sergey Bykov, Alan Geller, Gabriel Kliot, Jim Larus, Ravi Pandya, and Jorgen Thelin. 2011. Orleans: Cloud Computing for Everyone. In ACM Symposium on Cloud Computing (SOCC 2011).
[13]
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, and Raghu Ramakrishnan. 2019. Hydra: A Federated Resource Manager for Data-center Scale Analytics. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation.
[14]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 1--26.
[15]
Guoqiang Jerry Chen, Janet L Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei, Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, and Serhat Yilmaz. 2016. Realtime Data Processing at Facebook. In Proceedings of the 2016 International Conference on Management of Data. 1087--1098.
[16]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's Globally Distributed Database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
[17]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. In Proceedings of the 26th ACM Symposium on Operating Systems Principles.
[18]
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based Scheduling: If You're Late Don't Blame Us!. In Proceedings of the ACM Symposium on Cloud Computing. 1--14.
[19]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems.
[20]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems.
[21]
Jeffrey Dunn. 2016. Introducing FBLearner Flow: Facebook's AI backbone. https://engineering.fb.com/ml-applications/introducing-fblearner-flow-facebook-s-ai-backbone/.
[22]
Jeremy Fein. 2014. Building Mobile-First Infrastructure for Messenger. https://engineering.fb.com/2014/10/09/production-engineering/building-mobile-first-infrastructure-for-messenger/.
[23]
Fullmatix. 2014. https://github.com/kishoreg/fullmatix.
[24]
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. 2018. Medea: Scheduling of Long Running Applications in Shared Production Clusters. In Proceedings of the Thirteenth EuroSys Conference. 1--13.
[25]
Gizzard. 2019. https://github.com/uber-node/ringpop-node.
[26]
Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N.M. Watson, and Steven Hand. 2016. Firmament: Fast, Centralized Cluster Scheduling at Scale. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.
[27]
Kishore Gopalakrishna, Shi Lu, Zhen Zhang, Adam Silberstein, Kapil Surlaker, Ramesh Subramonian, and Bob Schulman. 2012. Untangling Cluster Management with Helix. In Proceedings of the Third ACM Symposium on Cloud Computing. 1--13.
[28]
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-resource Packing for Cluster Schedulers. ACM SIGCOMM Computer Communication Review 44, 4 (2014), 455--466.
[29]
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-resource Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 65--80.
[30]
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 81--97.
[31]
Ajay Gulati, Anne Holler, Minwen Ji, Ganesha Shanmuganathan, Carl Waldspurger, and Xiaoyun Zhu. 2012. VMware Distributed Resource Management: Design, Implementation, and Lessons Learned. VMware Technical Journal 1, 1 (2012), 45--64.
[32]
Ori Hadary, Luke Marshall, Ishai Menache, Abhisek Pan, Esaias E Greeff, David Dion, Star Dorminey, Shailesh Joshi, Yang Chen, Mark Russinovich, and Thomas Moscibroda. 2020. Protean: VM Allocation Service at Scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 845--861.
[33]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles.
[34]
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.
[35]
Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) (EuroSys '18). Association for Computing Machinery, New York, NY, USA, Article 33, 15 pages.
[36]
Manolis Karpathiotakis, Dino Wernli, and Milos Stojanovic. 2019. Scribe: Transporting Petabytes per Hour via a Distributed, Buffered Queueing System. https://engineering.fb.com/2019/10/07/data-infrastructure/scribe/.
[37]
Kubernetes. 2020. https://kubernetes.io/.
[38]
Leslie Lamport et al. 2001. Paxos Made Simple. ACM Sigact News 32, 4 (2001), 18--25.
[39]
Sangmin Lee, Rina Panigrahy, Vijayan Prabhakaran, Kunal Talwar, Udi Wieder, and Rama Ramasubramanian. 2011. Validating Heuristics for Virtual Machines Consolidation. Technical Report MSR-TR-2011-9.
[40]
Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguiça, and Rodrigo Rodrigues. 2012. Making Geo-replicated Systems Fast as Possible, Consistent when Necessary. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265--278.
[41]
Wyatt Lloyd, Michael J Freedman, Michael Kaminsky, and David G Andersen. 2013. Stronger Semantics for Low-latency Geo-replicated Storage. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 313--328.
[42]
Prince Mahajan, Srinath Setty, Sangmin Lee, Allen Clement, Lorenzo Alvisi, Mike Dahlin, and Michael Walfish. 2011. Depot: Cloud Storage with Minimal Trust. ACM Trans. Comput. Syst. 29, 4, Article 12 (Dec. 2011), 38 pages.
[43]
Mark Marchukov. 2017. Facebook's Distributed Data Store for Logs. https://engineering.fb.com/2017/08/31/core-data/logdevice-a-distributed-data-store-for-logs/.
[44]
MarginSimulator. 2018. https://github.com/Dishan006/MarginSimulator.
[45]
Sarang Masti. 2021. How We Built a General Purpose Key Value Store for Facebook with ZippyDB. https://engineering.fb.com/2021/08/06/core-data/zippydb/.
[46]
Yuan Mei, Luwei Cheng, Vanish Talwar, Michael Y. Levin, Gabriela Jacques-Silva, Nikhil Simha, Anirban Banerjee, Brian Smith, Tim Williamson, Serhat Yilmaz, Weitao Chen, and Guoqiang Jerry Chen. 2020. Turbine: Facebook's Service Management Platform for Stream Processing. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1591--1602.
[47]
Akshay Nanavati and Girish Joshi. 2021. FOQS: Scaling a Distributed Priority Queue. https://engineering.fb.com/2021/02/22/productionengineering/foqs-scaling-a-distributed-priority-queue/.
[48]
Andrew Newell, Dimitrios Skarlatos, Jingyuan Fan, Pavan Kumar, Maxim Khutornenko, Mayank Pundir, Yirui Zhang, Mingjun Zhang, Yuanlai Liu, Linh Le, Brendon Daugherty, Apurva Samudra, Prashasti Baid, James Kneeland, Igor Kabiljo, Dmitry Shchukin, Andre Rodrigues, Scott Michelson, Ben Christensen, Kaushik Veeraraghavan, and Chunqiang Tang. 2021. RAS: Continuously Optimized Region-Wide Datacenter Resource Allocation. In Proceedings of the 28th ACM Symposium on Operating Systems Principles.
[49]
Ruoming Pang, Ramon Caceres, Mike Burrows, Zhifeng Chen, Pratik Dave, Nathan Germer, Alexander Golynski, Kevin Graney, Nina Kang, Lea Kissner, Jeffrey L. Korn, Abhishek Parmar, Christopher D. Richards, and Mengzhi Wang. 2019. Zanzibar: Google's Consistent, Global Authorization System. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 33--46.
[50]
Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
[51]
Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, in-Memory Time Series Database. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1816--1827.
[52]
Shlomo Priymak. 2013. Under the hood: MySQL Pool Scanner (MPS). https://engineering.fb.com/2013/10/22/core-data/under-the-hood-mysql-pool-scanner-mps/.
[53]
Anshul Rai, Ranjita Bhagwan, and Saikat Guha. 2012. Generalized Resource Allocation for the Cloud. In Proceedings of the Third ACM Symposium on Cloud Computing. 1--12.
[54]
Ringpop. 2017. https://github.com/uber-node/ringpop-node.
[55]
Alon Shalita, Brian Karrer, Igor Kabiljo, Arun Sharma, Alessandro Presta, Aaron Adcock, Herald Kllapi, and Michael Stumm. 2016. Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 455--468.
[56]
Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, David Callies, Abhishek Choudhary, Laurent Demailly, Thomas Fersch, Liat Atsmon Guz, Andrzej Kotulski, Sachin Kulkarni, Sanjeev Kumar, Harry Li, Jun Li, Evgeniy Makeev, Kowshik Prakasam, Robbert Van Renesse, Sabyasachi Roy, Pratyush Seth, Yee Jiun Song, Benjamin Wester, Kaushik Veeraraghavan, and Peter Xie. 2015. Wormhole: Reliable Pub-Sub to Support Geo-replicated Internet Services. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 351--366. https://www.usen ix.org/conference/nsdi15/technical-sessions/presentation/sharma
[57]
Lalith Suresh, João Loff, Faria Kalim, Sangeetha Abdu Jyothi, Nina Narodytska, Leonid Ryzhyk, Sahan Gamage, Brian Oki, Pranshu Jain, and Michael Gasch. 2020. Building Scalable and Flexible Cluster Managers Using Declarative Programming. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 827--844.
[58]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD'20). Association for Computing Machinery, New York, NY, USA, 1493--1509.
[59]
Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici. 2007. A Scalable Application Placement Controller for Enterprise Data Centers. In Proceedings of the 16th international conference on World Wide Web. 331--340.
[60]
Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. 2020. Twine: A Unified Cluster Management System for Shared Infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 787--803.
[61]
Muhammad Tirmazi, Adam Barker, Nan Deng, Md E Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: the Next Generation. In Proceedings of the Fifteenth European Conference on Computer Systems. 1--14.
[62]
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. 2016. TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters. In Proceedings of the Eleventh European Conference on Computer Systems. 1--16.
[63]
Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing.
[64]
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys).
[65]
Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, and Harsha V Madhyastha. 2013. Spanstore: Cost-effective Geo-replicated Storage Spanning Multiple Cloud Services. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 292--308.
[66]
Yunqi Zhang, George Prekas, Giovanni Matteo Fumarola, Marcus Fontoura, Íñigo Goiri, and Ricardo Bianchini. 2016. History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.

Cited By

View all
  • (2025)Cooperative Graceful Degradation in Containerized CloudsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707244(214-232)Online publication date: 3-Feb-2025
  • (2024)Optimizing resource allocation in hyperscale datacentersProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691965(507-528)Online publication date: 10-Jul-2024
  • (2024)Dirigent: Lightweight Serverless OrchestrationProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695966(369-384)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles
October 2021
899 pages
ISBN:9781450387095
DOI:10.1145/3477132
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2021

Check for updates

Author Tags

  1. availability
  2. shard management
  3. sharding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SOSP '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,790
  • Downloads (Last 6 weeks)175
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Cooperative Graceful Degradation in Containerized CloudsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707244(214-232)Online publication date: 3-Feb-2025
  • (2024)Optimizing resource allocation in hyperscale datacentersProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691965(507-528)Online publication date: 10-Jul-2024
  • (2024)Dirigent: Lightweight Serverless OrchestrationProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695966(369-384)Online publication date: 4-Nov-2024
  • (2024)Thinking out of replication for geo-distributing applications: the sharding case2024 IEEE 8th International Conference on Fog and Edge Computing (ICFEC)10.1109/ICFEC61590.2024.00019(43-50)Online publication date: 6-May-2024
  • (2023)StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154316:12(3501-3514)Online publication date: 1-Aug-2023
  • (2023)Scaling a Declarative Cluster Manager Architecture with Query Optimization TechniquesProceedings of the VLDB Endowment10.14778/3603581.360359916:10(2618-2631)Online publication date: 1-Jun-2023
  • (2023)Load Balancing Algorithms and Their Impacts on Apache Kafka2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386734(1726-1735)Online publication date: 15-Dec-2023
  • (2022)Parallelism-Optimizing Data Placement for Faster Data-Parallel ComputationsProceedings of the VLDB Endowment10.14778/3574245.357426016:4(760-771)Online publication date: 1-Dec-2022
  • (2022)Meta's next-generation realtime monitoring and analytics platformProceedings of the VLDB Endowment10.14778/3554821.355484115:12(3522-3534)Online publication date: 1-Aug-2022
  • (2022)ESDB: Processing Extremely Skewed Workloads in Real-timeProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526051(2286-2298)Online publication date: 10-Jun-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media