Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3623278.3624765acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

Published: 07 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to the overall performance of a large-scale DFS. Common "serverful" MDS architectures, such as a single server or cluster of servers, have a significant shortcoming: either they are not scalable, or they make it difficult to achieve an optimal balance of performance, resource utilization, and cost. A modern MDS requires a novel architecture that addresses this shortcoming.
    To this end, we design and implement γFS, an elastic, high-performance metadata service for large-scale DFSes. γFS scales a DFS metadata cache elastically on a FaaS (Function-as-a-Service) platform and synthesizes a series of techniques to overcome the obstacles that are encountered when building large, stateful, and performance-sensitive applications on FaaS platforms. γFS takes full advantage of the unique benefits offered by FaaS---elastic scaling and massive parallelism---to realize a highly-optimized metadata service capable of sustaining up to 4.13X higher throughput, 90.40% lower latency, 85.99% lower cost, 3.33X better performance-per-cost, and better resource utilization and efficiency than a state-of-the-art DFS for an industrial workload.

    References

    [1]
    Alibaba Cloud Function Compute Custom Container Runtime. https://www.alibabacloud.com/help/doc-detail/179368.htm.
    [2]
    Apache Hadoop. http://hadoop.apache.org/.
    [3]
    Apache OpenWhisk. https://github.com/apache/incubator-openwhisk.
    [4]
    AWS Lambda. https://aws.amazon.com/lambda/.
    [5]
    AWS Lambda Pricing. https://aws.amazon.com/lambda/pricing/.
    [6]
    BeeGFS. https://www.beegfs.io/c/.
    [7]
    Capabilities in CephFS. https://docs.ceph.com/en/quincy/cephfs/capabilities/.
    [8]
    GitHub EsotericSoftware/kryonet. https://github.com/EsotericSoftware/kryonet/blob/03a135e2039bd7eb20e436ad70539238563d15a4/README.md.
    [9]
    Google Cloud Run. https://cloud.google.com/run.
    [10]
    kubernetes: Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.
    [11]
    γFS Source Code. https://github.com/ds2--lab/LambdaFS.
    [12]
    γFS Workload Driver. https://github.com/ds2-lab/LambdaFS-Benchmark-Utility.
    [13]
    LevelDB. https://github.com/google/leveldb.
    [14]
    Lustre file system. http://lustre.org/.
    [15]
    MySQL :: MySQL 8.0 Reference Manual:: 23 MySQL NDB Cluster 8.0.
    [16]
    MySQL Cluster NDB. https://www.mysql.com/products/cluster/.
    [17]
    New for AWS Lambda - Container Image Support. https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/.
    [18]
    NSF Computational and Data-Enabled Science and Engineering (CDS&E). https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504813.
    [19]
    Nuclio. https://nuclio.io/.
    [20]
    NumPy: the fundamental package for scientific computing with Python. http://www.numpy.org/.
    [21]
    Preventing Long Tail Latency. https://www.section.io/blog/preventing-long-tail-latency/.
    [22]
    Provisioned Concurrency for Lambda Functions. https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/.
    [23]
    PyTorch: A Deep Learning Framework for Fast, Flexible Experimentation. https://pytorch.org/.
    [24]
    REALIZING THE POTENTIAL OF DATA SCIENCE: Final Report from the National Science Foundation Computer and Information Science and Engineering Advisory Committee Data Science Working Group. https://www.nsf.gov/cise/ac-data-science-report/CISEACDataScienceReport1.19.17.pdf.
    [25]
    Scaling Namespace Operations with Giraffa File System | USENIX. https://www.usenix.org/publications/login/summer2017/shvachko.
    [26]
    Smkniazi/Hammer-Bench: HDFS-Distributed-BenchMark. https://github.com/smkniazi/hammer-bench.
    [27]
    The exabyte club: LinkedIn's journey of scaling the Hadoop Distributed File System. https://shorturl.at/agoyH.
    [28]
    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, Savannah, GA, 2016. USENIX Association.
    [29]
    S. A. Brandt, E. L. Miller, D. D. E. Long, and Lan Xue. Efficient metadata management in large distributed storage systems. In 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings., pages 290--298, 2003.
    [30]
    Philip H. Carns, Walter B. Ligon III, Robert B. Ross, and Rajeev Thakur. PVFS: A parallel file system for linux clusters. In 4th Annual Linux Showcase & Conference (ALS 2000), Atlanta, GA, October 2000. USENIX Association.
    [31]
    Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng. Wukong: A scalable and locality-enhanced framework for serverless parallel computing. In ACM Symposium on Cloud Computing 2020 (SoCC'20), 2020.
    [32]
    Benjamin Carver, Jingyuan Zhang, Ao Wang, and Yue Cheng. In search of a fast and efficient serverless dag engine. In 2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW), pages 1--10, 2019.
    [33]
    Ryan Chard, Tyler J. Skluzacek, Zhuozhao Li, Yadu Babuji, Anna Woodard, Ben Blaiszik, Steven Tuecke, Ian Foster, and Kyle Chard. Serverless supercomputing: High performance function as a service for science, 2019.
    [34]
    Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.
    [35]
    Yanpei Chen, Sara Alspaugh, and Randy Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proc. VLDB Endow., 5(12):1802--1813, August 2012.
    [36]
    Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, January 2008.
    [37]
    Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. Encoding, fast and slow: Low-latency video processing using thousands of tiny threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 363--376, Boston, MA, 2017. USENIX Association.
    [38]
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 20--43, Bolton Landing, NY, 2003.
    [39]
    Jim Gray. Why do computers stop and what can be done about it?, 1985.
    [40]
    Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In 2010 USENIX Annual Technical Conference (USENIX ATC 10). USENIX Association, June 2010.
    [41]
    Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. Occupy the cloud: Distributed computing for the 99%. In ACM SoCC '17, 2017.
    [42]
    Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Menezes Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. Cloud programming simplified: A berkeley view on serverless computing. Technical Report UCB/EECS-2019-3, EECS Department, University of California, Berkeley, Feb 2019.
    [43]
    Anna R. Karlin, Mark S. Manasse, Larry Rudolph, and Daniel D. Sleator. Competitive snoopy caching. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pages 244--254, 1986.
    [44]
    Anurag Khandelwal, Yupeng Tang, Rachit Agarwal, Aditya Akella, and Ion Stoica. Jiffy: Elastic far-memory for stateful serverless analytics. In Proceedings of the Seventeenth European Conference on Computer Systems, EuroSys '22, page 697--713, New York, NY, USA, 2022. Association for Computing Machinery.
    [45]
    Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. Pocket: Elastic ephemeral storage for serverless analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 427--444, Carlsbad, CA, 2018. USENIX Association.
    [46]
    Andrew W. Leung, Shankar Pasupathy, Garth Goodson, and Ethan L. Miller. Measurement and analysis of large-scale network file system workloads. In USENIX 2008 Annual Technical Conference, ATC'08, page 213--226, USA, 2008. USENIX Association.
    [47]
    Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst., 7(4):321--359, nov 1989.
    [48]
    Siyang Li, Youyou Lu, Jiwu Shu, Yang Hu, and Tao Li. Locofs: A loosely-coupled metadata service for distributed file systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '17, New York, NY, USA, 2017. Association for Computing Machinery.
    [49]
    Wenhao Lv, Youyou Lu, Yiming Zhang, Peile Duan, and Jiwu Shu. InfiniFS: An efficient metadata service for Large-Scale distributed filesystems. In 20th USENIX Conference on File and Storage Technologies (FAST 22), pages 313--328, Santa Clara, CA, February 2022. USENIX Association.
    [50]
    Pulkit A. Misra, María F. Borge, Íñigo Goiri, Alvin R. Lebeck, Willy Zwaenepoel, and Ricardo Bianchini. Managing tail latency in datacenter-scale file systems under production constraints. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, New York, NY, USA, 2019. Association for Computing Machinery.
    [51]
    Salman Niazi, Mahmoud Ismail, Seif Haridi, Jim Dowling, Steffen Grohsschmiedt, and Mikael Ronström. Hopsfs: Scaling hierarchical file system metadata using newsql databases. In 15th USENIX Conference on File and Storage Technologies (FAST 17), pages 89--104, Santa Clara, CA, February 2017. USENIX Association.
    [52]
    Satadru Pan, Theano Stavrinos, Yunqiao Zhang, Atul Sikaria, Pavel Zakharov, Abhinav Sharma, Shiva Shankar P, Mike Shuey, Richard Wareing, Monika Gangapuram, Guanglei Cao, Christian Preseau, Pratap Singh, Kestutis Patiejunas, JR Tipton, Ethan Katz-Bassett, and Wyatt Lloyd. Facebook's tectonic filesystem: Efficiency from exascale. In 19th USENIX Conference on File and Storage Technologies (FAST 21), pages 217--231. USENIX Association, February 2021.
    [53]
    Swapnil Patil and Garth Gibson. Scale and concurrency of GIGA+: File system directories with millions of files. In 9th USENIX Conference on File and Storage Technologies (FAST 11), San Jose, CA, February 2011. USENIX Association.
    [54]
    K. Ren, Q. Zheng, S. Patil, and G. Gibson. Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 237--248, 2014.
    [55]
    Zujie Ren, Biao Xu, Weisong Shi, Yongjian Ren, Feng Cao, Jiangbin Lin, and Zheng Ye. igen: A realistic request generator for cloud file systems benchmarking. In 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pages 343--350, 2016.
    [56]
    Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca, Christos Kozyrakis, and Ricardo Bianchini. Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications, page 122--137. Association for Computing Machinery, New York, NY, USA, 2021.
    [57]
    Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society.
    [58]
    Konstantin V Shvachko. Hdfs scalability: The limits to growth. ; login:: the magazine of USENIX & SAGE, 35(2):6--16, 2010.
    [59]
    Alexander Thomson and Daniel J. Abadi. {CalvinFS}: Consistent {WAN} Replication and Scalable Metadata Management for Distributed File Systems. pages 1--14, 2015.
    [60]
    Huangshi Tian, Yunchuan Zheng, and Wei Wang. Characterizing and synthesizing task dependencies of data-parallel jobs in alibaba cloud. In Proceedings of the ACM Symposium on Cloud Computing, SoCC '19, page 139--151, New York, NY, USA, 2019. Association for Computing Machinery.
    [61]
    Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. InfiniCache: Exploiting ephemeral serverless functions to build a cost-effective memory cache. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 267--281, Santa Clara, CA, February 2020. USENIX Association.
    [62]
    Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. Peeking behind the curtains of serverless platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 133--146, Boston, MA, 2018. USENIX Association.
    [63]
    S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller. Dynamic metadata management for petabyte-scale file systems. In SC '04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, pages 4--4, 2004.
    [64]
    Lianghong Xu, James Cipar, Elie Krevat, Alexey Tumanov, Nitin Gupta, Michael A. Kozuch, and Gregory R. Ganger. Springfs: Bridging agility and performance in elastic distributed storage. In 12th USENIX Conference on File and Storage Technologies (FAST 14), pages 243--255, Santa Clara, CA, February 2014. USENIX Association.
    [65]
    Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In USENIX NSDI 12, 2012.
    [66]
    Jingyuan Zhang, Ao Wang, Xiaolong Ma, Benjamin Carver, Nicholas John Newman, Ali Anwar, Lukas Rupprecht, Vasily Tarasov, Dimitrios Skourtis, Feng Yan, and Yue Cheng. Infinistore: Elastic serverless cloud storage. Proc. VLDB Endow., 16(7):1629--1642, may 2023.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4
    March 2023
    430 pages
    ISBN:9798400703942
    DOI:10.1145/3623278
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 February 2024

    Check for updates

    Badges

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '23

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 369
      Total Downloads
    • Downloads (Last 12 months)369
    • Downloads (Last 6 weeks)85

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media