Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage

Published: 22 May 2023 Publication History

Abstract

Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes inflexible, failing to optimize or satisfy application requirements such as load balance, failure tolerance, parallelism, and network/system performance. This work presents a novel solution to achieve the best of two worlds: flexibility while maintaining cost-effectiveness and scalability. The proposed method Smash is an object placement and lookup method that achieves full placement flexibility, balanced load, low resource cost, and short latency. Smash utilizes a recent space-efficient data structure and applies it to object-location lookups. We implement Smash as a prototype system and evaluate it in a public cloud. The analysis and experimental results show that Smash achieves full placement flexibility, fast storage operations, fast recovery from node dynamics, and lower DRAM cost (<60%) compared to existing hash-based solutions such as Ceph and MapX.

References

[1]
https://docs.ceph.com/.
[2]
https://docs.ceph.com/en/latest/rados/configuration/storage-devices/.
[3]
https://github.com/facebook/rocksdb.
[4]
https://github.com/mellanox/dctrafficgen.
[5]
https://github.com/mongodb/mongo.
[6]
https://github.com/openstack/swift.
[7]
https://github.com/yliu634/smash.
[8]
https://www.cloudlab.us/.
[9]
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., and Paleczny, M. Workload Analysis of a Large-Scale Key-Value Store. In In Proc. of ACM SIGMETRICS (2012).
[10]
Ballani, H., Costa, P., Karagiannis, T., and Rowstron., A. Towards Predictable Datacenter Networks. In In Proc. of ACM SIGCOMM (2011).
[11]
Barcelona-Pons, D., Sánchez-Artigas, M., París, G., Sutra, P., and García-López, P. On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures. In Proceedings of the 20th International Middleware Conference (2019), pp. 41--54.
[12]
Bel, O., Chang, K., Tallent, N., Duellmann, D., Miller, E. L., Nawab, F., and Long, D. D. E. Geomancy: Automated Performance Enhancement Through Data Layout Optimization. In Proceedings of 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2020).
[13]
Belazzougui, D., and Botelho, F. C. Hash, displace, and compress. In Proc. of Algorithms-ESA (2009).
[14]
Borthakur, D., et al. HDFS Architecture Guide. Hadoop apache project 53, 1--13 (2008), 2.
[15]
Braam, P. The Lustre Storage Architecture. arXiv preprint arXiv:1903.01955 (2019).
[16]
Cain, J. A., Sanders, P., and Wormald, N. The Random Graph Threshold for k-orientiability and a Fast Algorithm for Optimal Multiple-Choice Allocation. In Proc. of ACM-SIAM SODA (2007).
[17]
Cao, Z., Dong, S., Vemuri, S., and Du, D. H. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20) (2020), pp. 209--223.
[18]
Chakraborttii, C., and Litz, H. Reducing write amplification in flash by death-time prediction of logical block addresses. In Proceedings of the 14th ACM International Conference on Systems and Storage (2021), pp. 1--12.
[19]
Fernholz, D., and Ramachandran, V. The k-orientability Thresholds for Gn,p. In Proc. of ACM/SIAM SODA (2007).
[20]
Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google File System. In Proceedings of the nineteenth ACM symposium on Operating systems principles (2003), pp. 29--43.
[21]
Kargar, S., Litz, H., and Nawab, F. Predict and write: Using k-means clustering to extend the lifetime of nvm storage. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021), IEEE, pp. 768--779.
[22]
Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., and Panigrahy, R. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In In Proc. of ACM SOTC (1997).
[23]
Klimovic, A., Litz, H., and Kozyrakis, C. Reflex: Remote flash = local flash. ACM SIGARCH Computer Architecture News 45, 1 (2017), 345--359.
[24]
Klimovic, A., Wang, Y., Stuedi, P., Trivedi, A., Pfefferle, J., and Kozyrakis, C. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018), pp. 427--444.
[25]
Li, S., Lu, Y., Shu, J., Hu, Y., and Li, T. LocoFS: A Loosely-coupled Metadata Service for Distributed File Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2017), pp. 1--12.
[26]
Li, X., Andersen, D., Kaminsky, M., and Freedman, M. J. Algorithmic improvements for fast concurrent cuckoo hashing. In Proc. of ACM EuroSys (2014).
[27]
Litz, H., Gonzalez, J., Klimovic, A., and Kozyrakis, C. Rail: Predictable, low tail latency for nvme flash. In Transactions on Storage (ToS) (2021).
[28]
Liu, Z., Bai, Z., Liu, Z., Li, X., Kim, C., Braverman, V., Jin, X., and Stoica, I. Distcache: Provable Load Balancing for Large-scale Storage Systems with Distributed Caching. In 17th USENIX Conference on File and Storage Technologies (FAST 19) (2019), pp. 143--157.
[29]
Lofstead, J., Jimenez, I., Maltzahn, C., Koziol, Q., Bent, J., and Barton, E. DAOS and Friends: A Proposal for an Exascale Storage System. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2016), IEEE, pp. 585--596.
[30]
Lv, W., Lu, Y., Zhang, Y., Duan, P., and Shu, J. InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems. In 20th USENIX Conference on File and Storage Technologies (FAST 22) (Santa Clara, CA, Feb. 2022), USENIX Association, pp. 313--328.
[31]
Purandare, D., Wilcox, P., Litz, H., and Finkelstein, S. Append is near: Log-based data management on zns ssds. In 12th Annual Conference on Innovative Data Systems Research (CIDR'22). (2022).
[32]
Rashmi, K., Chowdhury, M., Kosaian, J., Stoica, I., and Ramchandran, K. EC-Cache: Load-balanced, Low- latency Cluster Caching with Online Erasure Coding. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016), pp. 401--417.
[33]
Amazon Web Services. Amazon simple storage service. https://aws.amazon.com/s3/.
[34]
Ren, K., Zheng, Q., Patil, S., and Gibson, G. IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2014), IEEE, pp. 237--248.
[35]
Roy, A., Zeng, H., Bagga, J., Porter, G., and Snoeren, A. C. Inside the Social Network's (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (2015), pp. 123--137.
[36]
Sevilla, M. A., Watkins, N., Maltzahn, C., Nassi, I., Brandt, S. A., Weil, S. A., Farnum, G., and Fineberg, S. Mantle: A Programmable Metadata Load Balancer for the Ceph File System. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2015), IEEE, pp. 1--12.
[37]
Shi, S., and Qian, C. Ludo hashing: Compact, Fast, and Dynamic Key-Value Lookups for Practical Network Systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4, 2 (2020), 1--32.
[38]
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM Computer Communication Review 31, 4 (2001), 149--160.
[39]
Tang, H., Byna, S., Dong, B., Liu, J., and Koziol, Q. Someta: Scalable Object-centric Metadata Management for High Performance Computing. In 2017 IEEE International Conference on Cluster Computing (CLUSTER) (2017), IEEE, pp. 359--369.
[40]
Wang, L., Zhang, Y., Xu, J., and Xue, G. MAPX: Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20) (2020), pp. 1--11.
[41]
Wang, Y., Li, C., Shao, X., Chen, Y., Yan, F., and Xu, Y. Lunule: An Agile and Judicious Metadata Load Balancer for CephFS. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2021), pp. 1--16.
[42]
Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D., and Maltzahn, C. Ceph: A Scalable, High-performance Distributed File System. In Proceedings of the 7th symposium on Operating systems design and implementation (2006), pp. 307--320.
[43]
Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. Crush: Controlled, Scalable, Decentralized Placement of Replicated Data. In SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006), IEEE, pp. 31--31.
[44]
won You, G., won Hwang, S., and Jain, N. Scalable Load Balancing in Cluster Storage Systems. In In Proc. of ACM/IFIP/USENIX Middleware (2011).
[45]
Xie, M., and Qian, C. Reflex4arm: Supporting 100gbe flash storage disaggregation on arm soc. In OCP Future Technology Symposium (2020).
[46]
Yang, J., Yue, Y., and Rashmi, K. V. A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter. ACM Transactions on Storage (2021).
[47]
Yu, Y., Belazzougui, D., Qian, C., and Zhang, Q. Memory-efficient and Ultra-fast Network Lookup and Forwarding using Othello Hashing. IEEE/ACM Transactions on Networking (2018).

Cited By

View all
  • (2024)IEEE 802.11ax Target Wake Time: Design and Performance Analysis in ns-3Proceedings of the 2024 Workshop on ns-310.1145/3659111.3659115(10-18)Online publication date: 5-Jun-2024
  • (2024)Poster: Performance Analysis of TCP CUBIC and BBR over V2V Wi-FiProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661413(668-669)Online publication date: 3-Jun-2024
  • (2024)In-Sensor Machine Learning: Radio Frequency Neural Networks for Wireless SensingProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686393(261-270)Online publication date: 14-Oct-2024
  • Show More Cited By

Index Terms

  1. Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 7, Issue 2
    POMACS
    June 2023
    247 pages
    EISSN:2476-1249
    DOI:10.1145/3599176
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 May 2023
    Published in POMACS Volume 7, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed storage
    2. indexing algorithm
    3. ludo hashing

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)413
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)IEEE 802.11ax Target Wake Time: Design and Performance Analysis in ns-3Proceedings of the 2024 Workshop on ns-310.1145/3659111.3659115(10-18)Online publication date: 5-Jun-2024
    • (2024)Poster: Performance Analysis of TCP CUBIC and BBR over V2V Wi-FiProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661413(668-669)Online publication date: 3-Jun-2024
    • (2024)In-Sensor Machine Learning: Radio Frequency Neural Networks for Wireless SensingProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686393(261-270)Online publication date: 14-Oct-2024
    • (2024)VisionEmbedder: Bit-Level-Compact Key-Value Storage with Constant Lookup, Rapid Updates, and Rare Failure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00324(4248-4261)Online publication date: 13-May-2024
    • (2023)Enabling Multi-tenancy on SSDs with Accurate IO Interference ModelingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624657(216-232)Online publication date: 30-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media