research-article

Open access

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage

Authors:

Chen QianAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 7, Issue 2

Article No.: 33, Pages 1 - 22

https://doi.org/10.1145/3589977

Published: 22 May 2023 Publication History

Abstract

Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes inflexible, failing to optimize or satisfy application requirements such as load balance, failure tolerance, parallelism, and network/system performance. This work presents a novel solution to achieve the best of two worlds: flexibility while maintaining cost-effectiveness and scalability. The proposed method Smash is an object placement and lookup method that achieves full placement flexibility, balanced load, low resource cost, and short latency. Smash utilizes a recent space-efficient data structure and applies it to object-location lookups. We implement Smash as a prototype system and evaluate it in a public cloud. The analysis and experimental results show that Smash achieves full placement flexibility, fast storage operations, fast recovery from node dynamics, and lower DRAM cost (<60%) compared to existing hash-based solutions such as Ceph and MapX.

References

[1]

https://docs.ceph.com/.

[2]

https://docs.ceph.com/en/latest/rados/configuration/storage-devices/.

[3]

https://github.com/facebook/rocksdb.

[4]

https://github.com/mellanox/dctrafficgen.

[5]

https://github.com/mongodb/mongo.

[6]

https://github.com/openstack/swift.

[7]

https://github.com/yliu634/smash.

[8]

https://www.cloudlab.us/.

[9]

Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., and Paleczny, M. Workload Analysis of a Large-Scale Key-Value Store. In In Proc. of ACM SIGMETRICS (2012).

Digital Library

[10]

Ballani, H., Costa, P., Karagiannis, T., and Rowstron., A. Towards Predictable Datacenter Networks. In In Proc. of ACM SIGCOMM (2011).

[11]

Barcelona-Pons, D., Sánchez-Artigas, M., París, G., Sutra, P., and García-López, P. On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures. In Proceedings of the 20th International Middleware Conference (2019), pp. 41--54.

Digital Library

[12]

Bel, O., Chang, K., Tallent, N., Duellmann, D., Miller, E. L., Nawab, F., and Long, D. D. E. Geomancy: Automated Performance Enhancement Through Data Layout Optimization. In Proceedings of 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2020).

[13]

Belazzougui, D., and Botelho, F. C. Hash, displace, and compress. In Proc. of Algorithms-ESA (2009).

[14]

Borthakur, D., et al. HDFS Architecture Guide. Hadoop apache project 53, 1--13 (2008), 2.

[15]

Braam, P. The Lustre Storage Architecture. arXiv preprint arXiv:1903.01955 (2019).

[16]

Cain, J. A., Sanders, P., and Wormald, N. The Random Graph Threshold for k-orientiability and a Fast Algorithm for Optimal Multiple-Choice Allocation. In Proc. of ACM-SIAM SODA (2007).

[17]

Cao, Z., Dong, S., Vemuri, S., and Du, D. H. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20) (2020), pp. 209--223.

Digital Library

[18]

Chakraborttii, C., and Litz, H. Reducing write amplification in flash by death-time prediction of logical block addresses. In Proceedings of the 14th ACM International Conference on Systems and Storage (2021), pp. 1--12.

Digital Library

[19]

Fernholz, D., and Ramachandran, V. The k-orientability Thresholds for Gn,p. In Proc. of ACM/SIAM SODA (2007).

[20]

Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google File System. In Proceedings of the nineteenth ACM symposium on Operating systems principles (2003), pp. 29--43.

Digital Library

[21]

Kargar, S., Litz, H., and Nawab, F. Predict and write: Using k-means clustering to extend the lifetime of nvm storage. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (2021), IEEE, pp. 768--779.

[22]

Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., and Panigrahy, R. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In In Proc. of ACM SOTC (1997).

Digital Library

[23]

Klimovic, A., Litz, H., and Kozyrakis, C. Reflex: Remote flash = local flash. ACM SIGARCH Computer Architecture News 45, 1 (2017), 345--359.

Digital Library

[24]

Klimovic, A., Wang, Y., Stuedi, P., Trivedi, A., Pfefferle, J., and Kozyrakis, C. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (2018), pp. 427--444.

[25]

Li, S., Lu, Y., Shu, J., Hu, Y., and Li, T. LocoFS: A Loosely-coupled Metadata Service for Distributed File Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2017), pp. 1--12.

Digital Library

[26]

Li, X., Andersen, D., Kaminsky, M., and Freedman, M. J. Algorithmic improvements for fast concurrent cuckoo hashing. In Proc. of ACM EuroSys (2014).

Digital Library

[27]

Litz, H., Gonzalez, J., Klimovic, A., and Kozyrakis, C. Rail: Predictable, low tail latency for nvme flash. In Transactions on Storage (ToS) (2021).

[28]

Liu, Z., Bai, Z., Liu, Z., Li, X., Kim, C., Braverman, V., Jin, X., and Stoica, I. Distcache: Provable Load Balancing for Large-scale Storage Systems with Distributed Caching. In 17th USENIX Conference on File and Storage Technologies (FAST 19) (2019), pp. 143--157.

[29]

Lofstead, J., Jimenez, I., Maltzahn, C., Koziol, Q., Bent, J., and Barton, E. DAOS and Friends: A Proposal for an Exascale Storage System. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2016), IEEE, pp. 585--596.

[30]

Lv, W., Lu, Y., Zhang, Y., Duan, P., and Shu, J. InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems. In 20th USENIX Conference on File and Storage Technologies (FAST 22) (Santa Clara, CA, Feb. 2022), USENIX Association, pp. 313--328.

[31]

Purandare, D., Wilcox, P., Litz, H., and Finkelstein, S. Append is near: Log-based data management on zns ssds. In 12th Annual Conference on Innovative Data Systems Research (CIDR'22). (2022).

[32]

Rashmi, K., Chowdhury, M., Kosaian, J., Stoica, I., and Ramchandran, K. EC-Cache: Load-balanced, Low- latency Cluster Caching with Online Erasure Coding. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016), pp. 401--417.

[33]

Amazon Web Services. Amazon simple storage service. https://aws.amazon.com/s3/.

[34]

Ren, K., Zheng, Q., Patil, S., and Gibson, G. IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2014), IEEE, pp. 237--248.

Digital Library

[35]

Roy, A., Zeng, H., Bagga, J., Porter, G., and Snoeren, A. C. Inside the Social Network's (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (2015), pp. 123--137.

Digital Library

[36]

Sevilla, M. A., Watkins, N., Maltzahn, C., Nassi, I., Brandt, S. A., Weil, S. A., Farnum, G., and Fineberg, S. Mantle: A Programmable Metadata Load Balancer for the Ceph File System. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2015), IEEE, pp. 1--12.

[37]

Shi, S., and Qian, C. Ludo hashing: Compact, Fast, and Dynamic Key-Value Lookups for Practical Network Systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4, 2 (2020), 1--32.

[38]

Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM Computer Communication Review 31, 4 (2001), 149--160.

[39]

Tang, H., Byna, S., Dong, B., Liu, J., and Koziol, Q. Someta: Scalable Object-centric Metadata Management for High Performance Computing. In 2017 IEEE International Conference on Cluster Computing (CLUSTER) (2017), IEEE, pp. 359--369.

[40]

Wang, L., Zhang, Y., Xu, J., and Xue, G. MAPX: Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20) (2020), pp. 1--11.

[41]

Wang, Y., Li, C., Shao, X., Chen, Y., Yan, F., and Xu, Y. Lunule: An Agile and Judicious Metadata Load Balancer for CephFS. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2021), pp. 1--16.

Digital Library

[42]

Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D., and Maltzahn, C. Ceph: A Scalable, High-performance Distributed File System. In Proceedings of the 7th symposium on Operating systems design and implementation (2006), pp. 307--320.

Digital Library

[43]

Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. Crush: Controlled, Scalable, Decentralized Placement of Replicated Data. In SC'06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006), IEEE, pp. 31--31.

[44]

won You, G., won Hwang, S., and Jain, N. Scalable Load Balancing in Cluster Storage Systems. In In Proc. of ACM/IFIP/USENIX Middleware (2011).

[45]

Xie, M., and Qian, C. Reflex4arm: Supporting 100gbe flash storage disaggregation on arm soc. In OCP Future Technology Symposium (2020).

[46]

Yang, J., Yue, Y., and Rashmi, K. V. A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter. ACM Transactions on Storage (2021).

[47]

Yu, Y., Belazzougui, D., Qian, C., and Zhang, Q. Memory-efficient and Ultra-fast Network Lookup and Forwarding using Othello Hashing. IEEE/ACM Transactions on Networking (2018).

Digital Library

Cited By

Venkateswaran STai CGarnayak RBen-Yehezkel YAlpert YSivakumar RGamess E(2024)IEEE 802.11ax Target Wake Time: Design and Performance Analysis in ns-3Proceedings of the 2024 Workshop on ns-310.1145/3659111.3659115(10-18)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3659111.3659115
Lee CHiguchi TUcar SKaneko NAltintas OOguchi KOkoshi TKo JLiKamWa R(2024)Poster: Performance Analysis of TCP CUBIC and BBR over V2V Wi-FiProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661413(668-669)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661413
Tong JAn ZZhao XLiao SYang LPapavassiliou SSchmid S(2024)In-Sensor Machine Learning: Radio Frequency Neural Networks for Wireless SensingProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686393(261-270)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3641512.3686393
Show More Cited By

Index Terms

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
1. Networks
  1. Network types
    1. Storage area networks

Recommendations

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
SIGMETRICS '23

Smash is a new placement and lookup method for distributed storage systems. It achieves full placement flexibility and low DRAM cost to store ID-to-location mappings, two desired features that could not be achieved simultaneously by any prior method.
Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Smash is a new placement and lookup method for distributed storage systems. It achieves full placement flexibility and low DRAM cost to store ID-to-location mappings, two desired features that could not be achieved simultaneously by any prior method.
An efficient and effective detailed placement algorithm
ICCAD '05: Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design

In the past few years, there has been a lot of research in the area of global placement. In comparison, not much attention has been paid to the detailed placement problem. Existing detailed placers either fail to improve upon the excellent solution ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 7, Issue 2

POMACS

June 2023

247 pages

EISSN:2476-1249

DOI:10.1145/3599176

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California, United States
,
Zhi-Li Zhang
University of Minnesota, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2023

Published in POMACS Volume 7, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
651
Total Downloads

Downloads (Last 12 months)413
Downloads (Last 6 weeks)30

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Venkateswaran STai CGarnayak RBen-Yehezkel YAlpert YSivakumar RGamess E(2024)IEEE 802.11ax Target Wake Time: Design and Performance Analysis in ns-3Proceedings of the 2024 Workshop on ns-310.1145/3659111.3659115(10-18)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3659111.3659115
Lee CHiguchi TUcar SKaneko NAltintas OOguchi KOkoshi TKo JLiKamWa R(2024)Poster: Performance Analysis of TCP CUBIC and BBR over V2V Wi-FiProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661413(668-669)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661413
Tong JAn ZZhao XLiao SYang LPapavassiliou SSchmid S(2024)In-Sensor Machine Learning: Radio Frequency Neural Networks for Wireless SensingProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686393(261-270)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3641512.3686393
Wu YWang FZhu YFan ZXiong ZYang TCui B(2024)VisionEmbedder: Bit-Level-Compact Key-Value Storage with Constant Lookup, Rapid Updates, and Rare Failure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00324(4248-4261)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00324
Jaliminche LChakraborttii CChoi CLitz H(2023)Enabling Multi-tenancy on SSDs with Accurate IO Interference ModelingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624657(216-232)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624657

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents