Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3456727.3463784acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article
Open access

Reducing write amplification in flash by death-time prediction of logical block addresses

Published: 14 June 2021 Publication History

Abstract

Flash-based solid state drives lack support for in-place updates, and hence deploy a flash translation layer to absorb the writes. For this purpose, SSDs implement a log-structured storage system introducing garbage collection and write-amplification overheads. In this paper, we present a machine learning based approach for reducing write amplification in log structured file systems via death-time prediction of logical block addresses. We define death-time of a data element as the number of I/O writes before which the data element is overwritten. We leverage the sequential nature of I/O accesses to train lightweight, yet powerful, temporal convolutional network (TCN) based models to predict death-times of logical blocks in SSDs. We leverage the predicted death-times in designing ML-DT, a near-optimal data placement technique that minimizes write amplification (WA) in log structured storage systems. We compare our approach with three state-of-the-art data placement schemes and show that ML-DT achieves the lowest WA by utilizing the learnt I/O death-time patterns from real-world storage workloads. Our proposed approach results in up to 14% reduction in write amplification compared to the best baseline technique. Additionally, we present a mapping learning technique to test the applicability of our approach to new or unseen workloads and present a hyper-parameter sensitive study.

References

[1]
Ben Athiwaratkun and Jack W Stokes. 2017. Malware classification with LSTM and GRU language models and a character-level CNN. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, IEEE, USA, 2482--2486.
[2]
Grant Ayers, Heiner Litz, Christos Kozyrakis, and Parthasarathy Ranganathan. 2020. Classifying memory access patterns for prefetching. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, USA, 513--526.
[3]
Peter Braun and Heiner Litz. 2019. Understanding memory access patterns for prefetching. In International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA. NSF, USA, 10187649.
[4]
Thomas J Brazil. 1995. Causal-convolution-a new method for the transient analysis of linear systems at microwave frequencies. IEEE transactions on microwave theory and techniques 43, 2 (1995), 315--323.
[5]
Jason Brownlee. 2018. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery, USA.
[6]
Chandranil Chakraborttii and Heiner Litz. 2020. Explaining SSD Failures using Anomaly Detection. Non-Volatile Memory Workshop 1, 1 (2020), 1.
[7]
Chandranil Chakraborttii and Heiner Litz. 2020. Improving the accuracy, adaptability, and interpretability of SSD failure prediction models. In Proceedings of the 11th ACM Symposium on Cloud Computing. ACM, USA, 120--133.
[8]
Chandranil Chakraborttii and Heiner Litz. 2020. Learning I/O Access Patterns to Improve Prefetching in SSDs. ECML-PKDD 1, 1 (2020), 14.
[9]
Chandranil Chakraborttii, Vikas Sinha, and Heiner Litz. 2018. Ssd qos improvements through machine learning. In Proceedings of the ACM Symposium on Cloud Computing. ACM, USA, 511--511.
[10]
Cen Chen, Bingzhe Wu, Minghui Qiu, Li Wang, and Jun Zhou. 2020. A Comprehensive Analysis of Information Leakage in Deep Transfer Learning. arXiv preprint arXiv:2009.01989 1, 0 (2020), 0000--0000.
[11]
M-L Chiang and R-C Chang. 1999. Cleaning policies in mobile computers using flash memory. Journal of Systems and Software 48, 3 (1999), 213--231.
[12]
Mei-Ling Chiang, Paul CH Lee, and Ruei-Chuan Chang. 1999. Using data clustering to improve cleaning performance for flash memory. Software: Practice and Experience 29, 3 (1999), 267--290.
[13]
Steven Gold, Anand Rangarajan, et al. 1996. Softmax to softassign: Neural network algorithms for combinatorial optimization. Journal of Artificial Neural Networks 2, 4 (1996), 381--399.
[14]
Longzhe Han, Yeonseung Ryu, and Keunsoo Yim. 2006. CATA: a garbage collection scheme for flash memory file systems. In International Conference on Ubiquitous Intelligence and Computing. Springer, USA, 103--112.
[15]
Long-zhe Han, Yeonseung Ryu, Tae-sun Chung, Myungho Lee, and Sukwon Hong. 2006. An intelligent garbage collection algorithm for flash memory storages. In International Conference on Computational Science and Its Applications. Springer, USA, 1019--1027.
[16]
Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. In International Conference on Machine Learning (ICML). PMLR, USA, 1919--1928.
[17]
Jun He, Sudarsun Kannan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. The unwritten contract of solid state drives. In Proceedings of the Twelfth European Conference on Computer Systems. European Conference on Computer Systems, USA, 127--144.
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[19]
Shujun Huang, Nianguang Cai, Pedro Penzuti Pacheco, Shavira Narrandes, Yang Wang, and Wayne Xu. 2018. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics-Proteomics 15, 1 (2018), 41--51.
[20]
Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Hot Topics in Storage and File Systems (HotStorage 14). 6th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 14), USA, 1--26.
[21]
Saeed Kargar, Heiner Litz, and Faisal Nawab. 2020. Predict and Write: Using K-Means Clustering to Extend the Lifetime of NVM Storage. arXiv preprint arXiv:2011.02556 1, 1 (2020), 1--1.
[22]
Bekir Karlik and A Vehbi Olgac. 2011. Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems 1, 4 (2011), 111--122.
[23]
Taejin Kim, Sangwook Shane Hahn, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. 2018. PCStream: automatic stream allocation using program contexts. In 10th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 18). Hot Topics in Storage and File Systems, USA, 1--12.
[24]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. Reflex: Remote flash==local flash. ACM SIGARCH Computer Architecture News 45, 1 (2017), 345--359.
[25]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2018. Selecta: Heterogeneous cloud storage configuration for data analytics. In 2018 {USENIX} Annual Technical Conference ATC 18). ACM, USA, 759--773.
[26]
Kevin Kremer and André Brinkmann. 2019. FADaC: A self-adapting data classifier for flash memory. In Proceedings of the 12th ACM International Conference on Systems and Storage. ACM International Conference on Systems and Storage, USA, 167--178.
[27]
Arezki Laga, Jalil Boukhobza, Michel Koskas, and Frank Singhoff. 2016. Lynx: A learning linux prefetching mechanism for ssd performance model. In 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, IEEE, USA, 1--6.
[28]
Colin Lea, Rene Vidal, Austin Reiter, and Gregory D Hager. 2016. Temporal convolutional networks: A unified approach to action segmentation. In European Conference on Computer Vision. Springer, Springer, USA, 47--54.
[29]
Chunghan Lee, Tatsuo Kumano, Tatsuma Matsuki, Hiroshi Endo, Naoto Fukumoto, and Mariko Sugawara. 2017. Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In Proceedings of the 10th ACM International Systems and Storage Conference. ACM International Systems and Storage Conference, USA, 1--11.
[30]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In 13th {USENIX} Conference on File and Storage Technologies ({FAST} 15). {USENIX} Conference on File and Storage Technologies, USA, 273--286.
[31]
Heiner Litz, Javier Gonzalez, Ana Klimovic, and Christos Kozyrakis. 2021. RAIL: Predictable, Low Tail Latency for NVMe Flash. ACM Transactions on Storage (TOS) 1, 1 (2021), 1--1.
[32]
Heiner Litz and Milad Hashemi. 2020. Machine Learning for Systems. IEEE Micro 40, 5 (2020), 6--7.
[33]
Yusuke Manabe and Basabi Chakraborty. 2007. A novel approach for estimation of optimal embedding parameters of nonlinear time series by structural learning of neural network. Neurocomputing 70, 7-9 (2007), 1360--1371.
[34]
Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: random write considered harmful in solid state drives. In FAST, Vol. 12. {USENIX} Conference on File and Storage Technologies, USA, 1--16.
[35]
Yongseok Oh, Jongmoo Choi, Donghee Lee, and Sam H Noh. 2014. Improving performance and lifetime of the SSD RAID-based host cache through a log-structured approach. ACM SIGOPS Operating Systems Review 48, 1 (2014), 90--97.
[36]
Keiron O'Shea and Ryan Nash. 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 1, 1 (2015), 14.
[37]
Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2019. On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237 1, 1 (2019), 14.
[38]
Eunhee Rho, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Jooyoung Hwang, Sangyeun Cho, Daniel DG Lee, and Jaeheon Jeong. 2018. FStream: Managing flash streams in the file system. In 16th {USENIX} Conference on File and Storage Technologies ({FAST} 18). {USENIX} Conference on File and Storage Technologies, USA, 257--264.
[39]
Mendel Rosenblum and John K Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS) 10, 1 (1992), 26--52.
[40]
Mansour Shafaei, Peter Desnoyers, and Jim Fitzpatrick. 2016. Write amplification reduction in flash-based SSDs through extent-based temperature identification. In 8th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 16). {USENIX} Workshop on Hot Topics in Storage and File Systems, USA, 1--8.
[41]
SIA. 2001. Block IO Traces. http://iotta.snia.org/tracetypes/3. Accessed on 2021-01-11.
[42]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.
[43]
Radu Stoica and Anastasia Ailamaki. 2013. Improving flash write performance by using update frequency. Proceedings of the VLDB Endowment 6, 9 (2013), 733--744.
[44]
Gongjin Sun and Sang-Woo Jun. 2020. ColumnBurst: a near-storage accelerator for memory-efficient database join queries. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems. ACM, USA, 9--16.
[45]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the Ninth European Conference on Computer Systems. ACM, USA, 1--14.
[46]
Qiuping Wang, Jinhong Li, Patrick PC Lee, Guangliang Zhao, Chao Shi, and Lilong Huang. 2021. In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage. arXiv e-prints 1, 1 (2021), arXiv-2104.
[47]
Shihao Wang, Dajiang Zhou, Xushen Han, and Takeshi Yoshimura. 2017. Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, IEEE, USA, 1032--1037.
[48]
Baoxun Xu, Xiufeng Guo, Yunming Ye, and Jiefeng Cheng. 2012. An Improved Random Forest Classifier for Text Categorization. JCP 7, 12 (2012), 2913--2920.
[49]
Gala Yadgar, MOSHE Gabel, Shehbaz Jaffer, and Bianca Schroeder. 2021. SSD-based Workload Characteristics and Their Performance Implications. ACM Transactions on Storage (TOS) 17, 1 (2021), 1--26.
[50]
Jingpei Yang, Rajinikanth Pandurangan, Changho Choi, and Vijay Balakrishnan. 2017. AutoStream: automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference. ACM, USA, 1--11.
[51]
Jing Yang, Shuyi Pei, and Qing Yang. 2019. WARCIP: Write amplification reduction by clustering I/O pages. In Proceedings of the 12th ACM International Conference on Systems and Storage. ACM, USA, 155--166.
[52]
Kielan Yarrow. 2010. Temporal dilation: the chronostasis illusion and spatial attention. Attention and time 1, 1 (2010), 163--176.
[53]
Zijun Zhang. 2018. Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, IEEE, USA, 1--2.

Cited By

View all
  • (2024)ZMSProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692003(173-189)Online publication date: 10-Jul-2024
  • (2024)FairyWRENProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691978(745-764)Online publication date: 10-Jul-2024
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SYSTOR '21: Proceedings of the 14th ACM International Conference on Systems and Storage
June 2021
226 pages
ISBN:9781450383981
DOI:10.1145/3456727
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • Technion: Israel Institute of Technology
  • USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SYSTOR '21
Sponsor:

Acceptance Rates

SYSTOR '21 Paper Acceptance Rate 18 of 63 submissions, 29%;
Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)447
  • Downloads (Last 6 weeks)62
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ZMSProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692003(173-189)Online publication date: 10-Jul-2024
  • (2024)FairyWRENProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691978(745-764)Online publication date: 10-Jul-2024
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • (2024)En4S: Enabling SLOs in Serverless Storage SystemsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698529(160-177)Online publication date: 20-Nov-2024
  • (2024)Extremely-Compressed SSDs with I/O Behavior PredictionACM Transactions on Storage10.1145/367704420:4(1-38)Online publication date: 16-Jul-2024
  • (2024)A Machine Learning-Empowered Cache Management Scheme for High-Performance SSDsIEEE Transactions on Computers10.1109/TC.2024.340406473:8(2066-2080)Online publication date: 1-Aug-2024
  • (2024)TAP: Transformers Driven Adaptive Prefetching for Hybrid Memory Systems2024 11th International Conference on Soft Computing & Machine Intelligence (ISCMI)10.1109/ISCMI63661.2024.10851689(139-146)Online publication date: 22-Nov-2024
  • (2024)MemFlex: A Hybrid Memory System to Boost Cost of Ownership in Data Centers2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00267(1693-1698)Online publication date: 2-Jul-2024
  • (2023)Enabling Multi-tenancy on SSDs with Accurate IO Interference ModelingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624657(216-232)Online publication date: 30-Oct-2023
  • (2023)Offline and Online Algorithms for SSD ManagementCommunications of the ACM10.1145/359620566:7(129-137)Online publication date: 22-Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media