Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3267809.3275453acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
poster

SSD QoS Improvements through Machine Learning

Published: 11 October 2018 Publication History

Abstract

The recent deceleration of Moore's law bespeaks new approaches for optimization of resources. Machine learning has been applied to a wide variety of problems across multiple domains; however, the space of machine learning research for storage optimization is only lightly explored. In this paper, we focus on learning IO access patterns with the aim of improving the performance of flash based devices. Flash based storage devices provide orders of magnitude better performance than HDDs, but they suffer from high tail latencies due to garbage collection (GC) which causes variable IO latency. In flash devices, GC is the method of relocating existing data and deleting stale data, in order to create empty blocks for new incoming data. By learning the temporal trends of IO accesses, we built workload specific regression models for predicting the future time when the SSD will be in GC mode. We tested our models on synthetic traces (random read/write mix with fixed blocksize) generated by FIO workload generator. For the purpose of determining when the SSD is in GC mode, we track I/O completion times and classify completions that take more than 10 times the median completion value as representing those times when the SSD is in GC mode. Experiments run on the SSD models we tested reveal that a GC phase usually last 400 ms and it happens every 7000 ms on average. Results show that our workload specific models are accurate in predicting the time of next GC mode, achieving RMSE score of 10.61.
The performance of flash devices can be further improved via efficient prefetching by learning IO access patterns. We use long short-term memory (LSTM) recurrent neural network (RNN) architecture to learn spatial patterns from block level I/O traces from SNIA, in order to predict the LBA to be requested ahead of time to be put in primary memory. Preliminary results show that the neural network based prefetchers are quite efficient in predicting the next requested LBA, achieving upto 82.5% accuracy. Our LSTM models are also very effective in predicting future IO operations (read/write) achieving high (91.6%) accuracy. We used a four layered neural network architecture with an LSTM layer containing 512 neurons and three other fully connected layers containing 256, 64 and 1000 neurons respectively. Time series models such as LSTM are very efficient in learning local temporal trends in data, which is useful in learning storage IO patterns. The work opens up a new direction towards using time series neural network model-based prefetching, and can be applied to a variety of problems in storage systems.
Unsupervised machine learning techniques can be used to cluster the IO accesses and store offsets in different blocks based on access patterns. The strategy is to cluster the offsets and store data in different physical blocks based on the frequency of writes to those blocks. The separation of hot and cold data will minimize the write amplification associated GC and improve performance. Newly launched multi-stream SSD's provide a perfect opportunity to utilize the idea mentioned above to improve the quality of service.

References

[1]
Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 1999. Learning to forget: Continual prediction with LSTM. (1999).
[2]
Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning Memory Access Patterns. (2018). arXiv:arXiv preprint arXiv:1803.02329
[3]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[4]
Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. {n. d.}. Machine learning-based prefetch optimization for data center applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, New York, 56.
[5]
Kent Smith. 2011. Garbage collection. SandForce, Flash Memory Summit, Santa Clara, CA (2011), 1--9.

Cited By

View all
  • (2024)Improving Insurance Fraud Detection With Generated Data2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00321(2008-2013)Online publication date: 2-Jul-2024
  • (2024)MemFlex: A Hybrid Memory System to Boost Cost of Ownership in Data Centers2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00267(1693-1698)Online publication date: 2-Jul-2024
  • (2023)A Space-Efficient Fair Cache Scheme Based on Machine Learning for NVMe SSDsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322141034:1(383-399)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
October 2018
546 pages
ISBN:9781450360111
DOI:10.1145/3267809
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Check for updates

Author Tags

  1. ACM proceedings
  2. LATEX
  3. prefetching

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

SoCC '18
Sponsor:
SoCC '18: ACM Symposium on Cloud Computing
October 11 - 13, 2018
CA, Carlsbad, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving Insurance Fraud Detection With Generated Data2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00321(2008-2013)Online publication date: 2-Jul-2024
  • (2024)MemFlex: A Hybrid Memory System to Boost Cost of Ownership in Data Centers2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00267(1693-1698)Online publication date: 2-Jul-2024
  • (2023)A Space-Efficient Fair Cache Scheme Based on Machine Learning for NVMe SSDsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322141034:1(383-399)Online publication date: 1-Jan-2023
  • (2023)Towards data generation to alleviate privacy concerns for cybersecurity applications2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC57700.2023.00222(1447-1452)Online publication date: Jun-2023
  • (2021)Reducing write amplification in flash by death-time prediction of logical block addressesProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463784(1-12)Online publication date: 14-Jun-2021
  • (2021)Learning I/O Access Patterns to Improve Prefetching in SSDsMachine Learning and Knowledge Discovery in Databases: Applied Data Science Track10.1007/978-3-030-67667-4_26(427-443)Online publication date: 25-Feb-2021
  • (2020)MLCacheProceedings of the 39th International Conference on Computer-Aided Design10.1145/3400302.3415652(1-9)Online publication date: 2-Nov-2020
  • (2019)KleioProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3325398(37-48)Online publication date: 17-Jun-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media