research-article

GPUstore: harnessing GPU computing for storage systems in the OS kernel

Authors:

Matthew L. CurryAuthors Info & Claims

SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

Article No.: 9, Pages 1 - 12

https://doi.org/10.1145/2367589.2367595

Published: 04 June 2012 Publication History

Abstract

Many storage systems include computationally expensive components. Examples include encryption for confidentiality, checksums for integrity, and error correcting codes for reliability. As storage systems become larger, faster, and serve more clients, the demands placed on their computational components increase and they can become performance bottlenecks. Many of these computational tasks are inherently parallel: they can be run independently for different blocks, files, or I/O requests. This makes them a good fit for GPUs, a class of processor designed specifically for high degrees of parallelism: consumer-grade GPUs have hundreds of cores and are capable of running hundreds of thousands of concurrent threads. However, because the software frameworks built for GPUs have been designed primarily for the long-running, data-intensive workloads seen in graphics or high-performance computing, they are not well-suited to the needs of storage systems.

In this paper, we present GPUstore, a framework for integrating GPU computing into storage systems. GPUstore is designed to match the programming models already used these systems. We have prototyped GPUstore in the Linux kernel and demonstrate its use in three storage subsystems: file-level encryption, block-level encryption, and RAID 6 data recovery. Comparing our GPU-accelerated drivers with the mature CPU-based implementations in the Linux kernel, we show performance improvements of up to an order of magnitude.

References

[1]

R. Bhaskar, P. K. Dubey, V. Kumar, and A. Rudra. Efficient Galois field arithmetic on SIMD architectures. In Proceedings of the Symposium on Parallel Algorithms and Architectures, 2003.

Digital Library

[2]

P. Bhatotia, R. Rodrigues, and A. Verma. Shredder: GPU-accelerated incremental storage and computation. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST), 2012.

Digital Library

[3]

M. Blaum, J. Brady, J. Bruck, and J. Menon. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA), 1994.

Digital Library

[4]

M. Blaze. A cryptographic file system for UNIX. In Proceedings of the 1st ACM Conference on Computer and Communications Security, 1993.

Digital Library

[5]

A. Brinkmann and D. Eschweiler. A microdriver architecture for error correcting codes inside the Linux kernel. In Proccedings of the SC09 Conference, 2009.

Digital Library

[6]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. In Proceedings of the ACM SIGGRAPH Annual Conference, 2004.

Digital Library

[7]

P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: High-performance, reliable secondary storage. ACM Computing Surveys, 26(2): 145--185, 1994.

Digital Library

[8]

P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST), 2004.

Digital Library

[9]

M. L. Curry, H. L. Ward, A. Skjellum, and R. Brightwell. A lightweight, GPU-based software RAID system. In International Confernece on Parallel Processing (ICPP).

Digital Library

[10]

M. L. Curry, A. Skjellum, H. L. Ward, and R. Brightwell. Gibraltar: A Reed-Solomon coding library for storage applications on programmable graphics processors. Concurrency and Computation: Practice and Experience, 2010.

Digital Library

[11]

FastestSSD.com. SSD ranking: The fastest solid state drives, Apr. 2012. http://www.fastestssd.com/featured/ssd-rankings-the-fastest-solid-state-drives/#pcie; accessed April 27, 2012.

[12]

I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W. Hwu. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010.

Digital Library

[13]

A. Gharaibeh, S. Al-Kiswany, S. Gopalakrishnan, and M. Ripeanu. A GPU accelerated storage system. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC), 2010.

Digital Library

[14]

S. Han, K. Jang, K. Park, and S. Moon. PacketShader: a GPU-accelerated software router. In Proceedings of the ACM SIGCOMM Conference, 2010.

Digital Library

[15]

T. D. Han and T. S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU), 2009.

Digital Library

[16]

O. Harrison and J. Waldron. Practical symmetric key cryptography on modern graphics hardware. In Proceedings of the 17th USENIX Security Symposium, 2008.

Digital Library

[17]

O. Harrison and J. Waldron. GPU accelerated cryptography as an OS service. In Transactions on Computational Science XI. Springer-Verlag, 2010.

Digital Library

[18]

A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2011.

Digital Library

[19]

K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: cheap SSL acceleration with commodity processors. In Proceedings of the 8th USENIX conference on Networked Systems Design and Implementation (NSDI), 2011.

Digital Library

[20]

A. Kashyap and A. Kashyap. File system extensibility and reliability using an in-kernel database. Technical report, Stony Brook University, 2004.

[21]

S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In Proceedings of the USENIX Annual Technical Conference (ATC), June 2012.

Digital Library

[22]

Khronos Group. OpenCL Specification 1.1. http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf.

[23]

P. Nath, B. Urgaonkar, and A. Sivasubramaniam. Evaluating the usefulness of content addressable storage for high-performance data intensive applications. In Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC), 2008.

Digital Library

[24]

NVIDIA Inc. CUDA C Programming Guide 4.0.

[25]

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1): 80--113, 2007.

[26]

V. S. Pai, P. Druschel, and W. Zwaenepoel. IO-Lite: a unified I/O buffering and caching system. ACM Transactions on Computer Systems, 18: 37--66, February 2000.

Digital Library

[27]

H. Pang, K.-L. Tan, and X. Zhou. Stegfs: A steganographic file system. International Conference on Data Engineering, 2003.

[28]

S. Patil, G. Sivathanu, and E. Zadok. I3fs: An in-kernel integrity checker and intrusion detection file system. In Proceedings of the 18th Annual Large Installation System Administration Conference (LISA), 2004.

Digital Library

[29]

T. Prabhu, S. Ramalingam, M. Might, and M. Hall. EigenCFA: accelerating flow analysis with GPUs. In Proceedings of the 38th ACM Symposium on Principles of Programming Languages (POPL), 2011.

Digital Library

[30]

S. Quinlan and S. Dorward. Venti: A new approach to archival data storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST), 2002.

Digital Library

[31]

I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8(2): 300--304, 1960.

[32]

C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Oct. 2011.

Digital Library

[33]

J. Schindler, S. Shete, and K. A. Smith. Improving throughput for small disk requests with proximal I/O. In Proceedings of the 9th USENIX conference on File and Stroage Technologies (FAST), 2011.

Digital Library

[34]

S. Ueng, M. Lathara, S. S. Baghsorkhi, and W. Hwu. CUDA-Lite: Reducing GPU programming complexity. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, 2008.

Digital Library

[35]

C. Ungureanu, B. Atkin, A. Aranya, S. Gokhale, S. Rago, G. Calkowski, C. Dubnicki, and A. Bohra. HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system. In Proceedings of the 8th USENIX conference on File and Storage Technologies (FAST), 2010.

Digital Library

[36]

G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID), 2008.

Digital Library

[37]

S. Watanabe. Solaris 10 ZFS Essentials. Prentice Hall, 2009.

Digital Library

[38]

Y. Weinsberg, D. Dolev, T. Anker, M. Ben-Yehuda, and P. Wyckoff. Tapping into the fountain of CPUs: on operating system support for programmable devices. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008.

Digital Library

[39]

E. Zadok, I. Badulescu, and A. Shender. Cryptfs: A stackable vnode level encryption file system. Technical Report CUCS-021-98, Computer Science, Columbia University, 1998.

Cited By

Fingler HTarte IYu HSzekely AHu BAkella ARossbach CAamodt TJerger NSwift M(2023)Towards a Machine Learning-Assisted Kernel with LAKEProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575697(846-861)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575697
Eduardo VDe Bona LZola WMerchant AWeatherspoon H(2019)Speculative encryption on GPU applied to cryptographic file systemsProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323307(93-105)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323307
Arango CDernat RSanabria J(2019)Performance evaluation of container-based virtualization for high performance computing environmentsRevista UIS Ingenierías10.18273/revuin.v18n4-201900318:4(31-42)Online publication date: 16-Jul-2019
https://doi.org/10.18273/revuin.v18n4-2019003
Show More Cited By

GPUstore: harnessing GPU computing for storage systems in the OS kernel

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

June 2012

183 pages

ISBN:9781450314480

DOI:10.1145/2367589

General Chair:
Michael Vinov
IBM Haifa
,
Program Chairs:
Dan Tsafrir
Technion
,
Erez Zadok
Stony Brook University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

The Technion - Israel Institute of Techn.: The Technion - Israel Institute of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SYSTOR '12

Sponsor:

The Technion - Israel Institute of Techn.

SYSTOR '12: The 5th Annual International Systems and Storage Conference

June 4 - 6, 2012

Haifa, Israel

Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
465
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fingler HTarte IYu HSzekely AHu BAkella ARossbach CAamodt TJerger NSwift M(2023)Towards a Machine Learning-Assisted Kernel with LAKEProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575697(846-861)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575697
Eduardo VDe Bona LZola WMerchant AWeatherspoon H(2019)Speculative encryption on GPU applied to cryptographic file systemsProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323307(93-105)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323307
Arango CDernat RSanabria J(2019)Performance evaluation of container-based virtualization for high performance computing environmentsRevista UIS Ingenierías10.18273/revuin.v18n4-201900318:4(31-42)Online publication date: 16-Jul-2019
https://doi.org/10.18273/revuin.v18n4-2019003
Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
SUZUKI YYAMADA HKATO SKONO K(2018)Cooperative GPGPU Scheduling for Consolidating Server WorkloadsIEICE Transactions on Information and Systems10.1587/transinf.2018EDP7027E101.D:12(3019-3037)Online publication date: 1-Dec-2018
https://doi.org/10.1587/transinf.2018EDP7027
SHIOMOTO K(2018)Research Challenges for Network Function Virtualization - Re-Architecting Middlebox for High Performance and Efficient, Elastic and Resilient Platform to Create New Services -IEICE Transactions on Communications10.1587/transcom.2017EBI0001E101.B:1(96-122)Online publication date: 2018
https://doi.org/10.1587/transcom.2017EBI0001
Liu CWang QChu XLeung Y(2018)G-CRS: GPU Accelerated Cauchy Reed-Solomon CodingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.279143829:7(1484-1498)Online publication date: 1-Jul-2018
https://doi.org/10.1109/TPDS.2018.2791438
Yuhara SSuzuki YKono K(2018)An Application Framework for Migrating GPGPU Cloud Applications2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom2018.2018.00026(62-66)Online publication date: Dec-2018
https://doi.org/10.1109/CloudCom2018.2018.00026
Song TPirahandeh MAhn CKim D(2018)GPU-accelerated high-performance encoding and decoding of hierarchical RAID in virtual machinesThe Journal of Supercomputing10.1007/s11227-017-1969-y74:11(5865-5888)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1007/s11227-017-1969-y
Garg AMishra DKulkarni P(2017)CatalystACM SIGPLAN Notices10.1145/3140607.305076052:7(44-59)Online publication date: 8-Apr-2017
https://dl.acm.org/doi/10.1145/3140607.3050760
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten