Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2928275.2928276acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article
Public Access

Supporting data-driven I/O on GPUs using GPUfs

Published: 06 June 2016 Publication History

Abstract

Using discrete GPUs for processing very large datasets is challenging, in particular when an algorithm exhibit unpredictable, data-driven access patterns. In this paper we investigate the utility of GPUfs, a library that provides direct access to files from GPU programs, to implement such algorithms. We analyze the system's bottlenecks, and suggest several modifications to the GPUfs design, including new concurrent hash table for the buffer cache and a highly parallel memory allocator. We also show that by implementing the workload in a warp-centric manner we can improve the performance even further. We evaluate our changes by implementing a real image processing application which creates collages from a dataset of 10 Million images. The enhanced GPUfs design improves the application performance by 5.6× on average over the original GPUfs, and outperforms both 12-core parallel CPU which uses the AVX instruction set, and a standard CUDA-based GPU implementation by up to 2.5× and 3× respectively, while significantly enhancing system programmability and simplifying the application design and implementation.

References

[1]
Aleksandar Stupar, Sebastian Michel and Ralf Schenkel. RankReduce--Processing K-Nearest Neighbor Queries on Top of MapReduce. In Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pages 13--18, 2010.
[2]
Antonio Torralba, Robert Fergus and William T Freeman. 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958--1970, 2008.
[3]
Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel and Dave Hitz. NFS Version 3: Design and Implementation. In USENIX Summer, pages 137--152. Boston, MA, 1994.
[4]
Christopher J Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray and Emmett Witchel. PTask: Operating System Abstractions to Manage GPUs as Compute Devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 233--248. ACM, 2011.
[5]
Chuck Pheatt. Intel® threading building blocks. Journal of Computing Sciences in Colleges, 23(4):298--298, 2008.
[6]
David B Kirk and Wen-mei W Hwu. Programming massively parallel processors: a hands-on approach. Newnes, 2012.
[7]
Feng Ji, Heshan Lin and Xiaosong Ma. RSVM: A Region-Based Software Virtual Memory for GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 269--278. IEEE, 2013.
[8]
Janghaeng Lee, Mehrzad Samadi, Scott Mahlke. VAST: The Illusion of a Large Memory Space for GPUs. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pages 443--454. ACM, 2014.
[9]
Jia Pan and Dinesh Manocha. Fast GPU-based Locality Sensitive Hashing For K-nearest Neighbor Computation. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 211--220. ACM, 2011.
[10]
Mark Silberstein, Bryan Ford, Idit Keidar and Emmett Witchel. GPUfs: Integrating a File System with GPUs. In ACM SIGARCH Computer Architecture News, volume 41, pages 485--498. ACM, 2013.
[11]
Mayur Datar, Nicole Immorlica, Piotr Indyk and Vahab S Mirrokni. Locality-Sensitive Hashing Scheme Based on P-Stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, pages 253--262. ACM, 2004.
[12]
Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri and Shihjong Kuo. Intel avx: New frontiers in performance improvements and energy efficiency. Intel white paper, 2008.
[13]
Pramod Bhatotia, Rodrigo Rodrigues and Akshat Verma. Shredder: GPU-Accelerated Incremental Storage and Computation. In FAST, page 14, 2012.
[14]
Saša Petrović, Miles Osborne and Victor Lavrenko. Streaming First Story Detection with Application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 181--189.
[15]
Steve Heller, Maurice Herlihy, Victor Luchangco, Mark Moir, William N Scherer III and Nir Shavit. A Lazy Concurrent List-Based Set Algorithm. In Principles of Distributed Systems, pages 3--16. Springer, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SYSTOR '16: Proceedings of the 9th ACM International on Systems and Storage Conference
June 2016
191 pages
ISBN:9781450343817
DOI:10.1145/2928275
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. File Systems
  2. GPGPUs
  3. Operating Systems

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SYSTOR '16
Sponsor:

Acceptance Rates

SYSTOR '16 Paper Acceptance Rate 16 of 49 submissions, 33%;
Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)6
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)GAIAProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358864(661-674)Online publication date: 10-Jul-2019
  • (2018)ActivePointersACM SIGOPS Operating Systems Review10.1145/3273982.327399052:1(84-95)Online publication date: 28-Aug-2018
  • (2016)ActivePointersACM SIGARCH Computer Architecture News10.1145/3007787.300120044:3(596-608)Online publication date: 18-Jun-2016
  • (2016)GPUrdmaProceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2931088.2931091(1-8)Online publication date: 1-Jun-2016
  • (2016)ActivePointersProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.58(596-608)Online publication date: 18-Jun-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media