research-article

Collaborative processing of data-intensive algorithms with CPU, intelligent SSD, and GPU

Authors:

Hyunok OhAuthors Info & Claims

SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing

Pages 1865 - 1870

https://doi.org/10.1145/2851613.2851741

Published: 04 April 2016 Publication History

Abstract

The graphic processing unit (GPU) is a computing resource to process graphics-related applications. The intelligent SSD (iSSD) is a solid state device (SSD) that is provided with data processing power. These days, CPU, GPU, and SSD are equipped together in most processing environment. If SSD is replaced with iSSD later on, we have a new processing environment where three computing resources collaborate one another to process a huge volume of data (so called big data) quite effectively. In this paper, we address how to exploit all these computing resources for efficient processing of data-intensive algorithms.Through extensive experiment, we verify the effectiveness and potential of the proposed collaborative processing environment by processing data concurrently with multiple computing resources. The results reveal that processing in the our environment outperforms that in the traditional one by up to 3.5 times.

References

[1]

D. Bae et al., "Intelligent SSD: A Turbo for Big Data Mining," In Proc. of ACM Int'l Conf. on Information and Knowledge Management, ACM CIKM, pp. 1553--1556, 2013.

Digital Library

[2]

N. Gov et al., "GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management," In Proc. ACM Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 325--336, 2006.

Digital Library

[3]

J. Fung and S. Mann, "Using Graphics Devices in Reverse: GPU-based Image Processing and Computer Vision," In Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 9--12, 2008.

[4]

S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA," In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP, pp. 73--82, 2008

Digital Library

[5]

S. Kim et al., "Fast, Energy Efficient Scan inside Flash Memory SSDs," In Proc. Int'l Workshop on Accelerating Data Management Systems using Modern Processor and Storage Architectures, ADMS, 2011.

[6]

Y. Jo et al., "On Running Data-Intensive Algorithms with Intelligent SSD and Host CPU: A Collaborative Approach," In Proc. Int'l Conf. on ACM/SIGAPP Symposium On Applied Computing, ACM SAC, pp. 2060--2065, 2015.

Digital Library

[7]

S. Pabst, A. Koch, and W. Straber, "Fast and Scalable CPU=GPU Collision Detection for Rigid and Deformable Surfaces," Computer Graphics Forum, Vol. 29, No. 5, pp. 1605--1612, 2010.

[8]

H. Oh and S. Ha, "A Static Scheduling Heuristic for Heterogeneous Processors," In Proc. Int'l Conf. Euro-Par Parallel Processing, pp. 573--577, 1996.

Digital Library

[9]

H. Topcuoglu, S. Hariri and M. Wu, "Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing," IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 3, pp. 260--274. 2002.

Digital Library

[10]

G. Sih and E. Lee, "A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures," IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 2, pp. 175--187. 1993.

Digital Library

[11]

Y. Kwok and I. Ahmad, "Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors," ACM Computing Surveys, Vol. 31, No. 4, 1999.

Digital Library

[12]

N. Bell and M. Garland, Efficient Sparse Matrix-Vector Multiplication on CUDA, NVIDIA Technical Report, NVIDIA Corporation, 2008.

[13]

E. Lee and D. Messerschmitt, "Synchronous data flow," Proceedings of the IEEE, Vol. 75, No. 9, pp. 1235--1245, 1987.

[14]

J. MacQueen et al., "Some Methods for Classification and Analysis of Multivariate Observations," In Proc. of Berkeley Symp. on Mathematical Statistics and Probability, pp. 281--297, 1967.

[15]

L. Page et al., The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford University, 1999.

[16]

G. Jeh and J. Widom, "SimRank: a measure of structural-context similarity," In Proc. of ACM Int'l. Conf. on Knowledge discovery and data mining, ACM SIGKDD, pp. 538--543, 2002.

Digital Library

[17]

Intel, Intel VTune Amplifier, https://software.intel.com/en-us/node/529213, 2014.

Cited By

Yavits LKaplan RGinosar R(2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TPDS.2021.3065448
Torabzadehkashi MRezaei SHeydariGorji ABobarshad HAlves VBagherzadeh N(2019)Computational storage: an efficient and scalable platform for big data and HPC applicationsJournal of Big Data10.1186/s40537-019-0265-56:1Online publication date: 15-Nov-2019
https://doi.org/10.1186/s40537-019-0265-5
Kaplan RYavits LGinosar R(2018)PRINS: Processing-in-Storage Acceleration of Machine LearningIEEE Transactions on Nanotechnology10.1109/TNANO.2018.279987217:5(889-896)Online publication date: Sep-2018
https://doi.org/10.1109/TNANO.2018.2799872
Show More Cited By

Index Terms

Collaborative processing of data-intensive algorithms with CPU, intelligent SSD, and GPU
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation evaluation

Recommendations

On running data-intensive algorithms with intelligent SSD and host CPU: a collaborative approach
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

A solid state device (SSD), which has the characteristics such as high IO bandwidth and low access latency, is drawing attention as a next-generation storage device. Even though SSD provides a high internal bandwidth, the performance bottleneck exists ...
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is ...
Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC
IA³ '14: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms

In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing

April 2016

2360 pages

ISBN:9781450337397

DOI:10.1145/2851613

Conference Chair:
Sascha Ossowski
University Rey Juan Carlos, Spain

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Samsung
National Research Foundation of Korea

Conference

SAC 2016

Sponsor:

SIGAPP

SAC 2016: Symposium on Applied Computing

April 4 - 8, 2016

Pisa, Italy

Acceptance Rates

SAC '16 Paper Acceptance Rate 252 of 1,047 submissions, 24%;

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
229
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yavits LKaplan RGinosar R(2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1109/TPDS.2021.3065448
Torabzadehkashi MRezaei SHeydariGorji ABobarshad HAlves VBagherzadeh N(2019)Computational storage: an efficient and scalable platform for big data and HPC applicationsJournal of Big Data10.1186/s40537-019-0265-56:1Online publication date: 15-Nov-2019
https://doi.org/10.1186/s40537-019-0265-5
Kaplan RYavits LGinosar R(2018)PRINS: Processing-in-Storage Acceleration of Machine LearningIEEE Transactions on Nanotechnology10.1109/TNANO.2018.279987217:5(889-896)Online publication date: Sep-2018
https://doi.org/10.1109/TNANO.2018.2799872
Kaplan Yavits Ginosar (2017)From Processing-in-Memory to Processing-in-StorageSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1703074:3(99-116)Online publication date: 15-Sep-2017
https://dl.acm.org/doi/10.14529/jsfi170307

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents