Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Jian, Liheng; Wang, Cheng; Liu, Ying; Liang, Shenshen; Yi, Weidong; Shi, Yong

doi:10.1007/s11227-011-0672-7

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Published: 26 August 2011

Volume 64, pages 942–967, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Liheng Jian¹,
Cheng Wang²,
Ying Liu^1,3,
Shenshen Liang¹,
Weidong Yi¹ &
…
Yong Shi^3,4

2094 Accesses
Explore all metrics

Abstract

Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. Data mining is widely used and has significant applications in various domains. However, current data mining toolkits cannot meet the requirement of applications with large-scale databases in terms of speed. In this paper, we propose three techniques to speedup fundamental problems in data mining algorithms on the CUDA platform: scalable thread scheduling scheme for irregular pattern, parallel distributed top-k scheme, and parallel high dimension reduction scheme. They play a key role in our CUDA-based implementation of three representative data mining algorithms, CU-Apriori, CU-KNN, and CU-K-means. These parallel implementations outperform the other state-of-the-art implementations significantly on a HP xw8600 workstation with a Tesla C1060 GPU and a Core-quad Intel Xeon CPU. Our results have shown that GPU + CUDA parallel architecture is feasible and promising for data mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of data mining in Weka through multi-core and GPU acceleration: opportunities and pitfalls

Article 24 June 2015

Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

Article 01 July 2017

A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture

References

Kamber M, Han J (2005) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Mateo
Google Scholar
Peng Y, Kou G, Shi Y, Chen ZX (2008) A descriptive framework for the field of data mining and knowledge discovery. Int J Inf Technol Decis Mak 7(4):639–682
Article Google Scholar
Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
Google Scholar
Zhou L, Lai KK, Yen J (2009) Credit scoring models with AUC maximization based on the weighted SVM. Int J Inf Technol Decis Mak 8(4):677–696
Article MATH Google Scholar
Zhang Q, Segal RS (2008) Web mining: a survey of current research, techniques, and software. Int J Inf Technol Decis Mak 7(4):683–720
Article Google Scholar
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):4–25, Special issue on Parallel Mechanisms for Data Mining
Article Google Scholar
Srivastava A, Han E, Kumar V, Singh V (1999) Parallel formulation of decision-tree classification algorithms. Data Min Knowl Discov 3(3):237–261
Article Google Scholar
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5(4):659–670
Article Google Scholar
Liu Y, Pisharath J, Liao WK, Memik G, Choudhary A, Dubey P (2004) Performance evaluation and characterization of scalable data mining algorithms. In: 16th IASTED international conference on parallel and distributed computing and systems (PDCS). MIT, Cambridge, pp 620–625
Google Scholar
Dehuri S, Mall R (2009) Parallel processing of olap queries using a cluster of workstations. Int J Inf Technol Decis Mak 6(2):279–299
Article Google Scholar
Ergu D, Kou G, Peng Y, Shi Y, Shi Y (2011) The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment. J Supercomput. doi:10.1007/s11227-011-0625-1
MATH Google Scholar
NVIDIA (2008) CUDA programming guide 2.1. http://www.nvidia.com/object/cuda_develop.html
Tesla (2009) C1060 computing processor. http://www.nvidia.com/object/product_tesla_c1060_us.html
Balevic A, Rockstroh L, Li W et al (2008) Acceleration of a Finite-Difference Time-Domain method with general purpose GPUs (GPGPUs). In: Proc of international conference on computer and information technology, vol 1–2, pp 291–294
Google Scholar
Cohen JM, Molemaker MJ (2009) A fast double precision CFD code using CUDA. In: 21st International conference on parallel computational fluid dynamics
Google Scholar
Jeong WK, Fletcher PT, Tao R et al (2007) Interactive visualization of volumetric white matter connectivity. IEEE Trans Vis Comput Graph 3(6):1480–1487
Article Google Scholar
Kavinguy B (2008) A neural network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx
Catanzaro B, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. In: ICML ’08: proceedings of the 25th international conference on machine learning, pp 104–111
Chapter Google Scholar
Vasiliadis G, Antonatos S, Polychronakis M et al (2008) Gnort: high performance network intrusion detection using graphics processors. Recent Adv Intrusion Detect 5230:116–134
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of international conference on very large data bases, pp 487–499
Google Scholar
Fix E, Hodges JL (1951) Discriminatory analysis, non-parametric discrimination: consistency properties. Technical Report 21-49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137 (Original version: Technical Report, Bell Labs, 1957)
Article MathSciNet MATH Google Scholar
Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: IEEE conference on computer vision and patter recognition workshops, vols 1–3, pp 1107–1112
Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc of international conference on management of data, pp 1–12
Google Scholar
Zaki MJ, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared-memory multi-processors. In: Proc of supercomputing, p 43
Google Scholar
Agrawal R, Shafer C (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Article Google Scholar
Han EH, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352
Article Google Scholar
Cheung DW, Xiao YQ (1999) Effect of data distribution in parallel mining of associations. Data Min Knowl Discov 3(3):291–314
Article Google Scholar
Holt JD, Chung SM (2007) Parallel mining of association rules from text databases. J Supercomput 39(3):273–299
Article Google Scholar
Shafer J, Agrawal R, Mehta M (1996) SPRINT: a scalable parallel classifier for data mining. In: Proc of international conference on very large data bases, pp 544–555
Google Scholar
Zaki MJ, Ho CT, Agrawal R (1999) Scalable parallel classification for data mining on shared-memory multiprocessors. In: IEEE international conference on data engineering, pp 198–205
Google Scholar
Joshi MV, Karypis G, Kumar V (1998) ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc of international parallel processing symposium, pp 573–579
Google Scholar
Nagesh HS, Choudhary A, Goil S (2000) A scalable parallel subspace clustering algorithm for massive data sets. In: Proc of international conference on parallel processing, pp 477–484
Google Scholar
Forman G, Zhang B (2000) Linear speed-up for a parallel non-approximate recasting of center-based clustering algorithms, including K-Means, K-Harmonic Means, and EM. In: Proc ACM SIGKDD workshop on distributed and parallel knowledge discovery (KDD’00), Boston, MA
Google Scholar
Sibson R (1973) SLINK: An optimally efficient algorithm for the single link cluster method. Comput J 16(1):30–34
Article MathSciNet Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article Google Scholar
Fang WB, Lau KK, Lu M, Xiao XY et al (2008) Parallel data mining on graphics processors. Technical Report HKUST-CS08-07. http://code.google.com/p/gpuminer/
Che S, Boyer M, Meng JY et al (2008) A performance study of general purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
Wu R, Zhang B, Hsu MC (2009) Clustering billions of data points using GPUs. In: UCHPC-MAW’09, pp 1–5
Chapter Google Scholar
CUDA SDK 3.2 (2010) http://developer.nvidia.com/object/cuda_3_2_downloads.html
IBM synthetic data generator (2011) http://www.cs.loyola.edu/~cgiannel/assoc_gen.html
The Linux Kernel Archives (2007) http://www.kernel.org/1480-1487
KDD Cup 2004 Data (2011) http://kodiak.cs.cornell.edu/kddcup/datasets.html
KDD Cup 1999 Data (2011) http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Graduate University of Chinese Academy of Sciences, Beijing, China
Liheng Jian, Ying Liu, Shenshen Liang & Weidong Yi
Agilent Technologies Co. Ltd., Beijing, China
Cheng Wang
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, China
Ying Liu & Yong Shi
University of Nebraska at Omaha, Omaha, USA
Yong Shi

Authors

Liheng Jian
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shenshen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jian, L., Wang, C., Liu, Y. et al. Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA). J Supercomput 64, 942–967 (2013). https://doi.org/10.1007/s11227-011-0672-7

Download citation

Published: 26 August 2011
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11227-011-0672-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance improvement of data mining in Weka through multi-core and GPU acceleration: opportunities and pitfalls

Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance improvement of data mining in Weka through multi-core and GPU acceleration: opportunities and pitfalls

Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation