Abstract
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. Data mining is widely used and has significant applications in various domains. However, current data mining toolkits cannot meet the requirement of applications with large-scale databases in terms of speed. In this paper, we propose three techniques to speedup fundamental problems in data mining algorithms on the CUDA platform: scalable thread scheduling scheme for irregular pattern, parallel distributed top-k scheme, and parallel high dimension reduction scheme. They play a key role in our CUDA-based implementation of three representative data mining algorithms, CU-Apriori, CU-KNN, and CU-K-means. These parallel implementations outperform the other state-of-the-art implementations significantly on a HP xw8600 workstation with a Tesla C1060 GPU and a Core-quad Intel Xeon CPU. Our results have shown that GPU + CUDA parallel architecture is feasible and promising for data mining applications.
Similar content being viewed by others
References
Kamber M, Han J (2005) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Mateo
Peng Y, Kou G, Shi Y, Chen ZX (2008) A descriptive framework for the field of data mining and knowledge discovery. Int J Inf Technol Decis Mak 7(4):639–682
Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
Zhou L, Lai KK, Yen J (2009) Credit scoring models with AUC maximization based on the weighted SVM. Int J Inf Technol Decis Mak 8(4):677–696
Zhang Q, Segal RS (2008) Web mining: a survey of current research, techniques, and software. Int J Inf Technol Decis Mak 7(4):683–720
Zaki MJ (1999) Parallel and distributed association mining: a survey. IEEE Concurr 7(4):4–25, Special issue on Parallel Mechanisms for Data Mining
Srivastava A, Han E, Kumar V, Singh V (1999) Parallel formulation of decision-tree classification algorithms. Data Min Knowl Discov 3(3):237–261
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5(4):659–670
Liu Y, Pisharath J, Liao WK, Memik G, Choudhary A, Dubey P (2004) Performance evaluation and characterization of scalable data mining algorithms. In: 16th IASTED international conference on parallel and distributed computing and systems (PDCS). MIT, Cambridge, pp 620–625
Dehuri S, Mall R (2009) Parallel processing of olap queries using a cluster of workstations. Int J Inf Technol Decis Mak 6(2):279–299
Ergu D, Kou G, Peng Y, Shi Y, Shi Y (2011) The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment. J Supercomput. doi:10.1007/s11227-011-0625-1
NVIDIA (2008) CUDA programming guide 2.1. http://www.nvidia.com/object/cuda_develop.html
Tesla (2009) C1060 computing processor. http://www.nvidia.com/object/product_tesla_c1060_us.html
Balevic A, Rockstroh L, Li W et al (2008) Acceleration of a Finite-Difference Time-Domain method with general purpose GPUs (GPGPUs). In: Proc of international conference on computer and information technology, vol 1–2, pp 291–294
Cohen JM, Molemaker MJ (2009) A fast double precision CFD code using CUDA. In: 21st International conference on parallel computational fluid dynamics
Jeong WK, Fletcher PT, Tao R et al (2007) Interactive visualization of volumetric white matter connectivity. IEEE Trans Vis Comput Graph 3(6):1480–1487
Kavinguy B (2008) A neural network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx
Catanzaro B, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. In: ICML ’08: proceedings of the 25th international conference on machine learning, pp 104–111
Vasiliadis G, Antonatos S, Polychronakis M et al (2008) Gnort: high performance network intrusion detection using graphics processors. Recent Adv Intrusion Detect 5230:116–134
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of international conference on very large data bases, pp 487–499
Fix E, Hodges JL (1951) Discriminatory analysis, non-parametric discrimination: consistency properties. Technical Report 21-49-004(4), USAF School of Aviation Medicine, Randolph Field, Texas
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137 (Original version: Technical Report, Bell Labs, 1957)
Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: IEEE conference on computer vision and patter recognition workshops, vols 1–3, pp 1107–1112
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc of international conference on management of data, pp 1–12
Zaki MJ, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared-memory multi-processors. In: Proc of supercomputing, p 43
Agrawal R, Shafer C (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Han EH, Karypis G, Kumar V (2000) Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3):337–352
Cheung DW, Xiao YQ (1999) Effect of data distribution in parallel mining of associations. Data Min Knowl Discov 3(3):291–314
Holt JD, Chung SM (2007) Parallel mining of association rules from text databases. J Supercomput 39(3):273–299
Shafer J, Agrawal R, Mehta M (1996) SPRINT: a scalable parallel classifier for data mining. In: Proc of international conference on very large data bases, pp 544–555
Zaki MJ, Ho CT, Agrawal R (1999) Scalable parallel classification for data mining on shared-memory multiprocessors. In: IEEE international conference on data engineering, pp 198–205
Joshi MV, Karypis G, Kumar V (1998) ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc of international parallel processing symposium, pp 573–579
Nagesh HS, Choudhary A, Goil S (2000) A scalable parallel subspace clustering algorithm for massive data sets. In: Proc of international conference on parallel processing, pp 477–484
Forman G, Zhang B (2000) Linear speed-up for a parallel non-approximate recasting of center-based clustering algorithms, including K-Means, K-Harmonic Means, and EM. In: Proc ACM SIGKDD workshop on distributed and parallel knowledge discovery (KDD’00), Boston, MA
Sibson R (1973) SLINK: An optimally efficient algorithm for the single link cluster method. Comput J 16(1):30–34
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Fang WB, Lau KK, Lu M, Xiao XY et al (2008) Parallel data mining on graphics processors. Technical Report HKUST-CS08-07. http://code.google.com/p/gpuminer/
Che S, Boyer M, Meng JY et al (2008) A performance study of general purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Wu R, Zhang B, Hsu MC (2009) Clustering billions of data points using GPUs. In: UCHPC-MAW’09, pp 1–5
CUDA SDK 3.2 (2010) http://developer.nvidia.com/object/cuda_3_2_downloads.html
IBM synthetic data generator (2011) http://www.cs.loyola.edu/~cgiannel/assoc_gen.html
The Linux Kernel Archives (2007) http://www.kernel.org/1480-1487
KDD Cup 2004 Data (2011) http://kodiak.cs.cornell.edu/kddcup/datasets.html
KDD Cup 1999 Data (2011) http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jian, L., Wang, C., Liu, Y. et al. Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA). J Supercomput 64, 942–967 (2013). https://doi.org/10.1007/s11227-011-0672-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0672-7