Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1565694.1565702acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Frequent itemset mining on graphics processors

Published: 28 June 2009 Publication History

Abstract

We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-generation graphics processing units (GPUs). Our implementations take advantage of the GPU's massively multi-threaded SIMD (Single Instruction, Multiple Data) architecture. Both implementations employ a bitmap data structure to exploit the GPU's SIMD parallelism and to accelerate the frequency counting operation. One implementation runs entirely on the GPU and eliminates intermediate data transfer between the GPU memory and the CPU memory. The other implementation employs both the GPU and the CPU for processing. It represents itemsets in a trie, and uses the CPU for trie traversing and incremental maintenance. Our preliminary results show that both implementations achieve a speedup of up to two orders of magnitude over optimized CPU Apriori implementations on a PC with an NVIDIA GTX 280 GPU and a quad-core CPU.

References

[1]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. Integrating compression and execution in column-oriented database systems. SIGMOD, 2006.
[2]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. SIGMOD, 1993.
[3]
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. VLDB, 1994.
[4]
Lamine M. Aouad, Nhien-An Le-Khac, and Tahar M. Kechadi. Distributed frequent itemsets mining in heterogeneous platforms. Journal of Engineering, Computing and Architecture, 2007.
[5]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The parsec benchmark suite: Characterization and architectural implications. PACT, 2008.
[6]
Ferenc Bodon. A fast apriori implementation. FIMI, 2003.
[7]
Gregory Buehrer, Srinivasan Parthasarathy, Shirish Tatikonda, Tahsin Kurc, and Joel Saltz. Toward terabyte pattern mining: an architecture-conscious solution. PPoPP, 2007.
[8]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and Kevin Skadron. A performance study of general-purpose applications on graphics processors using cuda. Journal of parallel and Distributed Computing, 2008.
[9]
Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. A sampling-based framework for parallel data mining. PPoPP, 2005.
[10]
Mohammad El-Hajj and Osmar R. Zaiane. Parallel leap: Large-scale maximal pattern mining in a distributed environment. ICPADS, 2006.
[11]
Amol Ghoting, Gregory Buehrer, Srinivasan Parthasarathy, Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, and Pradeep Dubey. Cache-conscious frequent pattern mining on a modern processor. VLDB, 2005.
[12]
Bart Goethals and Mohammed Javeed Zaki. Advances in frequent itemset mining implementations: Introduction to fimi'03. FIMI, 2003.
[13]
Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. SIGMOD, 2006.
[14]
Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, and Dinesh Manocha. Fast computation of database operations using graphics processors. SIGMOD, 2004.
[15]
Naga K. Govindaraju, Nikunj Raghuvanshi, and Dinesh Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. SIGMOD, 2005.
[16]
Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004.
[17]
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. Mars: a mapreduce framework on graphics processors. PACT, 2008.
[18]
Bingsheng He, Naga K. Govindaraju, Qiong Luo, and Burton Smith. Efficient gather and scatter operations on graphics processors. Supercomputing, 2007.
[19]
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational joins on graphics processors. SIGMOD, 2008.
[20]
http://fimi.cs.helsinki.fi/. FIMI repository.
[21]
http://www.adrem.ua.ac.be/goethals/software/files/apriori.tgz. Apriori implementation from Bart Goethals.
[22]
Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 1999.
[23]
E. Scott Larsen and David McAllister. Fast matrix multiplies using graphics hardware. Supercomputing, 2001.
[24]
Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y. Chang. Pfp: Parallel fp-growth for query recommendation. ACM Recommender Systems, 2008.
[25]
Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang. Optimization of frequent itemset mining on multiple-core processor. VLDB, 2007.
[26]
John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krĺźger, Aaron E. Lefohn, and Timothy J. Purcell. A survey of general-purpose computation on graphics hardware. In Computer Graphics Forum, 2007.
[27]
Lance Parsons, Ehtesham Haque, and Huan Liu. Evaluating subspace clustering algorithms. SDM, 2004.
[28]
S. Parthasarathy, M. J. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared memory systems. In Knowledge and Information Systems, 2001.
[29]
Jayaprakash Pisharath, Ying Liu, Wei keng Liao, Alok Choudhary, Gokhan Memik, and Janaki Parhi. Nu-minebench 2.0. Technical report, Northwestern University, 2005.
[30]
Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens. Scan primitives for gpu computing. In Graphics Hardware, 2007.
[31]
Yanbin Ye and Chia-Chu Chiang. A parallel apriori algorithm for frequent itemsets mining. SERA, 2006.
[32]
Mohammed J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency, 1999.
[33]
Mohammed J Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New algorithms for fast discovery of association rules. KDD, 1997.
[34]
Jingren Zhou and Kenneth A. Ross. Implementing database operations using simd instructions. SIGMOD, 2002.

Cited By

View all
  • (2024)Improvement of Apriori Algorithm Using Parallelization Technique on Multi-CPU and GPU TopologyWireless Communications & Mobile Computing10.1155/2024/77169762024Online publication date: 1-Jan-2024
  • (2022)Swarm of Honey Bees for Association Rule Mining Using CUDAInternational Journal of Software Innovation10.4018/IJSI.29799610:1(1-27)Online publication date: 6-May-2022
  • (2022)SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDsIEEE Access10.1109/ACCESS.2022.317959210(62502-62519)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '09: Proceedings of the Fifth International Workshop on Data Management on New Hardware
June 2009
63 pages
ISBN:9781605587011
DOI:10.1145/1565694
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

DaMoN 2009
Sponsor:
DaMoN 2009: Data Management on New Hardware
June 28, 2009
Rhode Island, Providence

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Improvement of Apriori Algorithm Using Parallelization Technique on Multi-CPU and GPU TopologyWireless Communications & Mobile Computing10.1155/2024/77169762024Online publication date: 1-Jan-2024
  • (2022)Swarm of Honey Bees for Association Rule Mining Using CUDAInternational Journal of Software Innovation10.4018/IJSI.29799610:1(1-27)Online publication date: 6-May-2022
  • (2022)SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDsIEEE Access10.1109/ACCESS.2022.317959210(62502-62519)Online publication date: 2022
  • (2021)FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive ReviewACM Computing Surveys10.1145/347228954:9(1-35)Online publication date: 8-Oct-2021
  • (2021)Apriori-Roaring-Parallel: Frequent pattern mining based on compressed bitmaps with OpenMP2021 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC53001.2021.9631495(1-6)Online publication date: 5-Sep-2021
  • (2021)Association Rules MiningSN Computer Science10.1007/s42979-021-00819-x2:6Online publication date: 9-Sep-2021
  • (2021)PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patternsThe VLDB Journal10.1007/s00778-021-00687-031:2(253-286)Online publication date: 9-Aug-2021
  • (2020)DIFF: a relational interface for large-scale data explanationThe VLDB Journal10.1007/s00778-020-00633-6Online publication date: 30-Sep-2020
  • (2019)GPU-based swarm intelligence for Association Rule Mining in big databasesIntelligent Data Analysis10.3233/IDA-17378523:1(57-76)Online publication date: 20-Feb-2019
  • (2019)BSTCProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356169(1-30)Online publication date: 17-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media