Relational Learning with GPUs: Accelerating Rule Coverage

Martínez-Angeles, Carlos Alberto; Wu, Haicheng; Dutra, Inês; Costa, Vítor Santos; Buenabad-Chávez, Jorge

doi:10.1007/s10766-015-0364-7

Relational Learning with GPUs: Accelerating Rule Coverage

Published: 28 March 2015

Volume 44, pages 663–685, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Carlos Alberto Martínez-Angeles¹,
Haicheng Wu³,
Inês Dutra²,
Vítor Santos Costa² &
…
Jorge Buenabad-Chávez¹

329 Accesses
Explore all metrics

Abstract

Relational learning algorithms mine complex databases for interesting patterns. Usually, the search space of patterns grows very quickly with the increase in data size, making it impractical to solve important problems. In this work we present the design of a relational learning system, that takes advantage of graphics processing units (GPUs) to perform the most time consuming function of the learner, rule coverage. To evaluate performance, we use four applications: a widely used relational learning benchmark for predicting carcinogenesis in rodents, an application in chemo-informatics, an application in opinion mining, and an application in mining health record data. We compare results using a single and multiple CPUs in a multicore host and using the GPU version. Results show that the GPU version of the learner is up to eight times faster than the best CPU version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-Accelerated Evolutionary Induction of Regression Trees

Learning Models over Relational Data: A Brief Tutorial

Parallel frequent itemsets mining using distributed graphic processing units

Article 30 May 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

CUDA is NVIDIA’s General-Purpose Parallel Computing Platform and Programming Model [10].
The Aleph modified version is available upon request to the authors.

References

Afrati, F.N., Borkar, V., Carey, M., Polyzotis, N., Ullman, J.D.: Cluster computing, recursion and datalog. In: Proceedings of the First International Conference on Datalog Reloaded, Datalog’10, pp. 120–144. Springer, Berlin (2011)
Beeri, C., Ramakrishnan, R.: On the power of magic. J. Log. Program. 10(3&4), 255–299 (1991)
Article MathSciNet MATH Google Scholar
Bekkerman, R., Bilenko, M., Langford, J. (eds.): Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Google Scholar
Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. 38(1) (2006). doi:10.1145/1132952.1132954
Collins, J.M.: The DTP AIDS antiviral screen program (1999). http://dtp.nci.nih.gov/docs/aids/aidsdata.html
Côrte-Real, J., Dutra, I., Rocha, R.: A map-reduce constructor for prolog. In: Proceedings of the International Conference on Principles and Practice of Declarative Programming (PPDP) (2013)
Costa, V.S., Sagonas, K., Lopes, R.: Demand-driven indexing of prolog clauses. In: Veronica D., Ilkka N. (eds.) Proceedings of the 23rd International Conference on Logic Programming, volume 4670 of Lecture Notes in Computer Science, pp. 305–409. Springer (2007)
Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Van Laer, W.: Query transformations for improving the efficiency of ilp systems. J. Mach. Learn. Res. 4, 465–491 (2003)
MATH Google Scholar
Costa, V.S., Rocha, R., Damas, L.: The yap prolog system. Theory Pract. Log. Program. 12(1–2), 5–34 (2012)
Article MathSciNet MATH Google Scholar
CUDA C programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Dastgeer, U., Li, L., Kessler, C.: Smart containers and skeleton programming for GPU-based systems. In: Proceedings 7th International Symposium on High-Level Parallel Programming and Applications (HLPP’14), Amsterdam (2014)
De Raedt, L.: Logical and Relational Learning. Springer, Berlin (2008)
Book MATH Google Scholar
Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: In Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pp. 112–117 (1995)
Diamos, G., Wu, H., Lele, A., Wang, J., Yalamanchili, S.: Efficient relational algebra algorithms and data structures for GPU. Technical report, Georgia Institute of Technology (2012)
Diamos, G., Wu, H., Wang, J., Lele, A., Yalamanchili, S.: Relational algorithms for multi-bulk-synchronous processors. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, New York, NY, USA, pp. 301–302. ACM (2013)
Fonseca, N.A., Srinivasan, A., Silva, F.M.A., Camacho, R.: Parallel ILP for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)
Article Google Scholar
Gavanelli, M., Riguzzi, F., Milano, M., Cagnoli, P.: Constraint and optimization techniques for supporting policy making. In: Yu, T., Chawla, N., Simoff, S. (eds) Computational Intelligent Data Analysis for Sustainable Development, Data Mining and Knowledge Discovery Series, chap. 12, pp. 361–382. Chapman & Hall/CRC, Abingdon (2013)
Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: a tutorial. In: Proceedings of the Second International Conference on Datalog in Academia and Industry, Datalog 2.0’12, pp. 1–8. Springer, Berlin (2012)
Green, O., McColl, R., Bader, D.A.: GPU merge path: a GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS ’12, New York, NY, USA, pp. 331–340. ACM (2012)
He, B., Mian, L., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)
Article Google Scholar
Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: an interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, New York, NY, USA, pp. 1213–1216. ACM (2011)
Martínez-Angeles, C.A., Dutra, I., Costa, V.S., Buenabad-Chávez, J.: A datalog engine for GPUs. In: WFLP-2013: 22nd International Workshop on Functional and (Constraint) Logic Programming, Kiel, Germany, 11–13 Sept, pp. 239–253 (2013)
Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)
Article Google Scholar
Odeh, S., Green, O., Mwassi, Z., Shmueli, O., Birk, Y.: Merge path—parallel merging made simple. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 1611–1618 (2012)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2012)
Google Scholar
Red fox: a compilation environment for data warehousing. http://gpuocelot.gatech.edu/projects/red-fox-a-compilation-environment-for-data-warehousing/
Ryan, P.B., Schuemie, M.J.: Evaluating performance of risk identification methods through a large-scale simulation of observational data. Drug Saf. 36(1), 171–180 (2013)
Article Google Scholar
Sean Baxter: modern GPU library—tutorial. http://nvlabs.github.io/moderngpu/index.html (visited in Jan 2015) (2013)
Srinivasan, A.: The Aleph manual. University of Oxford, England (2001). http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Lavrac, N., Dszeroski, S. (eds.) Inductive Logic Programming, volume 1297 of Lecture Notes in Computer Science, pp. 273–287. Springer, Berlin (1997)
Google Scholar
Srinivasan, A., Faruquie, T.A., Joshi, S.: Data and task parallelism in ILP using MapReduce. Mach. Learn. 86(1), 141–168 (2012)
Article MathSciNet MATH Google Scholar
Taskar, B., Getoor, L.: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
MATH Google Scholar
Tekle, K.T., Liu, Y.A.: More efficient datalog queries: subsumptive tabling beats magic sets. In: SIGMOD Conference, pp. 661–672 (2011)
Thrust: a parallel template library. http://thrust.github.io/
TPC-H transaction processing performance council benchmark H. http://www.tpc.org/tpch/
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. I. Computer Science Press, Rockville (1988)
Google Scholar
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. II. Computer Science Press, Rockville (1989)
Google Scholar
Weislow, O.S., Kiser, R., Fine, D.L., Bader, J., Shoemaker, R.H., Boyd, M.R.: New soluble-formazan assay for hiv-1 cytopathic effects: application to high-flux screening of synthetic and natural products for aids-antiviral activity. J. Natl. Cancer Inst. 81(8), 577–586 (1989)
Article Google Scholar
Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, Washington, DC, USA, IEEE Computer Society, pp. 107–118 (2012)
Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: International Symposium on Code Generation and Optimization (CGO) (2014)
Wu, H., Diamos, G., Wang, J., Cadambi, S., Yalamanchili, S., Chakradhar, S.: Optimizing data warehousing applications for gpus using kernel fusion/fission. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 2433–2442 (2012)
Young, J., Wu, H., Yalamanchili, S.: Satisfying data-intensive queries using GPU clusters. In: 2012 SC Companion High Performance Computing, Networking, Storage and Analysis (SCC), pp. 1314–1314 (2012)

Download references

Acknowledgments

The authors gratefully acknowledge the comments from all reviewers, which highly improved the quality of this paper. We would also like to thank Martínez-Angeles’ M.Sc. and qualification committee members for their helpful comments.

Author information

Authors and Affiliations

Departamento de Computación, CINVESTAV-IPN, Av. Instituto Politécnico Nacional 2508, 07360, Mexico, DF, Mexico
Carlos Alberto Martínez-Angeles & Jorge Buenabad-Chávez
Departmento de Ciência de Computadores, CRACS INESC-TEC LA and Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Inês Dutra & Vítor Santos Costa
Georgia Institute of Technology, 266 Ferst Drive, Atlanta, GA, 30332, USA
Haicheng Wu

Authors

Carlos Alberto Martínez-Angeles
View author publications
You can also search for this author in PubMed Google Scholar
Haicheng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Inês Dutra
View author publications
You can also search for this author in PubMed Google Scholar
Vítor Santos Costa
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Buenabad-Chávez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inês Dutra.

Additional information

CMA was supported by the University of Porto, the Centre for Research and Postgraduate Studies of the National Polytechnic Institute (CINVESTAV-IPN) of Mexico, and the Council of Science and Technology (CONACyT) of Mexico. ICD and VSC were partially supported by: the European Regional Development Fund (ERDF), COMPETE Programme; the Portuguese Foundation for Science and Technology (FCT), projects ADE (PTDC/EIA-EIA/121686/2010 (FCOMP-01-0124-FEDER-020575)), and ABLe PTDC/EEI-SII/2094/2012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martínez-Angeles, C.A., Wu, H., Dutra, I. et al. Relational Learning with GPUs: Accelerating Rule Coverage. Int J Parallel Prog 44, 663–685 (2016). https://doi.org/10.1007/s10766-015-0364-7

Download citation

Received: 19 August 2014
Accepted: 10 March 2015
Published: 28 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10766-015-0364-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relational Learning with GPUs: Accelerating Rule Coverage

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Accelerated Evolutionary Induction of Regression Trees

Learning Models over Relational Data: A Brief Tutorial

Parallel frequent itemsets mining using distributed graphic processing units

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Relational Learning with GPUs: Accelerating Rule Coverage

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-Accelerated Evolutionary Induction of Regression Trees

Learning Models over Relational Data: A Brief Tutorial

Parallel frequent itemsets mining using distributed graphic processing units

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation