Abstract
Relational learning algorithms mine complex databases for interesting patterns. Usually, the search space of patterns grows very quickly with the increase in data size, making it impractical to solve important problems. In this work we present the design of a relational learning system, that takes advantage of graphics processing units (GPUs) to perform the most time consuming function of the learner, rule coverage. To evaluate performance, we use four applications: a widely used relational learning benchmark for predicting carcinogenesis in rodents, an application in chemo-informatics, an application in opinion mining, and an application in mining health record data. We compare results using a single and multiple CPUs in a multicore host and using the GPU version. Results show that the GPU version of the learner is up to eight times faster than the best CPU version.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
CUDA is NVIDIA’s General-Purpose Parallel Computing Platform and Programming Model [10].
The Aleph modified version is available upon request to the authors.
References
Afrati, F.N., Borkar, V., Carey, M., Polyzotis, N., Ullman, J.D.: Cluster computing, recursion and datalog. In: Proceedings of the First International Conference on Datalog Reloaded, Datalog’10, pp. 120–144. Springer, Berlin (2011)
Beeri, C., Ramakrishnan, R.: On the power of magic. J. Log. Program. 10(3&4), 255–299 (1991)
Bekkerman, R., Bilenko, M., Langford, J. (eds.): Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. 38(1) (2006). doi:10.1145/1132952.1132954
Collins, J.M.: The DTP AIDS antiviral screen program (1999). http://dtp.nci.nih.gov/docs/aids/aidsdata.html
Côrte-Real, J., Dutra, I., Rocha, R.: A map-reduce constructor for prolog. In: Proceedings of the International Conference on Principles and Practice of Declarative Programming (PPDP) (2013)
Costa, V.S., Sagonas, K., Lopes, R.: Demand-driven indexing of prolog clauses. In: Veronica D., Ilkka N. (eds.) Proceedings of the 23rd International Conference on Logic Programming, volume 4670 of Lecture Notes in Computer Science, pp. 305–409. Springer (2007)
Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Van Laer, W.: Query transformations for improving the efficiency of ilp systems. J. Mach. Learn. Res. 4, 465–491 (2003)
Costa, V.S., Rocha, R., Damas, L.: The yap prolog system. Theory Pract. Log. Program. 12(1–2), 5–34 (2012)
CUDA C programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Dastgeer, U., Li, L., Kessler, C.: Smart containers and skeleton programming for GPU-based systems. In: Proceedings 7th International Symposium on High-Level Parallel Programming and Applications (HLPP’14), Amsterdam (2014)
De Raedt, L.: Logical and Relational Learning. Springer, Berlin (2008)
Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: In Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pp. 112–117 (1995)
Diamos, G., Wu, H., Lele, A., Wang, J., Yalamanchili, S.: Efficient relational algebra algorithms and data structures for GPU. Technical report, Georgia Institute of Technology (2012)
Diamos, G., Wu, H., Wang, J., Lele, A., Yalamanchili, S.: Relational algorithms for multi-bulk-synchronous processors. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, New York, NY, USA, pp. 301–302. ACM (2013)
Fonseca, N.A., Srinivasan, A., Silva, F.M.A., Camacho, R.: Parallel ILP for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)
Gavanelli, M., Riguzzi, F., Milano, M., Cagnoli, P.: Constraint and optimization techniques for supporting policy making. In: Yu, T., Chawla, N., Simoff, S. (eds) Computational Intelligent Data Analysis for Sustainable Development, Data Mining and Knowledge Discovery Series, chap. 12, pp. 361–382. Chapman & Hall/CRC, Abingdon (2013)
Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: a tutorial. In: Proceedings of the Second International Conference on Datalog in Academia and Industry, Datalog 2.0’12, pp. 1–8. Springer, Berlin (2012)
Green, O., McColl, R., Bader, D.A.: GPU merge path: a GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS ’12, New York, NY, USA, pp. 331–340. ACM (2012)
He, B., Mian, L., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)
Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: an interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, New York, NY, USA, pp. 1213–1216. ACM (2011)
Martínez-Angeles, C.A., Dutra, I., Costa, V.S., Buenabad-Chávez, J.: A datalog engine for GPUs. In: WFLP-2013: 22nd International Workshop on Functional and (Constraint) Logic Programming, Kiel, Germany, 11–13 Sept, pp. 239–253 (2013)
Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)
Odeh, S., Green, O., Mwassi, Z., Shmueli, O., Birk, Y.: Merge path—parallel merging made simple. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 1611–1618 (2012)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2012)
Red fox: a compilation environment for data warehousing. http://gpuocelot.gatech.edu/projects/red-fox-a-compilation-environment-for-data-warehousing/
Ryan, P.B., Schuemie, M.J.: Evaluating performance of risk identification methods through a large-scale simulation of observational data. Drug Saf. 36(1), 171–180 (2013)
Sean Baxter: modern GPU library—tutorial. http://nvlabs.github.io/moderngpu/index.html (visited in Jan 2015) (2013)
Srinivasan, A.: The Aleph manual. University of Oxford, England (2001). http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Lavrac, N., Dszeroski, S. (eds.) Inductive Logic Programming, volume 1297 of Lecture Notes in Computer Science, pp. 273–287. Springer, Berlin (1997)
Srinivasan, A., Faruquie, T.A., Joshi, S.: Data and task parallelism in ILP using MapReduce. Mach. Learn. 86(1), 141–168 (2012)
Taskar, B., Getoor, L.: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Tekle, K.T., Liu, Y.A.: More efficient datalog queries: subsumptive tabling beats magic sets. In: SIGMOD Conference, pp. 661–672 (2011)
Thrust: a parallel template library. http://thrust.github.io/
TPC-H transaction processing performance council benchmark H. http://www.tpc.org/tpch/
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. I. Computer Science Press, Rockville (1988)
Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. II. Computer Science Press, Rockville (1989)
Weislow, O.S., Kiser, R., Fine, D.L., Bader, J., Shoemaker, R.H., Boyd, M.R.: New soluble-formazan assay for hiv-1 cytopathic effects: application to high-flux screening of synthetic and natural products for aids-antiviral activity. J. Natl. Cancer Inst. 81(8), 577–586 (1989)
Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, Washington, DC, USA, IEEE Computer Society, pp. 107–118 (2012)
Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: International Symposium on Code Generation and Optimization (CGO) (2014)
Wu, H., Diamos, G., Wang, J., Cadambi, S., Yalamanchili, S., Chakradhar, S.: Optimizing data warehousing applications for gpus using kernel fusion/fission. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 2433–2442 (2012)
Young, J., Wu, H., Yalamanchili, S.: Satisfying data-intensive queries using GPU clusters. In: 2012 SC Companion High Performance Computing, Networking, Storage and Analysis (SCC), pp. 1314–1314 (2012)
Acknowledgments
The authors gratefully acknowledge the comments from all reviewers, which highly improved the quality of this paper. We would also like to thank Martínez-Angeles’ M.Sc. and qualification committee members for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
CMA was supported by the University of Porto, the Centre for Research and Postgraduate Studies of the National Polytechnic Institute (CINVESTAV-IPN) of Mexico, and the Council of Science and Technology (CONACyT) of Mexico. ICD and VSC were partially supported by: the European Regional Development Fund (ERDF), COMPETE Programme; the Portuguese Foundation for Science and Technology (FCT), projects ADE (PTDC/EIA-EIA/121686/2010 (FCOMP-01-0124-FEDER-020575)), and ABLe PTDC/EEI-SII/2094/2012.
Rights and permissions
About this article
Cite this article
Martínez-Angeles, C.A., Wu, H., Dutra, I. et al. Relational Learning with GPUs: Accelerating Rule Coverage. Int J Parallel Prog 44, 663–685 (2016). https://doi.org/10.1007/s10766-015-0364-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0364-7