Abstract
This article describes an approach to parallelizing of data mining algorithms, implemented in functional programming language, for distributed data processing in cluster. Here are provided requirements for the functions which form these algorithms for their conversion into parallel type. As an example we describe Naive Bayes algorithm implementation in Common Lisp language, its conversion into parallel type and execution on cluster with MPI system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Paul, S.: Parallel and Distributed Data Mining, New Fundamental Technologies in Data Mining. Funatsu, K. (ed.), pp. 43–54 (2011)
Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining, pp. 1–23. Springer-Verlag, Heidelberg (2000)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classier for data mining. In: Proc. of the Fifth Intl. Conference on Extending Database Technology (EDBT), Avignon, France (1996)
Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classier for data mining. In: 22nd VLDB Conference (1996)
Kufrin, R.: Decision trees on parallel processors. In: Geller, J., Kitano, H., Suttner, C. (eds.) Parallel Processing for Artiffcial Intelligence 3. Elsevier-Science (1997)
Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multi-processors. In: Supercomputing 1996 (1996)
Cheung, D., Hu, K., Xia, S.: Asynchronous parallel algorithm for mining association rules on shared-memory multi-processors. In: 10th ACM Symp. Parallel Algorithms and Architectures (1998)
Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: 4th Intl. Conf. Parallel and Distributed Info. Systems (1996)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: an International Journal 1(4), 343–373 (1997)
Johnson, E.L., Kargupta, H.: Collective, hierarchical clustering from distributed, heterogeneous data. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 221–244. Springer, Heidelberg (2000)
Goil, S.H.N., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Center for Parallel and Distributed Computing, Northwestern University (1999)
Judd, D., McKinley, P., Jain, A.: Large-scale parallel data clustering. In: Intl Conf. Pattern Recognition (1996)
Kashef, R.: Cooperative Clustering Model and Its Applications. PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering (2008)
Hammouda, K.M., Kamel, M.S.: Distributed collaborative web document clustering using cluster keyphrase summaries. Information Fusion 9(4), 465–480 (2008)
Deb, D., Angryk, R.A.: Distributed document clustering using word-clusters. In: IEEE Symposium on Computational Intelligenceand Data mining, CIDM 2007, pp. 376–383 (2007)
Wrobel, S., Dzeroski, S.: The ILP description learning problem: towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3) (1995)
Kerdprasop, N., Kerdprasop, K.: Mining Frequent Patterns with Functional Programming. International Journal of Computer, Information, Systems and Control Engineering 1(1), 120–125 (2007)
Amanda, C., King, R.: Data mining the yeast genome in a lazy functional language. http://users.aber.ac.uk/afc/papers/ClareKingPADL.pdf
Aleksovski, D., Erwig, M., Dzeroski, S.: A Functional Programming Approach to Distance-based Machine Learning. http://www.academia.edu/2804496/A_functional_programming_approach_to_distance-based_machine_learning
Bloomfield, V.A.: Using R for Numerical Analysis in Science and Engineering. Chapman & Hall/CRC p. 359 (2014)
Common Warehouse Metamodel Specification. http://www.omg.org/spec/CWM/1.1/
Kholod, I., Karshiyev, Z., Shorov, A.: Formal model of data mining algorithms for algorithm parallelization. The nineteenth international multi-conference on advanced computer systems (ACS 2014). Artificial Intelligence, Software Technologies Biometrics and Information Technology Security (AISBIS 2014), Międzyzdroje, Poland, pp. 385–394, October 22–24, 2014
Domingos, P., Pazzani M.: On the optimality of the simple Bayesian classifier under zero-one loss (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kholod, I., Malov, A., Rodionov, S. (2015). Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster. In: Balandin, S., Andreev, S., Koucheryavy, Y. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. ruSMART NEW2AN 2015 2015. Lecture Notes in Computer Science(), vol 9247. Springer, Cham. https://doi.org/10.1007/978-3-319-23126-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-23126-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23125-9
Online ISBN: 978-3-319-23126-6
eBook Packages: Computer ScienceComputer Science (R0)