Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster

Kholod, Ivan; Malov, Aleksey; Rodionov, Sergey

doi:10.1007/978-3-319-23126-6_13

Ivan Kholod¹⁶,
Aleksey Malov¹⁷ &
Sergey Rodionov¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9247))

Included in the following conference series:

4291 Accesses
3 Citations

Abstract

This article describes an approach to parallelizing of data mining algorithms, implemented in functional programming language, for distributed data processing in cluster. Here are provided requirements for the functions which form these algorithms for their conversion into parallel type. As an example we describe Naive Bayes algorithm implementation in Common Lisp language, its conversion into parallel type and execution on cluster with MPI system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data Mining Algorithms Parallelization in Logic Programming Framework for Execution in Cluster

Creation of Data Mining Algorithms as Functional Expression for Parallel and Distributed Execution

A formally based parallelization of data mining algorithms for multi-core systems

Article 07 July 2018

References

Paul, S.: Parallel and Distributed Data Mining, New Fundamental Technologies in Data Mining. Funatsu, K. (ed.), pp. 43–54 (2011)
Google Scholar
Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining, pp. 1–23. Springer-Verlag, Heidelberg (2000)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classier for data mining. In: Proc. of the Fifth Intl. Conference on Extending Database Technology (EDBT), Avignon, France (1996)
Google Scholar
Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classier for data mining. In: 22nd VLDB Conference (1996)
Google Scholar
Kufrin, R.: Decision trees on parallel processors. In: Geller, J., Kitano, H., Suttner, C. (eds.) Parallel Processing for Artiffcial Intelligence 3. Elsevier-Science (1997)
Google Scholar
Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multi-processors. In: Supercomputing 1996 (1996)
Google Scholar
Cheung, D., Hu, K., Xia, S.: Asynchronous parallel algorithm for mining association rules on shared-memory multi-processors. In: 10th ACM Symp. Parallel Algorithms and Architectures (1998)
Google Scholar
Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: 4th Intl. Conf. Parallel and Distributed Info. Systems (1996)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: an International Journal 1(4), 343–373 (1997)
Article Google Scholar
Johnson, E.L., Kargupta, H.: Collective, hierarchical clustering from distributed, heterogeneous data. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 221–244. Springer, Heidelberg (2000)
Chapter Google Scholar
Goil, S.H.N., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Center for Parallel and Distributed Computing, Northwestern University (1999)
Google Scholar
Judd, D., McKinley, P., Jain, A.: Large-scale parallel data clustering. In: Intl Conf. Pattern Recognition (1996)
Google Scholar
Kashef, R.: Cooperative Clustering Model and Its Applications. PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering (2008)
Google Scholar
Hammouda, K.M., Kamel, M.S.: Distributed collaborative web document clustering using cluster keyphrase summaries. Information Fusion 9(4), 465–480 (2008)
Article Google Scholar
Deb, D., Angryk, R.A.: Distributed document clustering using word-clusters. In: IEEE Symposium on Computational Intelligenceand Data mining, CIDM 2007, pp. 376–383 (2007)
Google Scholar
Wrobel, S., Dzeroski, S.: The ILP description learning problem: towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3) (1995)
Google Scholar
Kerdprasop, N., Kerdprasop, K.: Mining Frequent Patterns with Functional Programming. International Journal of Computer, Information, Systems and Control Engineering 1(1), 120–125 (2007)
Google Scholar
Amanda, C., King, R.: Data mining the yeast genome in a lazy functional language. http://users.aber.ac.uk/afc/papers/ClareKingPADL.pdf
Aleksovski, D., Erwig, M., Dzeroski, S.: A Functional Programming Approach to Distance-based Machine Learning. http://www.academia.edu/2804496/A_functional_programming_approach_to_distance-based_machine_learning
Bloomfield, V.A.: Using R for Numerical Analysis in Science and Engineering. Chapman & Hall/CRC p. 359 (2014)
Google Scholar
Common Warehouse Metamodel Specification. http://www.omg.org/spec/CWM/1.1/
Kholod, I., Karshiyev, Z., Shorov, A.: Formal model of data mining algorithms for algorithm parallelization. The nineteenth international multi-conference on advanced computer systems (ACS 2014). Artificial Intelligence, Software Technologies Biometrics and Information Technology Security (AISBIS 2014), Międzyzdroje, Poland, pp. 385–394, October 22–24, 2014
Google Scholar
Domingos, P., Pazzani M.: On the optimality of the simple Bayesian classifier under zero-one loss (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Saint Petersburg Electrotechnical University “LETI”, ul. Prof. Popova 5, Saint Petersburg, Russia
Ivan Kholod & Sergey Rodionov
Motorola Solutions, Business Centre “T4”, Sedova st., 12, 192019, Saint Petersburg, Russia
Aleksey Malov

Authors

Ivan Kholod
View author publications
You can also search for this author in PubMed Google Scholar
Aleksey Malov
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Rodionov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Kholod .

Editor information

Editors and Affiliations

FRUCT Oy, Helsinki, Finland
Sergey Balandin
Tampere University of Technology, Tampere, Finland
Sergey Andreev
Tampere University of Technology, Tampere, Finland
Yevgeni Koucheryavy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kholod, I., Malov, A., Rodionov, S. (2015). Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster. In: Balandin, S., Andreev, S., Koucheryavy, Y. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. ruSMART NEW2AN 2015 2015. Lecture Notes in Computer Science(), vol 9247. Springer, Cham. https://doi.org/10.1007/978-3-319-23126-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-23126-6_13
Published: 13 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23125-9
Online ISBN: 978-3-319-23126-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Mining Algorithms Parallelization in Logic Programming Framework for Execution in Cluster

Creation of Data Mining Algorithms as Functional Expression for Parallel and Distributed Execution

A formally based parallelization of data mining algorithms for multi-core systems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Data Mining Algorithms Parallelization in Logic Programming Framework for Execution in Cluster

Creation of Data Mining Algorithms as Functional Expression for Parallel and Distributed Execution

A formally based parallelization of data mining algorithms for multi-core systems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation