Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster

  • Conference paper
  • First Online:
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (ruSMART 2015, NEW2AN 2015)

Abstract

This article describes an approach to parallelizing of data mining algorithms, implemented in functional programming language, for distributed data processing in cluster. Here are provided requirements for the functions which form these algorithms for their conversion into parallel type. As an example we describe Naive Bayes algorithm implementation in Common Lisp language, its conversion into parallel type and execution on cluster with MPI system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Paul, S.: Parallel and Distributed Data Mining, New Fundamental Technologies in Data Mining. Funatsu, K. (ed.), pp. 43–54 (2011)

    Google Scholar 

  2. Zaki, M.J., Ho, C.-T. (eds.): Large-Scale Parallel Data Mining, pp. 1–23. Springer-Verlag, Heidelberg (2000)

    Google Scholar 

  3. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classier for data mining. In: Proc. of the Fifth Intl. Conference on Extending Database Technology (EDBT), Avignon, France (1996)

    Google Scholar 

  4. Shafer, J., Agrawal, R., Mehta, M.: Sprint: a scalable parallel classier for data mining. In: 22nd VLDB Conference (1996)

    Google Scholar 

  5. Kufrin, R.: Decision trees on parallel processors. In: Geller, J., Kitano, H., Suttner, C. (eds.) Parallel Processing for Artiffcial Intelligence 3. Elsevier-Science (1997)

    Google Scholar 

  6. Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multi-processors. In: Supercomputing 1996 (1996)

    Google Scholar 

  7. Cheung, D., Hu, K., Xia, S.: Asynchronous parallel algorithm for mining association rules on shared-memory multi-processors. In: 10th ACM Symp. Parallel Algorithms and Architectures (1998)

    Google Scholar 

  8. Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: 4th Intl. Conf. Parallel and Distributed Info. Systems (1996)

    Google Scholar 

  9. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: an International Journal 1(4), 343–373 (1997)

    Article  Google Scholar 

  10. Johnson, E.L., Kargupta, H.: Collective, hierarchical clustering from distributed, heterogeneous data. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 221–244. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Goil, S.H.N., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Center for Parallel and Distributed Computing, Northwestern University (1999)

    Google Scholar 

  12. Judd, D., McKinley, P., Jain, A.: Large-scale parallel data clustering. In: Intl Conf. Pattern Recognition (1996)

    Google Scholar 

  13. Kashef, R.: Cooperative Clustering Model and Its Applications. PhD thesis, University of Waterloo, Department of Electrical and Computer Enginnering (2008)

    Google Scholar 

  14. Hammouda, K.M., Kamel, M.S.: Distributed collaborative web document clustering using cluster keyphrase summaries. Information Fusion 9(4), 465–480 (2008)

    Article  Google Scholar 

  15. Deb, D., Angryk, R.A.: Distributed document clustering using word-clusters. In: IEEE Symposium on Computational Intelligenceand Data mining, CIDM 2007, pp. 376–383 (2007)

    Google Scholar 

  16. Wrobel, S., Dzeroski, S.: The ILP description learning problem: towards a general model-level definition of data mining in ILP. In: FGML-95 Annual Workshop of the GI Special Interest Group Machine Learning (GI FG 1.1.3) (1995)

    Google Scholar 

  17. Kerdprasop, N., Kerdprasop, K.: Mining Frequent Patterns with Functional Programming. International Journal of Computer, Information, Systems and Control Engineering 1(1), 120–125 (2007)

    Google Scholar 

  18. Amanda, C., King, R.: Data mining the yeast genome in a lazy functional language. http://users.aber.ac.uk/afc/papers/ClareKingPADL.pdf

  19. Aleksovski, D., Erwig, M., Dzeroski, S.: A Functional Programming Approach to Distance-based Machine Learning. http://www.academia.edu/2804496/A_functional_programming_approach_to_distance-based_machine_learning

  20. Bloomfield, V.A.: Using R for Numerical Analysis in Science and Engineering. Chapman & Hall/CRC p. 359 (2014)

    Google Scholar 

  21. Common Warehouse Metamodel Specification. http://www.omg.org/spec/CWM/1.1/

  22. Kholod, I., Karshiyev, Z., Shorov, A.: Formal model of data mining algorithms for algorithm parallelization. The nineteenth international multi-conference on advanced computer systems (ACS 2014). Artificial Intelligence, Software Technologies Biometrics and Information Technology Security (AISBIS 2014), Międzyzdroje, Poland, pp. 385–394, October 22–24, 2014

    Google Scholar 

  23. Domingos, P., Pazzani M.: On the optimality of the simple Bayesian classifier under zero-one loss (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Kholod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kholod, I., Malov, A., Rodionov, S. (2015). Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster. In: Balandin, S., Andreev, S., Koucheryavy, Y. (eds) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. ruSMART NEW2AN 2015 2015. Lecture Notes in Computer Science(), vol 9247. Springer, Cham. https://doi.org/10.1007/978-3-319-23126-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23126-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23125-9

  • Online ISBN: 978-3-319-23126-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics