Abstract
Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterprise can easily initiate a data mining project using the most current technology. Often the software is available at no cost, allowing the enterprise to instead focus on ensuring their staff can freely learn the data mining techniques and methods. Open source ensures that staff can understand exactly how the algorithms work by examining the source codes, if they so desire, and can also fine tune the algorithms to suit the specific purposes of the enterprise. However, diversity, instability, scalability and poor documentation can be major concerns in using open source data mining systems. In this paper, we survey open source data mining systems currently available on the Internet. We compare 12 open source systems against several aspects such as general characteristics, data source accessibility, data mining functionality, and usability. We discuss advantages and disadvantages of these open source data mining systems.
This paper was supported by the National Natural Science Foundation of China (NSFC) under grants No.60603066.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Free Software Foundation: The GNU project, Website (2007), http://www.gnu.org
DuBois, P.: MySQL. Sams (2005)
University of Waikato, New Zealand: Weka 3.4.9, Website (2006), http://www.cs.waikato.ac.nz/ml/Weka/index.html
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Adomavicius, G., Tuzhilin, A.: Using data mining methods to build customer profiles. Computer (2001)
Bounsaythip, C., Rinta, E.: Overview of data mining for customer behavior modeling. Technical report, VTT Information Technology (2001)
Ling, C.X., Li, C.: Data mining for direct marketing: Problems and solutions. American Association for Artificial Intelligence (1998)
Rygielski, C., Wang, J.-C., Yen, D.C.: Data mining techniques for customer relationship management. Technology in Society 24, 483–502 (2002)
Apte, C., Liu, B., Pednault, E.P.D., Smyth, P.: Business applications of data mining. Communications of the ACM 45, 49–53 (2002)
Ahmed, S.R.: Applications of data mining in retail business. In: Proceedings of the International Conference on Information Technology: Coding and Computing (2004)
Kovalerchuk, B., Vityaev, E.: Data Mining in finance: Advances in Relational and Hybrid Methods. Kluwer Academic Publishers, Dordrecht (2000)
Han, J., Altman, R.B., Kumar, V., Mannila, H., Pregibon, D.: Emerging scientific applications in data mining. Communications of the ACM 45, 54–58 (2002)
Grossman, R., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.: Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers, Dordrecht (2001)
Huang, J.: Data mining overview. Technical report, E-Business Technology Institute (2006)
Goebel, M., Gruenwald, L.: A survey of data mining and knowledge discovery software tools. In: SIGKDD Explorations, vol. 1, pp. 20–33. ACM SIGKDD (1999)
Open Source Initiative: The open source definition, Website (2007), http://www.opensource.org/docs/definition_plain.html
Perens, B.: The open source definition, Website (2007), http://perens.com/Articles/OSD.html
Wang, H., Wang, C.: Open source software adoption: A status report. IEEE SOFTWARE (2001)
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufman, San Francisco (1999)
Object Management Group: Common warehouse metamodel (cwm), Website (2007), http://www.omg.org/cwm/
Data Mining Group: Predictive model markup language (pmml) (2005)
Information Technology and Systems Center (ITSC) at the University of Alabama in Huntsville: Algorithm development and mining system, Website (2005), http://datamining.itsc.uah.edu/adam/
HIT-HKU BI Lab: Alphaminer 2.0 (2006) Website: http://bi.hitsz.edu.cn/AlphaMiner/
Data Bionics Research Group, University of Marburg: Databionic esom tools, Website (2006), http://databionic-esom.sourceforge.net/
Williams, G.J.: Gnome data mining tools, Website (2006), http://www.togaware.com/datamining/gdatamine/
Chair for Bioinformatics and Information Mining, University of Konstanz, Germany: Knime 1.2.0, Website (2007), http://www.knime.org/
MiningMartResearch Team: Mining mart 1.1, Website (2006), http://mmart.cs.uni-dortmund.de/
Stanford: Mlc++, Website (1997), http://www.sgi.com/tech/mlc/
Artificial Intelligence Laboratory, University of Ljubljana, Slovenia: Orange 0.9.64, Website (2007), http://www.ailab.si/orange/
Williams, G.J.: Rattle 2.1.116, Website (2006), http://Rattle.togaware.com/
Ricco RAKOTOMALALA, University Lyon, France: Tanagra 1.4.12, Website (2006), http://chirouble.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html
Artificial Intelligence Unit, University of Dortmund, Germany: Yale 3.4, Website (2006), http://rapid-i.com/
Kleissner, C.: Data mining for the enterprise. In: Proceeding of the 31st Annual Hawaii International Conference on System Science, pp. 295–304 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, X., Ye, Y., Williams, G., Xu, X. (2007). A Survey of Open Source Data Mining Systems. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)