Abstract
Data mining is the process of posing queries to large quantities of data and extracting information, often previously unknown, using mathematical, statistical and machine learning techniques. However some of the data mining techniques like classification and clustering cannot deal with numeric attributes though most real dataset contains some numeric attributes. Continuous attributes should be divided into a small distinct range of nominal attributes in order to apply data mining techniques. Correct discretization makes the dataset succinct and contributes to the high performance of classification algorithms. Meanwhile, several methods are presented and applied, but it is often dependent on the area. In this paper, we propose a weighted hybrid discretization technique based on entropy and contingency coefficient. Also we analyze performance evaluation with well-known techniques of discretization such as Equal-width binning, 1R, MDLP and ChiMerge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley (2006)
Witten, I.H., Frank, E.: Data Mining: Practical Machine learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Artificial Intelligence 13, 1022–1027 (1993)
Barron, A., Rissanen, J., Yu, B.: The Minimum Description Length Principle in Coding and Modeling. IEEE Transactions on Information Theory 44(6), 2743–2760 (1998)
Kerber, R.: ChiMerge: Discretization of numeric attribute. In: Proc. AAAI 1991, 10th International Conference on Artificial Intelligence, pp. 123–127 (1992)
Perner, P., Trautzsch, S.: Multi-interval discretization methods for decision tree learning. Pattern Recognition 1451, 475–482 (1998)
Liu, H., Hussain, H.F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Artificial Intelligence 13, 1022–1027 (1993)
Han, J., Kamber, M.: Data Mining Conceptsand Techniques. Morgan Kaufmann (2001)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Transactions on Knowledge and Data Engineering 9, 642–645
Kohavi, M.S.: Error-Based and Entropy-Based Discretization of Continuous Features. In: The 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119 (1996)
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective Supervised Discretization for Classification based on Correlation Maximization. IEEE Transactions on Information Feuse and Integration, 390–295 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, YG., Kim, K.M., Kwon, Y.M. (2012). Using Weighted Hybrid Discretization Method to Analyze Climate Changes. In: Kim, Th., Cho, Hs., Gervasi, O., Yau, S.S. (eds) Computer Applications for Graphics, Grid Computing, and Industrial Environment. CGAG GDC IESH 2012 2012 2012. Communications in Computer and Information Science, vol 351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35600-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-35600-1_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35599-8
Online ISBN: 978-3-642-35600-1
eBook Packages: Computer ScienceComputer Science (R0)