Abstract
Ordinal classification plays an important role in various decision making tasks. However, little attention is paid to this type of learning tasks compared with general classification learning. Shannon information entropy and the derived measure of mutual information play a fundamental role in a number of learning algorithms including feature evaluation, selection and decision tree construction. These measures are not applicable to ordinal classification for they cannot characterize the consistency of monotonicity in ordinal classification. In this paper, we generalize Shannon’s entropy to crisp ordinal classification and fuzzy ordinal classification, and show the information measures of ranking mutual information and fuzzy ranking mutual information. We discuss the properties of these measures and show that the proposed ranking mutual information and fuzzy ranking mutual information are the indexes of consistency of monotonicity in ordinal classification. In addition, the proposed indexes are used to evaluate the monotonicity degree between features and decision in the context of ordinal classification.
Similar content being viewed by others
References
Kamishima T, Akaho S. Dimension reduction for supervised ordering. In: Proceedings of the Sixth International Conference on Data Mining (ICDM’06). Hong Kong, China, 2006. 18–22
Lee J W T, Yeung D S, Wang X. Monotonic decision tree for ordinal classification. IEEE Int Conf Syst Man Cybern, 2003, 3: 2623–2628
Ben-David A, Sterling L, Pao Y H. Learning and classification of monotonic ordinal concepts. Comput Intell, 1989, 5: 45–49
Ben-David A. Automatic generation of symbolic multiattribute ordinal knowledge-based DSSs: Methodology and applications. Decis Sci, 1992, 23: 1357–1372
Frank E, Hall M. A simple approach to ordinal classification. In: De Raedt L, Flach P, eds. ECML 2001, LNAI 2167. Berlin: Springer-Verlag, 2001. 145–156
Costa J P, Cardoso J S. Classification of ordinal data using neural networks. In: Gama J, Camacho R, Brazdil P, et al. eds. ECML 2005, LNAI 3720. Berlin: Springer-Verlag, 2005. 690–697
Cardoso J S, Costa J F P. Learning to classify ordinal data: the data replication method. J Mach Learn Res, 2007, 8: 1393–1429
Costa J P, Alonso H, Cardoso J S. The unimodal model for the classification of ordinal data. Neur Netw, 2008, 21: 78–91
Ben-David A. Monotonicity maintenance in information-theoretic machine learning algorithms. Mach Learn, 1995, 19: 29–43
Potharst R, Bioch J C. Decision trees for ordinal classification. Intell Data Anal, 2000, 4: 97–111
Cao-Van K, Baets B D. Growing decision trees in an ordinal setting. Int J Intell Syst, 2003, 18: 733–750
Potharst R, Feelders A J. Classification trees for problems with monotonicity constraints. ACM SIGKDD Explor Newslett, 2002, 4: 1–10
Xia F, Zhang W S, Li F X, et al. Ranking with decision tree. Know Inf Syst, 2008, 17: 381–395
Greco S, Matarazzo B, Slowinski R. Rough approximation of a preference relation by dominance relations. ICS Research Report 16/96. Europ J Operat Res, 1999, 117: 63–83
Hu Q, Yu D, Guo M Z. Fuzzy preference based rough sets. Inf Sci, 2010, 180: 2003–2022
Lee J W T, Yeung D S, Tsang E C C. Rough sets and ordinal reducts. Soft Comput, 2006, 10: 27–33
Sai Y, Yao Y Y, Zhong N. Data analysis and mining in ordered information tables. In: Proceedings of the IEEE International Conference on Data Mining, IEEE Computer Society, 2001. 497–504
Greco S, Matarazzo B, Slowinski R. Rough sets methodology for sorting problems in presence of multiple attributes and criteria. Europ J Operat Res, 2002, 138: 247–259
Liang J Y, Qian Y H. Information granules and entropy theory in information systems. Sci China Ser F-Inf Sci, 2008, 51: 1427–1444
Hu D, Li H X, Yu X C. The information content of rules and rule sets and its application. Sci China Ser F-Inf Sci, 2008, 51: 1958–1979
Mingers J. An empirical comparison of selection measures for decision-tree induction. Mach Learn, 1989, 3: 319–342
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1226–1238
Fayyad U M, Irani K B. On the handling of continuous-valued attributes in decision tree generation. Mach Learn, 1992, 8: 87–102
Viola P, Wells W M. III. Alignment by maximization of mutual information. Int J Comput Vision, 1997, 24: 137–154
Spearman C. “Footrule” for measuring correlation. British J Psych, 1906, 2: 89–108
Hu Q H, Yu D R, Xie Z X, et al. Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst, 2006, 14: 191–201
Yu D R, Hu Q H, Wu C. Uncertainty measures for fuzzy relations and their applications. Appl Soft Comput, 2007, 7: 1135–1143
Quinlan J R. Induction of decision trees. Mach Learn 1986, 1: 81–106
Quinlan J R. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann, 1993
Pawlak Z. Rough Sets, Theoretical Aspects of Reasoning About Data. Dordrecht: Kluwer Academic Publishers, 1991
Greco S, Matarazzo B, Slowinski R. Rough approximation by dominance relations. Int J Intell Syst, 2002, 17: 153–171
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, Q., Guo, M., Yu, D. et al. Information entropy for ordinal classification. Sci. China Inf. Sci. 53, 1188–1200 (2010). https://doi.org/10.1007/s11432-010-3117-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-010-3117-7