Abstract
Evaluation functions, used to measure the quality of features, have great influence on the feature selection algorithms in areas of data mining and knowledge discovery. However, the existing evaluation functions are often inadequately measured candidate features on cost-sensitive heterogeneous data. To address this problem, an entropy-based evaluation function is firstly proposed for measuring the uncertainty for heterogeneous data. To further evaluate the quality of candidate features, we propose a multi-criteria based evaluation function, which attempts to find candidate features with the minimal total costs and the same information as the whole feature set. On this basis, a cost-sensitive feature selection algorithm on heterogeneous data is developed. Compared with the existing feature selection algorithms, the experimental results show that the proposed algorithm is more efficient to find a subset of features without losing the classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Farahat A.K., Ghodsi A., Kamel M.S.: An efficient greedy method for unsupervised feature selection. In: The 11th IEEE International Conference on Data Mining (ICDM), pp. 161–170 (2011)
Xue, B., Cervante, L., et al.: Multi-Objective Evolutionary Algorithms for Filter Based Feature Selection in Classification. International Journal on Artificial Intelligence Tools. 22(4), 1350024, 1–31 (2013)
Xue, B., Zhang, M.J., et al.: Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach. IEEE Transactions on Cybernetics 43(6), 1656–1671 (2013)
Pawlak, Z., Skowron, A.: Rough sets and Boolean reasoning. Information Sciences 177(1), 41–73 (2007)
Hu, Q., Zhao, H., Xie, Z., Yu, D.: Consistency based attribute reduction. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 96–107. Springer, Heidelberg (2007)
Qian, Y.H., Liang, J.Y., Pedrycz, W.: Positive approximation: an accelerator for attribute reduction in rough set theory. Artificial Intelligence 174, 597–618 (2010)
Sun, L., Xu, J.C.: Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowledge-Based Systems 36, 206–216 (2012)
Yang, M., Yang, P.: A novel condensing tree structure for rough set feature selection. Neurocomputing 71, 1092–1100 (2008)
Min, F., Hu, Q.H., Zhu, W.: Feature selection with test cost constraint. International Journal of Approximate Reasoning 55, 167–179 (2014)
Weiss, Y., Elovici, Y., Rokach, L.: The CASH algorithm cost-sensitive attribute selection using histograms. Information Sciences 222, 247–268 (2013)
Bolon-Canedo, V., Porto-Daz, I., Sanchez-Marono, N.: A framework for cost-based feature selection. Pattern Recognition 47, 2481–2489 (2014)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Hu, Q.H., Pedrycz, W., Yu, D.R., Lang, J.: Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 40(1), 137–150 (2010)
Chen, D.G., Yang, Y.Y.: Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Transactions on Fuzzy Systems 22(5), 1325–1334 (2014)
Dai, J.H., Wang, W.T.: An uncertainty measure for incomplete decision tables and its applications. IEEE Transactions on Cybernetics 43(4), 1277–1289 (2013)
UCI Dataset: http://www.ics.uci.edu/mlearn/MLRepository.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Qian, W., Shu, W., Yang, J., Wang, Y. (2015). Cost-Sensitive Feature Selection on Heterogeneous Data. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)