Abstract
Learning an appropriate representation for categorical data is a critical yet challenging task. Current research makes efforts to embed the categorical data into the vector or dis/similarity spaces, however, it either ignores the complex interactions within data or overlooks the relationship between the representation and its fed learning model. In this paper, we propose a model-aware representation learning framework for categorical data with hierarchical couplings, which simultaneously reveals the couplings from value to object and optimizes the fitness of the represented data for the follow-up learning model. An SVM-aware representation learning method has been instantiated for this framework. Extensive experiments on ten UCI categorical datasets with diverse characteristics demonstrate the representation via our proposed method can significantly improve the learning performance (up to 18.64% improved) compared with other three competitors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn. Lett. 28(1), 110–118 (2007)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and regression trees. Biometrics 40(3), 358 (1984)
Cao, F., Liang, J., Li, D., Bai, L., Dang, C.: A dissimilarity measure for the k-modes clustering algorithm. Knowl.-Based Syst. 26, 120–127 (2012)
Grąbczewski, K., Jankowski, N.: Transformations of symbolic data for continuous data oriented models. In: Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (eds.) ICANN/ICONIP -2003. LNCS, vol. 2714, pp. 359–366. Springer, Heidelberg (2003). doi:10.1007/3-540-44989-2_43
Ienco, D., Pensa, R.G., Meo, R.: From context to distance: learning dissimilarity for categorical data clustering. ACM Trans. Knowl. Discov. Data 6(1), 1–25 (2012)
Jia, H., Cheung, Y.M., Liu, J.: A new distance metric for unsupervised learning of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 1065–1079 (2016)
Le, S.Q., Ho, T.B.: An association-based dissimilarity measure for categorical data. Pattern Recogn. Lett. 26(16), 2549–2557 (2005)
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)
Peng, S., Hu, Q., Chen, Y., Dang, J.: Improved support vector machine algorithm for heterogeneous data. Pattern Recogn. 48(6), 2072–2083 (2015)
Stanfill, C., Waltz, D.: Toward memory-based reasoning. Commun. ACM 29(12), 1213–1228 (1986)
Vapnik, V.N.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
Wang, C., Dong, X., Zhou, F., Cao, L., Chi, C.H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781 (2015)
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6(1), 1–34 (1997)
Xie, J., Szymanski, B.K., Zaki, M.J.: Learning dissimilarities for categorical symbols. In: JMLR: Workshop on Feature Selection in Data Mining, pp. 2228–2238. JMLR.org (2013)
Zhang, K., Wang, Q., Chen, Z., Marsic, I., Kumar, V., Jiang, G., Zhang, J.: From categorical to numerical: multiple transitive distance learning and embedding. In: SIAM International Conference on Data Mining, pp. 46–54. SIAM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Song, J., Zhu, C., Zhao, W., Liu, W., Liu, Q. (2017). Model-Aware Representation Learning for Categorical Data with Hierarchical Couplings. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-68612-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68611-0
Online ISBN: 978-3-319-68612-7
eBook Packages: Computer ScienceComputer Science (R0)