Abstract
The Multinomial distribution has been widely used to model count data. However, its Naive Bayes assumption usually degrades clustering performance especially when correlation between features is imminent, i.e., text documents. In this paper, we use the Negative Multinomial distribution to perform clustering based on finite mixture models, where the mixture parameters are to be estimated using a novel minorization-maximization algorithm, thriving in high-dimensionality optimization settings. Furthermore, we integrate a model-based feature selection approach to determine the optimal number of components in the mixture. To evaluate the clustering performance of the proposed model, three real-world applications are considered, namely, COVID-19 analysis, Web page clustering and facial expression recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, London (2012). https://doi.org/10.1007/978-1-4614-3223-4_4
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Azam, M., Bouguila, N.: Bounded generalized Gaussian mixture model with ICA. Neural Process. Lett. 49(3), 1299–1320 (2019)
Bakhtiari, A.S., Bouguila, N.: An expandable hierarchical statistical framework for count data modeling and its application to object classification. In: IEEE 23rd International Conference on Tools with Artificial Intelligence, ICTAI 2011, Boca Raton, FL, USA, 7–9, November 2011, pp. 817–824. IEEE Computer Society (2011)
Bakhtiari, A.S., Bouguila, N.: Online learning for two novel latent topic models. In: Linawati, M.M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds.) ICT-EurAsia 2014. LNCS, vol. 8407, pp. 286–295. Springer, Heidelberg (2014)
Baxter, R.A., Oliver, J.J.: Finding overlapping components with mml. Stat. Comput. 10(1), 5–16 (2000)
Bijl, D., Hyde-Thomson, H.: Speech to text conversion, Jan 9 2001, uS Patent 6,173,259
Bouguila, N.: A data-driven mixture kernel for count data classification using support vector machines. In: 2008 IEEE Workshop on Machine Learning for Signal Processing. pp. 26–31 (2008). https://doi.org/10.1109/MLSP.2008.4685450
Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)
Bouguila, N.: A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans. Knowl. Data Eng. 21(12), 1649–1664 (2009)
Bouguila, N.: Count data modeling and classification using finite mixtures of distributions. IEEE Trans. Neural Networks 22(2), 186–198 (2011)
Bouguila, N., Amayri, O.: A discrete mixture-based kernel for SVMs: application to spam and image categorization. Inf. Process. Manag. 45(6), 631–642 (2009)
Bouguila, N., ElGuebaly, W.: A generative model for spatial color image databases categorization. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, March 30–April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA, pp. 821–824. IEEE (2008)
Bouguila, N., ElGuebaly, W.: On discrete data clustering. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 503–510. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_44
Bouguila, N., ElGuebaly, W.: Discrete data clustering using finite mixture models. Pattern Recognit. 42(1), 33–42 (2009)
Bouguila, N., Ghimire, M.N.: Discrete visual features modeling via leave-one-out likelihood estimation and applications. J. Vis. Commun. Image Represent. 21(7), 613–626 (2010)
Bouguila, N., Ziou, D.: MML-based approach for finite Dirichlet mixture estimation and selection. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 42–51. Springer, Heidelberg (2005). https://doi.org/10.1007/11510888_5
Bouguila, N., Ziou, D.: Unsupervised selection of a finite Dirichlet mixture model: an mml-based approach. IEEE Trans. Knowl. Data Eng. 18(8), 993–1009 (2006)
Bouguila, N., Ziou, D.: Unsupervised learning of a finite discrete mixture: Applications to texture modeling and image databases summarization. J. Vis. Commun. Image Represent. 18(4), 295–309 (2007)
Chakraborty, S., Paul, D., Das, S., Xu, J.: Entropy weighted power k-means clustering. In: International Conference on Artificial Intelligence and Statistics, pp. 691–701. PMLR (2020)
Chiarappa, J.A.: Application of the negative multinomial distribution to comparative Poisson clinical trials of multiple experimental treatments versus a single control. Ph.D. thesis, Rutgers University-School of Graduate Studies (2019)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1, pp. 1–2. Prague (2004)
De Leeuw, J.: Block-relaxation algorithms in statistics. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 308–324. Springer, Heidelberg (1994). https://doi.org/10.1007/978-3-642-46808-7_28
Elkan, C.: Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 289–296 (2006)
Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)
Kesten, H., Morse, N.: A property of the multinomial distribution. Ann. Math. Stat. 30(1), 120–127 (1959)
Law, M.H., Figueiredo, M.A., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1154–1166 (2004)
Li, T., Mei, T., Kweon, I.S., Hua, X.S.: Contextual bag-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21(4), 381–392 (2010)
Li, Z., Tang, J., He, X.: Robust structured nonnegative matrix factorization for image representation. IEEE Trans. Neural Networks Learn. Syst. 29(5), 1947–1960 (2017)
Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retrieval 14(2), 178–203 (2011)
Madsen, R.E., Kauchak, D., Elkan, C.: Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545–552 (2005)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2004)
Minka, T.: Estimating a Dirichlet distribution (2000)
Pei, X., Chen, C., Gong, W.: Concept factorization with adaptive neighbors for document clustering. IEEE Trans. Neural Networks Learn. Syst. 29(2), 343–352 (2016)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Shuja, J., Alanazi, E., Alasmary, W., Alashaikh, A.: Covid-19 open source data sets: a comprehensive survey. Applied Intelligence, pp. 1–30 (2020)
Sibuya, M., Yoshimura, I., Shimizu, R.: Negative multinomial distribution. Ann. Inst. Stat. Math. 16(1), 409–426 (1964). https://doi.org/10.1007/BF02868583
Taleb, I., Serhani, M.A., Dssouli, R.: Big data quality assessment model for unstructured data. In: 2018 International Conference on Innovations in Information Technology (IIT), pp. 69–74. IEEE (2018)
Wallace, C.S., Dowe, D.L.: MMl clustering of multi-state, poisson, von mises circular and gaussian distributions. Stat. Comput. 10(1), 73–83 (2000)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273 (2003)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)
Zamzami, N., Bouguila, N.: Text modeling using multinomial scaled Dirichlet distributions. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds.) IEA/AIE 2018. LNCS (LNAI), vol. 10868, pp. 69–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92058-0_7
Zamzami, N., Bouguila, N.: Model selection and application to high-dimensional count data clustering - via finite EDCM mixture models. Appl. Intell. 49(4), 1467–1488 (2019)
Zenil, H., Kiani, N.A., Tegnér, J.: Quantifying loss of information in network-based dimensionality reduction techniques. J. Complex Networks 4(3), 342–362 (2016)
Zhou, H., Lange, K.: MM algorithms for some discrete multivariate distributions. J. Comput. Graph. Stat. 19(3), 645–665 (2010)
Zhou, H., Zhang, Y.: EM VS MM: a case study. Comput. Stat. Data Analy. 56(12), 3909–3920 (2012)
Zhu, J., Li, L.J., Fei-Fei, L., Xing, E.P.: Large margin learning of upstream scene understanding models. In: Advances in Neural Information Processing Systems, pp. 2586–2594 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bregu, O., Zamzami, N., Bouguila, N. (2021). Mixture-Based Unsupervised Learning for Positively Correlated Count Data. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-73280-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)