Abstract
We propose an alternative to the generative classifier that usually models both the class conditionals and class priors separately, and then uses the Bayes theorem to compute the posterior distribution of classes given the training set as a decision boundary. Because SVM (support vector machine) is not a probabilistic framework, it is really difficult to implement a direct posterior distribution-based discriminative classifier. As SVM lacks in full Bayesian analysis, we propose a hybrid (generative–discriminative) technique where the generative topic features from a Bayesian learning are fed to the SVM. The standard latent Dirichlet allocation topic model with its Dirichlet (Dir) prior could be defined as Dir–Dir topic model to characterize the Dirichlet placed on the document and corpus parameters. With very flexible conjugate priors to the multinomials such as generalized Dirichlet (GD) and Beta-Liouville (BL) in our proposed approach, we define two new topic models: the BL–GD and GD–BL. We take advantage of the geometric interpretation of our generative topic (latent) models that associate a K-dimensional manifold (K is the size of the topics) embedded into a V-dimensional feature space (word simplex) where V is the vocabulary size. Under this structure, the low-dimensional topic simplex (the subspace) defines a document as a single point on its manifold and associates each document with a single probability. The SVM, with its kernel trick, performs on these document probabilities in classification where it utilizes the maximum margin learning approach as a decision boundary. The key note is that points or documents that are close to each other on the manifold must belong to the same class. Experimental results with text documents and images show the merits of the proposed framework.
Similar content being viewed by others
References
Holub AD, Welling M, Perona P (2005) Combining generative models and fisher kernels for object recognition. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 136–143
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. MIT Press, pp 841–848. http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf
Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 64–71
Ihou KE, Bouguila N (2017) A new latent generalized dirichlet allocation model for image classification. In: 2017 Seventh international conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6
Ihou KE, Bouguila N (2019) Variational-based latent generalized dirichlet allocation model in the collapsed space and applications. Neurocomputing 332:372–395
Bouguila N (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202
Bouguila N, Ziou D (2010) A dirichlet process mixture of generalized dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122
Bouguila N (2011) Count data modeling and classification using finite mixtures of distributions. IEEE Trans Neural Netw 22(2):186–198
Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682
Weber M, Welling M, Perona P (2000) Towards automatic discovery of object categories. In: cvpr, p 39
Fergus R, Perona P, Zisserman A et al (2003) Object class recognition by unsupervised scale-invariant learning. In: CVPR (2), pp 264–271
Leibe B, Schiele B (2004) Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Joint pattern recognition symposium. Springer, pp 145–153
Schneiderman H (2004) Learning a restricted bayesian network for object detection. CVPR 2(4):639–646
Bakhtiari AS, Bouguila N (2014) A variational Bayes model for count data learning and classification. Eng Appl Artif Intell 35:176–186
Bakhtiari AS, Bouguila N (2014) Online learning for two novel latent topic models. In: Information and communication technology: second IFIP TC 5/8 international conference, ICT-EurAsia 2014, Bali, Indonesia, 14–17 Apr, proceedings, vol 8407. Springer, p 286
Fei-Fei L (2004) Learning generative visual models from few training examples. In: Workshop on generative-model based vision. In: IEEE Proceedings CVPR
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Yeh C, Tsai YH, Wang YF Generative-discriminative variational model for visual recognition. CoRR arXiv:1706.02295
Roth W, Peharz R, Tschiatschek S, Pernkopf F (2018) Hybrid generative-discriminative training of Gaussian mixture models. Pattern Recogn Lett 112:131–137
Zheng W, Liu Y, Lu H, Tang H (2017) Discriminative topic sparse representation for text categorization. In: 2017 10th International symposium on computational intelligence and design (ISCID), vol 1. IEEE, pp 454–457
Jaakkola T, David H (1999) Exploiting generative models in discriminative classifiers. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11. MIT Press, pp 487–493. http://papers.nips.cc/paper/1520-exploiting-generative-models-in-discriminative-classifiers.pdf
Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5(Jul):819–844
Vasconcelos N, Ho P, Moreno P (2004) The Kullback–Leibler kernel as a framework for discriminant and localized representations for visual recognition. In: European conference on computer vision. Springer, pp 430–441
Tsuda K, Kawanabe M, Rätsch G, Sonnenburg S, Müller K-R (2002) A new discriminative kernel from probabilistic models. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, pp 977–984. http://papers.nips.cc/paper/2014-a-new-discriminative-kernel-from-probabilistic-models.pdf
Prasad KR, Mohammed M, Noorullah R (2019) Visual topic models for healthcare data clustering. Evol Intell 12:1–17
Xia L, Luo D, Zhang C, Wu Z (2019) A survey of topic models in text classification. In: 2019 2nd international conference on artificial intelligence and big data (ICAIBD). IEEE, pp 244–250
Steinhauer HJ, Helldin T, Mathiason G, Karlsson A (2019) Topic modeling for anomaly detection in telecommunication networks. J Ambient Intell Human Comput 10:1–12
Laib L, Allili MS, Ait-Aoudia S (2019) A probabilistic topic model for event-based image classification and multi-label annotation. Sig Process Image Commun 76:283–294
Yao F, Wang Y (2019) Tracking urban geo-topics based on dynamic topic model. Comput Environ Urban Syst 79:101419
Venkatesaramani R, Downey D, Malin B, Vorobeychik Y (2019) A semantic cover approach for topic modeling. In: Proceedings of the eighth joint conference on lexical and computational semantics (*SEM 2019). Association for Computational Linguistics, Minneapolis, Minnesota, pp 92–102
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 2. IEEE, pp 524–531
Yang Y, Jia J, Zhang S, Wu B, Chen Q, Li J, Xing C, Tang J (2014) How do your friends on social media disclose your emotions? In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 306–312
Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, pp 99–108
Leng B, Zeng J, Yao M, Xiong Z (2015) 3D object retrieval with multitopic model combining relevance feedback and LDA model. IEEE Trans Image Process 24(1):94–105
Caballero KL, Barajas J, Akella R (2012) The generalized dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 773–782
Foulds J, Boyles L, DuBois C, Smyth P, Welling M (2013) Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 446–454
Ghorbani B, Javadi H, Montanari A (2019) An instability in variational inference for topic models. In: International conference on machine learning, pp 2221–2231
Zhang AY, Zhou HH Theoretical and computational guarantees of mean field variational inference for community detection. arXiv preprint arXiv:1710.11268
Bakhtiari AS, Bouguila N (2016) A latent beta-Liouville allocation model. Expert Syst Appl 45:260–272
Ihou KE, Bouguila N (2020) Stochastic topic models for large scale and nonstationary data. Eng Appl Artif Intell 88:103364
Teh YW, Newman D, Welling M (2007) A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: Advances in neural information processing systems, pp 1353–1360
Bhagat P, Choudhary P (2018) Image annotation: then and now. Image Vis Comput 80:1–23
Tian D, Shi Z (2019) A two-stage hybrid probabilistic topic model for refining image annotation. Int J Mach Learn Cybern 11:417–431
Fan W, Bouguila N (2013) Learning finite beta-Liouville mixture models via variational bayes for proportional data clustering. In: IJCAI, pp 1323–1329
Moreno PJ, Ho PP, Vasconcelos N (2004) A Kullback–Leibler divergence based kernel for svm classification in multimedia applications. In: Advances in neural information processing systems, pp 1385–1392
Blei DM, Jordan MI et al (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–144
Fan W, Bouguila N (2014) Online data clustering using variational learning of a hierarchical dirichlet process mixture of dirichlet distributions. In: International conference on database systems for advanced applications. Springer, pp 18–32
Zhao H, Du L, Buntine W, Liu G (2017) Metalda: a topic model that efficiently incorporates meta information. In: 2017 IEEE international conference on data mining (ICDM). IEEE, pp 635–644
Kherwa P, Bansal P (2018) Topic modeling: a comprehensive review. ICST Trans Scalable Inf Syst 7:159623
Li W, McCallum A (2006) Pachinko allocation: dag-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 577–584
Liu L, Huang H, Gao Y, Zhang Y, Wei X (2019) Neural variational correlated topic modeling. In: The world wide web conference. ACM, pp 1142–1152
Xun G, Li Y, Zhao WX, Gao J, Zhang A (2017) A correlated topic model using word embeddings. In: IJCAI, pp 4207–4213
Blei D, Lafferty J (2006) Correlated topic models. Adv Neural Inf Process Syst 18:147
Korshunova I, Xiong H, Fedoryszak M, Theis L (2019) Discriminative topic modeling with logistic LDA. In: Advances in neural information processing systems 32. Curran Associates, Inc., pp 6767–6777
Mcauliffe JD, Blei DM (2008) Supervised topic models. In: Advances in neural information processing systems, pp 121–128
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256
Lacoste-Julien S, Sha F, Jordan MI (2009) Disclda: discriminative learning for dimensionality reduction and classification. In: Advances in neural information processing systems, pp 897–904
Dieng AB, Ruiz FJR, Blei DM The dynamic embedded topic model. CoRR arXiv:1907.05545
Chi R, Wu B, Wang L (2018) Expert identification based on dynamic LDA topic model. In: 2018 IEEE third international conference on data science in cyberspace (DSC). IEEE, pp 881–888
Blei DM, Lafferty JD (2006) Dynamic topic models, In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
Chen J, Zhu J, Lu J, Liu S (2018) Scalable training of hierarchical topic models. Proc VLDB Endow 11(7):826–839
Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6(Sep):1345–1382
Li Y, Liu C, Zhao M, Li R, Xiao H, Wang K, Zhang J (2016) Multi-topic tracking model for dynamic social network. Physica A 454:51–65
Espinoza I, Mendoza M, Ortega P, Rivera D, Weiss F. Viscovery: trend tracking in opinion forums based on dynamic topic models, CoRR. arXiv:1805.00457
He Y, Lin C, Gao W, Wong K-F (2013) Dynamic joint sentiment-topic model. ACM Trans Intell Syst Technol (TIST) 5(1):6
Fenglei J, Cuiyun G et al (2019) An online topic modeling framework with topics automatically labeled. In: Proceedings of the 2019 workshop on widening NLP, pp 73–76
Gao C, Zeng J, Lyu MR, King I (2018) Online app review analysis for identifying emerging issues. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE). IEEE, pp 48–58
Bui X, Vu T, Than K (2016) Stochastic bounds for inference in topic models. In: International conference on advances in information and communication technology. Springer, pp 582–592
AlSumait L, Barbará D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking, In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 3–12
Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Ling 33(2):161–199
Valdez D, Pickett AC, Goodson P (2018) Topic modeling: latent semantic analysis for the social sciences. Soc Sci Q 99(5):1665–1679
Chang J, Blei D (2009) Relational topic models for document networks. In: van Dyk D, Welling M (eds) Proceedings of machine learning research, vol 5. PMLR, pp 81–88. http://proceedings.mlr.press/v5/chang09a.html
Blei DM, Franks K, Jordan MI, Mian IS (2006) Statistical modeling of biomedical corpora: mining the Caenorhabditis genetic center bibliography for genes related to life span. BMC Bioinform 7(1):250
Xiong S, Wang K, Ji D, Wang B (2018) A short text sentiment-topic model for product reviews. Neurocomputing 297:94–102
Hajjem M, Latiri C (2017) Combining IR and LDA topic modeling for filtering microblogs. Procedia Comput Sci 112:761–770
Fritz M, Schiele B (2008) Decomposition, discovery and detection of visual categories using topic models. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 370–377
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s image search. In: Tenth IEEE international conference on computer vision (ICCV'05) Volume 1, vol 2, pp 1816–1823
Bouguila N (2008) Clustering of count data using generalized dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474
Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57
Wu L, Shen L, Li Z (2016) A kernel method based on topic model for very high spatial resolution (VHSR) remote sensing image classification. ISPRS Int Arch Photogram Remote Sens Spatial Inf Sci XLI–B7:399–403
Lienou M, Maitre H, Datcu M (2009) Semantic annotation of satellite images using latent dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32
Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581. https://doi.org/10.1198/016214506000000302
Rematas K, Fritz M, Tuytelaars T (2012) Kernel density topic models: visual topics without visual words. In: NIPS workshops, modern nonparametric methods in machine learning
Nguyen V, Phung D, Venkatesh S (2015) Topic model kernel classification with probabilistically reduced features. J Data Sci 13(2):323–340
Hennig P, Stern D, Herbrich R, Graepel T (2012) Kernel topic models, In: Artificial intelligence and statistics, pp 511–519
Muandet K, Fukumizu K, Dinuzzo F, Schölkopf B (2012) Learning from distributions via support measure machines. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 10–18. http://papers.nips.cc/paper/4825-learning-from-distributions-via-support-measure-machines.pdf
Yoshikawa Y, Iwata T, Sawada H (2014) Latent support measure machines for bag-of-words data classification. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., pp 1961–1969. http://papers.nips.cc/paper/5480-latent-support-measure-machines-for-bag-of-words-data-classification.pdf
Bdiri T, Bouguila N (2013) Bayesian learning of inverted dirichlet mixtures for SVM kernels generation. Neural Comput Appl 23(5):1443–1458
Than K, Doan T Guaranteed inference in topic models. arXiv preprint arXiv:1512.03308
Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: why priors matter. In: Proceedings of the 22nd international conference on neural information processing systems. Curran Associates Inc., pp 1973–1981
Wallach HM, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models, In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1105–1112
Chan AB, Vasconcelos N, Moreno PJ A family of probabilistic kernels based on information divergence. Technical Report, SVCL-TR-2004-1, University of California, San Diego, CA
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Jebara T, Kondor R (2003) Bhattacharyya and expected likelihood kernels. In: Schölkopf B, Warmuth MK (eds) Learning theory and kernel machines. Springer, Berlin Heidelberg, pp 57–71
Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 361–368
Zeng J, Liu Z-Q, Cao X-Q (2015) Fast online EM for big topic modeling. IEEE Trans Knowl Data Eng 28(3):675–688
Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 27–34
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 649–657. http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178
Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 9:947–963
Acknowledgements
The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ihou, K.E., Bouguila, N. & Bouachir, W. Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels. Pattern Anal Applic 24, 217–241 (2021). https://doi.org/10.1007/s10044-020-00917-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00917-1