Abstract
Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality. We also include a detailed analysis of how to tune regularization coefficients in ARTM models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)
Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS (LNAI), vol. 8265, pp. 265–274. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45114-0_21
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling general and specific aspects of documents with a probabilistic topic model. In: Advances in Neural Information Processing Systems, vol. 19, pp. 241–248. MIT Press (2007)
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(Suppl. 1), 5228–5335 (2004)
Hoffmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)
Jagarlamudi, J., Daumé III., H., Udupa, R.: Incorporating lexical priors into topic models. In: Proceedings of EACL 2012, pp. 204–213 (2012)
Koltcov, S., Koltsova, O., Nikolenko, S.I.: Latent Dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of WebSci 2014, pp. 161–165 (2014)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of EMNLP 2011, pp. 262–272 (2011)
Nikolenko, S.I., Koltsova, O., Koltsov, S.: Topic modelling for qualitative studies. J. Inf. Sci. 43, 88–102 (2015)
Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9(8), e103408 (2014)
Sociopolitical processes in the internet. Laboratory for Internet Studies. Internal report, National Research University Higher School of Economics, reg. no. 01201362573, Moscow (2013)
Tan, Y., Ou, Z.: Topic-weak-correlated latent Dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. W.H. Winston, Washington, D.C. (1977)
Vorontsov, K.V., Potapenko, A.A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). doi:10.1007/978-3-319-12580-0_3
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special Issue on Data Analysis and Intelligent Optimization with Applications
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of TM 2015, pp. 29–37. ACM, New York (2015)
Vorontsov, K.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014)
Acknowledgments
This work was supported by the Russian Science Foundation grant no. 15-18-00091.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K. (2017). Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts. In: Sidorov, G., Herrera-Alcántara, O. (eds) Advances in Computational Intelligence. MICAI 2016. Lecture Notes in Computer Science(), vol 10061. Springer, Cham. https://doi.org/10.1007/978-3-319-62434-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-62434-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62433-4
Online ISBN: 978-3-319-62434-1
eBook Packages: Computer ScienceComputer Science (R0)