Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts

Apishev, Murat; Koltcov, Sergei; Koltsova, Olessia; Nikolenko, Sergey; Vorontsov, Konstantin

doi:10.1007/978-3-319-62434-1_14

Murat Apishev^16,18,
Sergei Koltcov¹⁵,
Olessia Koltsova¹⁵,
Sergey Nikolenko^15,17 &
…
Konstantin Vorontsov^18,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10061))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1466 Accesses
5 Citations

Abstract

Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality. We also include a detailed analysis of how to tune regularization coefficients in ARTM models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Additive regularization of topic models

Article 10 December 2014

The Effect of Additive Regularization for Topic Modeling of Social Media Communities

Topic modeling, long texts and the best number of topics. Some Problems and solutions

Article 17 February 2020

References

Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)
Google Scholar
Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K.: Mining ethnic content online with additively regularized topic models. Computacion y Sistemas 20(3), 387–403 (2016)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
MATH Google Scholar
Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A.: Interval semi-supervised LDA: classifying needles in a haystack. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS (LNAI), vol. 8265, pp. 265–274. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45114-0_21
Chapter Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling general and specific aspects of documents with a probabilistic topic model. In: Advances in Neural Information Processing Systems, vol. 19, pp. 241–248. MIT Press (2007)
Google Scholar
Griffiths, T., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(Suppl. 1), 5228–5335 (2004)
Article Google Scholar
Hoffmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)
Article MathSciNet MATH Google Scholar
Jagarlamudi, J., Daumé III., H., Udupa, R.: Incorporating lexical priors into topic models. In: Proceedings of EACL 2012, pp. 204–213 (2012)
Google Scholar
Koltcov, S., Koltsova, O., Nikolenko, S.I.: Latent Dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of WebSci 2014, pp. 161–165 (2014)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of EMNLP 2011, pp. 262–272 (2011)
Google Scholar
Nikolenko, S.I., Koltsova, O., Koltsov, S.: Topic modelling for qualitative studies. J. Inf. Sci. 43, 88–102 (2015)
Article Google Scholar
Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9(8), e103408 (2014)
Article Google Scholar
Sociopolitical processes in the internet. Laboratory for Internet Studies. Internal report, National Research University Higher School of Economics, reg. no. 01201362573, Moscow (2013)
Google Scholar
Tan, Y., Ou, Z.: Topic-weak-correlated latent Dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)
Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. W.H. Winston, Washington, D.C. (1977)
MATH Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). doi:10.1007/978-3-319-12580-0_3
Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015). Special Issue on Data Analysis and Intelligent Optimization with Applications
Article MathSciNet MATH Google Scholar
Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of TM 2015, pp. 29–37. ACM, New York (2015)
Google Scholar
Vorontsov, K.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the Russian Science Foundation grant no. 15-18-00091.

Author information

Authors and Affiliations

National Research University Higher School of Economics, St. Petersburg, Russia
Sergei Koltcov, Olessia Koltsova & Sergey Nikolenko
Moscow State University, Moscow, Russia
Murat Apishev
Steklov Institute of Mathematics, St. Petersburg, Russia
Sergey Nikolenko
Yandex, Moscow, Russia
Murat Apishev & Konstantin Vorontsov
Moscow Institute of Physics and Technology, Moscow, Russia
Konstantin Vorontsov

Authors

Murat Apishev
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Koltcov
View author publications
You can also search for this author in PubMed Google Scholar
Olessia Koltsova
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Nikolenko
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Nikolenko .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov
Universidad Autónoma Metropolitana, Mexico City, Mexico
Oscar Herrera-Alcántara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Apishev, M., Koltcov, S., Koltsova, O., Nikolenko, S., Vorontsov, K. (2017). Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts. In: Sidorov, G., Herrera-Alcántara, O. (eds) Advances in Computational Intelligence. MICAI 2016. Lecture Notes in Computer Science(), vol 10061. Springer, Cham. https://doi.org/10.1007/978-3-319-62434-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-62434-1_14
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62433-4
Online ISBN: 978-3-319-62434-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Additive regularization of topic models

The Effect of Additive Regularization for Topic Modeling of Social Media Communities

Topic modeling, long texts and the best number of topics. Some Problems and solutions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Additive Regularization for Topic Modeling in Sociological Studies of User-Generated Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Additive regularization of topic models

The Effect of Additive Regularization for Topic Modeling of Social Media Communities

Topic modeling, long texts and the best number of topics. Some Problems and solutions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation