Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2684822.2685324acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Exploring the Space of Topic Coherence Measures

Published: 02 February 2015 Publication History

Abstract

Quantifying the coherence of a set of statements is a long standing problem with many potential applications that has attracted researchers from different sciences. The special case of measuring coherence of topics has been recently studied to remedy the problem that topic models give no guaranty on the interpretablity of their output. Several benchmark datasets were produced that record human judgements of the interpretability of topics. We are the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components. We conduct a systematic search of the space of coherence measures using all publicly available topic relevance data for the evaluation. Our results show that new combinations of components outperform existing measures with respect to correlation to human ratings. nFinally, we outline how our results can be transferred to further applications in the context of text mining, information retrieval and the world wide web.

References

[1]
N. Aletras and M. Stevenson. Evaluating topic coherence using distributional semantics. In Proc. of the 10th Int. Conf. on Computational Semantics (IWCS'13), pages 13--22, 2013.
[2]
L. Alsumait, D. Barbara, J. Gentle, and C. Domeniconi. Topic significance ranking of lda generative models. In Proc. of the Europ. Conf. on Machine Learning and Knowledge Disc. in Databases: Part I, ECML PKDD '09, pages 67--82, 2009.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[4]
G. Bouma. Normalized (pointwise) mutual information in collocation extraction. In From Form to Meaning: Processing Texts Automatically, Proc. of the Biennial GSCL Conference 2009, pages 31--40, Tubingen, 2009.
[5]
L. Bovens and S. Hartmann. Bayesian Epistemology. Oxford University Press, 2003.
[6]
J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 22, pages 288--296. 2009.
[7]
I. Douven and W. Meijs. Measuring coherence. Synthese, 156(3):405--425, 2007.
[8]
B. Fitelson. A probabilistic theory of coherence. Analysis, 63(279):194--199, 2003.
[9]
A. Hinneburg, R. Preiss, and R. Schroder. Topicexplorer: Exploring document collections with topic models. In ECML/PKDD (2), pages 838--841, 2012.
[10]
J. H. Lau, D. Newman, and T. Baldwin. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proc. of the Europ. Chap. of the Assoc. for Comp. Ling., 2014.
[11]
D. Mimno and D. Blei. Bayesian checking for topic models. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, pages 227--237. 2011.
[12]
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum. Optimizing semantic coherence in topic models. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, pages 262--272. 2011.
[13]
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100--108. 2010.
[14]
E. Olsson. What is the problem of coherence and truth? The Jour. of Philosophy, 99(5):246--272, 2002.
[15]
M. Roder, M. Speicher, and R. Usbeck. Investigating quality raters' performance using interface evaluation methods. In Informatik 2013, pages 137--139. GI, 2013.
[16]
F. Rosner, A. Hinneburg, M. Roder, M. Nettling, and A. Both. Evaluating topic coherence measures. CoRR, abs/1403.6397, 2014.
[17]
T. Shogenji. Is coherence truth conducive? Analysis, 59(264):338--345, 1999.
[18]
K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler. Exploring topic coherence over many models and many topics. In Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 952--961, 2012.

Cited By

View all
  • (2025)What Are the Public’s Concerns About ChatGPT? A Novel Self-Supervised Neural Topic Model Tells YouMathematics10.3390/math1302018313:2(183)Online publication date: 8-Jan-2025
  • (2025)A Cross-Cultural Crash Pattern Analysis in the United States and Jordan Using BERT and SHAPElectronics10.3390/electronics1402027214:2(272)Online publication date: 10-Jan-2025
  • (2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: Jan-2025
  • Show More Cited By

Index Terms

  1. Exploring the Space of Topic Coherence Measures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining
    February 2015
    482 pages
    ISBN:9781450333177
    DOI:10.1145/2684822
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. topic coherence
    2. topic evaluation
    3. topic model

    Qualifiers

    • Research-article

    Conference

    WSDM 2015

    Acceptance Rates

    WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,006
    • Downloads (Last 6 weeks)111
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)What Are the Public’s Concerns About ChatGPT? A Novel Self-Supervised Neural Topic Model Tells YouMathematics10.3390/math1302018313:2(183)Online publication date: 8-Jan-2025
    • (2025)A Cross-Cultural Crash Pattern Analysis in the United States and Jordan Using BERT and SHAPElectronics10.3390/electronics1402027214:2(272)Online publication date: 10-Jan-2025
    • (2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: Jan-2025
    • (2025)Positive sentiment and expertise predict the diffusion of archaeological content on social mediaScientific Reports10.1038/s41598-025-85167-z15:1Online publication date: 15-Jan-2025
    • (2025)Enhancing the efficiency of maritime navigation environmental resource management based on text mining techniquesRegional Studies in Marine Science10.1016/j.rsma.2024.10398881(103988)Online publication date: Jan-2025
    • (2025)Identification of innovation drivers based on technology-related news articlesJournal of Open Innovation: Technology, Market, and Complexity10.1016/j.joitmc.2025.100475(100475)Online publication date: Jan-2025
    • (2025)Clustering-based topic modeling for biomedical documents extractive text summarizationThe Journal of Supercomputing10.1007/s11227-024-06640-681:1Online publication date: 1-Jan-2025
    • (2025)Topic modelling through the bibliometrics lens and its techniqueArtificial Intelligence Review10.1007/s10462-024-11011-x58:3Online publication date: 6-Jan-2025
    • (2025)Exploring Alternative Data for Nowcasting: A Case Study on US GDP Using Topic AttentionMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74643-7_28(387-401)Online publication date: 1-Jan-2025
    • (2024)Analyzing Online Conversations on Reddit: A Study of Stress and Anxiety Through Topic Modeling and Sentiment AnalysisCureus10.7759/cureus.69030Online publication date: 9-Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media