Probabilistic Topic Modeling for Comparative Analysis of Document Collections

Published: 04 March 2020 Publication History


Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposed model can be useful for “comparative thinking” analysis in real-world document collections.


Information & Contributors


Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 2
April 2020
322 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020
Accepted: 01 October 2019
Revised: 01 August 2019
Received: 01 March 2018
Published in TKDD Volume 14, Issue 2


Request permissions for this article.

Check for updates

Author Tags

  Probabilistic topic modeling
  text mining


  Research-article
  • Research
  • Refereed

Funding Sources


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)237
  • Downloads (Last 6 weeks)35
Reflects downloads up to 15 Oct 2024

Other Metrics


