Topic Modeling with Wasserstein Autoencoders

Nan, Feng; Ding, Ran; Nallapati, Ramesh; Xiang, Bing

Computer Science > Information Retrieval

arXiv:1907.12374v1 (cs)

[Submitted on 24 Jul 2019 (this version), latest version 6 Dec 2019 (v2)]

Title:Topic Modeling with Wasserstein Autoencoders

Authors:Feng Nan, Ran Ding, Ramesh Nallapati, Bing Xiang

View PDF

Abstract:We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.

Comments:	to appear at ACL 2019
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1907.12374 [cs.IR]
	(or arXiv:1907.12374v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1907.12374

Submission history

From: Feng Nan [view email]
[v1] Wed, 24 Jul 2019 14:08:23 UTC (2,869 KB)
[v2] Fri, 6 Dec 2019 21:47:06 UTC (2,869 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2019-07

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Feng Nan
Ran Ding
Ramesh Nallapati
Bing Xiang

export BibTeX citation

Computer Science > Information Retrieval

Title:Topic Modeling with Wasserstein Autoencoders

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Topic Modeling with Wasserstein Autoencoders

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators