Document Informed Neural Autoregressive Topic Models with Distributional Prior

Gupta, Pankaj; Chaudhary, Yatin; Buettner, Florian; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:1809.06709 (cs)

[Submitted on 15 Sep 2018 (v1), last revised 14 Jan 2019 (this version, v2)]

Title:Document Informed Neural Autoregressive Topic Models with Distributional Prior

Authors:Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schütze

View PDF

Abstract:We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts "artificial neural networks" vs. "biological neuron networks". Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe.
We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.

Comments:	AAAI2019. arXiv admin note: substantial text overlap with arXiv:1808.03793
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:1809.06709 [cs.CL]
	(or arXiv:1809.06709v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1809.06709

Submission history

From: Pankaj Gupta [view email]
[v1] Sat, 15 Sep 2018 12:48:16 UTC (1,022 KB)
[v2] Mon, 14 Jan 2019 16:25:06 UTC (1,006 KB)

Computer Science > Computation and Language

Title:Document Informed Neural Autoregressive Topic Models with Distributional Prior

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Document Informed Neural Autoregressive Topic Models with Distributional Prior

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators