Latent Tree Models for Hierarchical Topic Detection

Chen, Peixian; Zhang, Nevin L.; Liu, Tengfei; Poon, Leonard K. M.; Chen, Zhourong; Khawar, Farhan

Computer Science > Computation and Language

arXiv:1605.06650 (cs)

[Submitted on 21 May 2016 (v1), last revised 21 Dec 2016 (this version, v2)]

Title:Latent Tree Models for Hierarchical Topic Detection

Authors:Peixian Chen, Nevin L. Zhang, Tengfei Liu, Leonard K.M. Poon, Zhourong Chen, Farhan Khawar

View PDF

Abstract:We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.

Comments:	46 pages
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1605.06650 [cs.CL]
	(or arXiv:1605.06650v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1605.06650

Submission history

From: Peixian Chen [view email]
[v1] Sat, 21 May 2016 14:36:33 UTC (1,886 KB)
[v2] Wed, 21 Dec 2016 08:59:14 UTC (1,740 KB)

Computer Science > Computation and Language

Title:Latent Tree Models for Hierarchical Topic Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Latent Tree Models for Hierarchical Topic Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators