An Information-theoretic Perspective of Hierarchical Clustering

Pan, Yicheng; Zheng, Feng; Fan, Bingchen

Computer Science > Machine Learning

arXiv:2108.06036 (cs)

[Submitted on 13 Aug 2021]

Title:An Information-theoretic Perspective of Hierarchical Clustering

Authors:Yicheng Pan, Feng Zheng, Bingchen Fan

View PDF

Abstract:A combinatorial cost function for hierarchical clustering was introduced by Dasgupta \cite{dasgupta2016cost}. It has been generalized by Cohen-Addad et al. \cite{cohen2019hierarchical} to a general form named admissible function. In this paper, we investigate hierarchical clustering from the \emph{information-theoretic} perspective and formulate a new objective function. We also establish the relationship between these two perspectives. In algorithmic aspect, we get rid of the traditional top-down and bottom-up frameworks, and propose a new one to stratify the \emph{sparsest} level of a cluster tree recursively in guide with our objective function. For practical use, our resulting cluster tree is not binary. Our algorithm called HCSE outputs a $k$-level cluster tree by a novel and interpretable mechanism to choose $k$ automatically without any hyper-parameter. Our experimental results on synthetic datasets show that HCSE has a great advantage in finding the intrinsic number of hierarchies, and the results on real datasets show that HCSE also achieves competitive costs over the popular algorithms LOUVAIN and HLP.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2108.06036 [cs.LG]
	(or arXiv:2108.06036v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.06036

Submission history

From: Yicheng Pan [view email]
[v1] Fri, 13 Aug 2021 03:03:56 UTC (1,551 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yicheng Pan
Feng Zheng

export BibTeX citation

Computer Science > Machine Learning

Title:An Information-theoretic Perspective of Hierarchical Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:An Information-theoretic Perspective of Hierarchical Clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators