A Split-Merge Framework for Comparing Clusterings

Xiang, Qiaoliang; Mao, Qi; Chai, Kian Ming; Chieu, Hai Leong; Tsang, Ivor; Zhao, Zhendong

Computer Science > Machine Learning

arXiv:1206.6475 (cs)

[Submitted on 27 Jun 2012 (v1), last revised 4 Sep 2012 (this version, v2)]

Title:A Split-Merge Framework for Comparing Clusterings

Authors:Qiaoliang Xiang (Nanyang Technological University), Qi Mao (Nanyang Technological University), Kian Ming Chai (DSO National Laboratories), Hai Leong Chieu (DSO National Laboratories), Ivor Tsang (Nanyang Technological University), Zhendong Zhao (Macquarie University)

View PDF

Abstract:Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

Comments:	Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1206.6475 [cs.LG]
	(or arXiv:1206.6475v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1206.6475

Submission history

From: Qiaoliang Xiang [view email] [via Amir Globerson as proxy]
[v1] Wed, 27 Jun 2012 19:59:59 UTC (280 KB)
[v2] Tue, 4 Sep 2012 17:42:41 UTC (280 KB)

Computer Science > Machine Learning

Title:A Split-Merge Framework for Comparing Clusterings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Split-Merge Framework for Comparing Clusterings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators