Online Cluster Validity Indices for Streaming Data

Moshtaghi, Masud; Bezdek, James C.; Erfani, Sarah M.; Leckie, Christopher; Bailey, James

Statistics > Machine Learning

arXiv:1801.02937 (stat)

[Submitted on 8 Jan 2018]

Title:Online Cluster Validity Indices for Streaming Data

Authors:Masud Moshtaghi, James C. Bezdek, Sarah M. Erfani, Christopher Leckie, James Bailey

View PDF

Abstract:Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to find an efficient incremental formulation of an index that can capture both cohesion and separation of the clusters over potentially infinite data streams. In this paper, we develop two online versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin internal validity indices, and analyze their characteristics, using two streaming clustering algorithms (sk-means and online ellipsoidal clustering), and illustrate their use in monitoring evolving clusters in streaming data. We also show that incremental cluster validity indices are capable of sending a distress signal to online monitors when evolving clusters go awry. Our numerical examples indicate that the incremental Xie-Beni index with forgetting factor is superior to the other three indices tested.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1801.02937 [stat.ML]
	(or arXiv:1801.02937v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1801.02937

Submission history

From: Masud Moshtaghi [view email]
[v1] Mon, 8 Jan 2018 18:43:00 UTC (2,515 KB)

Statistics > Machine Learning

Title:Online Cluster Validity Indices for Streaming Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Online Cluster Validity Indices for Streaming Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators