article

Nearest-neighbors medians clustering

Authors:

Júlia Viladomat,

Ruben ZamarAuthors Info & Claims

Statistical Analysis and Data Mining, Volume 5, Issue 4

Pages 349 - 362

https://doi.org/10.1002/sam.11149

Published: 01 August 2012 Publication History

Abstract

We propose a nonparametric cluster algorithm based on local medians. Each observation is substituted by its local median and this new observation moves toward the peaks and away from the valleys of the distribution. The process is repeated until each observation converges to a fixpoint. We obtain a partition of the sample based on the convergence points. Our algorithm determines the number of clusters and the partition of the observations given the proportion α of neighbors. A fast version of the algorithm where only a subset of the observations from the sample is processed is also proposed. A proof of the convergence from each point to its closest fixpoint and the existence and uniqueness of a fixpoint in a neighborhood of each mode is given for the univariate case. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.

References

[1]

J. A. Hartigan and M. A. Wong, A k-means clustering algorithm, J R Stat Soc {Ser C} 28 (1979), 100–108.

[2]

L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley, 1990.

[3]

R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, J R Stat Soc {Ser B} 63 (2001), 411–423.

[4]

C. Fraley and A. E. Raftery, Mclust: Software for model-based cluster analysis, J Classif 16 (1999), 297–306.

[5]

H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE Trans Pattern Anal Mach Intell 21 (1999), 450–465.

Digital Library

[6]

X. Zhung, Y. Huang, K. Palaniappan, and J. S. Lee, Gaussian mixture modelling, decomposition and applications, IEEE Trans Signal Process 5 (1996), 1293–1302.

[7]

M. Y. Cheng and P. Hall, Calibrating the excess mass and dip tests of modality, J R Stat Soc {Ser B} 60 (1998), 579–589.

[8]

W. E. Wright, Gravitational clustering, Pattern Recognit 9 (1977), 151–166.

[9]

S. Kundu, Gravitational clustering: a new approach based on the spatial distribution of the points, Pattern Recognit 32 (1999), 1149–1160.

[10]

Y. Sato, An autonomous clustering technique, In Data Analysis, Classification, and Related Methods, A. L. H. Kiers, J. P. Rasson, P. J. E. Groenen, and M. Schader, eds. Berlin, Springer, 2000.

[11]

J. H. Wang and J. D. Rau, VQ-agglomeration: a novel approach to clustering, IEE Proc Vis Image Signal Process 148 (2001), 36–44.

[12]

K. Fukunaga and L. D. Hostetler, The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Trans Inform Theory 21 (1975), 32–40.

[13]

Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans Pattern Anal Mach Intell 17 (1995), 790–799.

Digital Library

[14]

D. Comaniciu and P. Meer, Mean shift analysis and applications, In Proceedings of the Seventh International Conference on Computer Vision, 1999, 1197–1203.

[15]

D. Comaniciu and P. Meer, Real-time tracking of non-rigid objects using mean shift, IEEE Conf Comput Vis Pattern Recognit 2 (2000), 142–149.

[16]

D. Comaniciu and P. Meer, The variable bandwidth mean shift and data-driven scale selection, Proc 8th Int Conf Comput Vis 1 (2001), 438–445.

[17]

D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans Pattern Anal Mach Intell 24 (2002), 603–619.

Digital Library

[18]

Y. P. Mack and M. Rosenblatt, Multivariate k-nearest neighbour density estimates, J Multivariate Anal 9 (1979), 1–15.

[19]

X. Wang, W. Qiu, and R. Zamar, CLUES: A non-parametric clustering method based on local shrinking, Comput Stat Data Anal 52 (2007), 286–298.

[20]

E. H. Ruspini, Numerical methods for fuzzy clustering, Inf Sci 2 (1970), 319–350.

Digital Library

[21]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Ann Eugenic 7 (1936), 179–188.

[22]

R. B. Calinski and J. Harabasz, A dendrite method for cluster analysis, Comm Stat 3 (1974), 1–27.

[23]

D. Peña and F. J. Prieto, Cluster identification using projections, J Am Stat Assoc 96 (2001), 1433–1445.

[24]

J. Einbeck, Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage, J Pattern Recognit Res 2 (2011), 175–192.

[25]

G. W. Milligan and M. C. Cooper, An examination of procedures for determining the number of clusters in a dataset, Psychometrika 50 (1985), 159–179.

[26]

L. Hubert and P. Arabie, Comparing partitions, J Classif 2 (1985), 193–218.

[27]

F. Chang, W. Qiu, R. Zamar, R. Lazarus, and X. Wang, clues: an R package for nonparametric clustering based on local shrinking, J Stat Softw 33 (2010), 1–16.

Index Terms

Nearest-neighbors medians clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Index terms have been assigned to the content through auto-classification.

Recommendations

A simple statistics-based nearest neighbor cluster detection algorithm

We propose a new method for autonomously finding clusters in spatial data. The proposed method belongs to the so called nearest neighbor approaches for finding clusters. It is a repetitive technique which produces changing averages and deviations of ...
The Projected Dip-means Clustering Algorithm
SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence

One of the major research issues in data clustering concerns the estimation of number of clusters. In previous work, the dip-means clustering algorithm has been proposed as a successful attempt to tackle this problem. Dip-means is an incremental ...
A distributed approximate nearest neighbors algorithm for efficient large scale mean shift clustering
Abstract
Mean Shift clustering, as a generalization of the well-known k-means clustering, computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its ...
Highlights
- Nearest neighbors estimators introduced using Locality Sensitive Hashing (LSH) for Gradient Ascent at scale.

Comments

Information & Contributors

Information

Published In

cover image Statistical Analysis and Data Mining

Statistical Analysis and Data Mining Volume 5, Issue 4

August 2012

88 pages

ISSN:1932-1864

Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 August 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents