Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Nearest-neighbors medians clustering

Published: 01 August 2012 Publication History

Abstract

We propose a nonparametric cluster algorithm based on local medians. Each observation is substituted by its local median and this new observation moves toward the peaks and away from the valleys of the distribution. The process is repeated until each observation converges to a fixpoint. We obtain a partition of the sample based on the convergence points. Our algorithm determines the number of clusters and the partition of the observations given the proportion α of neighbors. A fast version of the algorithm where only a subset of the observations from the sample is processed is also proposed. A proof of the convergence from each point to its closest fixpoint and the existence and uniqueness of a fixpoint in a neighborhood of each mode is given for the univariate case. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.

References

[1]
J. A. Hartigan and M. A. Wong, A k-means clustering algorithm, J R Stat Soc {Ser C} 28 (1979), 100–108.
[2]
L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York, John Wiley, 1990.
[3]
R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clusters in a data set via the gap statistic, J R Stat Soc {Ser B} 63 (2001), 411–423.
[4]
C. Fraley and A. E. Raftery, Mclust: Software for model-based cluster analysis, J Classif 16 (1999), 297–306.
[5]
H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE Trans Pattern Anal Mach Intell 21 (1999), 450–465.
[6]
X. Zhung, Y. Huang, K. Palaniappan, and J. S. Lee, Gaussian mixture modelling, decomposition and applications, IEEE Trans Signal Process 5 (1996), 1293–1302.
[7]
M. Y. Cheng and P. Hall, Calibrating the excess mass and dip tests of modality, J R Stat Soc {Ser B} 60 (1998), 579–589.
[8]
W. E. Wright, Gravitational clustering, Pattern Recognit 9 (1977), 151–166.
[9]
S. Kundu, Gravitational clustering: a new approach based on the spatial distribution of the points, Pattern Recognit 32 (1999), 1149–1160.
[10]
Y. Sato, An autonomous clustering technique, In Data Analysis, Classification, and Related Methods, A. L. H. Kiers, J. P. Rasson, P. J. E. Groenen, and M. Schader, eds. Berlin, Springer, 2000.
[11]
J. H. Wang and J. D. Rau, VQ-agglomeration: a novel approach to clustering, IEE Proc Vis Image Signal Process 148 (2001), 36–44.
[12]
K. Fukunaga and L. D. Hostetler, The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Trans Inform Theory 21 (1975), 32–40.
[13]
Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans Pattern Anal Mach Intell 17 (1995), 790–799.
[14]
D. Comaniciu and P. Meer, Mean shift analysis and applications, In Proceedings of the Seventh International Conference on Computer Vision, 1999, 1197–1203.
[15]
D. Comaniciu and P. Meer, Real-time tracking of non-rigid objects using mean shift, IEEE Conf Comput Vis Pattern Recognit 2 (2000), 142–149.
[16]
D. Comaniciu and P. Meer, The variable bandwidth mean shift and data-driven scale selection, Proc 8th Int Conf Comput Vis 1 (2001), 438–445.
[17]
D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans Pattern Anal Mach Intell 24 (2002), 603–619.
[18]
Y. P. Mack and M. Rosenblatt, Multivariate k-nearest neighbour density estimates, J Multivariate Anal 9 (1979), 1–15.
[19]
X. Wang, W. Qiu, and R. Zamar, CLUES: A non-parametric clustering method based on local shrinking, Comput Stat Data Anal 52 (2007), 286–298.
[20]
E. H. Ruspini, Numerical methods for fuzzy clustering, Inf Sci 2 (1970), 319–350.
[21]
R. A. Fisher, The use of multiple measurements in taxonomic problems, Ann Eugenic 7 (1936), 179–188.
[22]
R. B. Calinski and J. Harabasz, A dendrite method for cluster analysis, Comm Stat 3 (1974), 1–27.
[23]
D. Peña and F. J. Prieto, Cluster identification using projections, J Am Stat Assoc 96 (2001), 1433–1445.
[24]
J. Einbeck, Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage, J Pattern Recognit Res 2 (2011), 175–192.
[25]
G. W. Milligan and M. C. Cooper, An examination of procedures for determining the number of clusters in a dataset, Psychometrika 50 (1985), 159–179.
[26]
L. Hubert and P. Arabie, Comparing partitions, J Classif 2 (1985), 193–218.
[27]
F. Chang, W. Qiu, R. Zamar, R. Lazarus, and X. Wang, clues: an R package for nonparametric clustering based on local shrinking, J Stat Softw 33 (2010), 1–16.

Index Terms

  1. Nearest-neighbors medians clustering
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Statistical Analysis and Data Mining
    Statistical Analysis and Data Mining  Volume 5, Issue 4
    August 2012
    88 pages

    Publisher

    John Wiley & Sons, Inc.

    United States

    Publication History

    Published: 01 August 2012

    Author Tags

    1. cluster analysis
    2. local median
    3. nearest neighbors
    4. number of clusters

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media