Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

PHA: A fast potential-based hierarchical agglomerative clustering method

Published: 01 May 2013 Publication History
  • Get Citation Alerts
  • Abstract

    A novel potential-based hierarchical agglomerative (PHA) clustering method is proposed. In this method, we first construct a hypothetical potential field of all the data points, and show that this potential field is closely related to nonparametric estimation of the global probability density function of the data points. Then we propose a new similarity metric incorporating both the potential field which represents global data distribution information and the distance matrix which represents local data distribution information. Finally we develop another equivalent similarity metric based on an edge weighted tree of all the data points, which leads to a fast agglomerative clustering algorithm with time complexity O(N^2). The proposed PHA method is evaluated by comparing with six other typical agglomerative clustering methods on four synthetic data sets and two real data sets. Experiments show that it runs much faster than the other methods and produces the most satisfying results in most cases.

    References

    [1]
    Omran, M.G., Engelbrecht, A.P. and Salman, A., An overview of clustering methods. Intelligent Data Analysis. v11 i6. 583-605.
    [2]
    Xu, R. and Wunsch, D.I.I., Survey of clustering algorithms. IEEE Transactions on Neural Networks. v16 i3. 645-678.
    [3]
    H. Yu, M. Gerstein, Genomic analysis of the hierarchical structure of regulatory networks, in: Proceedings of the National Academy of Sciences of USA, October 3, 2006, 103(40), pp. 14724-14731.
    [4]
    Loewenstein, Y., Elon, P., Fromer, M. and Linial, M., Efficient algorithms for accurate hierarchical clustering of huge datasets: Tackling the entire protein space. Bioinformatics. v24. i41-9.
    [5]
    M. Balcan, P. Gupta, Robust hierarchical clustering, in: Proceedings of the 23rd Conference on Learning Theory (COLT), Haifa, Israel, June 27-29, 2010, pp. 282-294.
    [6]
    K.A. Heller, Z. Ghahramani, Bayesian hierarchical clustering, in: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, August 7-11, 2005, vol. 22, pp. 297-304.
    [7]
    Teh, Y.W., Daume III, H. and Roy, D., Bayesian agglomerative clustering with coalescents. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S. (Eds.), Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA. pp. 1473-1480.
    [8]
    Jain, A.K., Data clustering: 50 years beyond K-means. Pattern Recognition Letters. v31. 651-666.
    [9]
    Hore, P., Hall, L.O. and Goldgof, D.B., A scalable framework for cluster ensembles. Pattern Recognition. v42 i5. 676-688.
    [10]
    Mirzaei, A. and Rahmati, M., A novel hierarchical clustering combination scheme based on fuzzy-similarity relations. IEEE Transactions on Fuzzy Systems. v18 i1. 27-39.
    [11]
    S. Shi, G. Yang, D. Wang, W. Zheng, Potential-based hierarchical blustering, in: Proceedings of the 16th International Conference on Pattern Recognition, Quebec, Canada, August 11-15, 2002, vol. 4, pp. 272-275.
    [12]
    H. Yamachi, Y. Kambayashi, Y. Tsujimura, A clustering method based on potential field, in: Proceedings of the 10th Asia Pacific Industrial Engineering and Management System Conference (APIEMS), Kitakyushu, Japan, Dec. 14-16, 2009, pp. 846-855.
    [13]
    Li, J. and Fu, H., Molecular dynamics-like data clustering approach. Pattern Recognition. v44. 1721-1737.
    [14]
    SPICKER: A clustering approach to identify near-native protein folds. Journal of Computational Chemistry. v25 i6. 865-871.
    [15]
    Lu, Y. and Wan, Y., Clustering by sorting potential values (CSPV): A novel potential-based clustering method. Pattern Recognition. v45 i9. 3512-3522.
    [16]
    J. Kleinberg, An impossibility theorem for clustering. in: S. Becker, S. Thrun, K. Obermayer (Eds.), Proceedings of the Advances in Neural Information Processing Systems (NIPS) 15, Vancouver, British Columbia, Canada, December 9-14, 2002, pp. 463-470.
    [17]
    Parzen, E., On estimation of a probability density function and mode. Annals of Mathematical Statistics. v33. 1065-1076.
    [18]
    A. Frank, A. Asuncion, UCI Machine Learning Repository {http://archive.ics.uci.edu/ml}, 2010.
    [19]
    Fowlkes, E.B. and Mallows, C.L., A method for comparing two hierarchical clusterings. Journal of the American Statistical Association. v78. 553-569.

    Cited By

    View all

    Index Terms

    1. PHA: A fast potential-based hierarchical agglomerative clustering method
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition
      Pattern Recognition  Volume 46, Issue 5
      May, 2013
      296 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 May 2013

      Author Tags

      1. Algorithm
      2. Clustering
      3. Pattern recognition
      4. Potential field

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media