Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2851613.2851661acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

ClusMAM: fast and effective unsupervised clustering of large complex datasets using metric access methods

Published: 04 April 2016 Publication History

Abstract

An efficient and effective clustering process is a core task of data mining analysis, and has become more important in the nowadays scenario of big data, where scalability is an issue. In this paper we present the ClusMAM method, which proposes a new strategy for clustering large complex datasets through metric access methods. ClusMAM aims at accelerating the process of relational partitional clustering by taking advantage of the inherent node separations of metric access methods. In comparison with other methods from the literature, ClusMAM is up to four orders of magnitude faster than the competitors maintaining clustering quality. Additionally, ClusMAM exploits the datasets to find compact and coherent clusters, suggesting the number of clusters k found in the data. The method was evaluated employing synthetic and real datasets, and the behavior of the method was consistent regarding the number of distance calculations and time required for the clustering process as well.

References

[1]
A. K. Jain, Data clustering: 50 years beyond K-means, Pattern recognition letters, Vol. 31, no. 8, pp. 651--666, 2010.
[2]
R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, VLDB, pp. 144--155, 1994.
[3]
L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons, 2005.
[4]
M. C. N. Barioni, H. L. Razente, A. J. M. Traina and C. Traina Jr., Accelerating k-medoid-based algorithms through metric access methods, Journal of Systems and Software, vol. 81, no. 3, pp. 342--355, 2008.
[5]
P. Ciaccia, M. Patella and P. Zezula, M-tree: An efficient access method for similarity search in metric spaces, VLDB, pp. 426--435, 1997.
[6]
J. A. de Souza, H. L. Razente and M. C. N. Barioni, Optimizing metric access methods for querying and mining complex data types, Journal of the Brazilian Computer Society, vol. 20, no. 17, pp. 1--14, 2014.
[7]
P. Zezula, G. Amato, V. Dohnal and M. Batko, Similarity Search: The Metric Space Approach, vol. 32, Springer, 2006.
[8]
A. Thomasian, Singular Value Decomposition, Clustering, and Indexing for Similarity Search for Large Data Sets in High-Dimensional Spaces, In BIG DATA Algorithms, Analytics, and Applications, pp. 40--68, CRC, 2015.
[9]
M. Ester, H.-P. Kriegel and X. Xu, Knowledge discovery in large spatial databases: focusing techniques for efficient class identification, SSD, vol. 951, pp. 67--82, 1995.
[10]
J.-J. Hwang, K.-Y. Whang, Y.-S. Moon and B.-S. Lee, A top-down approach for density-based clustering using multidimensional indexes, Journal of Systems and Software, vol. 73, pp. 169--180, 2003.
[11]
H. P. Lai, M. Visani, A. Boucher and J.-M. Ogier, An experimental comparison of clustering methods for content-based indexing of large image databases, Pattern Analysis and Applications, vol. 15, no. 4, pp. 345--366, 2012.
[12]
L. Vendramin, R. J. G. B. Campello and E. R. Hruschka, Relative clustering validiy criteria: A comparative overview, Journal Statistical Analysis and Data Minig, vol. 3, no. 4, pp. 209--235, 2010.
[13]
L. P. S. Avalhais, S. F. da Silva, J. F. Rodrigues Jr., A. J. M. Traina and C. Traina Jr., Feature Space Optimization for Content-Based Image Retrieval, ACM SIGAPP, vol. 12, no. 3, pp. 7--19, 2012.
[14]
I. Färber, S. Günnemann, H-P. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl and A. Zimek, On using class-labels in evaluation of clusterings, MultiClust, 2010.

Index Terms

  1. ClusMAM: fast and effective unsupervised clustering of large complex datasets using metric access methods

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing
      April 2016
      2360 pages
      ISBN:9781450337397
      DOI:10.1145/2851613
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. complex datasets
      2. metric access methods
      3. multimedia indexing
      4. unsupervised clustering

      Qualifiers

      • Research-article

      Conference

      SAC 2016
      Sponsor:
      SAC 2016: Symposium on Applied Computing
      April 4 - 8, 2016
      Pisa, Italy

      Acceptance Rates

      SAC '16 Paper Acceptance Rate 252 of 1,047 submissions, 24%;
      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 56
        Total Downloads
      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media