Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3410566.3410601acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Local connectivity in centroid clustering

Published: 25 August 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Clustering is a fundamental task in unsupervised learning, one that targets to group a dataset into clusters of similar objects. There has been recent interest in embedding normative considerations around fairness within clustering formulations. In this paper, we propose 'local connectivity' as a crucial factor in assessing membership desert in centroid clustering. We use local connectivity to refer to the support offered by the local neighborhood of an object towards supporting its membership to the cluster in question. We motivate the need to consider local connectivity of objects in cluster assignment, and provide ways to quantify local connectivity in a given clustering. We then exploit concepts from density-based clustering and devise LOFKM, a clustering method that seeks to deepen local connectivity in clustering outputs, while staying within the framework of centroid clustering. Through an empirical evaluation over real-world datasets, we illustrate that LOFKM achieves notable improvements in local connectivity at reasonable costs to clustering quality, illustrating the effectiveness of the method.

    References

    [1]
    Savitha Abraham, P Deepak, and Sowmya Sundaram. 2020. Fairness in Clustering with Multiple Sensitive Attributes. In EDBT.
    [2]
    Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: ordering points to identify the clustering structure. ACM Sigmod record 28, 2 (1999), 49--60.
    [3]
    Kasra Babaei, ZhiYuan Chen, and Tomas Maul. 2019. Detecting Point Outliers Using Prune-based Outlier Factor (PLOF). arXiv preprint arXiv:1911.01654 (2019).
    [4]
    Vipin Balachandran, P Deepak, and Deepak Khemani. 2012. Interpretable and reconfigurable clustering of document datasets by deriving word-based rules. Knowledge and information systems 32, 3 (2012), 475--503.
    [5]
    Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. 2019. Fair algorithms for clustering. In Advances in Neural Information Processing Systems. 4955--4966.
    [6]
    Reuben Binns. 2020. On the apparent conflict between individual and group fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 514--524.
    [7]
    Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 93--104.
    [8]
    Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. 2019. Proportionally Fair Clustering. In ICML.
    [9]
    Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair clustering through fairlets. In NIPS. 5029--5037.
    [10]
    Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.
    [11]
    Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters 31, 8 (2010), 651--666.
    [12]
    Anil K Jain, M Narasimha Murty, and Patrick J Flynn. 1999. Data clustering: a review. ACM computing surveys (CSUR) 31, 3 (1999), 264--323.
    [13]
    Rawls John. 1971. A theory of justice. (1971).
    [14]
    Jihwan Lee and Nam-Wook Cho. 2016. Fast outlier detection using a grid-based algorithm. PloS one 11, 11 (2016).
    [15]
    Michele Loi and Markus Christen. 2019. How to Include Ethics in Machine Learning Research. ERCIM News 116, 3 (2019).
    [16]
    James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297.
    [17]
    Fionn Murtagh. 1983. A survey of recent advances in hierarchical clustering algorithms. The computer journal 26, 4 (1983), 354--359.
    [18]
    Deepak P. 2020. Whither Fair Clustering?. In AI for Social Good Workshop. Harvard CRCS.
    [19]
    Deepak P and Savitha Sam Abraham. 2020. Representativity Fairness in Clustering. In ACM Web Science.
    [20]
    Leonard KAUFMAN Peter J RDUSSEEUN. 1987. Clustering by means of medoids. (1987).
    [21]
    Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53--65.
    [22]
    Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1--21.
    [23]
    Éric D Taillard. 2003. Heuristic methods for large centroid clustering problems. Journal of heuristics 9, 1 (2003), 51--73.

    Index Terms

    1. Local connectivity in centroid clustering

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      IDEAS '20: Proceedings of the 24th Symposium on International Database Engineering & Applications
      August 2020
      252 pages
      ISBN:9781450375030
      DOI:10.1145/3410566
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. local connectivity
      3. normative considerations

      Qualifiers

      • Research-article

      Conference

      IDEAS 2020

      Acceptance Rates

      IDEAS '20 Paper Acceptance Rate 27 of 57 submissions, 47%;
      Overall Acceptance Rate 74 of 210 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 27
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media