Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3410566.3410601acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Local connectivity in centroid clustering

Published: 25 August 2020 Publication History

Abstract

Clustering is a fundamental task in unsupervised learning, one that targets to group a dataset into clusters of similar objects. There has been recent interest in embedding normative considerations around fairness within clustering formulations. In this paper, we propose 'local connectivity' as a crucial factor in assessing membership desert in centroid clustering. We use local connectivity to refer to the support offered by the local neighborhood of an object towards supporting its membership to the cluster in question. We motivate the need to consider local connectivity of objects in cluster assignment, and provide ways to quantify local connectivity in a given clustering. We then exploit concepts from density-based clustering and devise LOFKM, a clustering method that seeks to deepen local connectivity in clustering outputs, while staying within the framework of centroid clustering. Through an empirical evaluation over real-world datasets, we illustrate that LOFKM achieves notable improvements in local connectivity at reasonable costs to clustering quality, illustrating the effectiveness of the method.

References

[1]
Savitha Abraham, P Deepak, and Sowmya Sundaram. 2020. Fairness in Clustering with Multiple Sensitive Attributes. In EDBT.
[2]
Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: ordering points to identify the clustering structure. ACM Sigmod record 28, 2 (1999), 49--60.
[3]
Kasra Babaei, ZhiYuan Chen, and Tomas Maul. 2019. Detecting Point Outliers Using Prune-based Outlier Factor (PLOF). arXiv preprint arXiv:1911.01654 (2019).
[4]
Vipin Balachandran, P Deepak, and Deepak Khemani. 2012. Interpretable and reconfigurable clustering of document datasets by deriving word-based rules. Knowledge and information systems 32, 3 (2012), 475--503.
[5]
Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. 2019. Fair algorithms for clustering. In Advances in Neural Information Processing Systems. 4955--4966.
[6]
Reuben Binns. 2020. On the apparent conflict between individual and group fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 514--524.
[7]
Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 93--104.
[8]
Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. 2019. Proportionally Fair Clustering. In ICML.
[9]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair clustering through fairlets. In NIPS. 5029--5037.
[10]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, Vol. 96. 226--231.
[11]
Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters 31, 8 (2010), 651--666.
[12]
Anil K Jain, M Narasimha Murty, and Patrick J Flynn. 1999. Data clustering: a review. ACM computing surveys (CSUR) 31, 3 (1999), 264--323.
[13]
Rawls John. 1971. A theory of justice. (1971).
[14]
Jihwan Lee and Nam-Wook Cho. 2016. Fast outlier detection using a grid-based algorithm. PloS one 11, 11 (2016).
[15]
Michele Loi and Markus Christen. 2019. How to Include Ethics in Machine Learning Research. ERCIM News 116, 3 (2019).
[16]
James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297.
[17]
Fionn Murtagh. 1983. A survey of recent advances in hierarchical clustering algorithms. The computer journal 26, 4 (1983), 354--359.
[18]
Deepak P. 2020. Whither Fair Clustering?. In AI for Social Good Workshop. Harvard CRCS.
[19]
Deepak P and Savitha Sam Abraham. 2020. Representativity Fairness in Clustering. In ACM Web Science.
[20]
Leonard KAUFMAN Peter J RDUSSEEUN. 1987. Clustering by means of medoids. (1987).
[21]
Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53--65.
[22]
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1--21.
[23]
Éric D Taillard. 2003. Heuristic methods for large centroid clustering problems. Journal of heuristics 9, 1 (2003), 51--73.

Index Terms

  1. Local connectivity in centroid clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IDEAS '20: Proceedings of the 24th Symposium on International Database Engineering & Applications
    August 2020
    252 pages
    ISBN:9781450375030
    DOI:10.1145/3410566
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. local connectivity
    3. normative considerations

    Qualifiers

    • Research-article

    Conference

    IDEAS 2020

    Acceptance Rates

    IDEAS '20 Paper Acceptance Rate 27 of 57 submissions, 47%;
    Overall Acceptance Rate 74 of 210 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 27
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media