Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3292500.3330987acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Clustering without Over-Representation

Published: 25 July 2019 Publication History

Abstract

In this paper we consider clustering problems in which each point is endowed with a color. The goal is to cluster the points to minimize the classical clustering cost but with the additional constraint that no color is over-represented in any cluster. This problem is motivated by practical clustering settings, e.g., in clustering news articles where the color of an article is its source, it is preferable that no single news source dominates any cluster. For the most general version of this problem, we obtain an algorithm that has provable guarantees of performance; our algorithm is based on finding a fractional solution using a linear program and rounding the solution subsequently. For the special case of the problem where no color has an absolute majority in any cluster, we obtain a simpler combinatorial algorithm also with provable guarantees. Experiments on real-world data shows that our algorithms are effective in finding good clustering without over-representation.

References

[1]
Sugato Basu, Ian Davidson, and Kiri Wagstaff. 2008. Constrained Clustering: Algorithms, Applications and Theory .CRC Press.
[2]
Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. DMKD, Vol. 21, 2 (2010), 277--292.
[3]
L Elisa Celis, Lingxiao Huang, and Nisheeth K Vishnoi. 2018a. Multiwinner Voting with Fairness Constraints. In IJCAI. 144--151.
[4]
L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018b. Ranking with Fairness Constraints. (2018), 28:1--28:15.
[5]
Miriam Cha, Youngjune Gwon, and HT Kung. 2017. Language modeling by clustering with word embeddings for text readability assessment. In CIKM .
[6]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair clustering through fairlets. In NIPS. 5029--5037.
[7]
Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. (2017). http://archive.ics.uci.edu/ml
[8]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In ITCS. 214--226.
[9]
Martin E Dyer and Alan M Frieze. 1985. On the complexity of partitioning graphs into connected subgraphs. Discrete Applied Mathematics, Vol. 10, 2 (1985), 139--153.
[10]
Alessandro Epasto, Mohammad Mahdian, Vahab S. Mirrokni, and Song Zuo. 2018. Incentive-Aware Learning for Large Markets. In WWW. 1369--1378.
[11]
Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In KDD. 259--268.
[12]
Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. 2016. A confidence-based approach for balancing fairness and accuracy. In SDM. 144--152.
[13]
Teofilo F Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. TCS, Vol. 38 (1985), 293--306.
[14]
Abdulmecit Gungor. 2018. Fifty Victorian Era Novelists Authorship Attribution Data. (2018).
[15]
Dorit S Hochbaum and David B Shmoys. 1985. A best possible heuristic for the k-center problem. Mathematics of operations research, Vol. 10, 2 (1985), 180--184.
[16]
Wen-Lian Hsu and George L Nemhauser. 1979. Easy and hard bottleneck location problems. Discrete Applied Mathematics, Vol. 1, 3 (1979), 209--215.
[17]
Anil K Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, Vol. 31, 8 (2010), 651--666.
[18]
Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. 2016. Fairness in learning: Classic and contextual bandits. In NIPS. 325--333.
[19]
Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In PKDD. 35--50.
[20]
Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. 2011. Fairness-aware learning through regularization approach. In ICDMW. 643--650.
[21]
David G. Kirkpatrick and Pavol Hell. 1983. On the complexity of general graph factor problems. SIAM J. Comput., Vol. 12, 3 (1983), 601--609.
[22]
Jian Li, Ke Yi, and Qin Zhang. 2010. Clustering with Diversity. In ICALP. 188--200.
[23]
Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller. 2014a. Focused clustering and outlier detection in large attributed graphs. In KDD . 1346--1355.
[24]
B. Perozzi, R. Al-Rfou, and S. Skiena. 2014b. DeepWalk: Online Learning of Social Representations. In KDD . 701--710.
[25]
Clemens Rösner and Melanie Schmidt. 2018. Privacy preserving clustering with constraints. arXiv preprint arXiv:1802.02497 (2018).
[26]
Ke Yang and Julia Stoyanovich. 2017. Measuring fairness in ranked outputs. In SSDBM .

Cited By

View all
  • (2024)New Algorithms for Distributed Fair k-Center Clustering: Almost Accurate as Sequential AlgorithmsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663057(1938-1946)Online publication date: 6-May-2024
  • (2024)DFMVC: Deep Fair Multi-view ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681099(8090-8099)Online publication date: 28-Oct-2024
  • (2024)Fast and Accurate Fair k-Center Clustering in Doubling MetricsProceedings of the ACM Web Conference 202410.1145/3589334.3645568(756-767)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. fairness
  3. k-center
  4. representation constraint

Qualifiers

  • Research-article

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)204
  • Downloads (Last 6 weeks)29
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)New Algorithms for Distributed Fair k-Center Clustering: Almost Accurate as Sequential AlgorithmsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663057(1938-1946)Online publication date: 6-May-2024
  • (2024)DFMVC: Deep Fair Multi-view ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681099(8090-8099)Online publication date: 28-Oct-2024
  • (2024)Fast and Accurate Fair k-Center Clustering in Doubling MetricsProceedings of the ACM Web Conference 202410.1145/3589334.3645568(756-767)Online publication date: 13-May-2024
  • (2024)From a Timeline Contact Graph to Close Contact Tracing and Infection Diffusion InterventionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342347636:12(8328-8340)Online publication date: Dec-2024
  • (2024)New Algorithms for Fair k-Center Problem with Outliers and Capacity ConstraintsTheoretical Computer Science10.1016/j.tcs.2024.114515(114515)Online publication date: Mar-2024
  • (2024)Connected k-Center and k-Diameter ClusteringAlgorithmica10.1007/s00453-024-01266-986:11(3425-3464)Online publication date: 2-Sep-2024
  • (2024)Fair Densest Subgraph Across Multiple GraphsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70362-1_1(3-19)Online publication date: 22-Aug-2024
  • (2024)Fair Minimum Representation ClusteringIntegration of Constraint Programming, Artificial Intelligence, and Operations Research10.1007/978-3-031-60599-4_2(20-37)Online publication date: 25-May-2024
  • (2023)Doubly constrained fair clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666704(13267-13293)Online publication date: 10-Dec-2023
  • (2023)Feature-based Individual Fairness in k-clusteringProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599073(2772-2774)Online publication date: 30-May-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media