Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1390156.1390206acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Statistical models for partial membership

Published: 05 July 2008 Publication History

Abstract

We present a principled Bayesian framework for modeling partial memberships of data points to clusters. Unlike a standard mixture model which assumes that each data point belongs to one and only one mixture component, or cluster, a partial membership model allows data points to have fractional membership in multiple clusters. Algorithms which assign data points partial memberships to clusters can be useful for tasks such as clustering genes based on microarray data (Gasch & Eisen, 2002). Our Bayesian Partial Membership Model (BPM) uses exponential family distributions to model each cluster, and a product of these distibtutions, with weighted parameters, to model each datapoint. Here the weights correspond to the degree to which the datapoint belongs to each cluster. All parameters in the BPM are continuous, so we can use Hybrid Monte Carlo to perform inference and learning. We discuss relationships between the BPM and Latent Dirichlet Allocation, Mixed Membership models, Exponential Family PCA, and fuzzy clustering. Lastly, we show some experimental results and discuss nonparametric extensions to our model.

References

[1]
Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. Kluwer.
[2]
Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. JMLR.
[3]
Buntine, W., & Jakulin, A. (2006). LNCS, vol. 3940, chapter Discrete Component Analysis. Springer.
[4]
Collins, M., Dasgupta, S., & Schapire, R. (2002). A generalization of principal components analysis to the exponential family. NIPS.
[5]
Erosheva, E., Fienberg, S., & Lafferty, J. (2004). Mixed membership models of scientific publications. PNAS.
[6]
Gasch, A., & Eisen, M. (2002). Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol., 3.
[7]
Griffiths, T., & Ghahramani, Z. (2005). Infinite latent feature models and the indian buffet process (Technical Report). Gatsby Computational Neuroscience Unit.
[8]
Heller, K., & Ghahramani, Z. (2007). A nonparametric bayesian approach to modeling overlapping clusters. AISTATS.
[9]
Hinton, G. (1999). Products of experts. ICANN.
[10]
Jakulin, A. (2004). http://www.ailab.si/aleks/politics/.
[11]
Kosko, B. (1992). Neural networks and fuzzy systems. Prentice Hall.
[12]
MacKay, D. (2003). Information theory, inference, and learning algorithms. Cambridge University Press.
[13]
Neal, R. (1993). Probabilistic inference using markov chain monte carlo methods (Technical Report). University of Toronto.
[14]
Teh, Y., Jordan, M., Beal, M., & Blei, D. (2006). Hierarchical dirichlet processes. JASA, 101.
[15]
Zadeh, L. (1965). Fuzzy sets. Info. and Control, 8.

Cited By

View all
  • (2024)Mixtures of Probit Regression Models with Overlapping ClustersBayesian Analysis10.1214/23-BA137219:3Online publication date: 1-Sep-2024
  • (2024)Functional Mixed Membership ModelsJournal of Computational and Graphical Statistics10.1080/10618600.2024.2304633(1-18)Online publication date: 10-Jan-2024
  • (2022)Factor and hybrid components for model-based clusteringAdvances in Data Analysis and Classification10.1007/s11634-021-00483-216:2(373-398)Online publication date: 17-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Pascal
  • University of Helsinki
  • Xerox
  • Federation of Finnish Learned Societies
  • Google Inc.
  • NSF
  • Machine Learning Journal/Springer
  • Microsoft Research: Microsoft Research
  • Intel: Intel
  • Yahoo!
  • Helsinki Institute for Information Technology
  • IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '08
Sponsor:
  • Microsoft Research
  • Intel
  • IBM

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mixtures of Probit Regression Models with Overlapping ClustersBayesian Analysis10.1214/23-BA137219:3Online publication date: 1-Sep-2024
  • (2024)Functional Mixed Membership ModelsJournal of Computational and Graphical Statistics10.1080/10618600.2024.2304633(1-18)Online publication date: 10-Jan-2024
  • (2022)Factor and hybrid components for model-based clusteringAdvances in Data Analysis and Classification10.1007/s11634-021-00483-216:2(373-398)Online publication date: 17-Jan-2022
  • (2021)Spatio-Temporal Mixed Membership Models for Criminal ActivityJournal of the Royal Statistical Society Series A: Statistics in Society10.1111/rssa.12642184:4(1220-1244)Online publication date: 27-Jan-2021
  • (2021)Chimeral ClusteringJournal of Classification10.1007/s00357-021-09396-339:1(171-190)Online publication date: 2-Oct-2021
  • (2020)Model-Based ClusteringAn Introduction to Clustering with R10.1007/978-981-13-0553-5_6(215-289)Online publication date: 28-Aug-2020
  • (2017)Partial Membership Latent Dirichlet Allocation for Soft Image SegmentationIEEE Transactions on Image Processing10.1109/TIP.2017.273641926:12(5590-5602)Online publication date: Dec-2017
  • (2017)Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP‐Seq dataBiometrical Journal10.1002/bimj.20160013159:6(1301-1316)Online publication date: 30-Jun-2017
  • (2016)Partial membership latent Dirichlet allocation for image segmentation2016 23rd International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2016.7899990(2368-2373)Online publication date: Dec-2016
  • (2015)Identifying synonymy between relational phrases using word embeddingsJournal of Biomedical Informatics10.1016/j.jbi.2015.05.01056:C(94-102)Online publication date: 1-Aug-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media