A Probabilistic Clustering-Projection Model for Discrete Data

Yu, Shipeng; Yu, Kai; Tresp, Volker; Kriegel, Hans-Peter

doi:10.1007/11564126_41

Shipeng Yu^23,24,
Kai Yu²⁴,
Volker Tresp²⁴ &
…
Hans-Peter Kriegel²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3721))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3006 Accesses

Abstract

For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a low-dimensional latent space for words, and clustering aims at grouping documents based on their feature representations. In general projection and clustering are studied independently, but they both represent the intrinsic structure of data and should reinforce each other. In this paper we introduce a probabilistic clustering-projection (PCP) model for discrete data, where they are both represented in a unified framework. Clustering is seen to be performed in the projected space, and projection explicitly considers clustering structure. Iterating the two operations turns out to be exactly the variational EM algorithm under Bayesian model inference, and thus is guaranteed to improve the data likelihood. The model is evaluated on two text data sets, both showing very encouraging results.

Download to read the full chapter text

Chapter PDF

Random Projections with Bayesian Priors

Random Projection Towards the Baire Metric for High Dimensional Clustering

Diagonal latent block model for binary data

Article 29 June 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Buntine, W., Perttu, S.: Is multinomial PCA multi-faceted clustering or dimensionality reduction? In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, pp. 300–307 (2003)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. SIGKDD, 269–274 (2001)
Google Scholar
Ding, C., He, X., Zha, H., Simon, H.D.: Adaptive dimension reduction for clustering high dimensional data. In: ICDM, pp. 147–154 (2002)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd Annual ACM SIGIR Conference, Berkeley, California, August 1999, pp. 50–57 (1999)
Google Scholar
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM 1625 (1998)
Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T., Saul, L.K.: An introduction to variational methods for graphical models. Machine Learning 37(2), 183–233 (1999)
Article MATH Google Scholar
Keller, M., Bengio, S.: Theme Topic Mixture Model: A Graphical Model for Document Representation (January 2004)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Li, T., Ma, S., Ogihara, M.: Document clustering via adaptive subspace iteration. In: Proceedings of SIGIR (2004)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of SIGIR, pp. 267–273 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, University of Munich, Germany
Shipeng Yu & Hans-Peter Kriegel
Siemens Corporate Technology, Munich, Germany
Shipeng Yu, Kai Yu & Volker Tresp

Authors

Shipeng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yu
View author publications
You can also search for this author in PubMed Google Scholar
Volker Tresp
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6, 4050-190, Porto, Portugal
Luís Torgo
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel Brazdil
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
Faculty of Economics of the University of Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, S., Yu, K., Tresp, V., Kriegel, HP. (2005). A Probabilistic Clustering-Projection Model for Discrete Data. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_41

Download citation

DOI: https://doi.org/10.1007/11564126_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Probabilistic Clustering-Projection Model for Discrete Data

Abstract

Chapter PDF

Similar content being viewed by others

Random Projections with Bayesian Priors

Random Projection Towards the Baire Metric for High Dimensional Clustering

Diagonal latent block model for binary data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Probabilistic Clustering-Projection Model for Discrete Data

Abstract

Chapter PDF

Similar content being viewed by others

Random Projections with Bayesian Priors

Random Projection Towards the Baire Metric for High Dimensional Clustering

Diagonal latent block model for binary data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation