Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3132847.3133157acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

A Way to Boost Semi-NMF for Document Clustering

Published: 06 November 2017 Publication History

Abstract

Semi-Non Negative Matrix Factorization (Semi-NMF) is one of the most popular extensions of NMF, it extends the applicable range of NMF models, to data having mixed signs, as well as strengthens their relation to clustering. However, Semi-NMF has been found to perform somewhat less than NMF, in terms of clustering, when applied to positive data such as text, which we focus on. Inspired by the recent success of neural word embedding models, e.g., word2vec, in learning high quality real valued vector representations of words, we propose to integrate a word embedding model into Semi-NMF. This allows Semi-NMF to capture more semantic relationships among words and, thereby, to infer document factors that are even better for clustering. The combination of Semi-NMF and word embedding noticeably improves the performance of NMF models, in terms of both clustering and embedding, as illustrated in our experiments.

References

[1]
Melissa Ailem, Aghiles Salah, and Mohamed Nadif. 2017. Non-negative Matrix Factorization Meets Word Embedding ACM SIGIR. 1081--1084.
[2]
Kais Allab, Lazhar Labiod, and Mohamed Nadif. 2017. A semi-NMF-PCA unified framework for data clustering. IEEE TKDE, Vol. 29, 1 (2017), 2--16.
[3]
Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE TPAMI, Vol. 33, 8 (2011), 1548--1560.
[4]
Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Mach. Learn., Vol. 42, 1--2 (2001), 143--175.
[5]
Chris Ding, Xiaofeng He, and Horst D Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering SDM. 606--610.
[6]
Chris HQ Ding, Tao Li, and Michael I Jordan. 2010. Convex and semi-nonnegative matrix factorizations. IEEE TPAMI, Vol. 32, 1 (2010), 45--55.
[7]
Zellig S Harris. 1954. Distributional structure. Word, Vol. 10, 2--3 (1954), 146--162.
[8]
Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. NIPS. 556--562.
[9]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization NIPS. 2177--2185.
[10]
Tao Li. 2005. A general model for clustering binary data. In ACM SIGKDD. 188--197.
[11]
Tao Li and Chris HQ Ding. 2013. Nonnegative Matrix Factorizations for Clustering: A Survey. (2013).
[12]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119.
[13]
Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. JMLR Vol. 3 (2002), 583--617.
[14]
Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization SIGIR. 267--273.
[15]
Zhirong Yang and Erkki Oja. 2010. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks Vol. 21, 5 (2010), 734--749.
[16]
Jiho Yoo and Seungjin Choi. 2008. Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds IDEAL. 140--147.

Cited By

View all
  • (2022)A Survey of Community Detection in Complex Networks Using Nonnegative Matrix FactorizationIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.31144199:2(440-457)Online publication date: Apr-2022
  • (2021)Similarity preserving overlapping community detection in signed networksFuture Generation Computer Systems10.1016/j.future.2020.10.034116(275-290)Online publication date: Mar-2021
  • (2021)Wasserstein Embeddings for Nonnegative Matrix FactorizationMachine Learning, Optimization, and Data Science10.1007/978-3-030-64583-0_29(309-321)Online publication date: 8-Jan-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document clustering
  2. semi-nonnegative matrix factorization
  3. word embedding

Qualifiers

  • Short-paper

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Survey of Community Detection in Complex Networks Using Nonnegative Matrix FactorizationIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.31144199:2(440-457)Online publication date: Apr-2022
  • (2021)Similarity preserving overlapping community detection in signed networksFuture Generation Computer Systems10.1016/j.future.2020.10.034116(275-290)Online publication date: Mar-2021
  • (2021)Wasserstein Embeddings for Nonnegative Matrix FactorizationMachine Learning, Optimization, and Data Science10.1007/978-3-030-64583-0_29(309-321)Online publication date: 8-Jan-2021
  • (2020)A Consensus Approach to Improve NMF Document ClusteringAdvances in Intelligent Data Analysis XVIII10.1007/978-3-030-44584-3_14(171-183)Online publication date: 22-Apr-2020
  • (2019)A Consensus Clustering Algorithm for Multitask Multiview Learning2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE47853.2019.9170399(1098-1106)Online publication date: Nov-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media