A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

Ma, Huifang; Zhao, Weizhong; Shi, Zhongzhi

doi:10.1007/s10115-012-0560-3

A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

Regular Paper
Published: 13 October 2012

Volume 36, pages 629–651, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Huifang Ma¹,
Weizhong Zhao² &
Zhongzhi Shi³

595 Accesses
Explore all metrics

Abstract

In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Article 01 December 2020

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Regularized bi-directional co-clustering

Article 10 April 2021

References

Banerjee A, Dhillon L et al (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 509–514
Basu S, Banerjee A et al (2002) Semi-supervised clustering by seeding. In: Proceedings of the 19th ICML international conference on, machine learning, pp 27–34
Basu S, Bilenko M et al (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68
Beil F, Ester M et al (2002) Frequent term-based text clustering. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 436–442
Berry MW, Browne M et al (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52:155–173
Article MathSciNet MATH Google Scholar
Bission G, Hussain F (2008) Chi-sim: a new similarity measure for the co-clustering task. In: Proceedings of the 7th international conference on machine learning and applications, pp 211–217
Chang H, Yeung DY (2006) Locally linear metric adaptation for semi-supervised clustering and image retrieval. Pattern Recognit 39(7):1253–1264
Article MATH Google Scholar
Chen Y, Rege M et al (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17(3):355–379
Article Google Scholar
Chen Y, Wang L J et al (2009) Semi-supervised document clustering with simultaneous text representation and categorization. Mach Learn Knowl Discov Databases 5781:211–226
Google Scholar
Chen Y, Wang L et al (2010) Non-negative matrix factorization for semi-supervised heterogeneous data co-clustering. IEEE Trans Knowl Data Eng 22(10):1459–1474
Article Google Scholar
Cover TM, Thomas JA (1991) Elements of information theory. Wiley-Interscience, NewYork
Book MATH Google Scholar
Davidson I, Ravi T (2005) Clustering with constraints: feasibility issues and the FK-means algorithm. In: Proceedings of the 5th SIAM international conference on data mining, pp 138–149
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175
Article MATH Google Scholar
Dhillon IS, Mallela S et al (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 89–98
Ding CH, Li T et al (2008) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 99(1):195–197
Google Scholar
Ding CH, Li T et al (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135
Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 359–367
Ho ND (2008) Nonnegative matrix factorization-algorithms and applications. PhD thesis, Université catholique de Louvain, Belgium
Hu G, Zhou S et al (2008) Toward effective document clustering: a constrained K-means based approach. Inf Process Manag 44(4):1397–1409
Google Scholar
Kalogeratos A, Likas A (2012) Text document clustering using global term context vectors. Knowl Inf Syst 31(3):455–474
Google Scholar
Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 561–566
Klein D, Kamvar S, Manning C (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning, pp 307–314
Kriegel HP, Kröger P, Zimek A (2009) Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58
Article Google Scholar
Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. In: Proceedings of annual conference on neural information processing systems, pp 556–562
Lee H, Yoo J et al (2010) Semi-supervised nonnegative matrix factorization. IEEE Signal Process Lett 46(2):269–294
Google Scholar
Levin M (1998) Mathematical classification and clustering. J Glob Optimiz 12(1):105–108
Article Google Scholar
Li T, Ding C et al (2008) Knowledge transformation from word space to document space. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 187–194
Li, T, Zhang Y et al (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the 47th annual meeting of the ACL and the 4th IJCNLP of the AFNLP, pp 244–252
Lu Z, Leen TK (2007) Penalized probabilistic clustering. Neural Comput 19(6):1528–1567
Article MathSciNet MATH Google Scholar
Mechelen IV, Bock HH, Boeck DP (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394
Article MathSciNet MATH Google Scholar
Ni X, Quan X et al (2011) Short text clustering by finding core terms. Knowl Inf Syst 27(3):345–365
Article Google Scholar
Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126
Article Google Scholar
Rege M, Dong M (2006) Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: Proceedings of the 6th international conference on data mining, pp 532–541
Salton G, Wong A et al (1975) A vector space model for automatic indexing. Commun ACM 18(11): 613–620
Article MATH Google Scholar
Shan H., Banerjee A (2008) Bayesian co-clustering. In: Proceedings of the 8th international conference on data mining, pp 530–539
Song YQ, Pan S et al (2010) Constrained co-clustering for textual documents. In: Proceedings of the 24th AAAI conference on artificial intelligence, pp 581–586
Thurau C, Kersting K et al (2011) Convex non-negative matrix factorization for massive datasets. Knowl Inf Syst 29(2):457–478
Article Google Scholar
Verbeek JJ, Nunnink JRJ et al (2006) Accelerated EM-based clustering of large data sets. Data Min Knowl Discov 13(3):291–307
Article MathSciNet Google Scholar
Wagstaff K, Cardie C et al (2001) Constrained K-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, pp 577–584
Wang F, Li T et al (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of the 8th SIAM international conference on data mining, pp 1–12
Wang P, Domeniconi C et al (2009) Latent dirichlet bayesian co-clustering. Mach Learn Knowl Discov Databases 5782:522–537
Google Scholar
Xing EP, Ng AY et al (2002) Distance metric learning, with application to clustering with side-information. Adv Neural Inf Process Syst 15:502–512
Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th ACM SIGIR conference on research and development in information retrieval, pp 267–273
Yan Y, Chen L X et al (2011) Semi-supervised fuzzy co-clustering algorithm for document categorization. Knowl Inf Syst (published online)
Yin X, Chen S et al (2010) Semi-supervised clustering with metric learning: an adaptive kernel method. Pattern Recognit 43(4):1320–1333
Article MathSciNet MATH Google Scholar
Zhang ZY, Li T et al (2012) Non-negative tri-factor tensor decomposition with applications. Knowl Inf Syst (published online)
Zhao WZ, He Q, Ma HF et al (2011) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 30(3):569–587
Article Google Scholar
Zhu Y, Yu J et al (2012) A novel semi-supervised learning framework with simultaneous text representing. Knowl Inf Syst (published online)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61163039, 61105052), National Basic Research Priorities Programme (No. 2007CB311004), Funding of enhancement of young teachers’ research of Northwest Normal University (No. NWNU-LKQN-10-1), Doctoral Start-up Funding of Xiangtan University (No. 10QDZ42).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, Gansu, China
Huifang Ma
College of Information Engineering, Xiangtan University, Xiangtan, 411105, China
Weizhong Zhao
The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Zhongzhi Shi

Authors

Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Weizhong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huifang Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, H., Zhao, W. & Shi, Z. A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints. Knowl Inf Syst 36, 629–651 (2013). https://doi.org/10.1007/s10115-012-0560-3

Download citation

Received: 05 January 2012
Revised: 21 May 2012
Accepted: 21 July 2012
Published: 13 October 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10115-012-0560-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Regularized bi-directional co-clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now