In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.

This work is supported by the National Natural Science Foundation of China (No. 61163039, 61105052), National Basic Research Priorities Programme (No. 2007CB311004), Funding of enhancement of young teachers’ research of Northwest Normal University (No. NWNU-LKQN-10-1), Doctoral Start-up Funding of Xiangtan University (No. 10QDZ42).
Ma, H., Zhao, W. & Shi, Z. A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints. Knowl Inf Syst 36, 629–651 (2013). https://doi.org/10.1007/s10115-012-0560-3
DOI: https://doi.org/10.1007/s10115-012-0560-3