Google Scholar

A news classification applied with new text representation based on the improved LDA

D Shao, C Li, C Huang, Y Xiang, Z Yu - Multimedia tools and applications, 2022 - Springer

D Shao, C Li, C Huang, Y Xiang, Z Yu

Multimedia tools and applications, 2022•Springer

Abstract

Recently, news classification became an essential part of the Natural Language Processing (NLP). The traditional Latent Dirichlet Allocation (LDA) model used the generated “topic-document” matrix θ as a text representation feature to train a classifier and has achieved improved results. However, some text information will be missed using only the “topic-document” matrix θ as the text feature. In addition, the Gibbs sampling iteration number of the traditional LDA model must be set in advance, which affects the algorithm’s speed. In this paper, the traditional LDA model is improved in two phases. In the first phase, a method to determine the convergence of the parameter search process is proposed. An adaptive iterative method is used with the proposed method. In the second phase, a new text representation (C_new) obtained by multiplying the “topic-document” matrix θ and the “word-topic” matrix φ is provided. In the evaluation results, the proposed method is tested using the news corpus in the field of metallurgy, and the THU Chinese News (THUCNews) corpus provided by the Natural Language Processing Laboratory of Tsinghua University. The proposed method proved its efficiency in improving the classification accuracy and reducing the number of iterations for the Gibbs sampling compared with the traditional LDA.

Springer

Show moreShow less

Save Cite Cited by 12 Related articles All 4 versions

Cite

Advanced search

Saved to My library

A news classification applied with new text representation based on the improved LDA