Semantic Based Text Similarity Computation

Liu, Yaqi; Li, Zhijiang

doi:10.1007/978-981-10-3530-2_43

Yaqi Liu⁶ &
Zhijiang Li⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 417))

Included in the following conference series:

China Academic Conference on Printing & Packaging and Media Technology

2757 Accesses

Abstract

Text similarity algorithm is widely used in plurality fields, such as copy detection, text classification, machine translation, intelligent question answering system and natural language processing. At present, vector space model algorithm, which is more commonly used, does not consider the information of semantic features adequately, and the accuracy of the semantic similarity computation results can be further improved. This paper proposes a text similarity computation method which combines the HowNet with vector space model. Similarity computation is divided into two levels. In the level of words, words-similarity calculation based on HowNet prevents the loss of semantic information. In the level of texts, text-similarity calculation by vector space model ensures the integrity of the information expressed in the texts. This paper designs an experiment of news text classification based on KNN algorithm, in which data obtained from a part of the Chinese news in Sogou data corpora. Experimental results show that the method proposed in this paper is more accurate than the traditional vector space model algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fusion News Elements of News Text Similarity Calculation

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Improved sqrt-cosine similarity measurement

Article Open access 25 July 2017

References

Jin Xiqian. (2009). Research on Semantic Based Chinese Text Similarity Algorithm. (Doctoral dissertation, Zhejiang University of Technology).
Google Scholar
G. Salton, A. Wong, ang C.S. Yang, A Vector Space Model for Information Retrieval, Journal of the ASIS, 18:11, 613–620, November 1975.
Google Scholar
Liu Xiaojun, Zhao Dong, & Yao Weidong. (2007). A Two Factor Similarity Algorithm for Chinese Text Search. Computer Simulation, 24(12), 312–314.
Google Scholar
Chen Feihong. (2011). Research on Chinese Text Similarity Algorithm Based on Vector Space Model. (Doctoral dissertation, University of Electronic Science and technology).
Google Scholar
Kuai Yuanyuan. (2014). Research on Semantic Based Text Similarity Algorithm. Computer CD software and Applications (9), 302–303.
Google Scholar
Liu Qun & Li Sujian. (2002). Based on the HowNet Lexical Semantic Similarity Computation. Chinese of computational linguistics.
Google Scholar
Fan Hongyi, & Zhang Yangsen (2014). A method for semantic similarity of words based on HowNet. Journal of Beijing Information Science and Technology University: Natural Science Edition (4), 42–45.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Printing and Packaging, Wuhan University, Wuhan, China
Yaqi Liu & Zhijiang Li

Authors

Yaqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhijiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhijiang Li .

Editor information

Editors and Affiliations

China Academy of Printing Technology, Beijing, China
Pengfei Zhao
China Academy of Printing Technology, Beijing, China
Yun Ouyang
China Academy of Printing Technology, Beijing, China
Min Xu
China Academy of Printing Technology, Beijing, China
Li Yang
China Academy of Printing Technology, Beijing, China
Yujie Ouyang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Li, Z. (2017). Semantic Based Text Similarity Computation. In: Zhao, P., Ouyang, Y., Xu, M., Yang, L., Ouyang, Y. (eds) Advanced Graphic Communications and Media Technologies . PPMT 2016. Lecture Notes in Electrical Engineering, vol 417. Springer, Singapore. https://doi.org/10.1007/978-981-10-3530-2_43

Download citation

DOI: https://doi.org/10.1007/978-981-10-3530-2_43
Published: 22 March 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3529-6
Online ISBN: 978-981-10-3530-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Semantic Based Text Similarity Computation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fusion News Elements of News Text Similarity Calculation

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Improved sqrt-cosine similarity measurement

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Based Text Similarity Computation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fusion News Elements of News Text Similarity Calculation

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Improved sqrt-cosine similarity measurement

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation