Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3393527.3393540acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Multi-Domain Global Correlation Degree Branching Entropy Method for Microblog Text Word Segmentation

Published: 26 October 2020 Publication History

Abstract

Word segmentation is a basic topic in the field of natural language processing, and improving the accuracy of word segmentation is a key problem. With the popularity of microblog, accurate word segmentation for microblog text has become a hot spot. However, microblog texts often contain information about multiple related domains, ambiguous words in multi-domain will lead to the decline of word segmentation accuracy. Based on the model theory of word vector and branching entropy, this paper proposes a multi-domain global correlation degree branching entropy method for microblog text word segmentation. This model is applied to microblog text about house price topic in Beijing. The precision, recall and F-measure of this method are compared with branching entropy model proposed by Zhang[6], and the experimental results show that our method outperforms it.

References

[1]
Wei Yang and Longshu Li. 2014. Research on Film box office forecasting Model based on Weibo data. Electronic World, 21(Nov, 2014), 13--16.
[2]
Wenqing Zhao, Xiaoke Hou and Haihong Sha. 2014. Application of semantic rules to sentiment analysis of microblog hot topics. CAAI Transactions on Intelligent Systems, 9(2014), 121--125
[3]
Dongxia Zhang. 2013. Analysis and Discovery of Network Hotspot Based On Microblog for College Students. Southeast Communication, 6, (2013), 87--89.
[4]
Nianwen Xue. 2003. Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing, 8, (23, 2003), 29--48
[5]
Sujian Li, Qun Liu and Zhiyong Zhang. 2002. Method of Maximum Entropy Model for Language Processing. Computer Science, 7, (29, 2002), 108--110.
[6]
Libang Zhang, Yi Yuan, Jinfeng Yang. 2014. An Unsupervised Approach to Word Segmentation in Chinese EMRs. Intelligent Computer and Applications. 2, (Jan, 2014), 68--71.
[7]
Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese Segmentation and New Word Detection Using Conditional Random Fields. In Proceedings of International Conference on Computational Linguistics, 2004, 562--568.
[8]
Tao Tang and Qiaoli Zhou. 2011. Term extraction based on the combination of Statistics and rules. Journal of Shenyang Aerospace University, 5, (28, 2011), 71--74.
[9]
Yang Zheng, Jianwen Mo. 2012. Chinese word Segmentation method based on Professional term extraction. Popular Science & Technology, 4, (14, 2012), 20--23.
[10]
Keda He, Zhengtao Zhu and Yu Cheng. 2016. Research on Text Categorization Based on Improved TF-IDF Algorithm. Journal of GUANGDONG university of Techonology, 5, (33, 2016), 49--53.

Index Terms

  1. Multi-Domain Global Correlation Degree Branching Entropy Method for Microblog Text Word Segmentation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ACM TURC '20: Proceedings of the ACM Turing Celebration Conference - China
      May 2020
      220 pages
      ISBN:9781450375344
      DOI:10.1145/3393527
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      In-Cooperation

      • Baidu Research: Baidu Research

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Improved branching entropy model
      2. Multi-domain global correlation degree
      3. Multi-domain word segmentation

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ACM TURC'20

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 27
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media