Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1119250.1119263dlproceedingsArticle/Chapter ViewAbstractPublication PagessighanConference Proceedingsconference-collections
Article
Free access

News-oriented automatic Chinese keyword indexing

Published: 11 July 2003 Publication History

Abstract

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles' characteristics and the resources available, this paper introduces a simple procedure to index keywords based on the scoring system. In the process of indexing, we make use of some relatively mature linguistic techniques and tools to filter those meaningless candidate items. Furthermore, according to the hierarchical relations of content words, keywords are not restricted to extracting from text. These methods have improved our system a lot. At last experimental results are given and analyzed, showing that the quality of extracted keywords are satisfying.

References

[1]
{Chien 1997} Chien, L. F., PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval, Proceedings of the ACM SIGIR International Conference on Information Retrieval, 1997, pp. 50--59.
[2]
{Frank 1999} Frank E., Paynter G. W., Witten I. H., Gutwin C., and Nevill-Manning C. G., Domain-specific keyphrase extraction, Proc. Sixteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, CA, 1999, pp. 668--673.
[3]
{Lai 2002} Yu-Sheng Lai, Chung-Hsien Wu, Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology, ACM Transactions on Asian Language Information Processing (TALIP), Vol.1, No.1, March 2002, pp. 34--64.
[4]
{Liu 1998} Liu Ting, Wu Yan, Wang Kaizhu, An Chinese Word Automatic Segmentation System Based on String Frequency Statistics Combined with Word Matching, Journal of Chinese Information Processing, Vol.12, No.1, 1998, pp. 17--25.
[5]
{Ong 1999} T. Ong and H. Chen, Updateable PAT-Tree Approach to Chinese Key Phrase Extraction Using Mutual Information: A Linguistic Foundation for Knowledge Management, Proceedings of the Second Asian Digital Libaray Conference, Taipei, Taiwan, Novemeber 8-9, 1999.
[6]
{Turney 1999} Turney, P. D., Learning to Extract Key-phrases from Text, NRC Technical Report ERB-1057, National Research Council, Canada, 1999.
[7]
{Witten 1999} Witten I. H., Paynter G. W., Frank E., Gutwin C., and Nevill-Manning C. G., KEA: Practical automatic keyphrase extraction, Proc. DL '99, 1999, pp. 254--256.
[8]
{Yang 2002} Wenfeng Yang, Chinese keyword extraction based on max-duplicated strings of the documents, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002, pp. 439--440.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
SIGHAN '03: Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
July 2003
193 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 11 July 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 455
    Total Downloads
  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)8
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media