A Method of Extracting Related Words Using Standardized Mutual Information

Sugimachi, Tomohiko; Ishino, Akira; Takeda, Masayuki; Matsuo, Fumihiro

doi:10.1007/978-3-540-39644-4_49

Tomohiko Sugimachi⁴,
Akira Ishino⁴,
Masayuki Takeda⁴ &
…
Fumihiro Matsuo⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2843))

Included in the following conference series:

International Conference on Discovery Science

492 Accesses
1 Citations

Abstract

Techniques of automatic extraction of related words are of great importance in many applications such as query expansion and automatic thesaurus construction. In this paper, a method of extracting related words is proposed basing on the statistical information about the co-occurrences of words from huge corpora. The mutual information is one of such statistical measures and has been used for application mainly in natural language processing. A drawback is, however, the mutual information depends mainly on frequencies of words. To overcome this difficulty, we propose as a new measure a normalize deviation of mutual information. We also reveal a correspondence between word ambiguity and related words using word relation graphs constructed using this measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Analysis of Semantic Similarity Measures for Information Retrieval

Semantic Measures for Keywords Extraction

Extracting Keyphrases Using Heterogeneous Word Relations

References

Voorhees, E.M.: On expanding query vectors with lexically related words. In: Proceedings of the Second Text Retrieval Conference, pp. 223–231 (1994)
Google Scholar
Jing, Y., Croft, B.: An association thesaurus for information retrieval. In: Proceedings of RIAO, pp. 146–160 (1994)
Google Scholar
Fellbaum, C.: WordNet: An electronic lexical database. MIT press, Cambridge (1998)
MATH Google Scholar
Lin, D., Pantel, P.: DIRT - Discovery of Inference Rules from Text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2001, pp. 323–328 (2001)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Aizawa, A.: The Feature Quantity: An Information Theoretic Perspective of Tfidf-like Measures. In: Proceeding of ACM SIGIR 2000, pp. 104–111 (2000)
Google Scholar
Ohsawa, Y., Benson, N.E., Tachida, M.: KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor. In: Proceeding of IEEE Advanced Digital Library Conference, pp. 12–18 (1999)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a Single Document usingWord Co-occurrence Statistical Information. In: Proceeding of 16th Int’l FLAIRS Conference, pp. 392–396 (2003)
Google Scholar
Widdows, D., Dorow, B.: A Graph Model for Unsupervised Lexical Acquisition. In: 19th International Conference on Computational Linguistics, pp. 1093– 1099 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, Hakozaki 6-10-1, Higashi-ku, Fukuoka, 812-8581, JAPAN
Tomohiko Sugimachi, Akira Ishino, Masayuki Takeda & Fumihiro Matsuo

Authors

Tomohiko Sugimachi
View author publications
You can also search for this author in PubMed Google Scholar
Akira Ishino
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Fumihiro Matsuo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FG Knowledge Engineering, FB Informatik, Technical University Darmstadt, Hochschulstr. 10, 64289, Darmstadt
Gunter Grieser
Meme Media Laboratory, Hokkaido University, N13 W8, 0608628, Sapporo, Japan
Yuzuru Tanaka
Graduate School of Informatics, Kyoto University Yoshida Honmachi, Sakyo-ku, 606-850, Kyoto, Japan
Akihiro Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sugimachi, T., Ishino, A., Takeda, M., Matsuo, F. (2003). A Method of Extracting Related Words Using Standardized Mutual Information. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds) Discovery Science. DS 2003. Lecture Notes in Computer Science(), vol 2843. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39644-4_49

Download citation

DOI: https://doi.org/10.1007/978-3-540-39644-4_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20293-6
Online ISBN: 978-3-540-39644-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Method of Extracting Related Words Using Standardized Mutual Information

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Analysis of Semantic Similarity Measures for Information Retrieval

Semantic Measures for Keywords Extraction

Extracting Keyphrases Using Heterogeneous Word Relations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Method of Extracting Related Words Using Standardized Mutual Information

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Analysis of Semantic Similarity Measures for Information Retrieval

Semantic Measures for Keywords Extraction

Extracting Keyphrases Using Heterogeneous Word Relations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation