Graph-Based Semi-supervised Clustering for Semantic Classification of Unknown Words

Fukumoto, Fumiyo; Suzuki, Yoshimi

doi:10.1007/978-3-642-37186-8_16

Fumiyo Fukumoto⁵ &
Yoshimi Suzuki⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

1164 Accesses

Abstract

This paper presents a method for semantic classification of unknown verbs including polysemies into Levin-style semantic classes. We propose a semi-supervised clustering, which is based on a graph-based unsupervised clustering technique. The algorithm detects the spin configuration that minimizes the energy of the spin glass. Comparing global and local minima of an energy function, called the Hamiltonian, allows for the detection of nodes with more than one cluster. We extended the algorithm so as to employ a small amount of labeled data to aid unsupervised learning, and applied the algorithm to cluster verbs including polysemies. The distributional similarity between verbs used to calculate the Hamiltonian is in the form of probability distributions over verb frames. The result obtained using 110 test polysemous verbs with labeled data of 10% showed 0.577 F-score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information

Research on the Recognition of Chinese Autonomous Verbs Based on Semantic Selection Restriction and Natural Annotation Information

A semi-supervised hierarchical classifier based on local information

Article 27 September 2024

References

Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning Distance Functions using Equivalence Relations. In: Proc. of the 20th International Conference on Machine Learning, pp. 11–18 (2003)
Google Scholar
Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semi-Supervised Clustering. In: Proc. of the 21st International Conference on Machine Learning, pp. 81–88 (2004)
Google Scholar
Bouraev, B., Briscoe, E.J., Carroll, J., Carter, D., Grover, C.: The Derivation of a Grammatically-Indexed Lexicon from the Longman Dictionary of Contemporary English. In: Proc. of the 25th Annual Meeting of the Association for Computational Linguistics, pp. 193–200 (1987)
Google Scholar
Brew, C., Walde, S.S.: Spectral Clustering for German Verbs. In: Proc. of 2002 Conference on Empirical Methods in Natural Language Processing, pp. 117–123 (2002)
Google Scholar
Briscoe, E.J., Carroll, J.: Automatic Extraction of Subcategorization from Corpora. In: Proc. of 5th ACL Conference on Applied Natural Language Processing, pp. 356-363 (1997)
Google Scholar
Briscoe, E.J., Carroll, J.: Robust Accurate Statistical Annotaion of General Text. In: Proc. of 3rd International Conference on Language Resources and Evaluation, pp. 1499-1504 (2002)
Google Scholar
Chen, K.J., Chen, C.J.: Automatic Semantic Classification for Chinese Unknown Compound Nouns. In: Proc. of 38th Annual Meeting of the Association for Computational Linguistics, pp. 125–130 (2000)
Google Scholar
Dagan, I., Lee, L., Pereira, F.C.N.: Similarity-based Models of Word Cooccurrence Probabilities. Machine Learning 34(1-3), 43–69 (1999)
Article MATH Google Scholar
Dorr, B.: Large-scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation. Machine Translation 128(4), 271–325 (1997)
Article Google Scholar
Galley, M., McKeown, K.: Improving Word Sense Disambiguation in Lexical Chaining. In: Proc. of 19th International Joint Conference on Artificial Intelligence, pp. 1486–1488 (2003)
Google Scholar
Grishman, R., Macleod, C., Meyers, A.: Complex Syntax: Building a Computational Lexicon. In: Proc. of International Conference on Computational Linguistics, pp. 268–272 (1994)
Google Scholar
Hindle, D.: Noun Classification from Predicate-Argument Structures. In: Proc. of 28th Annual Meeting of the Association for Computational Linguistics, pp. 268–275 (1990)
Google Scholar
Hughes, J.: Automatically Acquiring Classification of Words, Ph.D. thesis University of Leeds (1994)
Google Scholar
Jannink, J., Wiederhold, G.: Thesaurus Entry Extraction from an Online Dictionary. In: Proc. of Fusion 1999 (1999)
Google Scholar
Kermanidis, K., Maragoudakis, M., Fakotakis, N., Kokkinakis, G.K.: Natural Language Engineering. Learning Verb Complements for Modern Greek: Balancing the Noisy Dataset 14(1), 71–100 (2008)
Google Scholar
Kudo, T., Matsumoto, Y.: Fast Methods for Kernel-based Text Analysis. In: Proc. of 41st Annual Meeting of the Association for Computational Linguistics, pp. 24–31 (2003)
Google Scholar
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet MATH Google Scholar
Korhonen, A.: Subcategorization Acquisition, Ph.D. thesis University of Cambridge (2002)
Google Scholar
Korhonen, A., Krymolowski, Y., Marx, Z.: Clustering Polysemic Subcategorization Frame Distributions Semantically. In: Proc. of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 64–71 (2003)
Google Scholar
Korhonen, A., Krymolowski, Y., Briscoe, T.: A Large Subcategorization Lexicon for Natural Language Processing Applications. In: Proc. of the 5th International Conference on Language Resources and Evaluation (2006)
Google Scholar
Lee, L.: Measures of Distributional Similarity. In: Proc. of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 25–32 (1999)
Google Scholar
Leech, G.: 100 Million Words of English: The British National Corpus. Language Research 28(1), 1–13 (1992)
MathSciNet Google Scholar
Levin, B.: English Verb Classes and Alternations. Chicago University Press (1993)
Google Scholar
Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proc. of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 768–773 (1998)
Google Scholar
Matsuo, Y., Sakaki, T., Uchiyama, K., Ishizuka, M.: Graph-based Word Clustering using a Web Search Engine. In: Proc. of 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 542–550 (2006)
Google Scholar
Mihalcea, R.: Unsupervised Large Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling. In: Proc. of the Human Language Technology/Empirical Methods in Natural Language Processing Conference, pp. 411–418 (2005)
Google Scholar
Muller, P., Hathout, N., Gaume, B.: Synonym Extraction Using a Semantic Distance on a Dictionary. In: Proc. of the Workshop on TextGraphs, pp. 65–72 (2006)
Google Scholar
Navigli, R.: A Structural Approach to the Automatic Adjudication of Word Sense Disagreements. Natural Language Engineering 14(4), 547–573 (2008)
Article Google Scholar
Navigli, R.: Word Sense Disambiguation: A Survey. ACM Computing Surveys 41(2), 1–69 (2009)
Article Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems 14. MIT Press (2002)
Google Scholar
Pereira, F., Tishby, N., Lee, L.: Distributional Clustering of English Words. In: Proc. of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (1993)
Google Scholar
Reichardt, J., Bornholdt, S.: Detecting Fuzzy Community Structure in Complex Networks with a Potts Model. Physical Review Letters 93(21) (2004)
Google Scholar
Reichardt, J., Bornholdt, S.: Statistical Mechanics of Community Detection. Physical Review E 74 (2006)
Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press (2000)
Google Scholar
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Rooth, M.: Two-Dimensional Clusters in Grammatical Relations. In: Inducing Lexicons with the EM Algorithm, AIMS Report 4(3) (1998)
Google Scholar
Rooth, M., Riezler, S., Prescher, D., Carroll, G., Beil, F.: Inducing a Semantically Annotated Lexicon via EM-Based Clustering. In: Proc. of the 37th Annual Meeting of the Association for Computational Linguistics (1999)
Google Scholar
Schulte im Walde, S.: Clustering Verbs Semantically according to their Alternation Behaviour. In: Proc. of the 18th International Conference on Computational Linguistics, pp. 747–753 (2000)
Google Scholar
Schulte im Walde, S., Brew, C.: Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (2002)
Google Scholar
Schulte im Walde, S.: Experiments on the Automatic Induction of German Semantic Verb Classes. Computational Linguistics 32(2), 159–194 (2006)
Article Google Scholar
Schulte im Walde, S., Hying, C., Scheible, C., Schmid, H.: Combining EM Training and the MDL Principle for an Automatic Verb Classification Incorporating Selectional Preferences. In: Proc. of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 496–504 (2008)
Google Scholar
Sinha, R., Mihalcea, R.: Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In: Proc. of the IEEE International Conference on Semantic Computing, pp. 46–54 (2007)
Google Scholar
Stevenson, S., Joanis, E.: Semi-Supervised Verb-Class Discovery using Noisy Features. In: Proc. of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, pp. 71–78 (2003)
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained K-Means Clustering with Background Knowledge. In: Proc. of 18th International Conference on Machine Learning, pp. 577–584 (2001)
Google Scholar
D. Widdows and B. Dorow, A Graph Model for Unsupervised Lexical Acquisition. In: Proc. of 19th International Conference on Computational Linguistics (COLING 2002), pp. 1093-1099, (2002)
Google Scholar
Witten, I.H., Bell, T.C.: The Zero-Frequency Problem: Estimating the Probabilities of Novel Events in Adaptive Text Compression. IEEE Transactions on Information Theory 37(4), 1085–1094 (1991)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi, Japan
Fumiyo Fukumoto & Yoshimi Suzuki

Authors

Fumiyo Fukumoto
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimi Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
INSTICC and IPS, Estefanilha, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukumoto, F., Suzuki, Y. (2013). Graph-Based Semi-supervised Clustering for Semantic Classification of Unknown Words. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-37186-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Graph-Based Semi-supervised Clustering for Semantic Classification of Unknown Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information

Research on the Recognition of Chinese Autonomous Verbs Based on Semantic Selection Restriction and Natural Annotation Information

A semi-supervised hierarchical classifier based on local information

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Graph-Based Semi-supervised Clustering for Semantic Classification of Unknown Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Unsupervised Induction of Persian Semantic Verb Classes Based on Syntactic Information

Research on the Recognition of Chinese Autonomous Verbs Based on Semantic Selection Restriction and Natural Annotation Information

A semi-supervised hierarchical classifier based on local information

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation