Abstract
Most previous corpus-based approaches to the resolution of word-sense ambiguity have collected lexical information from the context of the word to be disambiguated. However, they suffer from the problem of data sparseness. To address this problem, this paper proposes a disambiguation method using co-occurring concept codes (CCCs). The use of concept-code features and concept-code generalization effectively alleviate the data sparseness problem and also reduce the number of features to a practical size without any loss in system performance. We prove the effectiveness of the CCC features and the concept-code generalization by experimental evaluations. The proposed disambiguation method is applied to a Korean-to-Japanese MT system that experimented with various machine-learning techniques. In a lexical sample evaluation, our CCC-based method achieved a precision of 82.00%, with an 11.83% improvement over the baseline. Also, it achieved a precision of 83.51% in an experiment on real text, which shows that our proposed method is very useful for practical MT systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agirre, E. and G. Rigau: 1996, ‘Word sense disambiguation using conceptual density’. In Proceedings of the 16th International Conference on Compuational Linguistics, COLING-96, Copenhagen, Denmark, pp. 16–22.
Bruce, R. and J. Wiebe: 1994, ‘Word-sense disambiguation using decomposable models’. In 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 139–145.
I. Dagan A. Itai (1994) ArticleTitle‘Word sense disambiguation using a second language monolingual corpus’ Computational Linguistics 20 563–596
N. Ide J. Véronis (1998) ArticleTitle‘Introduction to the special issue on word sense disambiguation: The state of the art’ Computational Linguistics 24 1–40
T. Joachims (1999) ‘Making large-scale support vector machine learning practical’ B. Schölkopf C.J.C. Burges A.J. Smola (Eds) Advances in Kernel Methods – Support Vector Machines The MIT Press Cambridge, MA 41–56
Kim, E.J. and J.H. Lee: 1993, ‘A collocation-based transfer model for Japanese-to-Korean machine translation’. In Natural Language Processing Pacific Rim Symposium (NLPRS1993), Fukuoka, Japan, pp. 223–231.
C. Leacock M. Chodorow G.A. Miller (1998) ArticleTitle‘Using corpus statistics and WordNet relations for sense identification’ Computational Linguistics 24 147–165
H.F. Li N.W. Heo K.H. Moon J.H. Lee G.B. Lee (2000) ArticleTitle‘Lexical transfer ambiguity resolution using automatically-extracted concept co-occurrence information’ International Journal of Computer Processing of Oriental Languages 13 53–68
Lin, D.: 1997, ‘Using syntactic dependency as local context to resolve word sense ambiguity’. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 64–71.
Luk, A. K.: 1995, ‘Statistical sense disambiguation with relatively small corpora using dictionary definitions’. In 33rd Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 181–188.
B. Magnini C. Strapparava G. Pezzulo A. Gliozzo (2002) ArticleTitle‘The role of domain information in word sense disambiguation’ Natural Language Engineering 8 359–373 Occurrence Handle10.1017/S1351324902003029
S. McRoy (1992) ArticleTitle‘Using multiple knowledge sources for word sense discrimination’ Computational Linguistics 18 1–30
Ng, H-T. and H-B. Lee: 1996, ‘Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach’. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp. 40–47.
Ohno, S. and M. Hamanishi [New Synonym Dictionary], Kadokawa Shoten, Tōkyō.
Peh, L-S. and H-T. Ng: 1997, ‘Domain-specific semantic class disambiguation using wordNet’. In Proceedings of the Fifth Workshop on Very Large Corpora, Beijing/Hong Kong, pp. 56–64.
J.R. Quinlan (1993) C4.5: Programs for Machine Learning Morgan Kaufmann San Mateo, CA
Resnik, P.: 1997, ‘Selectional preference and sense disambiguation’. In ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, DC, pp. 52–57.
F. Smadja (1993) ArticleTitle‘Retrieving collocations from text: Xtract’ Computational Linguistics 19 143–177
Yarowsky, D.: 1992, ‘Word-sense disambiguation using statistical models of roget’s categories trained on large corpora’. In Proceedings of the fifteenth [sic] International Conference on Computational Linguistics: COLING’92, Nantes, France, pp. 454–460.
Yarowsky, D.: 1993, ‘One Sense per Collocation’. In Proceedings of DARPA Workshop on Human Language Technology, Princeton, NJ, pp. 266–271.
D. Yarowsky (2000) ArticleTitle‘Hierarchical decision lists for word sense disambiguation’ Computers and the Humanities 34 179–186 Occurrence Handle10.1023/A:1002674829964
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chung, Y., Lee, JH. Practical Word-Sense Disambiguation Using Co-occurring Concept Codes. Mach Translat 19, 59–82 (2005). https://doi.org/10.1007/s10590-005-2559-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-005-2559-y