Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1218955.1218994dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Relieving the data acquisition bottleneck in word sense disambiguation

Published: 21 July 2004 Publication History

Abstract

Supervised learning methods for WSD yield better performance than unsupervised methods. Yet the availability of clean training data for the former is still a severe challenge. In this paper, we present an unsupervised bootstrapping approach for WSD which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. The method is evaluated using the 29 nouns in the English Lexical Sample task of SENSEVAL 2. Our algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping WSD algorithms. We identify seven different factors that impact the performance of our system.

References

[1]
Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113--141.
[2]
Clara Cabezas, Philip Resnik, and Jessica Stevens. 2002. Supervised Sense Tagging using Support Vector Machines. Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2). Toulouse, France.
[3]
Scott Cotton, Phil Edmonds, Adam Kilgarriff, and Martha Palmer, ed. 2001. SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems. ACL SIGLEX, Toulouse, France.
[4]
Mona Diab. 2004. An Unsupervised Approach for Bootstrapping Arabic Word Sense Tagging. Proceedings of Arabic Based Script Languages, COLING 2004. Geneva, Switzerland.
[5]
Mona Diab and Philip Resnik. 2002. An Unsupervised Method for Word Sense Tagging Using Parallel Corpora. Proceedings of 40th meeting of ACL. Pennsylvania, USA.
[6]
Mona Diab. 2003. Word Sense Disambiguation Within a Multilingual Framework. PhD Thesis. University of Maryland College Park, USA.
[7]
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.
[8]
William A. Gale and Kenneth W. Church and David Yarowsky. 1992. Using Bilingual Materials to Develop Word Sense Disambiguation Methods. Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation. Montréal, Canada.
[9]
Thorsten Joachims. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of the European Conference on Machine Learning. Springer.
[10]
A. Kilgarriff and J. Rosenzweig. 2000. Framework and Results for English SENSEVAL. Journal of Computers and the Humanities, pages 15--48, 34.
[11]
Dekang Lin. 1998. Dependency-Based Evaluation of MINIPAR. Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation. Granada, Spain.
[12]
Dan I. Melamed. 1997. Measuring Semantic Entropy. ACL SIGLEX, Washington, DC.
[13]
Rada Mihalcea and Dan Moldovan. 1999. A method for Word Sense Disambiguation of unrestricted text. Proceedings of the 37th Annual Meeting of ACL. Maryland, USA.
[14]
Rada Mihalcea. 2002. Bootstrapping Large sense tagged corpora. Proceedings of the 3rd International Conference on Languages Resources and Evaluations (LREC). Las Palmas, Canary Islands, Spain.
[15]
Philip Resnik. 1999. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal Artificial Intelligence Research. (11) p. 95--130.
[16]
David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of ACL. Cambridge, MA.

Cited By

View all
  • (2016)Automatic construction and evaluation of a large semantically enriched wikipediaProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3061026(2894-2900)Online publication date: 9-Jul-2016
  • (2013)NLP tools as editorial aidsProceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries10.5555/2655780.2655880(1-4)Online publication date: 1-Nov-2013
  • (2012)Managing uncertainty in semantic taggingProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380919(840-850)Online publication date: 23-Apr-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)9
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Automatic construction and evaluation of a large semantically enriched wikipediaProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3061026(2894-2900)Online publication date: 9-Jul-2016
  • (2013)NLP tools as editorial aidsProceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries10.5555/2655780.2655880(1-4)Online publication date: 1-Nov-2013
  • (2012)Managing uncertainty in semantic taggingProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380919(840-850)Online publication date: 23-Apr-2012
  • (2012)A quick tour of word sense disambiguation, induction and related approachesProceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science10.1007/978-3-642-27660-6_10(115-129)Online publication date: 21-Jan-2012
  • (2010)AnveshanProceedings of the Fourth Linguistic Annotation Workshop10.5555/1868720.1868726(47-55)Online publication date: 15-Jul-2010
  • (2010)Extracting glosses to disambiguate word sensesHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858087(627-635)Online publication date: 2-Jun-2010
  • (2009)Acquiring applicable common sense knowledge from the WebProceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics10.5555/1641968.1641969(1-9)Online publication date: 5-Jun-2009
  • (2009)Making sense of word sense variationProceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions10.5555/1621969.1621972(2-9)Online publication date: 4-Jun-2009
  • (2008)Combining knowledge-based methods and supervised learning for effective Italian word sense disambiguationProceedings of the 2008 Conference on Semantics in Text Processing10.5555/1626481.1626483(5-16)Online publication date: 22-Sep-2008
  • (2007)A fast method for parallel document identificationHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers10.5555/1614108.1614116(29-32)Online publication date: 22-Apr-2007
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media