Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1073012.1073058dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Evaluating smoothing algorithms against plausibility judgements

Published: 06 July 2001 Publication History

Abstract

Previous research has shown that the plausibility of an adjective-noun combination is correlated with its corpus co-occurrence frequency. In this paper, we estimate the co-occurrence frequencies of adjective-noun pairs that fail to occur in a 100 million word corpus using smoothing techniques and compare them to human plausibility ratings. Both class-based smoothing and distance-weighted averaging yield frequency estimates that are significant predictors of rated plausibility, which provides independent evidence for the validity of these smoothing techniques.

References

[1]
Ellen Gurman Bard, Dan Robertson, and Antonella Sorace. 1996. Magnitude estimation of linguistic acceptability. Language, 72(1):32--68.
[2]
Peter F. Brown, Vincent J. Della Pietra, Peter V. de Souza, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.
[3]
Lou Burnard, 1995. Users Guide for the British National Corpus. British National Corpus Consortium, Oxford University Computing Service.
[4]
Stephen Clark and David Weir. 2001. Class-based probability estimation using a semantic hierarchy. In Proceedings of the 2nd Conference of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA.
[5]
Steffan Corley, Martin Corley, Frank Keller, Matthew W. Crocker, and Shari Trewin. 2001. Finding syntactic structure in unparsed corpora: The Gsearch corpus query system. Computers and the Humanities, 35(2):81--94.
[6]
Wayne Cowart. 1997. Experimental Syntax: Applying Objective Methods to Sentence Judgments. Sage Publications, Thousand Oaks, CA.
[7]
D. A. Cruse. 1986. Lexical Semantics. Cambridge Textbooks in Linguistics. Cambridge University Press, Cambridge.
[8]
Ido Dagan, Lillian Lee, and Fernando Pereira. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1):43--69.
[9]
Ralph Grishman and John Sterling. 1994. Generalizing automatically generated selectional patterns. In Proceedings of the 15th International Conference on Computational Linguistics, pages 742--747, Kyoto.
[10]
Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics Speech and Signal Processing, 33(3):400--401.
[11]
Frank Keller, Martin Corley, Steffan Corley, Lars Konieczny, and Amalia Todirascu. 1998. WebExp: A Java toolbox for web-based psychological experiments. Technical Report HCRC/TR-99, Human Communication Research Centre, University of Edinburgh.
[12]
Maria Lapata, Scott McDonald, and Frank Keller. 1999. Determinants of adjective-noun plausibility. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, pages 30--36, Bergen.
[13]
Maria Lapata. 2000. The Acquisition and Modeling of Lexical Knowledge: A Corpus-based Investigation of Systematic Polysemy. Ph.D. thesis, University of Edinburgh.
[14]
Mark Lauer. 1995. Designing Statistical Language Learners: Experiments on Compound Nouns. Ph.D. thesis, Macquarie University, Sydney.
[15]
Lilian Lee. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 25--32, University of Maryland, College Park.
[16]
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4):235--244.
[17]
Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 183--190, Columbus, OH.
[18]
Philip Stuart Resnik. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.
[19]
Carson T. Schütze. 1996. The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago.
[20]
Frank Smadja. 1991. Macrocoding the lexicon with cooccurrence knowledge. In Uri Zernik, editor, Lexical Acquisition: Using Online Resources to Build a Lexicon, pages 165--189. Lawrence Erlbaum Associates, Hillsdale, NJ.
[21]
S. S. Stevens. 1975. Psychophysics: Introduction to its Perceptual, Neural, and Social Prospects. John Wiley, New York.
[22]
Sholom M. Weiss and Casimir A. Kulikowski. 1991. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, San Mateo, CA.

Cited By

View all
  • (2004)Detection of incorrect case assignments in paraphrase generationProceedings of the First international joint conference on Natural Language Processing10.1007/978-3-540-30211-7_59(555-565)Online publication date: 22-Mar-2004
  • (2003)Evaluating and combining approaches to selectional preference acquisitionProceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 110.3115/1067807.1067813(27-34)Online publication date: 12-Apr-2003
  • (2002)Using the web to overcome data sparsenessProceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 1010.3115/1118693.1118723(230-237)Online publication date: 6-Jul-2002

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
July 2001
562 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 July 2001

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2004)Detection of incorrect case assignments in paraphrase generationProceedings of the First international joint conference on Natural Language Processing10.1007/978-3-540-30211-7_59(555-565)Online publication date: 22-Mar-2004
  • (2003)Evaluating and combining approaches to selectional preference acquisitionProceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 110.3115/1067807.1067813(27-34)Online publication date: 12-Apr-2003
  • (2002)Using the web to overcome data sparsenessProceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 1010.3115/1118693.1118723(230-237)Online publication date: 6-Jul-2002

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media