This volume contains the papers accepted for presentation at the ACL2002/SIGLEX workshop on Unsupervised Lexical Acquisition, held on Friday, July 12th, 2002, during the 40th Annual Meeting of the Association for Computational Linguistics in Philadelphia, Pennsylvania.Lexical resources form a cornerstone of all natural language understanding systems. It has long been recognized, however, that creating lexicons manually is a time-consuming, laborious, and expensive undertaking. Moreover, such lexicons can never be complete, given the everchanging content of the lexicon, especially across different domains. Finally, development of broad-coverage NL systems in many languages may be hindered by the lack of broad-coverage machine-readable lexical resources in those languages.We proposed this workshop as an opportunity for surveying the state of the art in the field, and for further stimulating discussion on the use of unsupervised, or minimally supervised, methods in the acquisition of lexical information. The papers in this volume attest to the broad appeal of these methods, as well as to the variety of lexical tasks addressed, such as acquiring semantic, syntactic, and collocational information; learning translation lexicons; processing out-of-vocabulary words; and thesaurus extraction. With the availability of online corpora, and the increased sophistication of machine-learning methods, self-augmenting lexicons for NL systems and other applications cannot be far off!
Proceeding Downloads
Identification of probable real words: an entropy-based approach
This paper proposes a method for identifying probable real words among out-of-vocabulary (OOV) words in text. The identification of real words is done based on entropy of probability of character trigrams as well as the morphological rules of English. ...
Learning a translation lexicon from monolingual corpora
This paper presents work on the task of constructing a word-level translation lexicon purely from unrelated monolingual corpora. We combine various clues such as cognates, similar context, preservation of word similarity, and word frequency. ...
Boosting automatic lexical acquisition with morphological information
In this paper we investigate the impact of morphological features on the task of automatically extending a dictionary. We approach the problem as a pattern classification task and compare the performance of several models in classifying nouns that are ...
Building a hyponymy lexicon with hierarchical structure
Many lexical semantic relations, such as the hyponymy relation, can be extracted from text as they occur in detectable syntactic constructions. This paper shows how a hypernym-hyponym based lexicon for Swedish can be created directly from a news paper ...
Using co-composition for acquiring syntactic and semantic subcategorisation
Natural language parsing requires extensive lexicons containing subcategorisation information for specific sublanguages. This paper describes an unsupervised method for acquiring both syntactic and semantic subcategorisation restrictions from corpora. ...
Learning argument/adjunct distinction for Basque
This paper presents experiments performed on lexical knowledge acquisition in the form of verbal argumental information. The system obtains the data from raw corpora after the application of a partial parser and statistical filters. We used two ...
Semantically motivated subcategorization acquisition
Automatic acquisition of subcategorization lexicons from textual corpora has become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the ...
Improvements in automatic thesaurus extraction
The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and ...
Acquiring collocations for lexical choice between near-synonyms
We extend a lexical knowledge-base of near-synonym differences with knowledge about their collocational behaviour. This type of knowledge is useful in the process of lexical choice between near-synonyms. We acquire collocations for the near-synonyms of ...