Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-16T09:38:16.165Z Has data issue: false hasContentIssue false

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

Published online by Cambridge University Press:  01 June 1999

ELLEN RILOFF
Affiliation:
Department of Computer Science, University of Utah, Salt Lake City, UT 84112, USA; e-mail: riloff@cs.utah.edu
JESSICA SHEPHERD
Affiliation:
Department of Computer Science, University of Utah, Salt Lake City, UT 84112, USA; e-mail: riloff@cs.utah.edu

Abstract

Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specific semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of ‘seed words’ that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain.

Type
Research Article
Copyright
1999 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This research is supported in part by the National Science Foundation under grants IRI-9509820 and IRI-9704240.