Learning information extraction rules for semi-structured and free text
S Soderland - Machine learning, 1999 - Springer
S Soderland
Machine learning, 1999•SpringerA wealth of on-line text information can be made available to automatic processing by
information extraction (IE) systems. Each IE application needs a separate set of rules tuned
to the domain and writing style. WHISK helps to overcome this knowledge-engineering
bottleneck by learning text extraction rules automatically. WHISK is designed to handle text
styles ranging from highly structured to free text, including text that is neither rigidly formatted
nor composed of grammatical sentences. Such semi-structured text has largely been …
information extraction (IE) systems. Each IE application needs a separate set of rules tuned
to the domain and writing style. WHISK helps to overcome this knowledge-engineering
bottleneck by learning text extraction rules automatically. WHISK is designed to handle text
styles ranging from highly structured to free text, including text that is neither rigidly formatted
nor composed of grammatical sentences. Such semi-structured text has largely been …
Abstract
A wealth of on-line text information can be made available to automatic processing by information extraction (IE) systems. Each IE application needs a separate set of rules tuned to the domain and writing style. WHISK helps to overcome this knowledge-engineering bottleneck by learning text extraction rules automatically.
WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences. Such semi-structured text has largely been beyond the scope of previous systems. When used in conjunction with a syntactic analyzer and semantic tagging, WHISK can also handle extraction from free text such as news stories.
Springer