[HTML][HTML] Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

P Gooch, A Roudsari - Journal of biomedical informatics, 2012 - Elsevier
Journal of biomedical informatics, 2012Elsevier
Generation of entity coreference chains provides a means to extract linked narrative events
from clinical notes, but despite being a well-researched topic in natural language
processing, general-purpose coreference tools perform poorly on clinical texts. This paper
presents a knowledge-centric and pattern-based approach to resolving coreference across
a wide variety of clinical records from two corpora (Ontology Development and Information
Extraction (ODIE) and i2b2/VA), and describes a method for generating coreference chains …
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general-purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA), and describes a method for generating coreference chains using progressively pruned linked lists that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results give an F-measure for each corpus of 79.2% and 87.5%, respectively. A baseline of blind coreference of mentions of the same class gives F-measures of 65.3% and 51.9% respectively. For the ODIE corpus, recall is significantly improved over the baseline (p<0.05) but overall there was no statistically significant improvement in F-measure (p>0.05). For the i2b2/VA corpus, recall, precision, and F-measure are significantly improved over the baseline (p<0.05). Overall, our approach offers performance at least as good as human annotators and greatly increased performance over general-purpose tools. The system uses a number of open-source components that are available to download.
Elsevier