research-article

Joint inference for end-to-end coreference resolution for clinical notes

Authors:

Prateek Jindal,

Carl A. GunterAuthors Info & Claims

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 192 - 201

https://doi.org/10.1145/2649387.2649437

Published: 20 September 2014 Publication History

Abstract

Recent US government initiatives have led to wide adoption of Electronic Health Records (EHRs). More and more health care institutions are storing patients' data in an electronic format. These EHRs contain valuable information which can be used in important applications like Clinical Decision Support (CDS). So, Information Extraction (IE) from EHRs is a very promising research area. This paper presents a robust method for end-to-end coreference resolution for clinical narratives. For our experiments, we used the datasets provided by i2b2/VA team as part of i2b2/VA 2011 shared task on coreference resolution. One part of this data was annotated according to ODIE guidelines and another part was annotated according to i2b2 guidelines. We designed a global inference strategy for end-to-end coreference resolution which jointly determines the mention types and coreference relations between them. This technique avoids the problem of error-propagation which is common in pipeline systems. For pronominal resolution, we developed different strategies for resolving different pronouns. We report the best results to date on both ODIE and i2b2 data. We got the best results for both types of cases: (1) where gold mentions are already given and (2) for end-to-end coreference resolution. ODIE and i2b2 data are annotated quite differently. Best results on both types of data proves the robustness of our algorithm.

References

[1]

AnatomicalTerms. http://en.wikipedia.org/wiki/anatomical_terms_of_location (accessed may 10, 2014), 2014.

[2]

P. Anick, P. Hong, N. Xue, and Y. Yang. Coreference resolution for electronic medical records. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.

[3]

A. Aronson and F. Lang. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229, 2010.

[4]

A. Bagga and B. Baldwin. Algorithms for scoring coreference chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566. Citeseer, 1998.

[5]

E. Bengtson and D. Roth. Understanding the value of features for coreference resolution. In Proceedings of the Conference on EMNLP, pages 294--303. Association for Computational Linguistics, 2008.

Digital Library

[6]

A. Bodnari, P. Szolovits, and Ö. Uzuner. Mcores: a system for noun phrase coreference resolution for clinical records. Journal of the American Medical Informatics Association, 19(5):906--912, 2012.

[7]

V. Bryl, C. Giuliano, L. Serafini, and K. Tymoshenko. Using background knowledge to support coreference resolution. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010), August, 2010.

Digital Library

[8]

J. Cai, E. Mujdricza-Maydt, Y. Hou, and M. Strube. Weakly supervised graph-based coreference resolution for clinical texts. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data., 2011.

[9]

C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[10]

K.-W. Chang, R. Samdani, A. Rozovskaya, N. Rizzolo, M. Sammons, and D. Roth. Inference protocols for coreference resolution. In CoNLL Shared Task, pages 40--44, Portland, Oregon, USA, 2011. Association for Computational Linguistics.

Digital Library

[11]

H. Dai, C. Chen, C. Wu, P. Lai, R. Tsai, and W. Hsu. Coreference resolution of medical concepts in discharge summaries by exploiting contextual information. Journal of the American Medical Informatics Association, 19(5):888--896, 2012.

[12]

P. Denis, J. Baldridge, et al. Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural, 42:87--96, 2009.

[13]

P. Gooch and A. Roudsari. Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. Journal of Biomedical Informatics, 2012.

Digital Library

[14]

C. Grouin, M. Dinarelli, S. Rosset, G. Wisniewski, and P. Zweigenbaum. Coreference resolution in clinical reports - the limsi participation in the i2b2/va 2011 challenge. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.

[15]

P. Jindal. Information extraction for clinical narratives. PhD thesis, University of Illinois at Urbana-Champaign, 2014.

[16]

P. Jindal and D. Roth. Using knowledge and constraints to find the best antecedent. In COLING, pages 1327--1342, 2012.

[17]

P. Jindal and D. Roth. End-to-end coreference resolution for clinical narratives. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (IJCAI), pages 2106--2112. AAAI Press, 2013.

Digital Library

[18]

P. Jindal and D. Roth. Extraction of events and temporal expressions from clinical narratives. Journal of biomedical informatics (JBI), 46:S13--S19, 2013.

Digital Library

[19]

P. Jindal and D. Roth. Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives. Journal of the American Medical Informatics Association (JAMIA), 20(2):356--362, 2013.

[20]

P. Jindal and D. Roth. Using soft constraints in joint inference for clinical concept recognition. In EMNLP, pages 1808--1814, 2013.

[21]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.

Digital Library

[22]

M. Lan, J. Zhao, K. Zhang, H. Shi, and J. Cai. Comparative investigation on learning-based and rule-based approaches to coreference resolution in clinic domain: A case study in i2b2 challenge 2011 task 1. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2. Boston, MA, USA, 2011.

[23]

X. Luo. On coreference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25--32. Association for Computational Linguistics, 2005.

Digital Library

[24]

A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.

[25]

MeSH. http://www.nlm.nih.gov/mesh/meshhome.html (accessed may 10, 2014), 2014.

[26]

MicrosoftLists. http://research.microsoft.com/en-us/projects/ehuatuo/default.aspx (accessed may 10, 2014), 2014.

[27]

V. Ng and C. Cardie. Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, pages 1--7. Association for Computational Linguistics, 2002.

Digital Library

[28]

V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 104--111. Association for Computational Linguistics, 2002.

Digital Library

[29]

C. Paice and G. Husk. Towards the automatic recognition of anaphoric features in english text: the impersonal pronoun "it". Computer Speech & Language, 2(2):109--132, 1987.

[30]

H. Poon and P. Domingos. Joint unsupervised coreference resolution with markov logic. In Proceedings of the Conference on EMNLP, pages 650--659. Association for Computational Linguistics, 2008.

Digital Library

[31]

K. Raghunathan, H. Lee, S. Rangarajan, N. Chambers, M. Surdeanu, D. Jurafsky, and C. Manning. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 492--501. Association for Computational Linguistics, 2010.

Digital Library

[32]

A. Rahman and V. Ng. Coreference resolution with world knowledge. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 814--824. Association for Computational Linguistics, 2011.

Digital Library

[33]

L. Ratinov and D. Roth. Learning-based multi-sieve co-reference resolution with knowledge. In EMNLP, 2012.

Digital Library

[34]

G. K. Savova, W. W. Chapman, J. Zheng, and R. S. Crowley. Anaphoric relations in the clinical narrative: corpus creation. Journal of the American Medical Informatics Association, 18(4):459--465, 2011.

[35]

SNOMEDCT. http://www.ihtsdo.org/snomed-ct/ (accessed may 10, 2014), 2014.

[36]

W. Soon, H. Ng, and D. Lim. A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4):521--544, 2001.

[37]

UMLS. http://www.nlm.nih.gov/research/umls/ (accessed may 10, 2014), 2014.

[38]

O. Uzuner, A. Bodnari, S. Shen, T. Forbush, J. Pestian, and B. R. South. Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5):786--791, 2012.

[39]

M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman. A model-theoretic coreference scoring scheme. In Proceedings of the 6th conference on Message understanding, pages 45--52. Association for Computational Linguistics, 1995.

Digital Library

[40]

H. Ware, C. Mullett, V. Jagannathan, and O. El-Rawas. Machine learning-based coreference resolution of concepts in clinical documents. Journal of the American Medical Informatics Association, 19(5):883--887, 2012.

[41]

Wikipedia. http://en.wikipedia.org/wiki/main_page (accessed may 10, 2014), 2014.

[42]

Y. Xu, J. Liu, J. Wu, Y. Wang, Z. Tu, J. Sun, J. Tsujii, I. Eric, and C. Chang. A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. Journal of the American Medical Informatics Association, 19(5):897--905, 2012.

[43]

H. Yang, A. Willis, A. de Roeck, and B. Nuseibeh. A system for coreference resolution in clinical documents. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.

Index Terms

Joint inference for end-to-end coreference resolution for clinical notes

Recommendations

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

Graphical abstractDisplay Omitted Highlights We present patterns for resolving coreference across a wide variety of clinical records. Our approach offers greatly increased performance over general purpose coreference tools. The system uses a number of ...
Joint inference of entities, relations, and coreference
AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

Although joint inference is an effective approach to avoid cascading of errors when inferring multiple natural language tasks, its application to information extraction has been limited to modeling only two tasks at a time, leading to modest ...
Mention detection in coreference resolution: survey
Abstract
Coreference Resolution is an essential task for Natural Language Processing (NLP) application, which has a paramount impact on the performance of text summarization, machine translation, text classification, and recognizing textual entailment. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

September 2014

851 pages

ISBN:9781450328944

DOI:10.1145/2649387

General Chairs:
Pierre Baldi
University of California, Irvine
,
Wei Wang
University of California, Los Angeles

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Health and Human Services

Conference

BCB '14

Sponsor:

SIGBio

BCB '14: ACM-BCB '14

September 20 - 23, 2014

California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
87
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents