Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2649387.2649437acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Joint inference for end-to-end coreference resolution for clinical notes

Published: 20 September 2014 Publication History

Abstract

Recent US government initiatives have led to wide adoption of Electronic Health Records (EHRs). More and more health care institutions are storing patients' data in an electronic format. These EHRs contain valuable information which can be used in important applications like Clinical Decision Support (CDS). So, Information Extraction (IE) from EHRs is a very promising research area. This paper presents a robust method for end-to-end coreference resolution for clinical narratives. For our experiments, we used the datasets provided by i2b2/VA team as part of i2b2/VA 2011 shared task on coreference resolution. One part of this data was annotated according to ODIE guidelines and another part was annotated according to i2b2 guidelines. We designed a global inference strategy for end-to-end coreference resolution which jointly determines the mention types and coreference relations between them. This technique avoids the problem of error-propagation which is common in pipeline systems. For pronominal resolution, we developed different strategies for resolving different pronouns. We report the best results to date on both ODIE and i2b2 data. We got the best results for both types of cases: (1) where gold mentions are already given and (2) for end-to-end coreference resolution. ODIE and i2b2 data are annotated quite differently. Best results on both types of data proves the robustness of our algorithm.

References

[1]
AnatomicalTerms. http://en.wikipedia.org/wiki/anatomical_terms_of_location (accessed may 10, 2014), 2014.
[2]
P. Anick, P. Hong, N. Xue, and Y. Yang. Coreference resolution for electronic medical records. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.
[3]
A. Aronson and F. Lang. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229, 2010.
[4]
A. Bagga and B. Baldwin. Algorithms for scoring coreference chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566. Citeseer, 1998.
[5]
E. Bengtson and D. Roth. Understanding the value of features for coreference resolution. In Proceedings of the Conference on EMNLP, pages 294--303. Association for Computational Linguistics, 2008.
[6]
A. Bodnari, P. Szolovits, and Ö. Uzuner. Mcores: a system for noun phrase coreference resolution for clinical records. Journal of the American Medical Informatics Association, 19(5):906--912, 2012.
[7]
V. Bryl, C. Giuliano, L. Serafini, and K. Tymoshenko. Using background knowledge to support coreference resolution. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010), August, 2010.
[8]
J. Cai, E. Mujdricza-Maydt, Y. Hou, and M. Strube. Weakly supervised graph-based coreference resolution for clinical texts. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data., 2011.
[9]
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[10]
K.-W. Chang, R. Samdani, A. Rozovskaya, N. Rizzolo, M. Sammons, and D. Roth. Inference protocols for coreference resolution. In CoNLL Shared Task, pages 40--44, Portland, Oregon, USA, 2011. Association for Computational Linguistics.
[11]
H. Dai, C. Chen, C. Wu, P. Lai, R. Tsai, and W. Hsu. Coreference resolution of medical concepts in discharge summaries by exploiting contextual information. Journal of the American Medical Informatics Association, 19(5):888--896, 2012.
[12]
P. Denis, J. Baldridge, et al. Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural, 42:87--96, 2009.
[13]
P. Gooch and A. Roudsari. Lexical patterns, features and knowledge resources for coreference resolution in clinical notes. Journal of Biomedical Informatics, 2012.
[14]
C. Grouin, M. Dinarelli, S. Rosset, G. Wisniewski, and P. Zweigenbaum. Coreference resolution in clinical reports - the limsi participation in the i2b2/va 2011 challenge. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.
[15]
P. Jindal. Information extraction for clinical narratives. PhD thesis, University of Illinois at Urbana-Champaign, 2014.
[16]
P. Jindal and D. Roth. Using knowledge and constraints to find the best antecedent. In COLING, pages 1327--1342, 2012.
[17]
P. Jindal and D. Roth. End-to-end coreference resolution for clinical narratives. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (IJCAI), pages 2106--2112. AAAI Press, 2013.
[18]
P. Jindal and D. Roth. Extraction of events and temporal expressions from clinical narratives. Journal of biomedical informatics (JBI), 46:S13--S19, 2013.
[19]
P. Jindal and D. Roth. Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives. Journal of the American Medical Informatics Association (JAMIA), 20(2):356--362, 2013.
[20]
P. Jindal and D. Roth. Using soft constraints in joint inference for clinical concept recognition. In EMNLP, pages 1808--1814, 2013.
[21]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[22]
M. Lan, J. Zhao, K. Zhang, H. Shi, and J. Cai. Comparative investigation on learning-based and rule-based approaches to coreference resolution in clinic domain: A case study in i2b2 challenge 2011 task 1. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data. i2b2. Boston, MA, USA, 2011.
[23]
X. Luo. On coreference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25--32. Association for Computational Linguistics, 2005.
[24]
A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
[25]
MeSH. http://www.nlm.nih.gov/mesh/meshhome.html (accessed may 10, 2014), 2014.
[26]
MicrosoftLists. http://research.microsoft.com/en-us/projects/ehuatuo/default.aspx (accessed may 10, 2014), 2014.
[27]
V. Ng and C. Cardie. Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, pages 1--7. Association for Computational Linguistics, 2002.
[28]
V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 104--111. Association for Computational Linguistics, 2002.
[29]
C. Paice and G. Husk. Towards the automatic recognition of anaphoric features in english text: the impersonal pronoun "it". Computer Speech & Language, 2(2):109--132, 1987.
[30]
H. Poon and P. Domingos. Joint unsupervised coreference resolution with markov logic. In Proceedings of the Conference on EMNLP, pages 650--659. Association for Computational Linguistics, 2008.
[31]
K. Raghunathan, H. Lee, S. Rangarajan, N. Chambers, M. Surdeanu, D. Jurafsky, and C. Manning. A multi-pass sieve for coreference resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 492--501. Association for Computational Linguistics, 2010.
[32]
A. Rahman and V. Ng. Coreference resolution with world knowledge. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 814--824. Association for Computational Linguistics, 2011.
[33]
L. Ratinov and D. Roth. Learning-based multi-sieve co-reference resolution with knowledge. In EMNLP, 2012.
[34]
G. K. Savova, W. W. Chapman, J. Zheng, and R. S. Crowley. Anaphoric relations in the clinical narrative: corpus creation. Journal of the American Medical Informatics Association, 18(4):459--465, 2011.
[35]
SNOMEDCT. http://www.ihtsdo.org/snomed-ct/ (accessed may 10, 2014), 2014.
[36]
W. Soon, H. Ng, and D. Lim. A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4):521--544, 2001.
[37]
UMLS. http://www.nlm.nih.gov/research/umls/ (accessed may 10, 2014), 2014.
[38]
O. Uzuner, A. Bodnari, S. Shen, T. Forbush, J. Pestian, and B. R. South. Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association, 19(5):786--791, 2012.
[39]
M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman. A model-theoretic coreference scoring scheme. In Proceedings of the 6th conference on Message understanding, pages 45--52. Association for Computational Linguistics, 1995.
[40]
H. Ware, C. Mullett, V. Jagannathan, and O. El-Rawas. Machine learning-based coreference resolution of concepts in clinical documents. Journal of the American Medical Informatics Association, 19(5):883--887, 2012.
[41]
Wikipedia. http://en.wikipedia.org/wiki/main_page (accessed may 10, 2014), 2014.
[42]
Y. Xu, J. Liu, J. Wu, Y. Wang, Z. Tu, J. Sun, J. Tsujii, I. Eric, and C. Chang. A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. Journal of the American Medical Informatics Association, 19(5):897--905, 2012.
[43]
H. Yang, A. Willis, A. de Roeck, and B. Nuseibeh. A system for coreference resolution in clinical documents. In Proceedings of the 2011 i2b2/VA/Cincinnati Workshop on Challenges in Natural Language Processing for Clinical Data, 2011.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coreference resolution
  2. health informatics
  3. integer programming
  4. joint inference
  5. natural language processing

Qualifiers

  • Research-article

Funding Sources

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 88
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media