Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Yanjun Gao; Dmitriy Dligach; Timothy Miller; Samuel Tesch; Ryan Laffin; Matthew M. Churpek; Majid Afshar

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, Majid Afshar

Abstract

Applying methods in natural language processing on electronic health records (EHR) data has attracted rising interests. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there are a paucity of annotated corpus built to model clinical diagnostic thinking, a processing involving text understanding, domain knowledge abstraction and reasoning. In this work, we introduce a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning and summarization. We create an annotated corpus based on a large collection of publicly available daily progress notes, a type of EHR that is time-sensitive, problem-oriented, and well-documented by the format of Subjective, Objective, Assessment and Plan (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. This new suite aims at training and evaluating future NLP models for clinical text understanding, clinical knowledge representation, inference and summarization.

Anthology ID:: 2022.lrec-1.587
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 5484–5493
Language:
URL:: https://aclanthology.org/2022.lrec-1.587
DOI:
Bibkey:
Cite (ACL):: Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, and Majid Afshar. 2022. Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5484–5493, Marseille, France. European Language Resources Association.
Cite (Informal):: Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding (Gao et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.587.pdf

PDF Cite Search