Computer Science > Computation and Language
[Submitted on 28 Nov 2016 (this version), latest version 3 Mar 2017 (v2)]
Title:Developing a cardiovascular disease risk-factors annotated corpus of Chinese electronic medical records
View PDFAbstract:Objective The goal of this study was to build a corpus of cardiovascular disease (CVD) risk-factor annotations based on Chinese electronic medical records (CEMRs). This corpus is intended to be used to develop a risk-factor information extraction system that, in turn, can be applied as a foundation for the further study of the progress of risk-factors and CVD. Materials and Methods We designed a light-annotation-task to capture CVD-risk-factors with indicators, temporal attributes and assertions explicitly displayed in the records. The task included: 1) preparing data; 2) creating guidelines for capturing annotations (these were created with the help of clinicians); 3) proposing annotation method including building the guidelines draft, training the annotators and updating the guidelines, and corpus construction. Results The outcome of this study was a risk-factor-annotated corpus based on de-identified discharge summaries and progress notes from 600 patients. Built with the help of specialists, this corpus has an inter-annotator agreement (IAA) F1-measure of 0.968, indicating a high reliability. Discussion Our annotations included 12 CVD-risk-factors such as Hypertension and Diabetes. The annotations can be applied as a powerful tool to the management of these chronic diseases and the prediction of CVD. Conclusion Guidelines for capturing CVD-risk-factor annotations from CEMRs were proposed and an annotated corpus was established. The obtained document-level annotations can be applied in future studies to monitor risk-factors and CVD over the long term.
Submission history
From: Jia Su [view email][v1] Mon, 28 Nov 2016 08:20:54 UTC (422 KB)
[v2] Fri, 3 Mar 2017 08:52:27 UTC (2,129 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.