Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

Published: 24 July 2015 Publication History

Abstract

We explore methods for effectively extracting information from clinical narratives that are captured in a public health consulting phone service called HealthLink. Our research investigates the application of state-of-the-art natural language processing and machine learning to clinical narratives to extract information of interest. The currently available data consist of dialogues constructed by nurses while consulting patients by phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise. When we extract the patient-related information from the noisy data, we have to remove or correct at least two kinds of noise: explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, and variants of terms, and implicit noise, which includes non-patient information and patient's untrustworthy information. To filter explicit noise, we propose our own biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms, temperature, and other types of named entities (which show patients’ personal information such as age and sex), we propose a bootstrapping-based pattern learning process to detect a variety of arbitrary variations of named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information, and we visualize the named entities by constructing a graph that shows the relations between named entities. The objective of this knowledge discovery task is to identify associations between biomedical terms and to clearly expose the trends of patients’ symptoms and concern; the experimental results show that we achieve reasonable performance with our noise reduction methods.

References

[1]
ACE. 2008. Automatic Content Extraction. English annotation guidelines for relations. Linguistic Data Consortium, version 6.0--2008.01.07 edition. Retrieved from http: //www.ldc.upenn.edu/Projects/ACE/.
[2]
A. R. Aronson. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In Proceedings of AMIA Symposium. 17--21.
[3]
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H. P. Kriegel. 2008. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 23, 9, 207.
[4]
A. J. Butte and R. Chen. 2006. Finding disease-related genomic experiments within an international repository: First steps in translational bioinformatics. In Proceedings of the AMIA Annual Symposium. 106--110.
[5]
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. 2010. Coupled semi-supervised learning for information extraction. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, NY, 101--110.
[6]
A. X. Chang and C. D. Manning. 2012. SUTIME: A library for recognizing and normalizing time expressions. In Proceedings of the Eight International Conference on Language Resources and Evaluation. Istanbul, Turkey, 3735--3740.
[7]
H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki, and J. Tsujii. 2006. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In Proceedings of the Pacific Symposium on Biocomputing. 4--15.
[8]
M. Dai, N. H. Shah, W. Xuan, M. A. Musen, S. J. Watson, B. D. Athey, and F. Meng. 2008. An efficient solution for mapping free text to ontology terms. In Proceedings of the AMIA Summit on Translational Bioinformatics. 21.
[9]
F. J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM 7, 3, 171--176.
[10]
J. C. Denny, J. F. Peterson, N. N. Choma, H. Xu, R. A. Miller, L. Bastarache, and N. B. Peterson. 2010. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. Journal of the American Medical Information Association 17, 4, 383--8.
[11]
R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 shared task: Learning to detect hedges and their scope in natural language text. In Proceedings of the 14th CoNLL Conference -- Shared Task. 1--12.
[12]
M. Fiszman, W. Chapman, D. Aronsky, R. Evans, and P. Haug. 2000. Automatic detection of acute bacterial pneumonia from chest X-ray reports. Journal of the American Medical Information Association 7, 6, 593--604.
[13]
S. Gaudan, A. Jimeno Yepes, V. Lee, and D. Rebholz-Schuhmann. 2008. Combining evidence, specificity, and proximity towards the normalization of gene ontology terms in text. EURASIP Journal on Bioinformatics and Systems Biology 8, 1, 1--9.
[14]
T. Hao. 2012. Bootstrap-based equivalent pattern learning for collaborative question answering. LNCS, 318--329.
[15]
A. Holzinger, R. Geierhofer, F. Modritscher, and R. Tatzl. 2008. Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. Journal of Universal Computer Science 14, 22, 3781--3795.
[16]
A. Holzinger, K. M. Simonic, and P. Yildirim. 2012. Disease-disease relationships for rheumatic diseases: Web-based biomedical textmining an knowledge discovery to assist medical decision making. In Proceedings of the IEEE 36th Annual Computer Software and Applications Conference (COMPSAC). 573--580.
[17]
A. Holzinger, P. Yildirim, M. Geier, and K.-M. Simonic. 2013. Quality-based knowledge discovery from medical text on the web. In Quality Issues in the Management of Web Information, Intelligent Systems Reference Library, ISRL 50. Springer, Berlin, 145--158.
[18]
Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214.
[19]
A. Jimeno, E. Jimenez-Ruiz, V. Lee, S. Gaudan, R. Berlanga, and D. Rebholz-Schuhmann. 2008. Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics 9, Suppl 3, S3.
[20]
L. Karttunen and A. Zaenen. 2005. Veridicity. In Proceedings of the Dagstuhl Seminar. Retrieved from http://drops.dagstuhl.de/opus/volltexte/2005/314/pdf/05151.KarttunenLauri.Paper.314.pdf.
[21]
J. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. 1--9.
[22]
Z. Kozareva and E. Hovy. 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL. 1482--1491.
[23]
M. Li and J. Patrick. 2012. Extracting temporal information from electronic patient records. In Proceedings of the AMIA Annual Symposium. 542--551.
[24]
X. Ling and D. S. Weld. 2010. Temporal information extraction. In Proceedings of the 24th Conference on Artificial Intelligence (AAAI). 1385--1390.
[25]
T. McIntosh. 2010. Unsupervised discovery of negative categories in lexicon bootstrapping. EMNLP 356--365.
[26]
A. Mottaz, Y. L. Yip, P. Ruch, and A. Veuthey. 2007. Mapping protein information to disease terminologies. Journal of Integrative Bioinformatics 4, 3, 79.
[27]
F. Mougin, A. Burgun, and O. Bodenreider. 2006. Mapping data elements to terminological resources for integrating biomedical data sources. BMC Bioinformatics 7, S3.
[28]
N. Nakashole, M. Theobald, and G. Weikum. 2010. Find your advisor: Robust knowledge gathering from the web. In Proceedings of the 13th International Workshop on the Web and Databases. 6.
[29]
A. Névéol, W. Kim, John W. Wilbur, and Z. Lu. 2009. Exploring two biomedical text genres for disease recognition. In Proceedings of the Workshop on BioNLP. 144--152.
[30]
J. Pustejovsky, M. Verhagen, R. Saurí, J. Littman, R. Gaizauskas, G. Katz, I. Mani, R. Knippen, and A. Setzer. 2006. TimeBank 1.2. Linguistic Data Consortium, LDC2006T08.
[31]
E. Riloff and J. Shepherd. 1997. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing. Providence, RI, 117--124.
[32]
E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multilevel bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference. 474--479.
[33]
S. Robertson and S. Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th ACM Conference on Research and Development in Information Retrieval (SIGIR'94). ACM Press, 232--241.
[34]
P. Ruch, R. Baud, and A. Geissbuhler. 2003. Using lexical disambiguation and named entity recognition to improve spelling correction in the electronic patient record. Artificial Intelligence in Medicine 29, 12, 169--184.
[35]
R. Saurí and J. Pustejovsky. 2012. Are you sure that this happened? Assessing the factuality degree of events in text. Computational Linguistics 38, 2, 261--299.
[36]
M. Skeppstedt, M. Kvist, and H. Dalianis. 2012. Rule-based entity recognition and coverage of SNOMED-CT in Swedish clinical text. LREC 1250--1257.
[37]
J. Strötgen and M. Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324.
[38]
L. K. Tanabe and W. J. Wilbur. 2006. A priority model for named entities. In Proceedings of HLT-NAACL BioNLP Workshop. 33--40.
[39]
Ö. Uzuner, B. South, S. Shen, and S. DuVall. 2010. i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Information Association 18, 5, 552--556.
[40]
Y. Wang, M. Zhu, L. Qu, M. Spaniol, and G. Weikum. 2010. Timely Yago: Harvesting, querying, and visualizing temporal knowledge from Wikipedia. In EDBT. 697--700.
[41]
P. Willet. 1988. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management 24, 577--597.
[42]
H. Yu and E. Agichtein. 2003. Extracting synonymous gene and protein terms from biological literature. Bioinformatics 19, 1, i340--i349.
[43]
A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. 2005. Biocreative task 1a: Gene mention finding evaluation. BMC Bioinformatics 6, Suppl.1, S2.

Cited By

View all
  • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
  • (2022)How artificial intelligence (AI) supports nursing education: profiling the roles, applications, and trends of AI in nursing education research (1993–2020)Interactive Learning Environments10.1080/10494820.2022.208657932:1(373-392)Online publication date: 26-Jun-2022
  • (2016)Layered Multicast Resource Allocation with Limited Feedback Scheme in Single Frequency NetworksWireless Personal Communications: An International Journal10.1007/s11277-015-3044-487:4(1131-1146)Online publication date: 1-Apr-2016

Index Terms

  1. Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 4
    Regular Papers and Special Section on Intelligent Healthcare Informatics
    August 2015
    419 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2801030
    • Editor:
    • Yu Zheng
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2015
    Accepted: 01 July 2014
    Revised: 01 May 2014
    Received: 01 October 2013
    Published in TIST Volume 6, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Tele-health mining
    2. biomedical text mining
    3. effective information retrieval
    4. named entity recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • the iCORE division of Alberta Innovates Technology Futures
    • the Alberta Innovates Centre for Machine Learning (AICML)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
    • (2022)How artificial intelligence (AI) supports nursing education: profiling the roles, applications, and trends of AI in nursing education research (1993–2020)Interactive Learning Environments10.1080/10494820.2022.208657932:1(373-392)Online publication date: 26-Jun-2022
    • (2016)Layered Multicast Resource Allocation with Limited Feedback Scheme in Single Frequency NetworksWireless Personal Communications: An International Journal10.1007/s11277-015-3044-487:4(1131-1146)Online publication date: 1-Apr-2016

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media