Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357384.3357934acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

Published: 03 November 2019 Publication History

Abstract

Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. Such conditions for medical knowledge are crucial for decision-making in various medical applications, which is missing in existing medical KGs. In this paper, we aim to discovery medical knowledge conditions from texts to enrich KGs. Electronic Medical Records (EMRs) are systematized collection of clinical data and contain detailed information about patients, thus EMRs can be a good resource to discover medical knowledge conditions. Unfortunately, the amount of available EMRs is limited due to reasons such as regularization. Meanwhile, a large amount of medical question answering (QA) data is available, which can greatly help the studied task. However, the quality of medical QA data is quite diverse, which may degrade the quality of the discovered medical knowledge conditions. In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation. We conduct series of experiments on real-world medical datasets to demonstrate that the proposed method can discover meaningful and accurate conditions for medical knowledge by leveraging both EMR and QA data. Further, the proposed method is tested on synthetic datasets to validate its effectiveness under various scenarios.

References

[1]
Melisachew Wudage Chekol, Giuseppe Pirrò, Joerg Schoenfisch, and Heiner Stuckenschmidt. 2017. Marrying Uncertainty and Time in Knowledge Graphs. In AAAI. 88--94.
[2]
Yang Chen and Daisy Zhe Wang. 2014. Knowledge expansion over probabilistic knowledge bases. In SIGMOD. 649--660.
[3]
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In SIGKDD . 601--610.
[4]
Xin Luna Dong, Barna Saha, and Divesh Srivastava. 2012. Less is more: selecting sources wisely for integration. In VLDB. 37--48.
[5]
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018. Crowdsourcing Ground Truth for Medical Relation Extraction. TiiS, Vol. 8, 2 (2018), 11:1--11:20.
[6]
Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Edwin Lewis-Kelham, Gerard De Melo, and Gerhard Weikum. 2011. YAGO2:exploring and querying world knowledge in time, space, context, and many languages. In WWW. 229--232.
[7]
Abhyuday N Jagannatha and Hong Yu. 2016. Bidirectional RNN for Medical Event Detection in Electronic Health Records. In NAACL-HLT . 473.
[8]
Jingchi Jiang, Chao Zhao, Yi Guan, and Qiubin Yu. 2017. Learning and inference in knowledge-based probabilistic model for medical diagnosis. Knowledge-Based Systems (2017).
[9]
Charles Jochim and Lea Deleris. 2017. Named Entity Recognition in the Medical Domain with Constrained CRF Models. In EACL . 839--849.
[10]
V Law, C Knox, Y Djoumbou, T Jewison, A. C. Guo, Y. Liu, A Maciejewski, D Arndt, M Wilson, and V Neveu. 2014. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Research, Vol. 42, Database issue (2014), 1091--7.
[11]
J. Lehmann. 2015. DBpedia: A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.
[12]
Cheng Li, Santu Rana, Dinh Phung, and Svetha Venkatesh. 2016b. Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records. Knowledge-Based Systems, Vol. 99, C (2016), 168--182.
[13]
Qi Li, Yaliang Li, Jing Gao, Wei Fan, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD. 1187--1198.
[14]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2015. Truth finding on the deep web: is the problem solved? PVLDB, Vol. 6, 2 (2015), 97--108.
[15]
Xian Li, Weiyi Meng, and C Yu. 2011. T-verifier: Verifying truthfulness of fact statements. In ICDE. 63--74.
[16]
Yaliang Li, Nan Du, Chaochun Liu, Yusheng Xie, Wei Fan, Qi Li, Jing Gao, and Huan Sun. 2017. Reliable medical diagnosis from crowdsourcing: Discover trustworthy answers from non-experts. In WSDM. 253--261.
[17]
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Wei Fan, Wei Fan, Wei Fan, and Jiawei Han. 2016a. A Survey on Truth Discovery. Acm Sigkdd Explorations Newsletter, Vol. 17, 2 (2016), 1--16.
[18]
Xueling Lin and Lei Chen. 2018. Domain-Aware Multi-Truth Discovery from Conflicting Sources. PVLDB, Vol. 11, 5 (2018), 635--647.
[19]
Xuan Liu, Xin Luna Dong, Beng Chin Ooi, and Divesh Srivastava. 2011. Online Data Fusion. PVLDB, Vol. 4, 11 (2011), 932--943.
[20]
Shanshan Lyu, Wentao Ouyang, Huawei Shen, and Xueqi Cheng. 2017. Truth Discovery by Claim and Source Embedding. In CIKM. 2183--2186.
[21]
Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation. In SIGKDD . 745--754.
[22]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science (2013).
[23]
Changsung Moon, Paul Jones, and Nagiza F. Samatova. 2017. Learning Entity Type Embeddings for Knowledge Graph Completion. In CIKM . 2215--2218.
[24]
Jingchao Ni, Hongliang Fei, Wei Fan, and Xiang Zhang. 2017. Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network. In ICDM . 1009--1014.
[25]
Jeff Pasternack and Dan Roth. 2010. Knowing what to believe (when you already know something). In COLING . 877--885.
[26]
Ying Shen, Yang Deng, Jin Zhang, Yaliang Li, Nan Du, Wei Fan, Min Yang, and Kai Lei. 2018. IDDAT: An Ontology-Driven Decision Support System for Infectious Disease Diagnosis and Therapy. In ICDM Workshops. 1417--1422.
[27]
Julien Tourille, Olivier Ferret, Aurelie Neveol, and Xavier Tannier. 2017. Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers. In ACL. 224--230.
[28]
Ruobing Xie, Zhiyuan Liu, Fen Lin, and Leyu Lin. 2018. Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence. In AAAI .
[29]
Hao Xin, Rui Meng, and Lei Chen. 2018. Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base. In SIGMOD . 1349--1361.
[30]
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth discovery with multiple conflicting information providers on the web. In SIGKDD . 1048--1052.
[31]
Xiaoxin Yin and Wenzhao Tan. 2011. Semi-supervised truth discovery. In WWW. 217--226.
[32]
Q. Yuan, J. Gao, D. Wu, S. Zhang, H Mamitsuka, and S. Zhu. 2016. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics, Vol. 32, 12 (2016), i18--i27.
[33]
Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip S. Yu. 2018. On the Generative Discovery of Structured Medical Knowledge. In SIGKDD . 2720--2728.

Cited By

View all
  • (2023)Interpretable multi-hop knowledge reasoning for gastrointestinal diseaseAnnals of Operations Research10.1007/s10479-023-05650-6Online publication date: 30-Oct-2023
  • (2022)Academic integrity among medical students and postgraduate trainees in the teaching hospitals of South Punjab PakistanHealth Information & Libraries Journal10.1111/hir.1245839:4(377-384)Online publication date: 14-Oct-2022
  • (2021)A Secure CDM-Based Data Analysis Platform (SCAP) in Multi-Centered Distributed SettingApplied Sciences10.3390/app1119907211:19(9072)Online publication date: 29-Sep-2021
  • Show More Cited By

Index Terms

  1. MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
      November 2019
      3373 pages
      ISBN:9781450369763
      DOI:10.1145/3357384
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 November 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. knowledge discovery
      2. multi-source data
      3. truth discovery

      Qualifiers

      • Research-article

      Funding Sources

      • Shenzhen Fundamental Research Project
      • National Natural Science Foundation of China
      • Shenzhen Project
      • PCL Future Regional Network Facilities for Large-scale Experiments and Applications

      Conference

      CIKM '19
      Sponsor:

      Acceptance Rates

      CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 02 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Interpretable multi-hop knowledge reasoning for gastrointestinal diseaseAnnals of Operations Research10.1007/s10479-023-05650-6Online publication date: 30-Oct-2023
      • (2022)Academic integrity among medical students and postgraduate trainees in the teaching hospitals of South Punjab PakistanHealth Information & Libraries Journal10.1111/hir.1245839:4(377-384)Online publication date: 14-Oct-2022
      • (2021)A Secure CDM-Based Data Analysis Platform (SCAP) in Multi-Centered Distributed SettingApplied Sciences10.3390/app1119907211:19(9072)Online publication date: 29-Sep-2021
      • (2020)Knowledge-Based Biomedical Data ScienceAnnual Review of Biomedical Data Science10.1146/annurev-biodatasci-010820-0916273:1(23-41)Online publication date: 20-Jul-2020
      • (2020)Design and Implementation of Medical Process Visualization CDSS Oriented to NCCN GuidelinesProceedings of the 2020 International Conference on Internet Computing for Science and Engineering10.1145/3424311.3424322(30-34)Online publication date: 14-Jan-2020
      • (2020)A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical PapersProceedings of the ACM/IEEE Joint Conference on Digital Libraries in 202010.1145/3383583.3398602(397-400)Online publication date: 1-Aug-2020
      • (2020)Graph-Based Natural Language Processing for the Pharmaceutical IndustryProvenance in Data Science10.1007/978-3-030-67681-0_6(75-110)Online publication date: 28-Dec-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media