research-article

A Comparison of Two Database Partitioning Approaches that Support Taxonomy-Based Query Answering

Authors:

Jero Mario Schäfer and

Lena WieseAuthors Info & Claims

iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

November 2020

Pages 426 - 435

https://doi.org/10.1145/3428757.3429108

Published: 27 January 2021 Publication History

Abstract

In this paper we address the topic of identification of cohorts of similar patients in a database of electronic health records. We follow the conjecture that retrieval of similar patients can be supported by an underlying distributed database design. Hence we propose a fragmentation based on partitioning the health records and present a benchmark of two implementation variants in comparison to an off-the-shelf data distribution approach provided by Apache Ignite. While our main use case in this paper is cohort identification, our approach has advantages for taxonomy-based query answering in other (non-medical) domains.

References

[1]

Ahmed Al-Ghezi and Lena Wiese. 2018. Adaptive workload-based partitioning and replication for RDF graphs. In International Conference on Database and Expert Systems Applications. Springer, 250--258.

Digital Library

[2]

Mazhar Ali, Kashif Bilal, Samee U Khan, Bharadwaj Veeravalli, Keqin Li, and Albert Y Zomaya. 2015. DROPS: division and replication of data in cloud for optimal performance and security. IEEE Transactions on Cloud computing 6, 2 (2015), 303--315.

[3]

J. Barrasa. 2019. neosemantics. https://github.com/jbarrasa/neosemantics version 3.4.0.2.

[4]

Mary Regina Boland, George Hripcsak, Yufeng Shen, Wendy K Chung, and Chunhua Weng. 2013. Defining a comprehensive verotype using electronic health records for personalized medicine. Journal of the American Medical Informatics Association 20, e2 (09 2013), e232-e238. https://doi.org/10.1136/amiajnl-2013-001932 arXiv:https://academic.oup.com/jamia/article-pdf/20/e2/e232/6113544/20-e2-e232.pdf

[5]

Ferdinand Bollwein and Lena Wiese. 2018. Keeping secrets by separation of duties while minimizing the amount of cloud servers. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXVII. Springer, 1--40.

[6]

Mirko Michele Dimartino, Andrea Calì, Alexandra Poulovassilis, and Peter T Wood. 2019. Efficient Ontological Query Answering by Rewriting into Graph Queries. In International Conference on Flexible Query Answering Systems. Springer, 75--84.

[7]

Saurabh Gombar, Alison Callahan, Robert Califf, Robert Harrington, and Nigam H Shah. 2019. It is time to learn from patients like mine. npj Digital Medicine 2, 1 (2019), 16.

[8]

T. F. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38 (1985), 293 - 306.

[9]

B. Haarbrandt, B. Schreiweis, S. Rey, U. Sax, et al. 2018. HIGHmed - An Open Platform Approach to Enhance Care and Research across Institutional Boundaries. Methods of Information in Medicine 57, S 01 (Jul 2018), e66-e81.

[10]

Donald Kossmann. 2000. The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32, 4 (2000), 422--469.

Digital Library

[11]

Choong Ho Lee and Hyung-Jin Yoon. 2017. Medical big data: promise and challenges. Kidney research and clinical practice 36, 1 (2017), 3.

[12]

Wei Lu, Jiajia Hou, Ying Yan, Meihui Zhang, Xiaoyong Du, and Thomas Moscibroda. 2017. MSQL: efficient similarity search in metric spaces using SQL. The VLDB Journal 26, 6 (2017), 829--854.

Digital Library

[13]

B Middleton, DF Sittig, and A Wright. 2016. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearbook of medical informatics 25, S 01 (2016), S103-S116.

[14]

Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2017. Decomposing federated queries in presence of replicated fragments. Journal of Web Semantics 42 (2017), 1--18.

Digital Library

[15]

Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh, Susanne Churchill, and Isaac Kohane. 2010. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 17, 2 (2010), 124--130.

[16]

Abdul Quamar, Jannik Straube, and Yuanyuan Tian. 2020. Enabling Rich Queries Over Heterogeneous Data From Diverse Sources In HealthCare. In CIDR. cidrdb.org.

[17]

Susan Rea, Jyotishman Pathak, Guergana Savova, Thomas A. Oniki, Les Westberg, Calvin E. Beebe, Cui Tao, Craig G. Parker, Peter J. Haug, Stanley M. Huff, and Christopher G. Chute. 2012. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of Biomedical Informatics 45, 4 (2012), 763 - 771. https://doi.org/10.1016/j.jbi.2012.01.009 Translating Standards into Practice: Experiences and Lessons Learned in Biomedicine and Health Care.

Digital Library

[18]

A. Roy and A. Olmsted. 2017. Distributed query processing and data sharing. In 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST). 221--224.

[19]

Charles Safran, Meryl Bloomrosen, W Edward Hammond, Steven Labkoff, Suzanne Markel-Fox, Paul C Tang, and Don E Detmer. 2007. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. Journal of the American Medical Informatics Association 14, 1 (2007), 1--9.

[20]

Kai-Uwe Sattler. 2018. Distributed Query Processing. In Encyclopedia of Database Systems (2nd ed.). Springer.

[21]

Araek Tashkandi, Ingmar Wiese, and Lena Wiese. 2018. Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems. Big data research 13 (2018), 52--64.

[22]

Caetano Traina Jr, Andre Moriyama, Guilherme Rocha, Robson Cordeiro, Cristina DA Ciferri, and Agma Traina. 2019. The SimilarQL framework: similarity queries in plain SQL. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. 468--471.

Digital Library

[23]

U.S. National Library of Medicine. 2019. Medical Subject Headings. https://www.nlm.nih.gov/mesh/

[24]

Ingmar Wiese, Nicole Sarna, Lena Wiese, Araek Tashkandi, and Ulrich Sax. 2018. Concept acquisition and improved in-database similarity analysis for medical data. Distributed and Parallel Databases (2018), 1--25.

Digital Library

[25]

Lena Wiese. 2014. Clustering-based fragmentation and data replication for flexible query answering in distributed databases. Journal of Cloud Computing 3, 1 (2014), 18.

[26]

Lena Wiese. 2015. Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases. DeGruyter/Oldenbourg.

[27]

Lena Wiese, Armin O. Schmitt, and Mehmet Gültas. 2019. Big Data Technologies for DNA Sequencing. In Encyclopedia of Big Data Technologies. Springer.

[28]

Hongjiu Zhang, Fan Zhu, Hiroko H Dodge, Gerald A Higgins, Gilbert S Omenn, Yuanfang Guan, and the Alzheimer's Disease Neuroimaging Initiative. 2018. A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer's disease. GigaScience 7, 7 (07 2018). https://doi.org/10.1093/gigascience/giy085 arXiv:https://academic.oup.com/gigascience/article-pdf/7/7/giy085/25192114/giy085.pdf giy085.

[29]

Weiguo Zheng, Lei Zou, Wei Peng, Xifeng Yan, Shaoxu Song, and Dongyan Zhao. 2016. Semantic SPARQL similarity search over RDF knowledge graphs. Proceedings of the VLDB Endowment 9, 11 (2016), 840--851.

Digital Library

Index Terms

A Comparison of Two Database Partitioning Approaches that Support Taxonomy-Based Query Answering
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
      2. Parallel and distributed DBMSs
        Relational parallel and distributed DBMSs

Recommendations

Benchmarking a distributed database design that supports patient cohort identification
IDEAS '20: Proceedings of the 24th Symposium on International Database Engineering & Applications

In this article we present the implementation and benchmarking of a medical information system on top of a distributed relational database system. We enhanced a distributed database system with the implementation of a clustering (based on similarity of ...
Read More
Query processing over object views of relational data

This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A ...
Read More
Query Interoperation Among Object-Oriented and Relational Databases
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

We develop an efficient algorithm for the query interoperation among existing heterogeneous object-oriented and relational databases. Our algorithm utilizes a canonical deductive database as a uniform representation of object-oriented schema and data. ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

November 2020

492 pages

ISBN:9781450389228

DOI:10.1145/3428757

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Johannes Kepler University, Linz, Austria

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

iiWAS '20

iiWAS '20: The 22nd International Conference on Information Integration and Web-based Applications & Services

November 30 - December 2, 2020

Chiang Mai, Thailand

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
52
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents