Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3428757.3429108acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

A Comparison of Two Database Partitioning Approaches that Support Taxonomy-Based Query Answering

Published: 27 January 2021 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we address the topic of identification of cohorts of similar patients in a database of electronic health records. We follow the conjecture that retrieval of similar patients can be supported by an underlying distributed database design. Hence we propose a fragmentation based on partitioning the health records and present a benchmark of two implementation variants in comparison to an off-the-shelf data distribution approach provided by Apache Ignite. While our main use case in this paper is cohort identification, our approach has advantages for taxonomy-based query answering in other (non-medical) domains.

    References

    [1]
    Ahmed Al-Ghezi and Lena Wiese. 2018. Adaptive workload-based partitioning and replication for RDF graphs. In International Conference on Database and Expert Systems Applications. Springer, 250--258.
    [2]
    Mazhar Ali, Kashif Bilal, Samee U Khan, Bharadwaj Veeravalli, Keqin Li, and Albert Y Zomaya. 2015. DROPS: division and replication of data in cloud for optimal performance and security. IEEE Transactions on Cloud computing 6, 2 (2015), 303--315.
    [3]
    J. Barrasa. 2019. neosemantics. https://github.com/jbarrasa/neosemantics version 3.4.0.2.
    [4]
    Mary Regina Boland, George Hripcsak, Yufeng Shen, Wendy K Chung, and Chunhua Weng. 2013. Defining a comprehensive verotype using electronic health records for personalized medicine. Journal of the American Medical Informatics Association 20, e2 (09 2013), e232-e238. https://doi.org/10.1136/amiajnl-2013-001932 arXiv:https://academic.oup.com/jamia/article-pdf/20/e2/e232/6113544/20-e2-e232.pdf
    [5]
    Ferdinand Bollwein and Lena Wiese. 2018. Keeping secrets by separation of duties while minimizing the amount of cloud servers. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXVII. Springer, 1--40.
    [6]
    Mirko Michele Dimartino, Andrea Calì, Alexandra Poulovassilis, and Peter T Wood. 2019. Efficient Ontological Query Answering by Rewriting into Graph Queries. In International Conference on Flexible Query Answering Systems. Springer, 75--84.
    [7]
    Saurabh Gombar, Alison Callahan, Robert Califf, Robert Harrington, and Nigam H Shah. 2019. It is time to learn from patients like mine. npj Digital Medicine 2, 1 (2019), 16.
    [8]
    T. F. Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38 (1985), 293 - 306.
    [9]
    B. Haarbrandt, B. Schreiweis, S. Rey, U. Sax, et al. 2018. HIGHmed - An Open Platform Approach to Enhance Care and Research across Institutional Boundaries. Methods of Information in Medicine 57, S 01 (Jul 2018), e66-e81.
    [10]
    Donald Kossmann. 2000. The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32, 4 (2000), 422--469.
    [11]
    Choong Ho Lee and Hyung-Jin Yoon. 2017. Medical big data: promise and challenges. Kidney research and clinical practice 36, 1 (2017), 3.
    [12]
    Wei Lu, Jiajia Hou, Ying Yan, Meihui Zhang, Xiaoyong Du, and Thomas Moscibroda. 2017. MSQL: efficient similarity search in metric spaces using SQL. The VLDB Journal 26, 6 (2017), 829--854.
    [13]
    B Middleton, DF Sittig, and A Wright. 2016. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearbook of medical informatics 25, S 01 (2016), S103-S116.
    [14]
    Gabriela Montoya, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2017. Decomposing federated queries in presence of replicated fragments. Journal of Web Semantics 42 (2017), 1--18.
    [15]
    Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh, Susanne Churchill, and Isaac Kohane. 2010. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 17, 2 (2010), 124--130.
    [16]
    Abdul Quamar, Jannik Straube, and Yuanyuan Tian. 2020. Enabling Rich Queries Over Heterogeneous Data From Diverse Sources In HealthCare. In CIDR. cidrdb.org.
    [17]
    Susan Rea, Jyotishman Pathak, Guergana Savova, Thomas A. Oniki, Les Westberg, Calvin E. Beebe, Cui Tao, Craig G. Parker, Peter J. Haug, Stanley M. Huff, and Christopher G. Chute. 2012. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of Biomedical Informatics 45, 4 (2012), 763 - 771. https://doi.org/10.1016/j.jbi.2012.01.009 Translating Standards into Practice: Experiences and Lessons Learned in Biomedicine and Health Care.
    [18]
    A. Roy and A. Olmsted. 2017. Distributed query processing and data sharing. In 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST). 221--224.
    [19]
    Charles Safran, Meryl Bloomrosen, W Edward Hammond, Steven Labkoff, Suzanne Markel-Fox, Paul C Tang, and Don E Detmer. 2007. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. Journal of the American Medical Informatics Association 14, 1 (2007), 1--9.
    [20]
    Kai-Uwe Sattler. 2018. Distributed Query Processing. In Encyclopedia of Database Systems (2nd ed.). Springer.
    [21]
    Araek Tashkandi, Ingmar Wiese, and Lena Wiese. 2018. Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems. Big data research 13 (2018), 52--64.
    [22]
    Caetano Traina Jr, Andre Moriyama, Guilherme Rocha, Robson Cordeiro, Cristina DA Ciferri, and Agma Traina. 2019. The SimilarQL framework: similarity queries in plain SQL. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. 468--471.
    [23]
    U.S. National Library of Medicine. 2019. Medical Subject Headings. https://www.nlm.nih.gov/mesh/
    [24]
    Ingmar Wiese, Nicole Sarna, Lena Wiese, Araek Tashkandi, and Ulrich Sax. 2018. Concept acquisition and improved in-database similarity analysis for medical data. Distributed and Parallel Databases (2018), 1--25.
    [25]
    Lena Wiese. 2014. Clustering-based fragmentation and data replication for flexible query answering in distributed databases. Journal of Cloud Computing 3, 1 (2014), 18.
    [26]
    Lena Wiese. 2015. Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases. DeGruyter/Oldenbourg.
    [27]
    Lena Wiese, Armin O. Schmitt, and Mehmet Gültas. 2019. Big Data Technologies for DNA Sequencing. In Encyclopedia of Big Data Technologies. Springer.
    [28]
    Hongjiu Zhang, Fan Zhu, Hiroko H Dodge, Gerald A Higgins, Gilbert S Omenn, Yuanfang Guan, and the Alzheimer's Disease Neuroimaging Initiative. 2018. A similarity-based approach to leverage multi-cohort medical data on the diagnosis and prognosis of Alzheimer's disease. GigaScience 7, 7 (07 2018). https://doi.org/10.1093/gigascience/giy085 arXiv:https://academic.oup.com/gigascience/article-pdf/7/7/giy085/25192114/giy085.pdf giy085.
    [29]
    Weiguo Zheng, Lei Zou, Wei Peng, Xifeng Yan, Shaoxu Song, and Dongyan Zhao. 2016. Semantic SPARQL similarity search over RDF knowledge graphs. Proceedings of the VLDB Endowment 9, 11 (2016), 840--851.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services
    November 2020
    492 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Johannes Kepler University, Linz, Austria

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed database system
    2. relational databases
    3. taxonomy-based query answering

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    iiWAS '20

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 52
      Total Downloads
    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media