Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3410566.3410608acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Benchmarking a distributed database design that supports patient cohort identification

Published: 25 August 2020 Publication History
  • Get Citation Alerts
  • Abstract

    In this article we present the implementation and benchmarking of a medical information system on top of a distributed relational database system. We enhanced a distributed database system with the implementation of a clustering (based on similarity of disease terms) that induces a primary horizontal fragmentation of a data table and derived fragmentations of secondary tables. With our clustering-based fragmentation, data locality for similarity-based query answering is ensured so that data do not have to be sent unnecessarily over the network. In our benchmark we show that we achieve a significant efficiency gain when retrieving all relevant related answers.

    References

    [1]
    S. Gombar, A. Callahan, R. Califf, R. Harrington, and N. H. Shah. It is time to learn from patients like mine. npj Digital Medicine, 2(1):1--3, 2019.
    [2]
    T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293 -- 306, 1985.
    [3]
    K. Inoue and L. Wiese. Generalizing conjunctive queries for informative answers. In International Conference on Flexible Query Answering Systems, pages 1--12. Springer, 2011.
    [4]
    A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-Wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark. MIMIC-III, a freely accessible critical care database. Scientific data, 3(1):1--9, 2016.
    [5]
    V. Kantere. Query similarity for approximate query answering. In International Conference on Database and Expert Systems Applications, pages 355--367. Springer, 2016.
    [6]
    Y. Lu, A. Shanbhag, A. Jindal, and S. Madden. Adaptdb: adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 10(5):589--600, 2017.
    [7]
    D. Martinenghi and R. Torlone. Taxonomy-based relaxation of query answering in relational databases. The VLDB Journal, 23(5):747--769, 2014.
    [8]
    S. Murphy and A. Wilcox. Mission and sustainability of informatics for integrating biology and the bedside (i2b2). eGEMs, 2(2), 2014.
    [9]
    National Library of Medicine. Medical subject headings, Nov 2019.
    [10]
    S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS), 9(4):680--710, 1984.
    [11]
    M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1991.
    [12]
    A. Poulovassilis. Applications of flexible querying to graph data. In Graph Data Management, Data-Centric Systems and Applications, pages 97--142. Springer, 2018.
    [13]
    K. Tan. Distributed database systems. In Encyclopedia of Database Systems (2nd ed.). Springer, 2018.
    [14]
    A. Tashkandi, I. Wiese, and L. Wiese. Efficient in-database patient similarity analysis for personalized medical decision support systems. Big data research, 13:52--64, 2018.
    [15]
    I. Wiese, N. Sarna, L. Wiese, A. Tashkandi, and U. Sax. Concept acquisition and improved in-database similarity analysis for medical data. Distributed and Parallel Databases, pages 1--25, 2018.
    [16]
    L. Wiese. Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases. DeGruyter/Oldenbourg, 2015.
    [17]
    L. Wiese, A. O. Schmitt, and M. Gültas. Big data technologies for DNA sequencing. In Encyclopedia of Big Data Technologies. Springer, 2019.
    [18]
    E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 17--30, 2015.

    Cited By

    View all
    • (2023)Security and Privacy Challenges in Distributed Database Management Systems2023 Global Conference on Information Technologies and Communications (GCITC)10.1109/GCITC60406.2023.10426303(1-6)Online publication date: 1-Dec-2023
    • (2021)Load Balanced Semantic Aware Distributed RDF GraphProceedings of the 25th International Database Engineering & Applications Symposium10.1145/3472163.3472167(127-133)Online publication date: 14-Jul-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IDEAS '20: Proceedings of the 24th Symposium on International Database Engineering & Applications
    August 2020
    252 pages
    ISBN:9781450375030
    DOI:10.1145/3410566
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed database system
    2. relational databases
    3. similarity-based query answering

    Qualifiers

    • Research-article

    Funding Sources

    • Fraunhofer

    Conference

    IDEAS 2020

    Acceptance Rates

    IDEAS '20 Paper Acceptance Rate 27 of 57 submissions, 47%;
    Overall Acceptance Rate 74 of 210 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Security and Privacy Challenges in Distributed Database Management Systems2023 Global Conference on Information Technologies and Communications (GCITC)10.1109/GCITC60406.2023.10426303(1-6)Online publication date: 1-Dec-2023
    • (2021)Load Balanced Semantic Aware Distributed RDF GraphProceedings of the 25th International Database Engineering & Applications Symposium10.1145/3472163.3472167(127-133)Online publication date: 14-Jul-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media