Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
column

The BigDAWG Polystore System

Published: 12 August 2015 Publication History

Abstract

This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that 'no one size fits all'. To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data

References

[1]
Accumulo. https://accumulo.apache.org/.
[2]
L. Amsaleg, A. Tomasic, M. J. Franklin, and T. Urhan. Scrambling query plans to cope with unexpected delays. In Fourth International Conference on Parallel and Distributed Information Systems, 1996, pages 208--219. IEEE, 1996.
[3]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16. ACM, 2002.
[4]
C. Batini, M. Lenzerini, and S. B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323--364, 1986.
[5]
L. Bouganim, F. Fabret, C. Mohan, and P. Valduriez. A dynamic query processing architecture for data integration systems. IEEE Data Eng. Bull., 23(2):42--48, 2000.
[6]
P. G. Brown. Overview of scidb: large scale array storage, processing and analysis. In SIGMOD, pages 963--968. ACM, 2010.
[7]
M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya, W. F. Cody, R. Fagin, M. Flickner, A. W. Luniewski,W. Niblack, and D. Petkovic. Towards heterogeneous multimedia information systems: The Garlic approach. In Data Engineering: Distributed Object Management, pages 124--131. IEEE, 1995.
[8]
U. Cetintemel, J. Du, T. Kraska, S. Madden, D. Maier, J. Meehan, A. Pavlo, M. Stonebraker, E. Sutherland, and N. Tatbul. S-Store: A Streaming NewSQL System for Big Velocity Applications. PVLDB, 7(13), 2014.
[9]
S. Chawathe, H. G. Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources. In IPSJ, 1994.
[10]
A. Deshpande and J. M. Hellerstein. Decoupled query optimization for federated database systems. In ICDE, pages 716--727. IEEE, 2002.
[11]
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split query processing in polybase. SIGMOD, pages 1255--1266, 2013.
[12]
M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. Sigmod Record, 34(4):27--33, 2005.
[13]
D. Halperin, V. Teixeira de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, et al. Demonstration of the Myria big data management service. In SIGMOD. ACM, 2014.
[14]
R. Hull. Managing semantic heterogeneity in databases: a theoretical prospective. In PODS, pages 51--61. ACM, 1997.
[15]
J. Kepner, W. Arcand, W. Bergeron, N. Bliss, R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, and J. Kurz. Dynamic distributed dimensional data model (d4m) database and computation system. In ICASSP. IEEE, 2012.
[16]
J. LeFevre, J. Sankaranarayanan, H. Hacigümüs, J. Tatemura, N. Polyzotis, and M. J. Carey. MISO: souping up big data query processing with a multistore system. In SIGMOD, pages 1591--1602, 2014.
[17]
L. M. Mackinnon, D. H. Marwick, and M. H. Williams. A model for query decomposition and answer construction in heterogeneous distributed database systems. Journal of Intelligent Information Systems, 11(1):69--87, 1998.
[18]
M. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, G. Moody, T. Heldt, T. H. Kyaw, B. Moody, and R. G. Mark. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database. Critical Care Medicine, 39:952--960, 2011.
[19]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pages 23--34. ACM, 1979.
[20]
M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: a wide-area distributed database system. In The VLDB Journal, volume 5, pages 48--63. Springer, 1996.
[21]
M. Stonebraker and U. Cetintemel. ¿One Size Fits All': An Idea Whose time has come and gone. In ICDE, pages 2--11, 2005.
[22]
R. Taft, M. Vartak, N. R. Satish, N. Sundaram, S. Madden, and M. Stonebraker. Genbase: A complex analytics genomics benchmark. In SIGMOD, pages 177--188. ACM, 2014.
[23]
G. Wiederhold. Mediators in the architecture of future information systems. Computer, pages 38--49, 1992.

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
  • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
  • Show More Cited By
  1. The BigDAWG Polystore System

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 44, Issue 2
    June 2015
    56 pages
    ISSN:0163-5808
    DOI:10.1145/2814710
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2015
    Published in SIGMOD Volume 44, Issue 2

    Check for updates

    Qualifiers

    • Column

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 1-Jul-2024
    • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
    • (2024)FONT: A Flexible Polystore Evaluation Platform2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00435(5489-5492)Online publication date: 13-May-2024
    • (2024)Multi-model query languages: taming the variety of big dataDistributed and Parallel Databases10.1007/s10619-023-07433-142:1(31-71)Online publication date: 1-Mar-2024
    • (2024)SEREIA: document store exploration through keywordsKnowledge and Information Systems10.1007/s10115-024-02151-166:10(6101-6132)Online publication date: 10-Jun-2024
    • (2024)Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case StudiesBig Data Analytics and Knowledge Discovery10.1007/978-3-031-68323-7_21(253-268)Online publication date: 26-Aug-2024
    • (2024)Unified Models and Framework for Querying Distributed Data Across PolystoresResearch Challenges in Information Science10.1007/978-3-031-59465-6_1(3-18)Online publication date: 2-May-2024
    • (2023)Data Modeling in Big Data Systems Including Polystore and Heterogeneous Information Processing ComponentsAutomatic Control and Computer Sciences10.3103/S014641162308026657:8(1096-1102)Online publication date: 1-Dec-2023
    • (2023)60 Years of Databases (final part)PROBLEMS IN PROGRAMMING10.15407/pp2023.01.066(66-103)Online publication date: Jan-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media