Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ESTOCADA: towards scalable polystore systems

Published: 01 August 2020 Publication History

Abstract

Big data applications increasingly involve diverse datasets, conforming to different data models. Such datasets are routinely hosted in heterogeneous stores, each capable of handling one or a few data models, and each efficient for some, but not all, kinds of data processing. Systems capable of exploiting disparate data in this fashion are usually termed polystores. A current limitation of polystores is that applications are written taking into account which part of the data is stored in which store and how. This fails to take advantage of (i) possible redundancy, when the same data may be accessible (with different performance) from distinct data stores; (ii) previous query results (in the style of materialized views), which may be available in the stores.
We propose to demonstrate ESTOCADA [4], a novel approach that can be used in a polystore setting to transparently enable each query to benefit from the best combination of stored data and available processing capabilities. The system leverages recent advances in the area of view-based query rewriting under constraints, which we use to describe the various data models and stored data.

References

[1]
AsterixDB. https://asterixdb.apache.org/.
[2]
GDELT.https://www.gdeltproject.org/data.html.
[3]
D. Agrawal et al. RHEEM: enabling cross-platform data processing - may the big data be with you! PVLDB, 11(11):1414--1427, 2018.
[4]
R. Alotaibi et al. Towards scalable hybrid stores: Constraint-based rewriting to the rescue. In SIGMOD, 2019.
[5]
R. Bonaque et al. Mixed-instance querying: a lightweight integration architecture for data journalism. PVLDB, 9(13):1513--1516, 2016.
[6]
F. Bugiotti et al. Flexible hybrid stores: Constraint-based rewriting to the rescue. In ICDE, 2016.
[7]
A. Deutsch et al. MARS: A system for publishing XML from mixed and redundant storage. In Proc. of VLDB, pages 201--212, 2003.
[8]
J. Duggan et al. The BigDAWG polystore system. In SIGMOD, 2015.
[9]
A. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270--294, 2001.
[10]
I. Ileana et al. Complete yet practical search for minimal query reformulations under constraints. In SIGMOD, 2014.
[11]
A. Johnson et al. MIMIC-III. Available at: http://www.nature.com/articles/sdata201635, 2016.
[12]
I. Manolescu et al. Answering XML queries on heterogeneous data sources. In Proc. of VLDB, pages 241--250, 2001.
[13]
R. Taft et al. Genbase: A complex analytics genomics benchmark. In SIGMOD, 2014.

Cited By

View all
  • (2024)A Polystore Approach for Post-processing CAE Data ManagementProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671362(1208-1212)Online publication date: 26-Apr-2024
  • (2024)Generating Cross-model Analytics Workloads Using LLMsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679932(4303-4307)Online publication date: 21-Oct-2024
  • (2024)SEREIA: document store exploration through keywordsKnowledge and Information Systems10.1007/s10115-024-02151-166:10(6101-6132)Online publication date: 1-Oct-2024
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Polystore Approach for Post-processing CAE Data ManagementProceedings of the 5th International Conference on Computer Information and Big Data Applications10.1145/3671151.3671362(1208-1212)Online publication date: 26-Apr-2024
  • (2024)Generating Cross-model Analytics Workloads Using LLMsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679932(4303-4307)Online publication date: 21-Oct-2024
  • (2024)SEREIA: document store exploration through keywordsKnowledge and Information Systems10.1007/s10115-024-02151-166:10(6101-6132)Online publication date: 1-Oct-2024
  • (2022)Ontology-based Data FederationProceedings of the 11th International Joint Conference on Knowledge Graphs10.1145/3579051.3579070(10-19)Online publication date: 27-Oct-2022
  • (2021)HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457311(23-35)Online publication date: 9-Jun-2021
  • (2021)Breakthroughs on Cross-Cutting Data Management, Data Analytics, and Applied Data ScienceInformation Systems Frontiers10.1007/s10796-020-10091-823:1(1-7)Online publication date: 1-Feb-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media