research-article

MISO: souping up big data query processing with a multistore system

Authors:

Jagan Sankaranarayanan,

Hakan Hacigumus,

Junichi Tatemura,

Neoklis Polyzotis,

Michael J. CareyAuthors Info & Claims

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 1591 - 1602

https://doi.org/10.1145/2588555.2588568

Published: 18 June 2014 Publication History

Abstract

Multistore systems utilize multiple distinct data stores such as Hadoop's HDFS and an RDBMS for query processing by allowing a query to access data and computation in both stores. Current approaches to multistore query processing fail to achieve the full potential benefits of utilizing both systems due to the high cost of data movement and loading between the stores. Tuning the physical design of a multistore, i.e., deciding what data resides in which store, can reduce the amount of data movement during query processing, which is crucial for good multistore performance. In this work, we provide what we believe to be the first method to tune the physical design of a multistore system, by focusing on which store to place data. Our method, called MISO for MultISstore Online tuning, is adaptive, lightweight, and works in an online fashion utilizing only the by-products of query processing, which we term as opportunistic views. We show that MISO significantly improves the performance of ad-hoc big data query processing by leveraging the specific characteristics of the individual stores while incurring little additional overhead on the stores.

References

[1]

A. Abouzeid, D. J. Abadi, and A. Silberschatz. Invisible loading: access-driven data transfer from raw files into database systems. In EDBT, 2013.

Digital Library

[2]

A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB, 2(1), 2009.

Digital Library

[3]

S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing data-parallel computing. In NSDI, 2012.

Digital Library

[4]

Apache Sqoop. http://sqoop.apache.org/, 2013.

[5]

N. Bruno and S. Chaudhuri. An online approach to physical design tuning. In ICDE, 2007.

[6]

S. Chaudhuri, M. Datar, and V. Narasayya. Index selection for databases: A hardness study and a principled heuristic solution. TKDE, 16(11), 2004.

Digital Library

[7]

S. Chaudhuri and V. Narasayya. An efficient, cost-driven index selection tool for Microsoft SQL Server. In VLDB, 1997.

Digital Library

[8]

M. P. Consens, K. Ioannidou, J. LeFevre, and N. Polyzotis. Divergent physical design tuning for replicated databases. In SIGMOD, 2012.

Digital Library

[9]

D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split query processing in Polybase. In SIGMOD, 2013.

Digital Library

[10]

I. Elghandour and A. Aboulnaga. ReStore: Reusing results of MapReduce jobs. PVLDB, 5(6), 2012.

Digital Library

[11]

H. Hacigümüs, J. Sankaranarayanan, J. Tatemura, J. LeFevre, and N. Polyzotis. Odyssey: A multi-store system for evolutionary analytics. PVLDB, 6(11), 2013.

Digital Library

[12]

S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.

[13]

S. LaValle, E. Lesser, R. Shockley, M. Hopkins, and N. Kruschwitz. Big data, analytics and the path from insights to value. MIT Sloan Management Review, 52(2), 2011.

[14]

J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, and N. Polyzotis. Towards a workload for evolutionary analytics. In SIGMOD Workshop on Data Analytics in the Cloud (DanaC), 2013. Extended version phCoRR abs/1304.1838.

Digital Library

[15]

J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, N. Polyzotis, and M. J. Carey. Opportunistic physical design for big data analytics. In SIGMOD, 2014.

Digital Library

[16]

T. Nykiel, M. Potamias, C. Mishra, G. Kollios, and N. Koudas. MRShare: sharing across multiple queries in MapReduce. PVLDB, 3(1--2), 2010.

Digital Library

[17]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009.

Digital Library

[18]

K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. On-line index selection for shifting workloads. In ICDE, 2007.

Digital Library

[19]

K. Schnaitter and N. Polyzotis. Semi-automatic index tuning: keeping DBAs in the loop. PVLDB, 5(5), 2012.

Digital Library

[20]

K. Schnaitter, N. Polyzotis, and L. Getoor. Index interactions in physical design tuning: modeling, analysis, and applications. PVLDB, 2(1), 2009.

Digital Library

[21]

A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal. QoX-driven ETL design: Reducing the cost of ETL consulting engagements. In SIGMOD, 2009.

Digital Library

[22]

A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal. Optimizing analytic data flows for multiple execution engines. In SIGMOD, 2012.

Digital Library

[23]

A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. TODS, 35(1), 2010.

Digital Library

[24]

G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley. DB2 advisor: An optimizer smart enough to recommend its own indexes. In ICDE, 2000.

[25]

Y. Xu, P. Kostamaa, and L. Gao. Integrating Hadoop and parallel DBMS. In SIGMOD, 2010.

Digital Library

Cited By

Gilman EBugiotti FKhalid AMehmood HKostakos PTuovinen LYlipulli JSu XFerreira D(2024)Addressing Data Challenges to Drive the Transformation of Smart CitiesACM Transactions on Intelligent Systems and Technology10.1145/3663482Online publication date: 3-May-2024
https://doi.org/10.1145/3663482
Gu RZhang YYin LSong LHuang WYuan CWang ZZhu GHuang Y(2023)Coral: federated query join order optimization based on deep reinforcement learningWorld Wide Web10.1007/s11280-023-01156-026:5(3093-3118)Online publication date: 12-Jun-2023
https://doi.org/10.1007/s11280-023-01156-0
Li XYuan ZGuan YSun GZhang TWei RNiu D(2022)Flatfish: A Reinforcement Learning Approach for Application-Aware Address MappingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.314620441:11(4758-4770)Online publication date: Nov-2022
https://doi.org/10.1109/TCAD.2022.3146204
Show More Cited By

Index Terms

MISO: souping up big data query processing with a multistore system
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop
ICICA '14: Proceedings of the 2014 International Conference on Intelligent Computing Applications

Hadoop is a quickly budding ecosystem of components based on Google's MapReduce algorithm and file system work for implementing MapReduce algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process ...
Building the Enterprise Fabric for Big Data with Vertica and Spark Integration
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Enterprise customers increasingly require greater flexibility in the way they access and process their Big Data while at the same time they continue to request advanced analytics and access to diverse data sources. Yet customers also still require the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

June 2014

1645 pages

ISBN:9781450323765

DOI:10.1145/2588555

General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS'14

Sponsor:

SIGMOD

SIGMOD/PODS'14: International Conference on Management of Data

June 22 - 27, 2014

Utah, Snowbird, USA

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

108
Total Citations
View Citations
1,061
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gilman EBugiotti FKhalid AMehmood HKostakos PTuovinen LYlipulli JSu XFerreira D(2024)Addressing Data Challenges to Drive the Transformation of Smart CitiesACM Transactions on Intelligent Systems and Technology10.1145/3663482Online publication date: 3-May-2024
https://doi.org/10.1145/3663482
Gu RZhang YYin LSong LHuang WYuan CWang ZZhu GHuang Y(2023)Coral: federated query join order optimization based on deep reinforcement learningWorld Wide Web10.1007/s11280-023-01156-026:5(3093-3118)Online publication date: 12-Jun-2023
https://doi.org/10.1007/s11280-023-01156-0
Li XYuan ZGuan YSun GZhang TWei RNiu D(2022)Flatfish: A Reinforcement Learning Approach for Application-Aware Address MappingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.314620441:11(4758-4770)Online publication date: Nov-2022
https://doi.org/10.1109/TCAD.2022.3146204
Qi ZWang HZhang H(2022)A Dual-Store Structure for Knowledge Graphs (Extended Abstract)2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00138(1523-1524)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00138
R RR S(2022)A survey on Automatic Query Optimization Approaches in Multi Store Systems for Big Data Analytics2022 2nd Asian Conference on Innovation in Technology (ASIANCON)10.1109/ASIANCON55314.2022.9909466(1-5)Online publication date: 26-Aug-2022
https://doi.org/10.1109/ASIANCON55314.2022.9909466
Yin QWang JDu SLeng JLi JHong YZhang FChai YZhang XZhao XLi MXiao SLu W(2022)An Adaptive Elastic Multi-model Big Data Analysis and Information Extraction SystemData Science and Engineering10.1007/s41019-022-00196-27:4(328-338)Online publication date: 12-Oct-2022
https://doi.org/10.1007/s41019-022-00196-2
Lei CQuamar AEfthymiou VÖzcan FAlotaibi R(2022)HERMES: data placement and schema optimization for enterprise knowledge basesThe VLDB Journal10.1007/s00778-022-00756-y32:3(549-574)Online publication date: 26-Jul-2022
https://doi.org/10.1007/s00778-022-00756-y
Qi ZWang HZhang H(2021)A Dual-Store Structure for Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3093200(1-1)Online publication date: 2021
https://doi.org/10.1109/TKDE.2021.3093200
Guo QLv ZLi QLi J(2021)XData: A General-purpose Unified Processing System for Data Analysis and Machine Learning2021 IEEE 4th International Conference on Big Data and Artificial Intelligence (BDAI)10.1109/BDAI52447.2021.9515263(26-31)Online publication date: 2-Jul-2021
https://doi.org/10.1109/BDAI52447.2021.9515263
Martin BDavis K(2021)Multi-Temperate Logical Data Warehouse Design for Large-Scale Healthcare DataBig Data Research10.1016/j.bdr.2021.100255(100255)Online publication date: Aug-2021
https://doi.org/10.1016/j.bdr.2021.100255
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten