Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2588555.2588568acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

MISO: souping up big data query processing with a multistore system

Published: 18 June 2014 Publication History

Abstract

Multistore systems utilize multiple distinct data stores such as Hadoop's HDFS and an RDBMS for query processing by allowing a query to access data and computation in both stores. Current approaches to multistore query processing fail to achieve the full potential benefits of utilizing both systems due to the high cost of data movement and loading between the stores. Tuning the physical design of a multistore, i.e., deciding what data resides in which store, can reduce the amount of data movement during query processing, which is crucial for good multistore performance. In this work, we provide what we believe to be the first method to tune the physical design of a multistore system, by focusing on which store to place data. Our method, called MISO for MultISstore Online tuning, is adaptive, lightweight, and works in an online fashion utilizing only the by-products of query processing, which we term as opportunistic views. We show that MISO significantly improves the performance of ad-hoc big data query processing by leveraging the specific characteristics of the individual stores while incurring little additional overhead on the stores.

References

[1]
A. Abouzeid, D. J. Abadi, and A. Silberschatz. Invisible loading: access-driven data transfer from raw files into database systems. In EDBT, 2013.
[2]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB, 2(1), 2009.
[3]
S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing data-parallel computing. In NSDI, 2012.
[4]
Apache Sqoop. http://sqoop.apache.org/, 2013.
[5]
N. Bruno and S. Chaudhuri. An online approach to physical design tuning. In ICDE, 2007.
[6]
S. Chaudhuri, M. Datar, and V. Narasayya. Index selection for databases: A hardness study and a principled heuristic solution. TKDE, 16(11), 2004.
[7]
S. Chaudhuri and V. Narasayya. An efficient, cost-driven index selection tool for Microsoft SQL Server. In VLDB, 1997.
[8]
M. P. Consens, K. Ioannidou, J. LeFevre, and N. Polyzotis. Divergent physical design tuning for replicated databases. In SIGMOD, 2012.
[9]
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split query processing in Polybase. In SIGMOD, 2013.
[10]
I. Elghandour and A. Aboulnaga. ReStore: Reusing results of MapReduce jobs. PVLDB, 5(6), 2012.
[11]
H. Hacigümüs, J. Sankaranarayanan, J. Tatemura, J. LeFevre, and N. Polyzotis. Odyssey: A multi-store system for evolutionary analytics. PVLDB, 6(11), 2013.
[12]
S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.
[13]
S. LaValle, E. Lesser, R. Shockley, M. Hopkins, and N. Kruschwitz. Big data, analytics and the path from insights to value. MIT Sloan Management Review, 52(2), 2011.
[14]
J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, and N. Polyzotis. Towards a workload for evolutionary analytics. In SIGMOD Workshop on Data Analytics in the Cloud (DanaC), 2013. Extended version phCoRR abs/1304.1838.
[15]
J. LeFevre, J. Sankaranarayanan, H. Hacıgümüş, J. Tatemura, N. Polyzotis, and M. J. Carey. Opportunistic physical design for big data analytics. In SIGMOD, 2014.
[16]
T. Nykiel, M. Potamias, C. Mishra, G. Kollios, and N. Koudas. MRShare: sharing across multiple queries in MapReduce. PVLDB, 3(1--2), 2010.
[17]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009.
[18]
K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. On-line index selection for shifting workloads. In ICDE, 2007.
[19]
K. Schnaitter and N. Polyzotis. Semi-automatic index tuning: keeping DBAs in the loop. PVLDB, 5(5), 2012.
[20]
K. Schnaitter, N. Polyzotis, and L. Getoor. Index interactions in physical design tuning: modeling, analysis, and applications. PVLDB, 2(1), 2009.
[21]
A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal. QoX-driven ETL design: Reducing the cost of ETL consulting engagements. In SIGMOD, 2009.
[22]
A. Simitsis, K. Wilkinson, M. Castellanos, and U. Dayal. Optimizing analytic data flows for multiple execution engines. In SIGMOD, 2012.
[23]
A. A. Soror, U. F. Minhas, A. Aboulnaga, K. Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. TODS, 35(1), 2010.
[24]
G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley. DB2 advisor: An optimizer smart enough to recommend its own indexes. In ICDE, 2000.
[25]
Y. Xu, P. Kostamaa, and L. Gao. Integrating Hadoop and parallel DBMS. In SIGMOD, 2010.

Cited By

View all
  • (2024)Addressing Data Challenges to Drive the Transformation of Smart CitiesACM Transactions on Intelligent Systems and Technology10.1145/3663482Online publication date: 3-May-2024
  • (2023)Coral: federated query join order optimization based on deep reinforcement learningWorld Wide Web10.1007/s11280-023-01156-026:5(3093-3118)Online publication date: 12-Jun-2023
  • (2022)Flatfish: A Reinforcement Learning Approach for Application-Aware Address MappingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.314620441:11(4758-4770)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analytics
  2. big data
  3. database
  4. hadoop
  5. multistore
  6. physical design

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Addressing Data Challenges to Drive the Transformation of Smart CitiesACM Transactions on Intelligent Systems and Technology10.1145/3663482Online publication date: 3-May-2024
  • (2023)Coral: federated query join order optimization based on deep reinforcement learningWorld Wide Web10.1007/s11280-023-01156-026:5(3093-3118)Online publication date: 12-Jun-2023
  • (2022)Flatfish: A Reinforcement Learning Approach for Application-Aware Address MappingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.314620441:11(4758-4770)Online publication date: Nov-2022
  • (2022)A Dual-Store Structure for Knowledge Graphs (Extended Abstract)2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00138(1523-1524)Online publication date: May-2022
  • (2022)A survey on Automatic Query Optimization Approaches in Multi Store Systems for Big Data Analytics2022 2nd Asian Conference on Innovation in Technology (ASIANCON)10.1109/ASIANCON55314.2022.9909466(1-5)Online publication date: 26-Aug-2022
  • (2022)An Adaptive Elastic Multi-model Big Data Analysis and Information Extraction SystemData Science and Engineering10.1007/s41019-022-00196-27:4(328-338)Online publication date: 12-Oct-2022
  • (2022)HERMES: data placement and schema optimization for enterprise knowledge basesThe VLDB Journal10.1007/s00778-022-00756-y32:3(549-574)Online publication date: 26-Jul-2022
  • (2021)A Dual-Store Structure for Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3093200(1-1)Online publication date: 2021
  • (2021)XData: A General-purpose Unified Processing System for Data Analysis and Machine Learning2021 IEEE 4th International Conference on Big Data and Artificial Intelligence (BDAI)10.1109/BDAI52447.2021.9515263(26-31)Online publication date: 2-Jul-2021
  • (2021)Multi-Temperate Logical Data Warehouse Design for Large-Scale Healthcare DataBig Data Research10.1016/j.bdr.2021.100255(100255)Online publication date: Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media