Article

Dynamic query re-planning using QOOP

Authors:

Kshiteej Mahajan,

Mosharaf Chowdhury,

Shuchi ChawlaAuthors Info & Claims

OSDI'18: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation

Pages 253 - 267

Published: 08 October 2018 Publication History

Abstract

Modern data processing clusters are highly dynamic - both in terms of the number of concurrently running jobs and their resource usage. To improve job performance, recent works have focused on optimizing the cluster scheduler and the jobs' query planner with a focus on picking the right query execution plan (QEP) - represented as a directed acyclic graph - for a job in a resource-aware manner, and scheduling jobs in a QEP-aware manner. However, because existing solutions use a fixed QEP throughout the entire execution, the inability to adapt a QEP in reaction to resource changes often leads to large performance inefficiencies.

This paper argues for dynamic query re-planning, wherein we re-evaluate and re-plan a job's QEP during its execution. We show that designing for re-planning requires fundamental changes to the interfaces between key layers of data analytics stacks today, i.e., the query planner, the execution engine, and the cluster scheduler. Instead of pushing more complexity into the scheduler or the query planner, we argue for a redistribution of responsibilities between the three components to simplify their designs. Under this redesign, we analytically show that a greedy algorithm for re-planning and execution alongside a simple max-min fair scheduler can offer provably competitive behavior even under adversarial resource changes. We prototype our algorithms atop Apache Hive and Tez. Via extensive experiments, we show that our design can offer a median performance improvement of 1:47× compared to state-of-the-art alternatives.

References

[1]

Amazon EC2. http://aws.amazon.com/ec2.

[2]

Amazon Simple Storage Service. http://aws.amazon.com/s3.

[3]

Apache Calcite. http://calcite.apache.org/.

[4]

Apache Hadoop. http://hadoop.apache.org.

[5]

Apache Hive. http://hive.apache.org.

[6]

Apache Mesos 2016 Survey Report Highlights. https://goo.gl/R6a1z2.

[7]

Apache Tez. http://tez.apache.org.

[8]

Google Cluster Traces. https://github.com/google/cluster-data.

[9]

Hadoop Private Cluster Size Statistics. https://wiki.apache.org/hadoop/PoweredBy.

[10]

Presto. https://prestodb.io.

[11]

Storm: Distributed and fault-tolerant realtime computation. http://storm-project.net.

[12]

TPC Benchmark DS (TPC-DS). http://www.tpc.org/tpcds.

[13]

Trident: Stateful stream processing on Storm. http://storm.apache.org/documentation/Trident-tutorial.html.

[14]

YARN Fair Scheduler. http://goo.gl/w5edEQ.

[15]

S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.

Digital Library

[16]

T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-tolerant stream processing at Internet scale. VLDB, 2013.

Digital Library

[17]

G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012.

Digital Library

[18]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010.

Digital Library

[19]

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015.

Digital Library

[20]

A. A. Bhattacharya, D. Culler, E. Friedman, A. Ghodsi, S. Shenker, and I. Stoica. Hierarchical scheduling for diverse datacenter workloads. In SoCC, 2013.

Digital Library

[21]

P. Bodik, I. Menache, M. Chowdhury, P. Mani, D. Maltz, and I. Stoica. Surviving failures in bandwidth-constrained datacenters. In SIGCOMM, 2012.

Digital Library

[22]

R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and efficient parallel processing of massive datasets. In VLDB, 2008.

Digital Library

[23]

M. Chowdhury and I. Stoica. Efficient coflow scheduling without prior knowledge. In SIGCOMM, 2015.

Digital Library

[24]

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011.

Digital Library

[25]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[26]

A. Deshpande, Z. Ives, and V. Raman. Adaptive query processing. Foundations and Trends in Databases, 1(1):1-140, 2007.

[27]

M. R. Garey, D. S. Johnson, and R. Sethi. The complexity of flowshop and jobshop scheduling. Mathematics of operations research, 1(2):117-129, 1976.

Digital Library

[28]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In SOSP, 2003.

Digital Library

[29]

A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica. Multiresource fair queueing for packet processing. SIGCOMM, 2012.

Digital Library

[30]

A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair allocation of multiple resource types. In NSDI, 2011.

Digital Library

[31]

A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Choosy: Max-min fair sharing for datacenter jobs with constraints. In EuroSys, 2013.

Digital Library

[32]

A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on mapreduce. In ICDE, 2011.

Digital Library

[33]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, 2014.

Digital Library

[34]

R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. In SIGCOMM, 2014.

Digital Library

[35]

R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In OSDI, 2016.

Digital Library

[36]

R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. Graphene: Packing and dependency-aware scheduling for data-parallel clusters. In OSDI, 2016.

Digital Library

[37]

J. Hellerstein. Query optimization. In P. Bailis, J. M. Hellerstein, and M. Stonebraker, editors, Readings in Database Systems, chapter 7. 2017.

[38]

B. Huang and J. Yang. CÜmÜlÖn-d: Data analytics in a dynamic spot market. Proc. VLDB Endow., 10(8):865-876, Apr. 2017.

Digital Library

[39]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.

Digital Library

[40]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009.

Digital Library

[41]

J. M. Jaffe. Bottleneck flow control. IEEE Transactions on Communications, 29(7):954-962, 1981.

[42]

R. Jain, D.-M. Chiu, and W. Hawe. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Technical Report DEC-TR-301, Digital Equipment Corporation, 1984.

[43]

C. Joe-Wong, S. Sen, T. Lan, and M. Chiang. Multiresource allocation: Fairness-efficiency tradeoffs in a unifying framework. In INFOCOM, 2012.

[44]

T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan. MLbase: A distributed machine-learning system. In CIDR, 2013.

[45]

H. Liu. Cutting MapReduce cost with spot market. In HotCloud, 2011.

Digital Library

[46]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new framework for parallel machine learning. In UAI, 2010.

Digital Library

[47]

K. Mahajan, M. Chowdhury, A. Akella, and S. Chawla. Dynamic Query Re-planning using QOOP. Technical Report TR1855, University of Wisconsin-Madison, 2018.

[48]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.

Digital Library

[49]

X. Meng, J. K. Bradley, B. Yavuz, E. R. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in Apache Spark. CoRR, abs/1505.06807, 2015.

[50]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataow system. In SOSP, 2013.

Digital Library

[51]

Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, V. Bahl, and I. Stoica. Low latency geo-distributed data analytics. In SIGCOMM, 2015.

Digital Library

[52]

P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. SpotCheck: Designing a derivative IaaS cloud on the spot market. In EuroSys, 2015.

Digital Library

[53]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. In SoCC, 2013.

Digital Library

[54]

R. Viswanathan, G. Ananthanarayanan, and A. Akella. Clarinet: WAN-aware optimization for analytics queries. In OSDI, 2016.

Digital Library

[55]

A. Vulimiri, C. Curino, B. Godfrey, J. Padhye, and G. Varghese. Global analytics in the face of bandwidth and regulatory constraints. In NSDI, 2015.

Digital Library

[56]

X. Wu, D. Turner, C.-C. Chen, D. A. Maltz, X. Yang, L. Yuan, and M. Zhang. Netpilot: automating datacenter network failure mitigation. In SIGCOMM, 2012.

Digital Library

[57]

Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.

Digital Library

[58]

M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.

Digital Library

[59]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.

Digital Library

[60]

M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant stream computation at scale. In SOSP, 2013.

Digital Library

[61]

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008.

Digital Library

[62]

L. Zheng, C. Joe-Wong, C. W. Tan, M. Chiang, and X. Wang. How to bid the cloud. In SIGCOMM, 2015.

Digital Library

Cited By

Khandelwal ATang YAgarwal RAkella AStoica IBromberg YKermarrec AKozyrakis C(2022)JiffyProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3527539(697-713)Online publication date: 28-Mar-2022
https://dl.acm.org/doi/10.1145/3492321.3527539
Lai FYou JZhu XMadhyastha HChowdhury MBhagwan RPorter G(2020)SolProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388262(273-288)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388262
Anadiotis AAppuswamy RAilamaki ABronshtein IAvni HDominguez-Sal DGoikhman SLevy E(2020)A system design for elastically scaling transaction processing engines in virtualized serversProceedings of the VLDB Endowment10.14778/3415478.341553613:12(3085-3098)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.14778/3415478.3415536
Show More Cited By

Recommendations

Generating query plans for distributed query processing using genetic algorithm
ICICA'11: Proceedings of the Second international conference on Information Computing and Applications

Query Processing is a key determinant in the overall performance of distributed databases. It requires processing of data at their respective sites and transmission of the same between them. These together constitute a distributed query processing ...
Continual planning and acting in dynamic multiagent environments

In order to behave intelligently, artificial agents must be able to deliberatively plan their future actions. Unfortunately, realistic agent environments are usually highly dynamic and only partially observable, which makes planning computationally ...
Cost-based query optimization via AI planning
AAAI'14: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

In this paper we revisit the problem of generating query plans using AI automated planning with a view to leveraging significant advances in state-of-the-art planning techniques. Our efforts focus on the specific problem of cost-based join-order ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

OSDI'18: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation

October 2018

815 pages

ISBN:9781931971478

Program Chairs:
Andrea Arpaci-Dusseau
University of Wisconsin-Madison
,
Geoff Voelker
University of California, San Diego

Sponsors

NetApp
Google Inc.
NSF
Microsoft: Microsoft
Facebook: Facebook

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

USENIX Association

United States

Publication History

Published: 08 October 2018

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khandelwal ATang YAgarwal RAkella AStoica IBromberg YKermarrec AKozyrakis C(2022)JiffyProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3527539(697-713)Online publication date: 28-Mar-2022
https://dl.acm.org/doi/10.1145/3492321.3527539
Lai FYou JZhu XMadhyastha HChowdhury MBhagwan RPorter G(2020)SolProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388262(273-288)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388262
Anadiotis AAppuswamy RAilamaki ABronshtein IAvni HDominguez-Sal DGoikhman SLevy E(2020)A system design for elastically scaling transaction processing engines in virtualized serversProceedings of the VLDB Endowment10.14778/3415478.341553613:12(3085-3098)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.14778/3415478.3415536
Yang YEo JKim GKim JLee SSeo JSong WChun BDan TDahlia M(2019)Apache nemoProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358824(177-190)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358824

View Options

View options

Media

Figures

Other

Tables

View Table of Contents