Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3291168.3291187acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

Dynamic query re-planning using QOOP

Published: 08 October 2018 Publication History

Abstract

Modern data processing clusters are highly dynamic - both in terms of the number of concurrently running jobs and their resource usage. To improve job performance, recent works have focused on optimizing the cluster scheduler and the jobs' query planner with a focus on picking the right query execution plan (QEP) - represented as a directed acyclic graph - for a job in a resource-aware manner, and scheduling jobs in a QEP-aware manner. However, because existing solutions use a fixed QEP throughout the entire execution, the inability to adapt a QEP in reaction to resource changes often leads to large performance inefficiencies.
This paper argues for dynamic query re-planning, wherein we re-evaluate and re-plan a job's QEP during its execution. We show that designing for re-planning requires fundamental changes to the interfaces between key layers of data analytics stacks today, i.e., the query planner, the execution engine, and the cluster scheduler. Instead of pushing more complexity into the scheduler or the query planner, we argue for a redistribution of responsibilities between the three components to simplify their designs. Under this redesign, we analytically show that a greedy algorithm for re-planning and execution alongside a simple max-min fair scheduler can offer provably competitive behavior even under adversarial resource changes. We prototype our algorithms atop Apache Hive and Tez. Via extensive experiments, we show that our design can offer a median performance improvement of 1:47× compared to state-of-the-art alternatives.

References

[1]
Amazon EC2. http://aws.amazon.com/ec2.
[2]
Amazon Simple Storage Service. http://aws.amazon.com/s3.
[3]
Apache Calcite. http://calcite.apache.org/.
[4]
Apache Hadoop. http://hadoop.apache.org.
[5]
Apache Hive. http://hive.apache.org.
[6]
Apache Mesos 2016 Survey Report Highlights. https://goo.gl/R6a1z2.
[7]
Apache Tez. http://tez.apache.org.
[8]
Google Cluster Traces. https://github.com/google/cluster-data.
[9]
Hadoop Private Cluster Size Statistics. https://wiki.apache.org/hadoop/PoweredBy.
[10]
Presto. https://prestodb.io.
[11]
Storm: Distributed and fault-tolerant realtime computation. http://storm-project.net.
[12]
TPC Benchmark DS (TPC-DS). http://www.tpc.org/tpcds.
[13]
Trident: Stateful stream processing on Storm. http://storm.apache.org/documentation/Trident-tutorial.html.
[14]
YARN Fair Scheduler. http://goo.gl/w5edEQ.
[15]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.
[16]
T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-tolerant stream processing at Internet scale. VLDB, 2013.
[17]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012.
[18]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010.
[19]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational data processing in Spark. In SIGMOD, 2015.
[20]
A. A. Bhattacharya, D. Culler, E. Friedman, A. Ghodsi, S. Shenker, and I. Stoica. Hierarchical scheduling for diverse datacenter workloads. In SoCC, 2013.
[21]
P. Bodik, I. Menache, M. Chowdhury, P. Mani, D. Maltz, and I. Stoica. Surviving failures in bandwidth-constrained datacenters. In SIGCOMM, 2012.
[22]
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and efficient parallel processing of massive datasets. In VLDB, 2008.
[23]
M. Chowdhury and I. Stoica. Efficient coflow scheduling without prior knowledge. In SIGCOMM, 2015.
[24]
M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011.
[25]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004.
[26]
A. Deshpande, Z. Ives, and V. Raman. Adaptive query processing. Foundations and Trends in Databases, 1(1):1-140, 2007.
[27]
M. R. Garey, D. S. Johnson, and R. Sethi. The complexity of flowshop and jobshop scheduling. Mathematics of operations research, 1(2):117-129, 1976.
[28]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In SOSP, 2003.
[29]
A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica. Multiresource fair queueing for packet processing. SIGCOMM, 2012.
[30]
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair allocation of multiple resource types. In NSDI, 2011.
[31]
A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Choosy: Max-min fair sharing for datacenter jobs with constraints. In EuroSys, 2013.
[32]
A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on mapreduce. In ICDE, 2011.
[33]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, 2014.
[34]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. In SIGCOMM, 2014.
[35]
R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In OSDI, 2016.
[36]
R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. Graphene: Packing and dependency-aware scheduling for data-parallel clusters. In OSDI, 2016.
[37]
J. Hellerstein. Query optimization. In P. Bailis, J. M. Hellerstein, and M. Stonebraker, editors, Readings in Database Systems, chapter 7. 2017.
[38]
B. Huang and J. Yang. CÜmÜlÖn-d: Data analytics in a dynamic spot market. Proc. VLDB Endow., 10(8):865-876, Apr. 2017.
[39]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.
[40]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009.
[41]
J. M. Jaffe. Bottleneck flow control. IEEE Transactions on Communications, 29(7):954-962, 1981.
[42]
R. Jain, D.-M. Chiu, and W. Hawe. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Technical Report DEC-TR-301, Digital Equipment Corporation, 1984.
[43]
C. Joe-Wong, S. Sen, T. Lan, and M. Chiang. Multiresource allocation: Fairness-efficiency tradeoffs in a unifying framework. In INFOCOM, 2012.
[44]
T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan. MLbase: A distributed machine-learning system. In CIDR, 2013.
[45]
H. Liu. Cutting MapReduce cost with spot market. In HotCloud, 2011.
[46]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new framework for parallel machine learning. In UAI, 2010.
[47]
K. Mahajan, M. Chowdhury, A. Akella, and S. Chawla. Dynamic Query Re-planning using QOOP. Technical Report TR1855, University of Wisconsin-Madison, 2018.
[48]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.
[49]
X. Meng, J. K. Bradley, B. Yavuz, E. R. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in Apache Spark. CoRR, abs/1505.06807, 2015.
[50]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataow system. In SOSP, 2013.
[51]
Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, V. Bahl, and I. Stoica. Low latency geo-distributed data analytics. In SIGCOMM, 2015.
[52]
P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. SpotCheck: Designing a derivative IaaS cloud on the spot market. In EuroSys, 2015.
[53]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. In SoCC, 2013.
[54]
R. Viswanathan, G. Ananthanarayanan, and A. Akella. Clarinet: WAN-aware optimization for analytics queries. In OSDI, 2016.
[55]
A. Vulimiri, C. Curino, B. Godfrey, J. Padhye, and G. Varghese. Global analytics in the face of bandwidth and regulatory constraints. In NSDI, 2015.
[56]
X. Wu, D. Turner, C.-C. Chen, D. A. Maltz, X. Yang, L. Yuan, and M. Zhang. Netpilot: automating datacenter network failure mitigation. In SIGCOMM, 2012.
[57]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.
[58]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.
[59]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[60]
M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant stream computation at scale. In SOSP, 2013.
[61]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI, 2008.
[62]
L. Zheng, C. Joe-Wong, C. W. Tan, M. Chiang, and X. Wang. How to bid the cloud. In SIGCOMM, 2015.

Cited By

View all
  • (2022)JiffyProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3527539(697-713)Online publication date: 28-Mar-2022
  • (2020)SolProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388262(273-288)Online publication date: 25-Feb-2020
  • (2020)A system design for elastically scaling transaction processing engines in virtualized serversProceedings of the VLDB Endowment10.14778/3415478.341553613:12(3085-3098)Online publication date: 1-Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
OSDI'18: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation
October 2018
815 pages
ISBN:9781931971478

Sponsors

  • NetApp
  • Google Inc.
  • NSF
  • Microsoft: Microsoft
  • Facebook: Facebook

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 08 October 2018

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)JiffyProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3527539(697-713)Online publication date: 28-Mar-2022
  • (2020)SolProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388262(273-288)Online publication date: 25-Feb-2020
  • (2020)A system design for elastically scaling transaction processing engines in virtualized serversProceedings of the VLDB Endowment10.14778/3415478.341553613:12(3085-3098)Online publication date: 1-Aug-2020
  • (2019)Apache nemoProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358824(177-190)Online publication date: 10-Jul-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media