Spark Parameter Tuning via Trial-and-Error

Petridis, Panagiotis; Gounaris, Anastasios; Torres, Jordi

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1607.07348 (cs)

[Submitted on 25 Jul 2016]

Title:Spark Parameter Tuning via Trial-and-Error

Authors:Panagiotis Petridis, Anastasios Gounaris, Jordi Torres

View PDF

Abstract:Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. The default values allow developers to quickly deploy their applications but leave the question as to whether performance can be improved open. In this work, we investigate the impact of the most important of the tunable Spark parameters on the application performance and guide developers on how to proceed to changes to the default values. We conduct a series of experiments with known benchmarks on the MareNostrum petascale supercomputer to test the performance sensitivity. More importantly, we offer a trial-and-error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. We test our methodology in three case studies, where we manage to achieve speedups of more than 10 times.

Comments:	full version of paper accepted in the 2nd INNS Conference on Big Data 2016
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1607.07348 [cs.DC]
	(or arXiv:1607.07348v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1607.07348

Submission history

From: Anastasios Gounaris [view email]
[v1] Mon, 25 Jul 2016 16:28:14 UTC (547 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2016-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Panagiotis Petridis
Anastasios Gounaris
Jordi Torres

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Spark Parameter Tuning via Trial-and-Error

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Spark Parameter Tuning via Trial-and-Error

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators