research-article

Analyzing dynamic task-based applications on hybrid platforms: an agile scripting approach

Authors:

Vinícius Garcia Pinto,

Arnaud Legrand,

Lucas Mello Schnorr,

Samuel Thibault,

Vincent DanjeanAuthors Info & Claims

VPA '16: Proceedings of the 3rd International Workshop on Visual Performance Analysis

Pages 17 - 24

Published: 13 November 2016 Publication History

Abstract

In this paper, we present visual analysis techniques to evaluate the performance of HPC task-based applications on hybrid architectures. Our approach is based on composing modern data analysis tools (pjdump, R, ggplot2, plotly), enabling an agile and flexible scripting framework with minor development cost. We validate our proposal by analyzing traces from the full-fledged implementation of the Cholesky decomposition available in the MORSE library running on a hybrid (CPU/GPU) platform. The analysis compares two different workloads and three different task schedulers from the StarPU runtime system. Our analysis based on composite views allows to identify allocation mistakes, priority problems in scheduling decisions, GPU tasks anomalies causing bad performance, and critical path issues.

References

[1]

H. Topcuoglu, S. Hariri, and M.-Y. Wu, "Performance-effective and low complexity task scheduling for heterogeneous computing," IEEE Trans. Par. Distr. Syst., vol. 13, no. 3, pp. 260--274, 2002.

Digital Library

[2]

E. Schulte, D. Davison, T. Dye, and C. Dominik, "A multi-language computing environment for literate programming and reproducible research," J. of Stat. Soft., vol. 46, no. 3, 2012.

[3]

E. Agullo, G. Bosilca, B. Bramas, C. Castagnede, O. Coulaud, E. Darve, J. Dongarra, M. Faverge, N. Furmento, L. Giraud, X. Lacoste, J. Langou, H. Ltaief, M. Messner, R. Namyst, P. Ramet, T. Takahashi, S. Thibault, S. Tomov, and I. Yamazaki, "Poster: Matrices over runtime systems at exascale," in High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, Nov 2012, pp. 1332--1332.

Digital Library

[4]

C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, "StarPU: a unified platform for task scheduling on heterogeneous multicore architectures," Conc. and Comp.: Pract. and Exp., vol. 23, no. 2, 2011.

Digital Library

[5]

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, "Numerical linear algebra on emerging architectures: The plasma and magma projects," Journal of Physics: Conference Series, vol. 180, no. 1, 2009.

[6]

A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas, "OmpSs: a proposal for programming heterogeneous multi-core architectures," Par. Proc. Letters, vol. 21, no. 02, 2011.

[7]

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra, "DAGuE: A generic distributed DAG engine for high performance computing," Parallel Computing, vol. 38, no. 1--2, 2012.

Digital Library

[8]

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, "StarPU-MPI: Task programming over clusters of machines enhanced with accelerators," in Proc. European Conf. Recent Advances in the Message Passing Interface (EuroMPI). Springer, 2012, pp. 298--299.

Digital Library

[9]

S. Ohshima, S. Katagiri, K. Nakajima, S. Thibault, and R. Namyst, "Implementation of FEM Application on GPU with StarPU," in SIAM Conference on Computational Science and Engineering, 2013.

[10]

V. Martínez, D. Michéa, F. Dupros, O. Aumage, S. Thibault, H. Aochi, and P. O. A. Navaux, "Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system," in Intl. Symp. on Comp. Arch. and High Perf. Comp. (SBAC-PAD). IEEE, Oct. 2015.

Digital Library

[11]

E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, "Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures," Inria Bordeaux, Research Report 8912, May 2016.

[12]

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, "Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes," in HCW'2014 workshop of IPDPS. IEEE, 2014, pp. 29--38.

Digital Library

[13]

R. L. Graham, "Bounds for certain multiprocessing anomalies," Bell System Technical Journal, vol. 45, no. 9, pp. 1563--1581, 1966.

[14]

K. Coulomb, M. Faverge, J. Jazeix, O. Lagrasse, J. Marcoueille, P. Noisette, A. Redondy, and C. Vuchener, "Visual trace explorer ViTE."

[15]

L. M. Schnorr, M. Faverge, F. Trahay, B. O. Stein, and J. C. de Kergommeaux, "The Paje trace file format," UFRGS, Tech. Rep., 2016.

[16]

V. Pillet, J. Labarta, T. Cortes, and S. Girona, "Paraver: A tool to visualize and analyze parallel code," in Proceedings of WoTUG-18: Transputer and occam Developments, vol. 44, 1995, pp. 17--31.

[17]

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel, "The Vampir performance analysis toolset," in Tools for High Perf. Comp. Springer, 2008, pp. 139--155.

[18]

A. Huynh, D. Thain, M. Pericàs, and K. Taura, "DAGViz: A DAG Visualization Tool for Analyzing Task-parallel Program Traces," in Proc. 2nd Workshop Visual Perf. Analysis (VPA'15). ACM, 2015, pp. 3:1--3:8.

Digital Library

[19]

R. Keller, S. Brinkmann, J. Gracia, and C. Niethammer, Temanejo: Debugging of Thread-Based Task-Parallel Programs in StarSS. Springer, 2012, pp. 131--137.

[20]

B. Haugen, S. Richmond, J. Kurzak, C. A. Steed, and J. Dongarra, "Visualizing execution traces with task dependencies," in Proc. 2nd Workshop Visual Perf. Analysis (VPA'15). ACM, 2015, pp. 2:1--2:8.

Digital Library

[21]

L. M. Schnorr and A. Legrand, "Visualizing more performance data than what fits on your screen," in Tools for High Perf. Comp. Springer, 2013, pp. 149--162.

[22]

G. Pagano and V. Marangozova-Martin, "FrameSoC Workbench: Facilitating Trace Analysis through a Consistent User Interface," Inria, Technical Report RT-0447, Apr. 2014.

[23]

V. Danjean, R. Namyst, and P.-A. Wacrenier, "An efficient multi-level trace toolkit for multi-threaded applications," in European Conference on Parallel Processing. Springer, 2005, pp. 166--175.

Digital Library

[24]

E. Agullo, O. Beaumont, L. Eyraud-Dubois, J. Herrmann, S. Kumar, L. Marchal, and S. Thibault, "Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms," in Heterogeneity in Computing Workshop 2015, Hyderabad, India, 2015.

Digital Library

[25]

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, "Multifrontal QR factorization for multicore architectures over runtime systems," in European Conf. on Parallel Processing. Springer, 2013, pp. 521--532.

Digital Library

Analyzing dynamic task-based applications on hybrid platforms: an agile scripting approach
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Trigeneous Platforms for Energy Efficient Computing of HPC Applications
HIPC '15: Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

In this paper, we present two novel real-time heterogeneous platforms with three kinds of devices (CPU, GPU, FPGA), i.e. trigeneous platforms, for efficiently accelerating computation intensive applications in both the high-performance computing and the ...
Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms

Considering the prevalent usage of multimedia applications on commodity computers equipped with both CPU and GPU devices, the possibility of simultaneously exploiting all parallelization capabilities of such hybrid platforms for high performance video ...
Performance and energy effects on task-based parallelized applications

Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a huge challenge due to the ever-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VPA '16: Proceedings of the 3rd International Workshop on Visual Performance Analysis

November 2016

34 pages

ISBN:9781509052264

Conference Chairs:
Peer-Timo Bremer
Lawrence Livermore National Laboratory
,
Judit Gimenez
Barcelona Supercomputing Center
,
Joshua A. Levine
University of Arizona
,
Martin Schulz
Lawrence Livermore National Laboratory

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IEEE-CS\DATC: IEEE Computer Society

In-Cooperation

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Qualifiers

Research-article

Conference

SC16

Sponsor:

SIGHPC
IEEE-CS\DATC

SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 18, 2016

Utah, Salt Lake City

Acceptance Rates

Overall Acceptance Rate 5 of 6 submissions, 83%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

Media

Figures

Other

Tables

View Table of Contents