Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3019115.3019118acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Analyzing dynamic task-based applications on hybrid platforms: an agile scripting approach

Published: 13 November 2016 Publication History

Abstract

In this paper, we present visual analysis techniques to evaluate the performance of HPC task-based applications on hybrid architectures. Our approach is based on composing modern data analysis tools (pjdump, R, ggplot2, plotly), enabling an agile and flexible scripting framework with minor development cost. We validate our proposal by analyzing traces from the full-fledged implementation of the Cholesky decomposition available in the MORSE library running on a hybrid (CPU/GPU) platform. The analysis compares two different workloads and three different task schedulers from the StarPU runtime system. Our analysis based on composite views allows to identify allocation mistakes, priority problems in scheduling decisions, GPU tasks anomalies causing bad performance, and critical path issues.

References

[1]
H. Topcuoglu, S. Hariri, and M.-Y. Wu, "Performance-effective and low complexity task scheduling for heterogeneous computing," IEEE Trans. Par. Distr. Syst., vol. 13, no. 3, pp. 260--274, 2002.
[2]
E. Schulte, D. Davison, T. Dye, and C. Dominik, "A multi-language computing environment for literate programming and reproducible research," J. of Stat. Soft., vol. 46, no. 3, 2012.
[3]
E. Agullo, G. Bosilca, B. Bramas, C. Castagnede, O. Coulaud, E. Darve, J. Dongarra, M. Faverge, N. Furmento, L. Giraud, X. Lacoste, J. Langou, H. Ltaief, M. Messner, R. Namyst, P. Ramet, T. Takahashi, S. Thibault, S. Tomov, and I. Yamazaki, "Poster: Matrices over runtime systems at exascale," in High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, Nov 2012, pp. 1332--1332.
[4]
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, "StarPU: a unified platform for task scheduling on heterogeneous multicore architectures," Conc. and Comp.: Pract. and Exp., vol. 23, no. 2, 2011.
[5]
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, "Numerical linear algebra on emerging architectures: The plasma and magma projects," Journal of Physics: Conference Series, vol. 180, no. 1, 2009.
[6]
A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas, "OmpSs: a proposal for programming heterogeneous multi-core architectures," Par. Proc. Letters, vol. 21, no. 02, 2011.
[7]
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra, "DAGuE: A generic distributed DAG engine for high performance computing," Parallel Computing, vol. 38, no. 1--2, 2012.
[8]
C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, "StarPU-MPI: Task programming over clusters of machines enhanced with accelerators," in Proc. European Conf. Recent Advances in the Message Passing Interface (EuroMPI). Springer, 2012, pp. 298--299.
[9]
S. Ohshima, S. Katagiri, K. Nakajima, S. Thibault, and R. Namyst, "Implementation of FEM Application on GPU with StarPU," in SIAM Conference on Computational Science and Engineering, 2013.
[10]
V. Martínez, D. Michéa, F. Dupros, O. Aumage, S. Thibault, H. Aochi, and P. O. A. Navaux, "Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system," in Intl. Symp. on Comp. Arch. and High Perf. Comp. (SBAC-PAD). IEEE, Oct. 2015.
[11]
E. Agullo, L. Giraud, A. Guermouche, S. Nakov, and J. Roman, "Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures," Inria Bordeaux, Research Report 8912, May 2016.
[12]
X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, "Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes," in HCW'2014 workshop of IPDPS. IEEE, 2014, pp. 29--38.
[13]
R. L. Graham, "Bounds for certain multiprocessing anomalies," Bell System Technical Journal, vol. 45, no. 9, pp. 1563--1581, 1966.
[14]
K. Coulomb, M. Faverge, J. Jazeix, O. Lagrasse, J. Marcoueille, P. Noisette, A. Redondy, and C. Vuchener, "Visual trace explorer ViTE."
[15]
L. M. Schnorr, M. Faverge, F. Trahay, B. O. Stein, and J. C. de Kergommeaux, "The Paje trace file format," UFRGS, Tech. Rep., 2016.
[16]
V. Pillet, J. Labarta, T. Cortes, and S. Girona, "Paraver: A tool to visualize and analyze parallel code," in Proceedings of WoTUG-18: Transputer and occam Developments, vol. 44, 1995, pp. 17--31.
[17]
A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel, "The Vampir performance analysis toolset," in Tools for High Perf. Comp. Springer, 2008, pp. 139--155.
[18]
A. Huynh, D. Thain, M. Pericàs, and K. Taura, "DAGViz: A DAG Visualization Tool for Analyzing Task-parallel Program Traces," in Proc. 2nd Workshop Visual Perf. Analysis (VPA'15). ACM, 2015, pp. 3:1--3:8.
[19]
R. Keller, S. Brinkmann, J. Gracia, and C. Niethammer, Temanejo: Debugging of Thread-Based Task-Parallel Programs in StarSS. Springer, 2012, pp. 131--137.
[20]
B. Haugen, S. Richmond, J. Kurzak, C. A. Steed, and J. Dongarra, "Visualizing execution traces with task dependencies," in Proc. 2nd Workshop Visual Perf. Analysis (VPA'15). ACM, 2015, pp. 2:1--2:8.
[21]
L. M. Schnorr and A. Legrand, "Visualizing more performance data than what fits on your screen," in Tools for High Perf. Comp. Springer, 2013, pp. 149--162.
[22]
G. Pagano and V. Marangozova-Martin, "FrameSoC Workbench: Facilitating Trace Analysis through a Consistent User Interface," Inria, Technical Report RT-0447, Apr. 2014.
[23]
V. Danjean, R. Namyst, and P.-A. Wacrenier, "An efficient multi-level trace toolkit for multi-threaded applications," in European Conference on Parallel Processing. Springer, 2005, pp. 166--175.
[24]
E. Agullo, O. Beaumont, L. Eyraud-Dubois, J. Herrmann, S. Kumar, L. Marchal, and S. Thibault, "Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms," in Heterogeneity in Computing Workshop 2015, Hyderabad, India, 2015.
[25]
E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, "Multifrontal QR factorization for multicore architectures over runtime systems," in European Conf. on Parallel Processing. Springer, 2013, pp. 521--532.
  1. Analyzing dynamic task-based applications on hybrid platforms: an agile scripting approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    VPA '16: Proceedings of the 3rd International Workshop on Visual Performance Analysis
    November 2016
    34 pages
    ISBN:9781509052264

    Sponsors

    In-Cooperation

    Publisher

    IEEE Press

    Publication History

    Published: 13 November 2016

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 5 of 6 submissions, 83%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media