Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2502323.2502329acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Published: 23 September 2013 Publication History

Abstract

The current trend in high-performance computing is to use heterogeneous architectures (i.e. multi-core with accelerators such as GPUs or Xeon Phi) because they offer very good performance over energy consumption ratios. Programming these architectures is notoriously hard, hence their use is still somewhat restricted to parallel programming experts. The situation is improving with frameworks using high-level programming models to generate efficient computation kernels for these new accelerator architectures. However, an orthogonal issue is to efficiently manage memory and kernel scheduling especially on architectures containing multiple accelerators. Task graph based runtime systems have been a first step toward efficiently automatizing these tasks. However they introduce new challenges of their own such as task granularity adaptation that cannot be easily automatized.
In this paper, we present a programming model and a preliminary implementation of a runtime system called ViperVM that takes advantage of parallel functional programming to extend task graph based runtime systems. The main idea is to substitute dynamically created task graphs with pure functional programs that are evaluated in parallel by the runtime system. Programmers can associate kernels (written in OpenCL, CUDA, Fortran...) to identifiers that can then be used as pure functions in programs. During parallel evaluation, the runtime system automatically schedules kernels on available accelerators when it has to reduce one of these identifiers. An extension of this mechanism consists in associating both a kernel and a functional expression to the same identifier and to let the runtime system decide either to execute the kernel or to evaluate the expression. We show that this mechanism can be used to perform dynamic granularity adaptation.

References

[1]
E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst, S. Thibault, and S. Tomov. Faster, Cheaper, Better a Hybridization Methodology to Develop Linear Algebra Software for. 2010.
[2]
M. Amini, B. Creusillet, S. Even, R. Keryell, O. Goubier, S. Guelton, J. McMahon, F. Pasquier, G. Péan, and P. Villalon. Par4all: From convex array regions to heterogeneous computing. In 2nd International Workshop on Polyhedral Compilation Techniques, Impact (Jan 2012), 2012.
[3]
J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. Petabricks: A language and compiler for algorithmic choice. In ACM SIGPLAN Conference on Programming Language Design and Implementation, Dublin, Ireland, Jun 2009.
[4]
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures. In Proceedings of the 15th International Euro-Par Conference, Lecture Notes in Computer Science, volume 5704 of Lecture Notes in Computer Science, pages 863--874, Delft, The Netherlands, Aug. 2009. Springer.
[5]
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2010.
[6]
E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo, and E. Quintana-Ortí. An extension of the starss programming model for platforms with multiple gpus. Euro-Par 2009 Parallel Processing, pages 851--862, 2009.
[7]
O. Beaumont, L. Marchal, Y. Robert, and L. de l'informatique du paralllisme. Scheduling divisible loads with return messages on hetero-geneous master-worker platforms. 2005.
[8]
R. Bird et al. Lectures on constructive functional programming. Oxford University Computing Laboratory, Programming Research Group, 1988.
[9]
L. F. Bittencourt, R. Sakellariou, and E. R. Madeira. Dag scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on, pages 27--34. IEEE, 2010.
[10]
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. Dague: A generic distributed dag engine for high performance computing. Parallel Computing, 38(1):37--51, 2011.
[11]
M. Boyer, K. Skadron, S. Che, and N. Jayasena. Load balancing in a changing world: dealing with heterogeneity and performance variability. In Proceedings of the ACM International Conference on Computing Frontiers, page 21. ACM, 2013.
[12]
K. J. Brown, A. K. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 89--100. IEEE, 2011.
[13]
M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, pages 3--14. ACM, 2011.
[14]
I. Christadler and V. Weinberg. Rapidmind: Portability across architectures and its limitations. In R. Keller, D. Kramer, and J.-P. Weiss, editors, Facing the Multicore-Challenge, volume 6310 of Lecture Notes in Computer Science, pages 4--15. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-16232-9.
[15]
M. Cosnard and M. Loi. Automatic task graph generation techniques. In System Sciences, 1995. Vol. II. Proceedings of the Twenty-Eighth Hawaii International Conference on, volume 2, pages 113--122. IEEE, 1995.
[16]
L. Courtés. C language extensions for hybrid cpu/gpu programming with starpu. arXiv preprint arXiv:1304.0878, 2013. URL http://arxiv.org/abs/1304.0878.
[17]
R. Dolbeau, S. Bihan, and F. Bodin. Hmpp: A hybrid multi-core parallel programming environment. 2007.
[18]
C. Elliott. Programming graphics processors functionally. In Proceedings of the 2004 ACM SIGPLAN workshop on Haskell, pages 45--56. ACM, 2004.
[19]
T. Gautier, J. V. F. Lima, N. Maillard, B. Raffin, et al. Xkaapi: A runtime system for data-flow task programming on hetero-geneous architectures. In 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2013. URL http://hal.inria.fr/hal-00799904/.
[20]
A. Ghuloum, E. Sprangle, J. Fang, G. Wu, and X. Zhou. Ct: A flexible parallel programming model for tera-scale architectures. Intel, 2007.
[21]
R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of Discrete Mathematics. v5, pages 287--326, 1977.
[22]
D. Grewe, Z. Wand, and M. F. O'Boyle. Portable mapping of data parallel programs to opencl for heterogeneous systems. In ACM/IEEE International Symposium on Code Generation and Optimization, Shenzen, China, Feb. 2013.
[23]
O. Group. The openacc application programming interface, 2011.
[24]
K. Hammond. Parallel functional programming: An introduction. In International Symposium on Parallel Symbolic Computation, Hagenberg/Linz, Austria, 09 1994. World Scientific.
[25]
S. Henry, A. Denis, and D. Barthou. Programmation unifie multi-acclrateur opencl. Techniques et Sciences Informatiques, 31(8-9-10): 1233--1249, 2012. . URL http://hal.inria.fr/hal-00772742.
[26]
J. Launchbury and S. L. P. Jones. State in haskell. Lisp and Symbolic Computation, 8(4):293--341, 1995. URL http://link.springer.com/article/10.1007/BF01018827.
[27]
R. Loogen, Y. Ortega-Mallén, and R. Peña-Marí. Parallel functional programming in eden. Journal of Functional Programming, 15(3): 431--476, 2005.
[28]
G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 67--78, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0252-4.
[29]
S. Marlow. Parallel and concurrent programming in Haskell. In V. Zsók, Z. Horváth, and R. Plasmeijer, editors, CEFP 2011, volume 7241 of LNCS, pages 339--401. 2012.
[30]
M. D. McCool and S. Du Toit. Metaprogramming GPUs with Sh. AK Peters Wellesley, 2004.
[31]
T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier. Optimising purely functional gpu programs. In ICFP, 2013.
[32]
I. Multicoreware. Gmac: Global memory for accelerator, tm: Task manager, 2011. http://www.multicorewareinc.com.
[33]
C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on, pages 224--235, 04 2011.
[34]
S. L. Peyton Jones and D. R. Lester. Implementing functional languages: a tutorial. Prentice-Hall, Inc., 1992.
[35]
J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with starss. International Journal of High Performance Computing Applications, 23(3):284--299, 2009.
[36]
R. Plasmeijer, M. Van Eekelen, and M. Plasmeijer. Functional programming and parallel graph rewriting, volume 857. Addison-wesley, 1993.
[37]
S. Ranaweera and D. P. Agrawal. A task duplication based scheduling algorithm for heterogeneous systems. In Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pages 445--450. IEEE, 2000.
[38]
M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of jade. ACM Trans. Program. Lang. Syst., 20(3):483--545, May 1998. ISSN 0164-0925.
[39]
P. Roe and A. Wendelborn. Implicit array copying: Prevention is better than cure, 1992.
[40]
R. Sakellariou and H. Zhao. A hybrid heuristic for dag scheduling on heterogeneous systems. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, page 111. IEEE, 2004.
[41]
E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli. Enabling task-level scheduling on heterogeneous platforms. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 84--93, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1233-2.
[42]
B. J. Svensson and R. Newton. Programming future parallel architectures with haskell and intel arbb. 2011.
[43]
B. J. Svensson and M. Sheeran. Parallel programming in haskell almost for free: an embedding of intel's array building blocks. In Proceedings of the 1st ACM SIGPLAN workshop on Functional high- performance computing, FHPC '12, pages 3--14, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1577-7.
[44]
J. Svensson, M. Sheeran, and K. Claessen. Obsidian: A domain specific embedded language for parallel programming of graphics processors. Implementation and Application of Functional Languages, pages 156--173, 2011.
[45]
D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program gpus for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 325--335, New York, NY, USA, 2006. ACM. ISBN 1-59593-451-0.
[46]
H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13(3):260--274, mar 2002. ISSN 1045-9219.
[47]
J. Windows. Automated parallelisation of code written in the bird-meertens formalism, 2003.

Cited By

View all
  • (2018)Exploiting high-performance heterogeneous hardware for Java programs using graalProceedings of the 15th International Conference on Managed Languages & Runtimes10.1145/3237009.3237016(1-13)Online publication date: 12-Sep-2018
  • (2017)Heterogeneous Managed Runtime SystemsACM SIGPLAN Notices10.1145/3140607.305076452:7(74-82)Online publication date: 8-Apr-2017
  • (2017)Heterogeneous Managed Runtime SystemsProceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3050748.3050764(74-82)Online publication date: 8-Apr-2017

Index Terms

  1. ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
      September 2013
      104 pages
      ISBN:9781450323819
      DOI:10.1145/2502323
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 September 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. heterogeneous architectures
      2. high-performance computing
      3. parallel functional programming

      Qualifiers

      • Research-article

      Conference

      ICFP'13
      Sponsor:

      Acceptance Rates

      FHPC '13 Paper Acceptance Rate 8 of 14 submissions, 57%;
      Overall Acceptance Rate 18 of 25 submissions, 72%

      Upcoming Conference

      ICFP '25
      ACM SIGPLAN International Conference on Functional Programming
      October 12 - 18, 2025
      Singapore , Singapore

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Exploiting high-performance heterogeneous hardware for Java programs using graalProceedings of the 15th International Conference on Managed Languages & Runtimes10.1145/3237009.3237016(1-13)Online publication date: 12-Sep-2018
      • (2017)Heterogeneous Managed Runtime SystemsACM SIGPLAN Notices10.1145/3140607.305076452:7(74-82)Online publication date: 8-Apr-2017
      • (2017)Heterogeneous Managed Runtime SystemsProceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3050748.3050764(74-82)Online publication date: 8-Apr-2017

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media