research-article

Financial software on GPUs: between Haskell and Fortran

Authors:

Fritz HengleinAuthors Info & Claims

FHPC '12: Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing

Pages 61 - 72

https://doi.org/10.1145/2364474.2364484

Published: 15 September 2012 Publication History

Get Access

Abstract

This paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by map-reduce-scan functional combinators. Efficient execution is achieved by manually; applying a series of generally-applicable compiler transformations that allows the generated-OpenCL code to yield speedups as high as 70x and 540x on a commodity mobile and desktop GPU, respectively.

Apart from the concrete speed-ups attained, our contributions are twofold: First, from a language perspective;, we illustrate that even state-of-the-art auto-parallelization techniques are incapable of discovering all the requisite data parallelism when rendering the functional code in Fortran-style imperative array processing form. Second, from a performance perspective;, we study which compiler transformations are necessary to map the high-level functional code to hand-optimized OpenCL code for GPU execution. We discover a rich optimization space with nontrivial trade-offs and cost models. Memory reuse in map-reduce patterns, strength reduction, branch divergence optimization, and memory access coalescing, exhibit significant impact individually. When combined, they enable essentially full utilization of all GPU cores.

Functional programming has played a crucial double role in our case study: Capturing the naturally data-parallel structure of the pricing algorithm in a transparent, reusable and entirely hardware-independent fashion; and supporting the correctness of the subsequent compiler transformations to a hardware-oriented target language by a rich class of universally valid equational properties. Given the observed difficulty of automatically parallelizing imperative sequential code and the inherent labor of porting hardware-oriented and -optimized programs, our case study suggests that functional programming technology can facilitate high-level; expression of leading-edge performant portable; high-performance systems for massively parallel hardware architectures.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. ISBN 1-55860-286-0.

Abstract

References

Cited By

Index Terms

Recommendations

A coordinated tiling and batching framework for efficient GEMM on GPUs

Effective resource management for enhancing performance of 2D and 3D stencils on GPUs

Optimized two-level parallelization for GPU accelerators using the polyhedral model

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations