Article

Low Precision Processing for High Order Stencil Computations

Authors:

Gagandeep Singh,

Dionysios Diamantopoulos,

Christoph Hagleitner,

Henk CorporaalAuthors Info & Claims

Embedded Computer Systems: Architectures, Modeling, and Simulation: 19th International Conference, SAMOS 2019, Samos, Greece, July 7–11, 2019, Proceedings

Pages 403 - 415

https://doi.org/10.1007/978-3-030-27562-4_29

Published: 07 July 2019 Publication History

Abstract

Modern scientific workloads have demonstrated the inefficiency of using high precision formats. Moving to a lower bit format or even to a different number system can provide tremendous gains in terms of performance and energy efficiency. In this article, we explore the applicability of different number formats and exhaustively search for the appropriate bit width for 3D complex stencil kernels, which are one of the most widely used scientific kernels. Further, we demonstrate the achievable performance of these kernels on state-of-the-art hardware that includes CPU and FPGA, which is the only hardware supporting arbitrary fixed-point precision. Thus, this work fills the gap between current hardware capabilities and future systems for stencil-based scientific applications.

References

[1]

Anderson, E., et al.: LAPACK Users’ guide, vol. 9. Siam (1999)

[2]

Carmichael, Z., et al.: Deep positron: a deep neural network using the posit number system. arXiv preprint arXiv:1812.01762 (2018)

[3]

Chi, Y., Cong, J., Wei, P., Zhou, P.: SODA: stencil with optimized dataflow architecture. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2018)

[4]

Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)

[5]

Diamantopoulos, D., Giefers, H., Hagleitner, C.: ecTALK: energy efficient coherent transprecision accelerators–the bidirectional long short-term memory neural network case. In: 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), pp. 1–3. IEEE (2018)

[6]

Doms, G., Schättler, U.: The nonhydrostatic limited-area model LM (lokal-model) of the DWD. Part I. Scientific documentation. DWD, GB Forschung und Entwicklung (1999)

[7]

de Fine Licht J, Blott M, and Hoefler T Designing scalable FPGA architectures using high-level synthesis ACM SIGPLAN Not. 2018 53 1 403-404

[8]

Finnerty, A., Ratigner, H.: Reduce power and cost by converting from floating point to fixed point. In: WP491 (v1. 0) (2017)

[9]

Gustafson JL and Yonemoto IT Beating floating point at its own game: posit arithmetic Supercomput. Front. Innovations 2017 4 2 71-86

[10]

Gysi, T., Grosser, T., Hoefler, T.: Modesto: data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 177–186. ACM (2015)

[11]

Iwata, A., et al.: An artificial neural network accelerator using general purpose 24 bits floating point digital signal processors. In: IJCNN-89, vol. 2, pp. l71–175 (1989)

[12]

Klöwer, M., Düben, P.D., Palmer, T.N.: Posits as an alternative to floats for weather and climate models (2019)

[13]

Langroudi, S.H.F., Pandit, T., Kudithipudi, D.: Deep learning inference on embedded devices: fixed-point vs posit. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 19–23. IEEE (2018)

[14]

Nguyen, A., et al.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. IEEE Computer Society (2010)

[15]

Parker, M.: Understanding peak floating-point performance claims. Technical White Paper WP-012220-1.0 (2014)

[16]

Sano K, Hatsuda Y, and Yamamoto S Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth IEEE Trans. Parallel Distrib. Syst. 2014 25 3 695-705

[17]

Singh, G., et al.: A review of near-memory computing architectures: opportunities and challenges. In: 2018 21st Euromicro Conference on Digital System Design (DSD), pp. 608–617. IEEE (2018)

[18]

Singh, G., et al.: NAPEL: near-memory computing application performance prediction via ensemble learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, pp. 27:1–27:6. ACM, New York (2019)

[19]

Waidyasooriya HM et al. OpenCL-based FPGA-platform for stencil computation and its optimization methodology IEEE Trans. Parallel Distrib. Syst. 2017 28 5 1390-1402

[20]

Xu J et al. Performance tuning and analysis for stencil-based applications on POWER8 processor ACM Trans. Archit. Code Optim. (TACO) 2018 15 4 41

Cited By

Liu XLiu YYang HLiao JLi MLuan ZQian DRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Toward accelerated stencil computation by adapting tensor core unit on GPUProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532392(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532392

Index Terms

Low Precision Processing for High Order Stencil Computations
1. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

Energy Efficient Stencil Computations on the Low-Power Manycore MPPA-256 Processor
Euro-Par 2018: Parallel Processing
Abstract
A new class of highly-parallel low-power manycore chips that cope with energy constraints have been unveiled. Sunway’s SW26010 and Kalray’s MPPA-256 are examples of them, featuring more than two hundred cores in a single low-power chip. Although ...
High-performance code generation for stencil computations on GPU architectures
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

Stencil computations arise in many scientific computing domains, and often represent time-critical portions of applications. There is significant interest in offloading these computations to high-performance devices such as GPU accelerators, but these ...
Automatic code generation for stencil computations on gpu architectures

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Embedded Computer Systems: Architectures, Modeling, and Simulation: 19th International Conference, SAMOS 2019, Samos, Greece, July 7–11, 2019, Proceedings

Jul 2019

485 pages

ISBN:978-3-030-27561-7

DOI:10.1007/978-3-030-27562-4

Editors:
Dionisios N. Pnevmatikatos
Technical University of Crete and ICS - FORTH, Chania, Greece
,
Maxime Pelcat
INSA Rennes, Rennes Cedex 7, France
,
Matthias Jung
Fraunhofer IESE, Kaiserslautern, Germany

© Springer Nature Switzerland AG 2019.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 July 2019

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XLiu YYang HLiao JLi MLuan ZQian DRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Toward accelerated stencil computation by adapting tensor core unit on GPUProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532392(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532392

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents