Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-27562-4_29guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Low Precision Processing for High Order Stencil Computations

Published: 07 July 2019 Publication History

Abstract

Modern scientific workloads have demonstrated the inefficiency of using high precision formats. Moving to a lower bit format or even to a different number system can provide tremendous gains in terms of performance and energy efficiency. In this article, we explore the applicability of different number formats and exhaustively search for the appropriate bit width for 3D complex stencil kernels, which are one of the most widely used scientific kernels. Further, we demonstrate the achievable performance of these kernels on state-of-the-art hardware that includes CPU and FPGA, which is the only hardware supporting arbitrary fixed-point precision. Thus, this work fills the gap between current hardware capabilities and future systems for stencil-based scientific applications.

References

[1]
Anderson, E., et al.: LAPACK Users’ guide, vol. 9. Siam (1999)
[2]
Carmichael, Z., et al.: Deep positron: a deep neural network using the posit number system. arXiv preprint arXiv:1812.01762 (2018)
[3]
Chi, Y., Cong, J., Wei, P., Zhou, P.: SODA: stencil with optimized dataflow architecture. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE (2018)
[4]
Datta, K., et al.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 4. IEEE Press (2008)
[5]
Diamantopoulos, D., Giefers, H., Hagleitner, C.: ecTALK: energy efficient coherent transprecision accelerators–the bidirectional long short-term memory neural network case. In: 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), pp. 1–3. IEEE (2018)
[6]
Doms, G., Schättler, U.: The nonhydrostatic limited-area model LM (lokal-model) of the DWD. Part I. Scientific documentation. DWD, GB Forschung und Entwicklung (1999)
[7]
de Fine Licht J, Blott M, and Hoefler T Designing scalable FPGA architectures using high-level synthesis ACM SIGPLAN Not. 2018 53 1 403-404
[8]
Finnerty, A., Ratigner, H.: Reduce power and cost by converting from floating point to fixed point. In: WP491 (v1. 0) (2017)
[9]
Gustafson JL and Yonemoto IT Beating floating point at its own game: posit arithmetic Supercomput. Front. Innovations 2017 4 2 71-86
[10]
Gysi, T., Grosser, T., Hoefler, T.: Modesto: data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 177–186. ACM (2015)
[11]
Iwata, A., et al.: An artificial neural network accelerator using general purpose 24 bits floating point digital signal processors. In: IJCNN-89, vol. 2, pp. l71–175 (1989)
[12]
Klöwer, M., Düben, P.D., Palmer, T.N.: Posits as an alternative to floats for weather and climate models (2019)
[13]
Langroudi, S.H.F., Pandit, T., Kudithipudi, D.: Deep learning inference on embedded devices: fixed-point vs posit. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 19–23. IEEE (2018)
[14]
Nguyen, A., et al.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13. IEEE Computer Society (2010)
[15]
Parker, M.: Understanding peak floating-point performance claims. Technical White Paper WP-012220-1.0 (2014)
[16]
Sano K, Hatsuda Y, and Yamamoto S Multi-FPGA accelerator for scalable stencil computation with constant memory bandwidth IEEE Trans. Parallel Distrib. Syst. 2014 25 3 695-705
[17]
Singh, G., et al.: A review of near-memory computing architectures: opportunities and challenges. In: 2018 21st Euromicro Conference on Digital System Design (DSD), pp. 608–617. IEEE (2018)
[18]
Singh, G., et al.: NAPEL: near-memory computing application performance prediction via ensemble learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, pp. 27:1–27:6. ACM, New York (2019)
[19]
Waidyasooriya HM et al. OpenCL-based FPGA-platform for stencil computation and its optimization methodology IEEE Trans. Parallel Distrib. Syst. 2017 28 5 1390-1402
[20]
Xu J et al. Performance tuning and analysis for stencil-based applications on POWER8 processor ACM Trans. Archit. Code Optim. (TACO) 2018 15 4 41

Cited By

View all
  • (2022)Toward accelerated stencil computation by adapting tensor core unit on GPUProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532392(1-12)Online publication date: 28-Jun-2022

Index Terms

  1. Low Precision Processing for High Order Stencil Computations
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Embedded Computer Systems: Architectures, Modeling, and Simulation: 19th International Conference, SAMOS 2019, Samos, Greece, July 7–11, 2019, Proceedings
    Jul 2019
    485 pages
    ISBN:978-3-030-27561-7
    DOI:10.1007/978-3-030-27562-4

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 07 July 2019

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Toward accelerated stencil computation by adapting tensor core unit on GPUProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532392(1-12)Online publication date: 28-Jun-2022

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media