Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms

Published: 25 March 2015 Publication History

Abstract

One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SWEs), which is one of the most essential equation sets describing atmospheric dynamics. We first design a hybrid methodology that employs both the host CPU cores and the field-programmable gate array (FPGA) accelerators to work in parallel. Through a careful adjustment of the computational domains, we achieve a balanced resource utilization and a further improvement of the overall performance. By decomposing the resource-demanding SWE kernel, we manage to map the double-precision algorithm into three FPGAs. Moreover, by using fixed-point and reduced-precision floating point arithmetic, we manage to build a fully pipelined mixed-precision design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The mixed-precision design with four FPGAs running together can achieve a speedup of 20 over a fully optimized design on a CPU rack with two eight-core processorsand is 8 times faster than the fully optimized Kepler GPU design. As for power efficiency, the mixed-precision design with four FPGAs is 10 times more power efficient than a Tianhe-1A supercomputer node.

References

[1]
S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, B. Smith, and H. Zhang. 2013. PETSc Users Manual Revision 3.4.
[2]
J. G. Charney and A. Eliassen. 1949. A numerical method for predicting the perturbations of the middle latitude westerlies. Tellus 1, 2, 38--54.
[3]
G. C. T. Chow, A. H. T. Tse, Q. Jin, W. Luk, P. H. W. Leong, and D. B. Thomas. 2012. A mixed precision Monte Carlo methodology for reconfigurable accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 57--66.
[4]
H. Fu, W. Osborne, R. G. Clapp, O. Mencer, and W. Luk. 2009. Accelerating seismic computations using customized number representations on FPGAs. EURASIP Journal on Embedded Systems 2009, Article No. 3.
[5]
H. Fu and R. G. Clapp. 2011. Eliminating the memory bottleneck: An FPGA-based solution for 3d reverse time migration. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 65--74.
[6]
H. Fu, R. G. Clapp, O. Lindtjorn, T. Wei, and G. Yang. 2012. Revisiting finite difference and spectral migration methods on diverse parallel architectures. Computers and Geosciences 43, 187--196.
[7]
H. Fu, L. Gan, R. Clapp, H. Ruan, O. Pell, O. Mencer, M. Flynn, X. Huang, and G. Yang. 2013. Scaling the reverse time migration performance through reconfigurable data-flow engines. IEEE Micro 34, 1, 30--40.
[8]
L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang. 2013. Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 1--6.
[9]
S. Gottlieb, C. W. Shu, and E. Tadmor. 2001. Strong stability-preserving high-order time discretization methods. SIAM Review 43, 1, 89--112.
[10]
T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, and P. Madden. 2011. Experience applying Fortran GPU compilers to numerical weather prediction. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 34--41.
[11]
T. C. Johns, J. M. Gregory, W. J. Ingram, C. E. Johnson, A. Jones, J. A. Lowe, J. F. B. Mitchell, D. L. Roberts, D. M. H. Sexton, D. S. Stevenson, S. F. B. Tett, and M. J. Woodage. 2003. Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dynamics 20, 6, 583--612.
[12]
D. U. Lee, A. A. Gaffar, R. C. C. Cheung, O. Mencer, W. Luk, and G. A. Constantinides. 2006. Accuracy-guaranteed bit-width optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 10, 1990--2000.
[13]
O. Lindtjorn, R. G. Clapp, and M. J. Flynn. 2010. Surviving the end of scaling of traditional micro processors in HPC. In Proceedings of HOT CHIPS 22.
[14]
Maxeler. 2011. Maxeler Products. Retrieved February 25, 2015, from http://www.maxeler.com/products/.
[15]
J. Mielikainen, B. Huang, H. Huang, and M. D. Goldberg. 2012. GPU acceleration of the updated Goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5, 2, 555--562.
[16]
G. Mingas and C. Bouganis. 2012. A custom precision based architecture for accelerating parallel tempering MCMC on FPGAs without introducing sampling error. In Proceedings of the 20th Annual Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Los Alamitos, CA, 153--156.
[17]
D. Oriato, S. Tilbury, M. Marrocu, and G. Pusceddu. 2012. Acceleration of a meteorological limited area model with dataflow engines. In Proceedings of the Symposium on Application Accelerators in High Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 129--132.
[18]
O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Computing in Science and Engineering 14, 4, 98--103.
[19]
T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi. 2011. 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Computer Science 4, 1535--1544.
[20]
T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. 2010. An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE, Los Alamitos, CA, 1--11.
[21]
W. C. Skamarock, J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers. 2005. A Description of the Advanced Research WRF Version 2. Technical Report. DTIC Document.
[22]
M. C. Smith, J. S. Vetter, and X. Liang. 2005. Accelerating scientific applications with the SRC-6 reconfigurable computer: Methodologies and analysis. In Proceedings of the 19th IEEE International Parallel and Distributed Computing Symposium. IEEE, Los Alamitos, CA, 157b.
[23]
G. Strand. 2011. Community earth system model data management: Policies and challenges. Procedia Computer Science 4, 558--566.
[24]
A. H. T. Tse, D. B. Thomas, K. H. Tsoi, and W. Luk. 2010. Reconfigurable control variate Monte-Carlo designs for pricing exotic options. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 364--367.
[25]
F. Wilhelm. 2012. Parallel Preconditioners for an Ocean Model in Climate Simulations. Ph.D. Dissertation. Karlsruher Institut fu¨r Technologie, Karlsruhe, Germany.
[26]
D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber. 1992. A standard test set for numerical approximations to the shallow water equations in spherical geometry. Journal of Computational Physics 102, 1, 211--224.
[27]
C. Yang, W. Xue, H. Fu, L. Gan, L. Li, Y. Xu, Y. Lu, J. Sun, G. Yang, and W. Zheng. 2013. A peta-scalable CPU-GPU algorithm for global atmospheric simulations. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, NY, 1--12.

Cited By

View all
  • (2024)FPGAs as Hardware Accelerators in Data Centers: A Survey From the Data Centric Perspective2024 2nd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT)10.1109/DICCT61038.2024.10533053(1-6)Online publication date: 15-Mar-2024
  • (2020)High performance reconfigurable computing for numerical simulation and deep learningCCF Transactions on High Performance Computing10.1007/s42514-020-00032-xOnline publication date: 11-Jun-2020
  • (2019)Optimizing Finite Volume Method Solvers on Nvidia GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292608430:12(2790-2805)Online publication date: 1-Dec-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 8, Issue 2
Special Section on FPL 2013
April 2015
129 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2746532
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2015
Accepted: 01 March 2014
Revised: 01 March 2014
Received: 01 December 2013
Published in TRETS Volume 8, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Atmospheric equations
  2. FPGAs
  3. high performance computing
  4. hybrid algorithm
  5. mixed-precision design

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • HiPEAC NoE
  • Maxeler University Program
  • National High-Tech R&D(863) Program of China
  • UKEPSRC
  • Xilinx
  • European Union Seventh Framework Programme

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FPGAs as Hardware Accelerators in Data Centers: A Survey From the Data Centric Perspective2024 2nd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT)10.1109/DICCT61038.2024.10533053(1-6)Online publication date: 15-Mar-2024
  • (2020)High performance reconfigurable computing for numerical simulation and deep learningCCF Transactions on High Performance Computing10.1007/s42514-020-00032-xOnline publication date: 11-Jun-2020
  • (2019)Optimizing Finite Volume Method Solvers on Nvidia GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292608430:12(2790-2805)Online publication date: 1-Dec-2019
  • (2019)Theoretical Time Evolution of Numerical Errors When Using Floating Point Numbers in Shallow‐Water ModelsJournal of Advances in Modeling Earth Systems10.1029/2019MS00161511:10(3235-3250)Online publication date: 23-Oct-2019
  • (2019)Biomedical Images Processing Using Maxeler DataFlow EnginesExploring the DataFlow Supercomputing Paradigm10.1007/978-3-030-13803-5_7(197-227)Online publication date: 28-May-2019
  • (2019)Face Recognition Using Maxeler DataFlowExploring the DataFlow Supercomputing Paradigm10.1007/978-3-030-13803-5_6(171-196)Online publication date: 28-May-2019
  • (2017)Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark PlatformRemote Sensing10.3390/rs91213019:12(1301)Online publication date: 12-Dec-2017
  • (2017)Acceleration of Image Segmentation Algorithm for (Breast) Mammogram Images Using High-Performance Reconfigurable Dataflow ComputersComputational and Mathematical Methods in Medicine10.1155/2017/79092822017(1-11)Online publication date: 2017
  • (2017)Parallel compressive sampling matching pursuit algorithm for compressed sensing signal reconstruction with OpenCLJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2016.07.00272:C(51-60)Online publication date: 1-Jan-2017
  • (2017)Exploiting the chaotic behaviour of atmospheric models with reconfigurable architecturesComputer Physics Communications10.1016/j.cpc.2017.08.011221(160-173)Online publication date: Dec-2017
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media