Abstract
Subsurface simulations are computationally intensive tasks and require a considerable amount of time to solve. To gain computational speed in simulating subsurface scenarios, this paper investigates the use of graphical processing units (GPUs). Accelerators such as GPUs have a different architecture compared to the conventional central processing units, which necessitates a distinct approach. Various techniques that are well suited to deal with GPUs and simulating subsurface phenomena are explored with an emphasis on groundwater flow problems. Finite volume method with implicit time steps to solve groundwater scenarios is discussed and the associated challenges are highlighted. Krylov solvers used in large-scale systems along with preconditioners to improve convergence are described in detail. An appropriate solver preconditioner pair that provides speedup is identified to solve groundwater flow scenarios. The role of matrix storage formats is examined in a GPU environment, and recommendations that could further improve performance are made. This paper concludes by presenting simulation results that test the ideas explored on different generations of NVIDIA GPUs and the speedup attained in them.
Similar content being viewed by others
References
Keyes DE et al (2013) Multiphysics simulations Challenges and opportunities. Int J High Perform C 27(1):4–83
Barry DA (1990) Supercomputers and their use in modeling subsurface solute transport. Rev Geophys 28(3):277–295
Steefel CI, et al (2014) Reactive transport codes for subsurface environmental simulation. Computat Geosci 19(3):445–478
Steefel CI, DePaolo DJ, Lichtner PC (2005) Reactive transport modeling: an essential tool and a new research approach for the Earth sciences. Earth Planet Sci Lett 240(3):539–558
Xing H (2009) Advances in geocomputing. Springer, Berlin
Kirk DB, Wen-mei WH (2012) Programming massively parallel processors: a hands-on approach. Elsevier, San Francisco
Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high-performance programming. Elsevier, San Francisco
Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466
Bear J (1972) Dynamics of fluids in porous media. American Elsevier, New York
Anderson MP, Woessner WW (1992) Applied groundwater modeling: simulation of flow and advective transport. Academic Press, San Diego
Huyakorn PS, Pinder GF (1983) Computational methods in subsurface flow. Academic Press, San Diego
Bear J, Verruijt A (1987) Modeling groundwater flow and pollution. Kluwer Academic Publishers, Dordrecht
Jasak H (1996) Error analysis and estimation for the finite volume method with applications to fluid flows, PhD dissertation. Imperial College of Science, Technology and Medicine, London
Batu V (2005) Applied flow and solute transport modeling in aquifers: fundamental principles and analytical and numerical methods. CRC Press, Boca Raton
Aziz K, Settari A (1979) Petroleum Reservoir Simulation. Applied Science Publishers Ltd, London
Kelley CT (1987) Iterative methods for linear and nonlinear equations. SIAM, Philadelphia
Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia
Van der Vorst HA (2003). Iterative Krylov methods for large linear systems. Cambridge University Press, Cambridge
Barrett R, Berry MW, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods. SIAM, Philadelphia
Liesen J, Strakos Z (2012) Krylov subspace methods: principles and analysis. Oxford University Press, Oxford
Knoll DA, Keyes DE (2004) Jacobian-free Newton–Krylov methods: a survey of approaches and applications. J Comput Phys 193(2):357–397
Hoemmen M (2010) Communication-avoiding Krylov subspace methods, PhD dissertation. University of California, Berkeley
Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182(2):418–477
Chen TY (2001) Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations, PhD dissertation. University of California, Berkeley
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA 2(5). Nvidia Technical Report NVR-2008-004, Nvidia Corporation
Lichtner P, et al (2015) PFLOTRAN user manual: a massively parallel reactive flow and transport model for describing surface and subsurface processes. No. LA-UR-15-20403. Los Alamos National Laboratory (LANL)
Balay S, Abhyankar S, Adams M, Brown J, Brune P, Buschelman K, Eijkhout V, Gropp W, Kaushik D, Knepley M, McInnes LC (2014) Petsc users manual revision 3.5. Technical report. Argonne National Laboratory (ANL)
Smith B, Bjorstad P, Gropp W (2004) Domain decomposition: parallel multilevel methods for elliptic partial differential equations. Cambridge University Press, Cambridge
Hammond GE, Lichtner PC (2010) Field-scale model for the natural attenuation of uranium at the Hanford 300 Area using high-performance computing. Water Resour Res 46(9):1–31
Hammond GE, Lichtner PC, Mills RT (2014) Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN. Water Resour Res 50(1):208–228
Mualem Y (1976) A new model for predicting the hydraulic conductivity of unsaturated porous media. Water Resour Res 12(3):513–522
Van Genuchten MT (1980) A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci Soc Am J 44(5):892–898
Schaap MG, Van Genuchten MT (2006) A modified Mualem–van Genuchten formulation for improved description of the hydraulic conductivity near saturation. Vadose Zone J 5(1):27–34
Peiró J and Sherwin S (2005) Finite difference, finite element and finite volume methods for partial differential equations. Handbook of materials modeling. Springer Netherlands, pp 2415–2446
Pinder GF, Gray WG (1977) Finite element simulation in surface and subsurface hydrology. Academic Press, New York
Archer RA (2000) Computing flow and pressure transients in heterogeneous media using boundary element methods, PhD dissertation. Stanford University
Brebbia CA (1980) The boundary element method for engineers, 2nd edn. Pentech Press, London
Bajaj R (2009) An unstructured finite volume simulator for multiphase flow through fractured-porous media, MS dissertation. Massachusetts Institute of Technology
Schäfer M (2006) Computational engineering-introduction to numerical methods. Springer, New York
Dembo RS, Eisenstat SC, Steihaug T (1982) Inexact newton methods. SIAM J Numer Anal 19(2):400–408
Falgout RD (2006) An introduction to algebraic multigrid computing. Comput Sci Eng 8(6): 24–33
Wu CT, Elman HC (2006) Analysis and comparison of geometric and algebraic multigrid for convection-diffusion equations. SIAM J Sci Comput 28(6):2208–2228
Stüben K (2001) A review of algebraic multigrid. J Comput Appl Math 128(1):281–309
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies. IBM Reserach Report, RC24704 (W0812-047)
Anzt H, Tomov S, Luszczek P, Sawyer W, Dongarra J (2015) Acceleration of GPU-based Krylov solvers via data transfer reduction. Int J High Perform C 29(3):366–383
Mei X, Chu X (2015) Dissecting GPU memory hierarchy through microbenchmarking. arXiv preprint arXiv:1509.02308
Tomov S, Nath R, Du P, Dongarra J (2011) MAGMA users’ guide. ICL, UTK. http://icl.cs.utk.edu/projectsfiles/magma/doxygen
Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU computing Gems Jade Edition 1st edn. Morgan Kaufmann Publishers Inc. San Francisco
Dalton S and Bell N CUSP: A C++ templated sparse matrix library. http://cusplibrary.github.com
Yan S, Li C, Zhang Y, Zhou H (2014) yaSpMV: yet another SpMV framework on GPUs.ACM Sigplan Notices 49(8):107–118
Brezina M, Falgout R, MacLachlan S, Manteuffel T, McCormick S, Ruge J (2005) Adaptive smoothed aggregation (α SA) multigrid. SIAM Rev 47(2):317–346
Demidov D, Ahnert K, Rupp K, Gottschling P (2013) Programming CUDA and OpenCL: a case study using modern C++ libraries. SIAM J Sci Comput 35(5):C453–C472
Acknowledgements
The author expresses his gratitude towards his colleagues at Canadian Nuclear Laboratories (CNL) for their help and valuable inputs. The author would also like to thank the team members of PFLOTRAN and PETSc from Los Alamos National Laboratory (LANL) and Argonne National Laboratory (ANL), respectively, for the timely help they extended to clarify the doubts in integrating the codes for GPUs.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This research was funded by Atomic Energy of Canada Limited, under the Federal Nuclear Science and Technology Safety and Security Program, project 51200.02.09.
Conflict of interest
The author declares that he has no conflict of interest.
Appendix A: analysis of the flow scenario
Appendix A: analysis of the flow scenario
This section presents additional details of the flow simulation in the three Tesla cards involving different Krylov solvers and preconditioners. The following discussion will assist to clearly understand the simulation behavior described earlier.
Figure 13 illustrates the role of the TeslaK80 accelerator in reducing runtimes. The default BiCGStab solver and point-block ILU preconditioner are used to demonstrate the gain in speed. Table 4 shows the runtimes involving all three Tesla cards and the DIA format. This table compares the compute time associated with different solver and preconditioner pairs. The time >0.70E+06 s indicates a runtime of over a week. It is evident from Table 4 that GMRES/AMG provides the best performance in a GPU for the flow problem.
The number of linear and nonlinear iterations in GMRES/AMG, BiCGStab/AMG, and BiCGStab/ILU is reported in Table 5. The steps are in the range of 38–45, increasing with the problem size. Identical trends in the iteration behavior are observed in the remaining cards for the three pairs and their performances are not reported here to avoid repetition. The GMRES/ILU is the least promising pair exhibiting large iteration counts. For example, the linear and nonlinear iteration counts in the smallest 32 × 32 × 32 problem size were 604, 275, and 213, respectively.
A number of factors determine the efficiency of a solver that includes the distribution of its eigenvalues and the oscillations involved. An effective preconditioner has the capability to overcome such disadvantages. GMRES case in the present scenario provides a good example that demonstrates this effect. The restarted GMRES converges very slowly and even stagnates at times for large systems when used with ILU, but this is not the case when AMG preconditioner is used. A well-implemented preconditioner can play a crucial role to improve performance, provided that it adequately uses the parallelism offered by GPUs and also manages the constraints present. Figure 14 shows the variation in iteration number as the grid size is increased. Table 4 and Fig. 14 present the fact that even though the iteration counts are lower in BiCGStab/AMG compared to GMRES/AMG, the work performed per iteration is less in the latter. This is exhibited by the slightly reduced runtimes associated with GMRES/AMG in the present example.
Rights and permissions
About this article
Cite this article
Das, R. GPUs in subsurface simulation: an investigation. Engineering with Computers 33, 919–934 (2017). https://doi.org/10.1007/s00366-017-0506-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00366-017-0506-1