Abstract
The fast multipole method (FMM) is commonly used to speed-up the time to solution of a wide diversity of N-body type problems. To use the FMM, the elements that constitute the geometry of the problem are clustered into groups of a given size (D) that may deeply vary the time to solution of the FMM. The optimal value of D, in the sense of minimizing the time cost, is unknown beforehand and it depends on several factors. Nevertheless, during the solver setup, it is possible to produce clusters of varying size D and to estimate the time to solution associated with each D. In this paper, we use octree structures to efficiently perform clustering and time cost estimation to find the optimal group size for the FMM implementations on a heterogeneous architecture. In addition, two different frameworks have been analyzed: single-level FMM and fast Fourier transform FMM (FMM-FFT). We found that the sensitivity of the time cost to the parameter D depends on factors, such as the problem size or the implementation of the FMM framework. Moreover, we observed that the time cost may be conspicuously reduced if a proper D is employed.
Similar content being viewed by others
Notes
4 cores at 3.6 GHz (Hyper-Threading enabled and Turbo Boost disabled).
1536 CUDA cores and 2 GB of GDDR5.
Consider that \(k_\mathrm{l} \propto \sqrt{N_\mathrm{g}}\), thus its drop is slower than \(N_\mathrm{g}\) (or \(Q\propto N_\mathrm{g}^{1.5}\)) rise.
References
Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73:325–348
Rokhlin V (1993) Diagonal Forms of translation operators for the Helmholtz equation in three dimensions. Appl Comput Harmonic Anal 1:82–93
Song J, Lu C-C, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1492
Waltz C, Sertel K, Carr MA, Usner BC, Volakis JL (2007) Massively parallel fast multipole method solutions of large electromagnetic scattering problems. Trans Antennas Propag 55(6):1810–1816
Taboada JM, Landesa L, Obelleiro F, Rodriguez JL, Bertolo JM, Araujo MG, Mourio JC, Gomez A (2009) High scalability FMM-FFT electromagnetic solver for supercomputer systems. IEEE Antennas Propag Mag 51(6):20–28
Dang V, Nguyen QM, Kilic O (2014) GPU cluster implementation of FMM-FFT for large-scale electromagnetic problems. IEEE Antennas Wirel Propag Lett 13:1259–1262
Scheneider S (2003) Application of fast methods for acoustic scattering and radiation problems. J Comput Acoust 11(3):387–401
López-Portugués M, López-Fernández JA, Ranilla J, Ayestarán RG, Las-Heras F (2013) Parallelization of the FMM on distributed-memory GPGPU. J Supercomput 64(1):17–27
Yoshida K, Nishimura N, Kobayashi S (2001) Application of new fast multipole boundary integral equation method to crack problems in 3D. Eng Anal Bound Elem 25:239–247
Coifman R, Rokhlin V, Wandzura S (1993) The fast multipole method for the wave equation: a pedestrian prescription. IEEE Antennas Propag Mag 35(3):7–12
Gumerov NA , Duraiswami R, Borovikov EA (2003) Data structures, optimal choice of parameters, and complexity results for generalized multilevel fast multipole methods in \(d\) dimensions. Institute for Advanced Computer Studies
López-Fernández JA, Portugués ML, Taboada JM, Rice HJ, Obelleiro F (2011) HP-FASS: a hybrid parallel fast acoustic scattering solver. Int J Comput Math 88(9):1960–1968
Song J, Chew W (1995) Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering. Microwave Opt Technol Lett 10(1):14–19
Wagner R, Song J, Chew W (1997) Monte Carlo simulation of electromagnetic scattering form two-dimensional random rough surfaces. IEEE Trans Antennas Propag 45(2):1810–1816
Araújo MG, Taboada JM, Obelleiro F, Bértolo JM, Landesa Luis, Rivero J, Rodríguez JL (2010) Supercomputer aware approach for the solution of challenging electromagnetic problems. Prog Electromagn Res 101:241–256
Burton AJ, Miller GF (1971) The application of integral equation methods to the numerical solution of some exterior boundary-value problems. Proc R Soc Lond 323(1553):201–210
Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869
López-Portugués M, López-Fernández Jesús A, Menéndez-Canal Jonatan, Rodríguez-Campa Alberto, Ranilla J (2012) Acoustic scattering solver based on single level FMM for multi-GPU systems. J Parallel Distrib Comput 72(9):1057–1064
The OpenMP ARB (2016) The OpenMP API specification for parallel programming. Available online at: http://openmp.org/wp/
Message Passing Interface Forum (2009) MPI: A message-passing interface standard, rel. 2.2. Available online at: http://www.mpi-forum.org
Acknowledgments
This work has been supported by the “Ministerio de Economía y Competitividad” of Spain / FEDER under Projects TEC2014-54005-P and TEC2015-67387-C4-3-R; and by the “Gobierno del Principado de Asturias” / FEDER under Project FC-15-GRUPIN14-114.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
López-Fernández, J.A., López-Portugués, M. & Ranilla, J. Improving the FMM performance using optimal group size on heterogeneous system architectures. J Supercomput 73, 291–301 (2017). https://doi.org/10.1007/s11227-016-1860-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1860-2