Abstract
GPUs excel at solving many parallel problems and hence dramatically increase the computation performance. In electrodynamics and many other fields, FDTD method is widely used due to its simplicity, accuracy, and practicability. In this paper, we applied the FDTD method on the Fermi Architecture GPUs, the latest product of NVidia, for a better understanding of Fermi’s new features, such as the double precision support and improved memory hierarchy. Then we make a comparison between the strategies using the shared memory, the traditional optimization method on GPUs, and using L1 cache. Next, the paper provides insights into the disparity of these two strategies. We demonstrate that parallel computations only using L1 cache can reach the similar or even better performance as the traditional optimization method using the shared memory does when the dataset is not too large or the frequency of repeated use of the related data is low.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kane, S.Y.: Numerical Solution of Initial Boundary Value Problems Involving Maxwell’s Equations in Isotropic Media. IEEE Transactions on Antennas and Propagation (1966)
Allen, T., Susan, C.H.: Computational Electrodynamics: The Finite-Difference Time-Domain Method, 3rd edn. Artech House Inc., MA (2005)
John N., Ian B., Michael G., Kevin S.: Scalable Parallel Programming with CUDA. Queue, 40–53 (2008)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide: Version 4.0 (2011)
Next Generation CUDA Architecture, Code Named Fermi, http://www.nvi-dia.com/object/fermi_architecture.html
Mehmet, F.S., Ihab, E-K., David, A.B., Shawn-Yu, L.: A Novel FDTD Application Featuring Open MP-MPI Hybrid Parallelization. In: Proceedings of International Conference on Parallel Processing, Montreal, Quebec, Canada, pp. 373–379 (2004)
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proc. ISCA, pp. 152–163 (2009)
NVIDIA Corporation: NVidia Fermi Compute Architecture Whitepaper Version 1.1
Hewlett-Packard Development Company: HP ProLiant SL390s G7 2U half width Server Maintenance and Service Guide
Jun, L., Tian, Y., Tong, L.: Analysis of the Electromagnetic Characteristics of Coplanar Waveguide by FDTD Method. Testing and Diagnosis (2009)
Wenhua, Y.: Electromagnetic Simulation Techniques Based on the FDTD Method, pp. 84–85. John Wiley and Sons Inc., Chichester (2009)
NVIDIA Corporation: Compute Visual Profiler User Guide (2010)
Phuong Hoai, H., Tsigas, P., Anshus, O.J.: The Synchronization Power of Coalesced Memory Accesses. IEEE Transactions on Parallel and Distributed System, 939–953 (2010)
Tesla GPU Computing Solutions for Data Centers, http://www.nvidia.com/object/preconfigured-clusters.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hou, K., Zhao, Y., Huang, J., Zhang, L. (2011). Performance Evaluation of the Three-Dimensional Finite-Difference Time-Domain(FDTD) Method on Fermi Architecture GPUs. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2011. Lecture Notes in Computer Science, vol 7016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24650-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-24650-0_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24649-4
Online ISBN: 978-3-642-24650-0
eBook Packages: Computer ScienceComputer Science (R0)