Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An efficient CELL library for lattice quantum chromodynamics

Published: 14 January 2011 Publication History

Abstract

Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at modeling the strong nuclear force, which is responsible for the interactions of nuclear particles. Numerical QCD studies are performed through a discrete formalism called LQCD (Lattice Quantum Chromodynamics). Typical simulations involve very large volume of data and numerically sensitive entities, thus the crucial need of high performance computing systems. We propose a set of CELL-accelerated routines for basic LQCD calculations. Our framework is provided as a unified library and is particularly optimized for an iterative use. Each routine is parallelized among the SPUs, and each SPU achieves it task by looping on small chunk of arrays from the main memory. Our SPU implementation is vectorized with double precision data, and the cooperation with the PPU shows a good overlap between data transfers and computations. Moreover, we permanently keep the SPU context and use mailboxes to synchronize between consecutive calls. We validate our library by using it to derive a CELL version of an existing LQCD package (tmLQCD). Experimental results on individual routines show a significant speedup compare to standard processor, 11 times better than a 2.83 GHz INTEL processor for instance (without SSE). This ratio is around 9 (with QS22 blade) when consider a more cooperative context like solving a linear system of equations (usually referred as Wislon-Dirac inversion). Our results clearly demonstrate that the CELL is a very promising way for high-scale LQCD simulations.

References

[1]
F. Belletti, G. Bilardi, M. Drochner, N. Eicker, Z. Fodor, D. Hierl, H. Kaldass, T. Lippert, T. Maurer, N. Meyer, A. Nobile, D. Pleiter, A. Schaefer, F. Schifano, H. Simma, S. Solbrig, T. Streuer, R. Tripiccione, and T. Wettig. QCD on the Cell Broadband Engine, Oct 2007.
[2]
http://fr.wikipedia.org/wiki/Cell
[3]
Cell SDK 3.0. www.ibm.com/developerworks/power/cell.
[4]
M. A. Clark, R. Babichc, K. Barrose, R. C. Browerc, C. Rebbic Solving Lattice QCD systems of equations using mixed precision solvers on GPUs, http://arxiv.org/abs/0911.3191, 2009.
[5]
Khaled Z. Ibrahim and Francois Bodin, Implementing Wilson-Dirac operator on the cell broadband engine, ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, pp. 4--14, Island of Kos, Greece, 2008.
[6]
Karl Jansen and Carsten Urbach, tmLQCD: a program suite to simulate Wilson Twisted mass Lattice QCD, Computer Physics Communications, vol. 180(12), p. 2717--2738, 2009.
[7]
Jakub Kurzak and Jack Dongarra, QR factorization for the Cell Broadband Engine, Scientific Programming, vol. 17(1-2), P. 31--42, 2009.
[8]
Jakub Kurzak, Alfredo Buttari, and Jack Dongarra, Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, www.netlib.org/lapack/lawnspdf/lawn184.pdf
[9]
C. Urbach, K. Jansen, A. Shindler, and U. Wenger, HMC Algorithm with Multiple Time Scale Intergration and Mass Preconditioning, Computer Physics Communications, vol. 174, p. 87, 2006.
[10]
Martin Luscher, Implementation of the lattice Dirac operator, 2006.
[11]
S. Motoki and A. Nakamura. Development of QCD Code on a Cell Machine. Proc. of Science, Oct. 2007.
[12]
H. Peter Hofstee, Power Efficient Processor Design and the Cell Processor, http://www.hpcaconf.org/hpca11/slides/Cell_Public_Hofstee.pdf.
[13]
QDP++, http://usqcd.jlab.org/usqcd-docs/qdp++/.
[14]
Claude Tadonki and Bernard Philippe, Parallel multiplication of a vector by a Kronecker product of matrices (part II), Parallel Distributed Computing Practices PDCP, volume 3(3), 2000.
[15]
P. Vranas, M. A. Blumrich, D. Chen, A. Gara, M. E. Giampapa, P. Heidelberger, V. Salapura, J. C. Sexton, R. Soltz, G. Bhanot, Massively parallel quantum chromodynamics, IBM J. RES. & DEV. VOL. 52 NO. 1/2 JANUARY/MARCH 2008.
[16]
F. Wilczek, What QCD Tells Us About Nature and Why We Should Listen, Nuc. Phys. A 663, 3 U20, 2000.
[17]
S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick, Scientific Computing Kernels on the Cell Processor, International Journal of Parallel Programming, 2007.

Cited By

View all
  • (2020)Performance Analysis and Optimization of the Vector-Kronecker Product Multiplication2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD49847.2020.00044(265-272)Online publication date: Sep-2020
  • (2018)Harris corner detection on a NUMA manycoreFuture Generation Computer Systems10.1016/j.future.2018.01.04888(442-452)Online publication date: Nov-2018
  • (2011)Large Scale Kronecker Product on SupercomputersProceedings of the 2011 Second Workshop on Architecture and Multi-Core Applications (wamca 2011)10.1109/WAMCA.2011.10(1-4)Online publication date: 26-Oct-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 38, Issue 4
September 2010
96 pages
ISSN:0163-5964
DOI:10.1145/1926367
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 January 2011
Published in SIGARCH Volume 38, Issue 4

Check for updates

Author Tags

  1. CELL
  2. LQCD
  3. linear algebra
  4. parallelism

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Performance Analysis and Optimization of the Vector-Kronecker Product Multiplication2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD49847.2020.00044(265-272)Online publication date: Sep-2020
  • (2018)Harris corner detection on a NUMA manycoreFuture Generation Computer Systems10.1016/j.future.2018.01.04888(442-452)Online publication date: Nov-2018
  • (2011)Large Scale Kronecker Product on SupercomputersProceedings of the 2011 Second Workshop on Architecture and Multi-Core Applications (wamca 2011)10.1109/WAMCA.2011.10(1-4)Online publication date: 26-Oct-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media