Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/SC.2014.11acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Lattice QCD with domain decomposition on Intel® Xeon Phi co-processors

Published: 16 November 2014 Publication History

Abstract

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

References

[1]
B. Joó, D. D. Kalamkar, K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. W. Lee, P. Dubey, and I. Watson, William, "Lattice QCD on Intel Xeon Phi Coprocessors," in Supercomputing, ser. Lecture Notes in Computer Science, J. M. Kunkel, T. Ludwig, and H. W. Meuer, Eds. Springer Berlin Heidelberg, 2013, vol. 7905, pp. 40--54. {Online}. Available: http://dx.doi.org/10.1007/978-3-642-38750-0_4
[2]
K. Bergman et al., "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems Peter Kogge, Editor & Study Lead," 2008.
[3]
K. Vaidyanathan, K. Pamnany, D. D. Kalamkar, A. Heinecke, M. Smelyanskiy, J. Park, D. Kim, A. Shet, B. Kaul, B. Joo, and P. Dubey, "Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters," in IPDPS 2014 (28th IEEE International Parallel & Distributed Processing Symposium). to be published.
[4]
A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey, "3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 1--13. {Online}. Available: http://dx.doi.org/10.1109/SC.2010.2
[5]
K. Wilson, "Quarks and strings on a lattice," in New Phenomena in Subnuclear Physics, ser. The Subnuclear Series, A. Zichichi, Ed. Springer US, 1977, vol. 13, pp. 69--142. {Online}. Available: http://dx.doi.org/10.1007/978-1-4613-4208-3_6
[6]
B. Sheikholeslami and R. Wohlert, "Improved Continuum Limit Lattice Action for QCD with Wilson Fermions," Nucl. Phys., vol. B259, p. 572, 1985.
[7]
M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems," Journal of research of the National Bureau of Standards, vol. 49, pp. 409--436, 1952.
[8]
Y. Saad and M. Schultz, "GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems," SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856--869, 1986. {Online}. Available: http://epubs.siam.org/doi/abs/10.1137/0907058
[9]
H. A. van der Vorst, "BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems," SIAM J. Sci. Stat. Comput., vol. 13, no. 2, pp. 631--644, Mar. 1992. {Online}. Available: http://dx.doi.org/10.1137/0913035
[10]
A. Frommer, A. Nobile, and P. Zingler, "Deflation and Flexible SAP-Preconditioning of GMRES in Lattice QCD Simulation," 2012. {Online}. Available: http://arxiv.org/abs/1204.5463
[11]
H. A. Schwarz, "Über einen Grenzübergang durch alternierendes Verfahren," in Vierteljahrsschrift der Naturforschenden Gesellschaft in Zürich, 1870, vol. 15, pp. 272--286.
[12]
M. Lüscher, "Solution of the Dirac equation in lattice QCD using a domain decomposition method," Comput. Phys. Commun., vol. 156, pp. 209--220, 2004.
[13]
Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition, 2nd ed. Society for Industrial and Applied Mathematics, 2003.
[14]
M. Lüscher, "Computational strategies in lattice QCD," in Modern Perspectives in Lattice QCD: Quantum Field Theory and High Performance Computing, L. Lellouch, R. Sommer, B. Svetitsky, A. Vladikas, and L. F. Cugliandolo, Eds., vol. Les Houches 2009, Session XCIII, 2009, pp. 331--399.
[15]
M. Lüscher, "Schwarz-preconditioned HMC algorithm for two-flavour lattice QCD," Comput. Phys. Commun., vol. 165, pp. 199--220, 2005.
[16]
G. Bali, P. Bruns, S. Collins, M. Deka, B. Gläßle et al., "Nucleon mass and sigma term from lattice QCD with two light fermion flavors," Nucl. Phys., vol. B866, pp. 1--25, 2013.
[17]
S. Dürr, Z. Fodor, C. Hölbling, R. Hoffmann, S. Katz et al., "Scaling study of dynamical smeared-link clover fermions," Phys. Rev., vol. D79, p. 014501, 2009.
[18]
S. Duane, A. D. Kennedy, B. Pendleton, and D. Roweth, "Hybrid Monte Carlo," Phys. Lett., vol. B195, pp. 216--222, 1987.
[19]
R. G. Edwards and B. Joo, "The Chroma software system for lattice QCD," Nucl. Phys. Proc. Suppl., vol. 140, p. 832, 2005.
[20]
R. Babich, J. Brannick, R. Brower, M. Clark, T. Manteuffel et al., "Adaptive multigrid algorithm for the lattice Wilson-Dirac operator," Phys. Rev. Lett., vol. 105, p. 201602, 2010.
[21]
J. Osborn, R. Babich, J. Brannick, R. Brower, M. Clark et al., "Multigrid solver for clover fermions," PoS, vol. LATTICE2010, p. 037, 2010.
[22]
A. Frommer, K. Kahl, S. Krieg, B. Leder, and M. Rottmann, "Adaptive Aggregation Based Domain Decomposition Multigrid for the Lattice Wilson Dirac Operator," 2013. {Online}. Available: http://arxiv.org/abs/1303.1377
[23]
A. Frommer, K. Kahl, S. Krieg, B. Leder, and M. Rottmann, "An Adaptive Aggregation Based Domain Decomposition Multilevel Method for the Lattice Wilson Dirac Operator: Multilevel Results," 2013. {Online}. Available: http://arxiv.org/abs/1307.6101
[24]
M. Lüscher, "Local coherence and deflation of the low quark modes in lattice QCD," JHEP, vol. 0707, p. 081, 2007.
[25]
R. Babich, M. A. Clark, B. Joó, G. Shi, R. C. Brower, and S. Gottlieb, "Scaling Lattice QCD Beyond 100 GPUs," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11. New York, NY, USA: ACM, 2011, pp. 70:1--70:11. {Online}. Available: http://doi.acm.org/10.1145/2063384.2063478
[26]
Y. Osaki and K.-I. Ishikawa, "Domain Decomposition method on GPU cluster," PoS, vol. LATTICE2010, p. 036, 2010.

Cited By

View all
  • (2017)Tessellating stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126920(1-13)Online publication date: 12-Nov-2017
  • (2017)Efficient array slicing on the Intel Xeon Phi coprocessorProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091975(40-47)Online publication date: 18-Jun-2017
  • (2017)Asynchronous and synchronous models of executions on Intel Xeon Phi coprocessor systems for high performance of long wave radiation calculations in atmosphere modelsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.018102:C(199-212)Online publication date: 1-Apr-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2014
1054 pages
ISBN:9781479955008
  • General Chair:
  • Trish Damkroger,
  • Program Chair:
  • Jack Dongarra

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Author Tags

  1. Intel® Xeon Phi coprocessor
  2. domain decomposition
  3. lattice QCD

Qualifiers

  • Research-article

Conference

SC '14
Sponsor:

Acceptance Rates

SC '14 Paper Acceptance Rate 83 of 394 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Tessellating stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126920(1-13)Online publication date: 12-Nov-2017
  • (2017)Efficient array slicing on the Intel Xeon Phi coprocessorProceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/3091966.3091975(40-47)Online publication date: 18-Jun-2017
  • (2017)Asynchronous and synchronous models of executions on Intel Xeon Phi coprocessor systems for high performance of long wave radiation calculations in atmosphere modelsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.018102:C(199-212)Online publication date: 1-Apr-2017
  • (2015)Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessorsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807631(1-12)Online publication date: 15-Nov-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media