Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063384.2063396acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Liszt: a domain specific language for building portable mesh-based PDE solvers

Published: 12 November 2011 Publication History

Abstract

Heterogeneous computers with processors and accelerators are becoming widespread in scientific computing. However, it is difficult to program hybrid architectures and there is no commonly accepted programming model. Ideally, applications should be written in a way that is portable to many platforms, but providing this portability for general programs is a hard problem.
By restricting the class of programs considered, we can make this portability feasible. We present Liszt, a domain-specific language for constructing mesh-based PDE solvers. We introduce language statements for interacting with an unstructured mesh, and storing data at its elements. Program analysis of these statements enables our compiler to expose the parallelism, locality, and synchronization of Liszt programs. Using this analysis, we generate applications for multiple platforms: a cluster, an SMP, and a GPU. This approach allows Liszt applications to perform within 12% of hand-written C++, scale to large clusters, and experience order-of-magnitude speedups on GPUs.

References

[1]
J. R. Allwright, R. Bordawekar, P. D. Coddington, K. Dincer, and C. L. Martin. A comparison of parallel graph coloring algorithms. Technical report, SCCS-666, Northeast Parallel Architectures Center at Syracuse University, 1995.
[2]
C. Ancourt, F. Coelho, and R. Keryell. How to add a new phase in PIPS: the case of dead code elimination. In In Sixth International Workshop on Compilers for Parallel Computers, 1996.
[3]
V. G. Asouti, X. S. Trompoukis, I. C. Kampolis, and K. C. Giannakoglou. Unsteady CFD computations using vertex-centered finite volumes for unstructured grids on graphics processing units. International Journal for Numerical Methods in Fluids, 2010.
[4]
S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202. Birkhäuser Press, 1997.
[5]
K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Entering the petaflop era: the architecture and performance of Roadrunner. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, Piscataway, NJ, USA, 2008. IEEE Press.
[6]
T. Brandvik and G. Pullan. SBLOCK: A framework for efficient stencil-based PDE solvers on multi-core platforms. In Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on, pages 1181--1188, July 2010.
[7]
D. L. Brown, G. S. Chesshire, W. D. Henshaw, and D. J. Quinlan. OVERTURE: An object-oriented software system for solving partial differential equations in serial and parallel environments. In PPSC'97, 1997.
[8]
H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '10, pages 835--847, New York, NY, USA, 2010. ACM.
[9]
G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W. Markstein. Register allocation via coloring. Comput. Lang., pages 47--57, 1981.
[10]
A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid CFD solvers on modern graphics hardware. In 19th AIAA Computational Fluid Dynamics Conference, number AIAA 2009-4001, June 2009.
[11]
D. P. Dobkin and M. J. Laszlo. Primitives for the manipulation of three-dimensional subdivisions. In Proceedings of the third annual symposium on Computational geometry, SCG '87, pages 86--99, New York, NY, USA, 1987. ACM.
[12]
J. B. Drake, W. Putman, P. N. Swarztrauber, and D. L. Williamson. High order cartesian method for the shallow water equations on a sphere. Technical report, TM-2001, Oakridge Nation Laboratory, 1999.
[13]
T. Dupont, J. Hoffman, C. Johnson, R. Kirby, M. Larson, A. Logg, and R. Scott. The FEniCS project. Technical report, 2003.
[14]
M. Giles, G. Mudalige, Z. Sharif, G. Markall, and P. Kelly. Performance analysis of the OP2 framework on many-core architecture. In ACM SIGMETRICS Performance Evaluation Review (to appear), March 2011.
[15]
W. Gropp, S. Huss-Ledermanand, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, and M. Snir. MPI - The Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA, 1998.
[16]
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding sources of inefficiency in general-purpose chips. SIGARCH Comput. Archit. News, 38:37--47, June 2010.
[17]
M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31:397--423, September 2005.
[18]
M. Houston, J.-Y. Park, M. Ren, T. Knight, K. Fatahalian, A. Aiken, W. Dally, and P. Hanrahan. A portable runtime interface for multi-level memory hierarchies. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, PPoPP '08, pages 143--152, New York, NY, USA, 2008. ACM.
[19]
A. Jameson, T. Baker, and N. Weatherill. Improvements to the aircraft Euler method. In AIAA 25th Aerospace Sciences Meeting, number 86-0103, January 1986.
[20]
I. Kampolis, X. Trompoukis, V. Asouti, and K. Giannakoglou. CFD-based analysis and two-level aerodynamic optimization on graphics processing units. Computer Methods in Applied Mechanics and Engineering, 199(9-12):712--722, 2010.
[21]
G. Karypis, V. Kumar, and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing, 48:71--95, 1998.
[22]
Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29, 8 December 2008.
[23]
O. Lawlor, S. Chakravorty, T. Wilmarth, N. Choudhury, I. Dooley, G. Zheng, and L. Kale. ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications. Engineering with Computers, 22:215--235, 2006.
[24]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. In Parallel Computing, pages 201--214. ACM Press, 1998.
[25]
R. Löhner. Applied Computational Fluid Dynamics: An Introduction Based on Finite Element Methods. Wiley, Fairfax, Virginia, 2nd edition, 2008.
[26]
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6:40--53, March 2008.
[27]
NVIDIA Corporation. NVIDIA's next generation compute architecture: Fermi, November 2009.
[28]
NVIDIA Corporation. NVIDIA Tesla GPUs power world's fastest supercomputer, 2010.
[29]
M. Odersky, V. Cremet, I. Dragos, G. Dubochet, B. Emir, S. Mcdirmid, S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman, L. Spoon, and M. Zenger. An overview of the Scala programming language (second edition. Technical report, LAMP-REPORT-2006-001, École Polytechnique Fédérale de Lausanne, 2006.
[30]
OpenMP Architecture Review Board. OpenMP: Application Program Interface 3.1, July 2011.
[31]
R. Pecnik, V. E. Terrapon, F. Ham, and G. Iaccarino. Full system scramjet simulation. Annual Research Briefs of the Center for Turbulence Research, Stanford University, Stanford, CA, 2009.
[32]
O. Pironneau, F. Hecht, A. L. Hyaric, and J. Morice. FreeFEM, 2005. Universitè Pierre et Marie Curie Laboratoire Jacques-Louis Lions, http://www.freefem.org/.
[33]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the ninth international conference on Generative programming and component engineering, GPCE '10, pages 127--136, New York, NY, USA, 2010. ACM.
[34]
D. E. Shaw, R. O. Dror, J. K. Salmon, J. P. Grossman, K. M. Mackenzie, J. A. Bank, C. Young, M. M. Deneroff, B. Batson, K. J. Bowers, E. Chow, M. P. Eastwood, D. J. Ierardi, J. L. Klepeis, J. S. Kuskin, R. H. Larson, K. Lindorff-Larsen, P. Maragakis, M. A. Moraes, S. Piana, Y. Shan, and B. Towles. Millisecond-scale molecular dynamics simulations on Anton. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 39:1--39:11, New York, NY, USA, 2009. ACM.
[35]
J. R. Stewart and H. C. Edwards. A framework approach for developing parallel adaptive multiphysics applications. Finite Elem. Anal. Des., 40:1599--1617, July 2004.
[36]
H. G. Weller, G. Tabor, H. Jasak, and C. Fureby. A tensorial approach to computational continuum mechanics using object-oriented techniques. Comput. Phys., 12:620--631, November 1998.

Cited By

View all
  • (2024)A Mesh-based Simulation Framework using Automatic Code GenerationACM Transactions on Graphics10.1145/368798643:6(1-17)Online publication date: 19-Dec-2024
  • (2024)IMESH: A DSL for Mesh ProcessingACM Transactions on Graphics10.1145/366218143:5(1-17)Online publication date: 25-Jun-2024
  • (2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2011
866 pages
ISBN:9781450307710
DOI:10.1145/2063384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compiler analysis and program transformations
  2. programming and runtime environments for high performance and high throughput computing

Qualifiers

  • Research-article

Conference

SC '11
Sponsor:

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)9
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Mesh-based Simulation Framework using Automatic Code GenerationACM Transactions on Graphics10.1145/368798643:6(1-17)Online publication date: 19-Dec-2024
  • (2024)IMESH: A DSL for Mesh ProcessingACM Transactions on Graphics10.1145/366218143:5(1-17)Online publication date: 25-Jun-2024
  • (2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
  • (2023)MOD2IR: High-Performance Code Generation for a Biophysically Detailed Neuronal Simulation DSLProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580268(203-215)Online publication date: 17-Feb-2023
  • (2022)MeshTaichiACM Transactions on Graphics10.1145/3550454.355543041:6(1-17)Online publication date: 30-Nov-2022
  • (2022)Increasing ising machine capacity with multi-chip architecturesProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527414(508-521)Online publication date: 18-Jun-2022
  • (2022)Automated generation of High-Performance Computational Fluid Dynamics CodesJournal of Computational Science10.1016/j.jocs.2022.10166461(101664)Online publication date: May-2022
  • (2021)NeuMIPACM Transactions on Graphics10.1145/3450626.345979540:4(1-13)Online publication date: 19-Jul-2021
  • (2021)Real-time locally injective volumetric deformationACM Transactions on Graphics10.1145/3450626.345979440:4(1-16)Online publication date: 19-Jul-2021
  • (2021)Capturing detailed deformations of moving human bodiesACM Transactions on Graphics10.1145/3450626.345979240:4(1-18)Online publication date: 19-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media