A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake

GT Bercea, ATT McRae, DA Ham… - Geoscientific Model …, 2016 - gmd.copernicus.org
Geoscientific Model Development, 2016gmd.copernicus.org
We present a generic algorithm for numbering and then efficiently iterating over the data
values attached to an extruded mesh. An extruded mesh is formed by replicating an existing
mesh, assumed to be unstructured, to form layers of prismatic cells. Applications of extruded
meshes include, but are not limited to, the representation of three-dimensional high aspect
ratio domains employed by geophysical finite element simulations. These meshes are
structured in the extruded direction. The algorithm presented here exploits this structure to …
Abstract
We present a generic algorithm for numbering and then efficiently iterating over the data values attached to an extruded mesh. An extruded mesh is formed by replicating an existing mesh, assumed to be unstructured, to form layers of prismatic cells. Applications of extruded meshes include, but are not limited to, the representation of three-dimensional high aspect ratio domains employed by geophysical finite element simulations. These meshes are structured in the extruded direction. The algorithm presented here exploits this structure to avoid the performance penalty traditionally associated with unstructured meshes. We evaluate the implementation of this algorithm in the Firedrake finite element system on a range of low compute intensity operations which constitute worst cases for data layout performance exploration. The experiments show that having structure along the extruded direction enables the cost of the indirect data accesses to be amortized after 10–20 layers as long as the underlying mesh is well ordered. We characterize the resulting spatial and temporal reuse in a representative set of both continuous-Galerkin and discontinuous-Galerkin discretizations. On meshes with realistic numbers of layers the performance achieved is between 70 and 90 % of a theoretical hardware-specific limit.
gmd.copernicus.org