Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2107.14758v3 [math.NA] 08 Jan 2024
\headers

Sparse vertex-star relaxation for high-order FEMP. D. Brubeck and P. E. Farrell

A scalable and robust vertex-star relaxation for high-order FEMthanks: Submitted to the editors August 31, 2021. \funding PDB was supported by the University of Oxford Mathematical Institute Graduate Scholarship. PEF was supported by EPSRC grants EP/V001493/1 and EP/R029423/1. We would also like to thank Lawrence Mitchell for his helpful advice for the implementation in Firedrake.

Pablo D. Brubeck Mathematical Institute, University of Oxford, Oxford UK () brubeckmarti@maths.ox.ac.uk    Patrick E. Farrell Mathematical Institute, University of Oxford, Oxford UK () patrick.farrell@maths.ox.ac.uk
Abstract

Pavarino proved that the additive Schwarz method with vertex patches and a low-order coarse space gives a p𝑝pitalic_p-robust solver for symmetric and coercive problems [42]. However, for very high polynomial degree it is not feasible to assemble or factorize the matrices for each patch. In this work we introduce a direct solver for separable patch problems that scales to very high polynomial degree on tensor product cells. The solver constructs a tensor product basis that diagonalizes the blocks in the stiffness matrix for the internal degrees of freedom of each individual cell. As a result, the non-zero structure of the cell matrices is that of the graph connecting internal degrees of freedom to their projection onto the facets. In the new basis, the patch problem is as sparse as a low-order finite difference discretization, while having a sparser Cholesky factorization. We can thus afford to assemble and factorize the matrices for the vertex-patch problems, even for very high polynomial degree. In the non-separable case, the method can be applied as a preconditioner by approximating the problem with a separable surrogate. We demonstrate the approach by solving the Poisson equation and a H(div)𝐻divH(\mathrm{div})italic_H ( roman_div )-conforming interior penalty discretization of linear elasticity in three dimensions at p=15𝑝15p=15italic_p = 15.

keywords:
preconditioning, high-order, tensor product, additive Schwarz, sparse Cholesky
{AMS}

65F08, 65N35, 65N55

{DOI}

10.1137/21M1444187

1 Introduction

For problems with smooth solutions, high-order finite element methods offer very good convergence properties, and in some cases they do not exhibit locking phenomena found in low-order methods. Moreover, there exist optimal matrix-free algorithms for operator evaluation with high arithmetic intensity, arising from data locality, that make efficient use of modern parallel hardware architectures. Unfortunately, the conditioning of the stiffness matrix is severely affected by the polynomial degree of the approximation. In order to obtain practical iterative solvers, we require good preconditioners.

Optimal solvers are often obtained from a multiplicative multigrid V-cycle where the smoother consists of a domain decomposition method, such as additive Schwarz with a particular space decomposition. The multigrid algorithm is then accelerated by a Krylov subspace method, such as preconditioned conjugate gradients (PCG). The choice of space decomposition in the relaxation is crucial for robustness with respect to the cell size hhitalic_h, the polynomial degree p𝑝pitalic_p, and parameters in the equation.

One of the cheapest relaxations, with 𝒪(pd)𝒪superscript𝑝𝑑\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) computational cost, is diagonal scaling, also known as point-Jacobi. The diagonally preconditioned Laplacian has a condition number of 𝒪(p2d2)𝒪superscript𝑝2𝑑2\mathcal{O}(p^{2d-2})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 italic_d - 2 end_POSTSUPERSCRIPT ) [36]. This implies that the number of PCG iterations, and therefore the number of residual evaluations, is 𝒪(pd1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d-1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ), incurring a total cost of 𝒪(p2d)𝒪superscript𝑝2𝑑\mathcal{O}(p^{2d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 italic_d end_POSTSUPERSCRIPT ). In order to minimize the time to solution, it is reasonable to consider more expensive relaxation methods that converge in fewer iterations. Ideally, we wish to balance the cost of applying the relaxation with that of updating the residual. On tensor product elements, the latter operation can be done quickly in 𝒪(pd+1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d+1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) operations via the sum-factorization. Sum-factorization breaks down the residual evaluation into products of one-dimensional operators and diagonal scalings [40].

In 1993, Pavarino proved that the additive Schwarz method with a vertex-centered space decomposition and an additive coarse space of lowest-order (p=1𝑝1p=1italic_p = 1) gives a robust solver with respect to hhitalic_h and p𝑝pitalic_p for symmetric and coercive problems [42]. This type of space decomposition is often referred to as generous overlap and is illustrated in Figure 1. We use the terminology of [19] and refer to the subdomains in this space decomposition as vertex-star patches, as they are constructed by taking all the degrees of freedom (DOFs) on the topological entities in the star of each vertex.

The most straightforward implementation of a vertex-star solver involves the assembly and direct factorization of the 𝒪(pd)×𝒪(pd)𝒪superscript𝑝𝑑𝒪superscript𝑝𝑑\mathcal{O}(p^{d})\times\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) × caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) patch matrices (which are dense for Lagrange shape functions). This becomes prohibitively expensive at very high polynomial degrees, with the Cholesky factorization of such a matrix requiring 𝒪(p3d)𝒪superscript𝑝3𝑑\mathcal{O}(p^{3d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 italic_d end_POSTSUPERSCRIPT ) operations. However, there exist bases for which the element matrices are sparse on affine cells, such as the hierarchical Lobatto shape functions [53]. In this basis, the stiffness matrix has a 5-point stencil in 2D, and a much larger 13-point stencil in 3D.

Efficient relaxation methods that are p𝑝pitalic_p-robust may arise from the discretization of an auxiliary problem for which fast inversion techniques are available. For a more general approach to auxiliary space techniques, we refer to the work of Xu [56]. In our context, the underlying PDE and/or the domain can be replaced by those of a problem which is solvable by the method of separation of variables. The fast diagonalization method (FDM) [34] is a 𝒪(pd+1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d+1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) direct factorization that breaks the problem down into a sequence of one-dimensional subproblems.

For the Poisson equation discretized on meshes with all cells Cartesian (all internal angles are right angles), the vertex-star problems can be solved directly with the FDM  [55]. Huismann et al. [25] introduced a remarkably fast solver with 𝒪(pd)𝒪superscript𝑝𝑑\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) scaling on such meshes. The linear system is statically condensed by elimination of the cell DOFs, and the reduced system on the interface is solved with p𝑝pitalic_p-multigrid and a restricted variant of the FDM onto the interface DOFs of a vertex-star. Since the statically condensed operator requires the exact inversion of the cell matrices, their approach has no obvious extension to the unstructured, non-Cartesian case.

The FDM can be applied as a relaxation by means of an auxiliary problem that is separable, but this requires a tensor product grid discretization of the patch, which is only possible when the cells are laid out in a tensor product grid [55]. When the cells are not Cartesian, the method of Witte et al. [55] approximates the whole patch as a single Cartesian domain and converges slowly even when the cells are slightly distorted. On general meshes, the patches may not have this structure, thus the FDM cannot be directly applied on such patches. An example of a vertex-star patch to which the FDM cannot be applied as a relaxation is shown in Figure 2.

Refer to caption
(a) Vertex-star patch
Refer to caption
(b) Cell-centered patch
Figure 1: Subdomains for the additive Schwarz method on a regular mesh (p=4𝑝4p=4italic_p = 4). In combination with a low-order coarse grid, the vertex-star patch gives a p𝑝pitalic_p-robust method for symmetric and coercive problems, while the cell-centered patch does not.

A popular alternative in the literature has been to use cell-centered patches with minimal overlap by including a few layers of DOFs of the neighboring cells [20, 48, 52]. This can be done in such a way that every patch remains structured. This kind of space decomposition is more amenable to fast implementation, but does not give a p𝑝pitalic_p-robust solver. If the number of layers is fixed, then the measure of the overlap region will decrease as p𝑝pitalic_p is increased. Pavarino also proved that when the overlap is not generous, the rate of convergence of the additive Schwarz method will depend inversely on the overlap size [43]. To overcome this, Fischer and Lottes [33] applied a hybrid p𝑝pitalic_p-multigrid/Schwarz method, in the context of a Poisson problem. They implemented several levels of p𝑝pitalic_p-multigrid to overcome the lack of p𝑝pitalic_p-robustness of the cell-centered patches with minimal overlap. The use of multiple levels increases the overlap at the coarser levels with a minor impact on the overall computational cost. Cell-centered patches without overlap have also been employed for non-symmetric problems [45, 46, 17].

Refer to caption
(a) Structured patch
Refer to caption
(b) Unstructured patch
Figure 2: The FDM may be applied as a relaxation only on vertex-star patches that are structured, i.e. where the cells are laid out in a tensor product grid.

Instead of replacing the vertex-star patches with cell-centered ones, the alternative FEM-SEM preconditioner [40] rediscretizes the problem on each vertex-star patch with p=1𝑝1p=1italic_p = 1 on a GLL grid, a mesh with vertices at the DOFs of the high-order space. The theory behind this guarantees the spectral equivalence between the differential operator discretized on the two spaces [13]. Since low-order methods are naturally sparse, this approach is not constrained to Cartesian cells and can deal properly with mixed first derivatives that the FDM cannot handle. A downside of this approach is that the Cholesky factors of the patch matrices are quite dense, limiting its scalability to very high polynomial degree. Computationally advantageous approaches involve incomplete factorizations of the patch FEM-SEM matrices [44], or the use of algebraic multigrid on the global FEM-SEM operator [9].

In this work we develop a solver for vertex-star patches that scales to very high polynomial degree. Our approach does not rely on a particular structure of the patch. In particular, it applies to the patches shown in Figure 2. The key idea is to use the FDM to numerically construct a basis of functions on an interval that diagonalizes the interior blocks of both the stiffness and mass matrices in one dimension. When the problem is assembled with respect to the associated tensor product basis, the resulting stiffness and mass matrices are sparse, in the Cartesian case. In particular, the total number of non-zeros is the same as that of a low-order finite difference discretization of the Laplacian. Moreover, fill-in in the Cholesky factorization is only introduced for the interface DOFs, resulting in very sparse Cholesky factors. The factorization requires 𝒪(p3(d1))𝒪superscript𝑝3𝑑1\mathcal{O}(p^{3(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) operations, while forward and back-substitution steps have a cost of 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) operations that is optimal for d{2,3}𝑑23d\in\{2,3\}italic_d ∈ { 2 , 3 }, in contrast with the 𝒪(p3d)𝒪superscript𝑝3𝑑\mathcal{O}(p^{3d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 italic_d end_POSTSUPERSCRIPT ) and 𝒪(p2d)𝒪superscript𝑝2𝑑\mathcal{O}(p^{2d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 italic_d end_POSTSUPERSCRIPT ) costs of the naïve approach. A disadvantage of the approach is that the memory required scales like 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ), instead of the optimal 𝒪(pd)𝒪superscript𝑝𝑑\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) required for storing the solution. In the non-Cartesian case, we approximate the form with one that is separable in the reference problem. Robustness with respect to hhitalic_h and p𝑝pitalic_p should follow from the spectral equivalence between the forms, and numerical experiments indicate that the approach is effective when the cells are moderately deformed.

We demonstrate the effectiveness of our approach by applying it to a H(div)×L2𝐻divsuperscript𝐿2H(\mathrm{div})\times L^{2}italic_H ( roman_div ) × italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT conforming discretization of a mixed formulation of incompressible linear elasticity. We present a sequence of problems of increasing complexity building up to this. In Section 2 we present the standard hp𝑝hpitalic_h italic_p-FEM formulation for the Poisson problem and construct a solver based on a sparse discretization of an auxiliary locally separable PDE that employs the numerically computed FDM basis. In Section 3 we consider the application of our solver to linear elasticity. In the primal formulation, although our approach can be applied to patch problems for the individual components of displacement, we explain why it cannot be applied to the coupled vector-valued problem, which is necessary for parameter-robustness in the incompressible regime. We therefore consider a mixed formulation instead. Developing a p𝑝pitalic_p-robust solver requires both a p𝑝pitalic_p-robust preconditioner and a p𝑝pitalic_p-robust discretization, and for the latter we choose a H(div)×L2𝐻divsuperscript𝐿2H(\mathrm{div})\times L^{2}italic_H ( roman_div ) × italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT conforming approach. We then extend the method to symmetric interior-penalty discontinuous Galerkin discretizations, required for the displacement block of the mixed problem. We apply our relaxation to the displacement block of the incompressible elasticity system in conjunction with block-preconditioned Krylov methods. We end with conclusions in Section 4.

2 Sparse Poisson solver

We will first describe a solver for the Poisson equation on Cartesian cells, which will subsequently be extended as a preconditioner for more general symmetric coercive problems on non-Cartesian cells.

2.1 Continuous Galerkin formulation

We start from the standard weak formulation of the Poisson equation. Consider a bounded domain ΩdΩsuperscript𝑑\Omega\subset\mathbb{R}^{d}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, d{1,2,3}𝑑123d\in\{1,2,3\}italic_d ∈ { 1 , 2 , 3 }, and let ΓDΩsubscriptΓ𝐷Ω\Gamma_{D}\subseteq\partial\Omegaroman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ⊆ ∂ roman_Ω be the part of the boundary where the Dirichlet boundary condition u|ΓD=u0evaluated-at𝑢subscriptΓ𝐷subscript𝑢0\left.u\right|_{\Gamma_{D}}=u_{0}italic_u | start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is prescribed. The problem is to find uu0𝑢subscript𝑢0u-u_{0}italic_u - italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in VH01(Ω)={vH1(Ω),v|ΓD=0}𝑉subscriptsuperscript𝐻10Ωformulae-sequence𝑣superscript𝐻1Ωevaluated-at𝑣subscriptΓ𝐷0V\coloneqq H^{1}_{0}(\Omega)=\{v\in H^{1}(\Omega),\left.v\right|_{\Gamma_{D}}=0\}italic_V ≔ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( roman_Ω ) = { italic_v ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω ) , italic_v | start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0 } such that

a(v,u)=L(v)vV,formulae-sequence𝑎𝑣𝑢𝐿𝑣for-all𝑣𝑉a(v,u)=L(v)\quad\forall\,v\in V,italic_a ( italic_v , italic_u ) = italic_L ( italic_v ) ∀ italic_v ∈ italic_V , (2.1)

where

a(v,u)Ωvud𝐱,L(v)Ωvfd𝐱.formulae-sequence𝑎𝑣𝑢subscriptΩ𝑣𝑢d𝐱𝐿𝑣subscriptΩ𝑣𝑓differential-d𝐱a(v,u)\coloneqq\int_{\Omega}\nabla v\cdot\nabla u\leavevmode\nobreak\ \mathrm{% d}\mathbf{x},\quad L(v)\coloneqq\int_{\Omega}vf\leavevmode\nobreak\ \mathrm{d}% \mathbf{x}.italic_a ( italic_v , italic_u ) ≔ ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ∇ italic_v ⋅ ∇ italic_u roman_d bold_x , italic_L ( italic_v ) ≔ ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_v italic_f roman_d bold_x . (2.2)

The standard FEM discretization employs a mesh 𝒯h={K}subscript𝒯𝐾\mathcal{T}_{h}=\{K\}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = { italic_K } of ΩΩ\Omegaroman_Ω. In this work we consider quadrilateral and hexahedral cells, so that each cell K𝐾Kitalic_K can be mapped with a diffeomorphism FK:K^K:subscript𝐹𝐾^𝐾𝐾F_{K}:\hat{K}\to Kitalic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT : over^ start_ARG italic_K end_ARG → italic_K from the reference hypercube K^=^d^𝐾superscript^𝑑\hat{K}=\hat{\mathcal{I}}^{d}over^ start_ARG italic_K end_ARG = over^ start_ARG caligraphic_I end_ARG start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where ^=[1,1]^11\hat{\mathcal{I}}=[-1,1]over^ start_ARG caligraphic_I end_ARG = [ - 1 , 1 ] is the reference interval. The approximate solution uhVhsubscript𝑢subscript𝑉u_{h}\in V_{h}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is sought in the space of piecewise continuous tensor product polynomials on each cell, i.e. VhQp(Ω)Vsubscript𝑉subscriptQ𝑝Ω𝑉V_{h}\coloneqq\mathrm{Q}_{p}(\Omega)\subset Vitalic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≔ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω ) ⊂ italic_V. We first define the space of shape functions on K^^𝐾\hat{K}over^ start_ARG italic_K end_ARG

Qp(K^)j=1dPp(^),Pp(^)span{x^j,0jp},formulae-sequencesubscriptQ𝑝^𝐾superscriptsubscripttensor-product𝑗1𝑑subscriptP𝑝^subscriptP𝑝^spansuperscript^𝑥𝑗0𝑗𝑝\mathrm{Q}_{p}(\hat{K})\coloneqq\bigotimes_{j=1}^{d}\mathrm{P}_{p}(\hat{% \mathcal{I}}),\quad\mathrm{P}_{p}(\hat{\mathcal{I}})\coloneqq\operatorname{% span}\left\{\hat{x}^{j},0\leq j\leq p\right\},roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_K end_ARG ) ≔ ⨂ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) , roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) ≔ roman_span { over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , 0 ≤ italic_j ≤ italic_p } , (2.3)

and via composition with FK1superscriptsubscript𝐹𝐾1F_{K}^{-1}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, we define

Qp(Ω){vH01(Ω):K𝒯hv^Qp(K^) s.t. v|K=v^FK1}.subscriptQ𝑝Ωconditional-set𝑣subscriptsuperscript𝐻10Ωfor-all𝐾subscript𝒯^𝑣evaluated-atsubscriptQ𝑝^𝐾 s.t. 𝑣𝐾^𝑣superscriptsubscript𝐹𝐾1\mathrm{Q}_{p}(\Omega)\coloneqq\left\{v\in H^{1}_{0}(\Omega):\forall\,K\in% \mathcal{T}_{h}\ \exists\ \hat{v}\in\mathrm{Q}_{p}(\hat{K})\text{ s.t. }\left.% v\right|_{K}=\hat{v}\circ F_{K}^{-1}\right\}.roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω ) ≔ { italic_v ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( roman_Ω ) : ∀ italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∃ over^ start_ARG italic_v end_ARG ∈ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_K end_ARG ) s.t. italic_v | start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = over^ start_ARG italic_v end_ARG ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT } . (2.4)

Once we fix a basis {ϕj}j=1nsuperscriptsubscriptsubscriptitalic-ϕ𝑗𝑗1𝑛\{\phi_{j}\}_{j=1}^{n}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for Vhsubscript𝑉V_{h}italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, the approximate solution is expanded as uh=j=1nujϕjsubscript𝑢superscriptsubscript𝑗1𝑛subscript𝑢𝑗subscriptitalic-ϕ𝑗u_{h}=\sum_{j=1}^{n}u_{j}\phi_{j}italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The resulting n×n𝑛𝑛n\times nitalic_n × italic_n system of linear equations is

Au¯=f¯,𝐴¯𝑢¯𝑓A\underline{u}=\underline{f},italic_A under¯ start_ARG italic_u end_ARG = under¯ start_ARG italic_f end_ARG , (2.5)

where [A]ij=a(ϕi,ϕj)subscriptdelimited-[]𝐴𝑖𝑗𝑎subscriptitalic-ϕ𝑖subscriptitalic-ϕ𝑗[A]_{ij}=a(\phi_{i},\phi_{j})[ italic_A ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_a ( italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the stiffness matrix, u¯=(u1,,un)¯𝑢superscriptsubscript𝑢1subscript𝑢𝑛top\underline{u}=(u_{1},\ldots,u_{n})^{\top}under¯ start_ARG italic_u end_ARG = ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the vector of coefficients, and f¯=(L(ϕ1),,L(ϕn))¯𝑓superscript𝐿subscriptitalic-ϕ1𝐿subscriptitalic-ϕ𝑛top\underline{f}=(L(\phi_{1}),\ldots,L(\phi_{n}))^{\top}under¯ start_ARG italic_f end_ARG = ( italic_L ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_L ( italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the load vector.

We recall the standard construction of the basis {ϕj}subscriptitalic-ϕ𝑗\{\phi_{j}\}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } [26]. The basis is defined in terms of shape functions {ϕ^j}subscript^italic-ϕ𝑗\{\hat{\phi}_{j}\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } on K^^𝐾\hat{K}over^ start_ARG italic_K end_ARG. Given shape functions {ϕ^j1D}j=0psuperscriptsubscriptsubscriptsuperscript^italic-ϕ1D𝑗𝑗0𝑝\{\hat{\phi}^{\mathrm{1D}}_{j}\}_{j=0}^{p}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUPERSCRIPT 1 roman_D end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT for Pp(^)subscriptP𝑝^\mathrm{P}_{p}(\hat{\mathcal{I}})roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ), a tensor product basis {ϕ^j}subscript^italic-ϕ𝑗\{\hat{\phi}_{j}\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } for Qp(K^)subscriptQ𝑝^𝐾\mathrm{Q}_{p}(\hat{K})roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_K end_ARG ) can be constructed as

ϕ^j(𝐱^)=k=1dϕ^jk1D(x^k),subscript^italic-ϕ𝑗^𝐱superscriptsubscriptproduct𝑘1𝑑subscriptsuperscript^italic-ϕ1Dsubscript𝑗𝑘subscript^𝑥𝑘\hat{\phi}_{j}(\hat{\mathbf{x}})=\prod_{k=1}^{d}\hat{\phi}^{\mathrm{1D}}_{j_{k% }}(\hat{x}_{k}),over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG ) = ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUPERSCRIPT 1 roman_D end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (2.6)

where we have expanded j=(j1,,jd)[0,p]d𝑗subscript𝑗1subscript𝑗𝑑superscript0𝑝𝑑j=(j_{1},\ldots,j_{d})\in[0,p]^{d}italic_j = ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_j start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ∈ [ 0 , italic_p ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as a multi-index.

The interval shape functions are decomposed into interface and interior modes. The interface modes have non-zero support on either endpoint of ^^\hat{\mathcal{I}}over^ start_ARG caligraphic_I end_ARG, while the interior modes vanish at the boundary of ^^\hat{\mathcal{I}}over^ start_ARG caligraphic_I end_ARG. In multiple dimensions, the shape functions decompose into interior, facet, edge, and vertex modes, depending on how many 1D interface functions are multiplied together. To generate a C0superscript𝐶0C^{0}italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT basis, we simply match the shape of individual interface modes. Hence, A𝐴Aitalic_A will be block sparse, since [A]ij=0subscriptdelimited-[]𝐴𝑖𝑗0[A]_{ij}=0[ italic_A ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 when i𝑖iitalic_i and j𝑗jitalic_j correspond to interior modes supported on different cells.

For the interval shape functions, one standard choice is the set of Lagrange polynomials on the Gauß–Lobatto–Legendre (GLL) nodes {ξ^i}i=0p[1,1]superscriptsubscriptsubscript^𝜉𝑖𝑖0𝑝11\{\hat{\xi}_{i}\}_{i=0}^{p}\subset[-1,1]{ over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ⊂ [ - 1 , 1 ]. These nodes are the roots of (1ξ^2)Pp(ξ^)1superscript^𝜉2subscriptsuperscript𝑃𝑝^𝜉(1-\hat{\xi}^{2})P^{\prime}_{p}(\hat{\xi})( 1 - over^ start_ARG italic_ξ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_ξ end_ARG ), where Pk(ξ^)subscript𝑃𝑘^𝜉P_{k}(\hat{\xi})italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over^ start_ARG italic_ξ end_ARG ) is the Legendre polynomial of degree k𝑘kitalic_k. The Lagrange polynomials {j(x^)}subscript𝑗^𝑥\{\ell_{j}(\hat{x})\}{ roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) } satisfy j(ξ^i)=δijsubscript𝑗subscript^𝜉𝑖subscript𝛿𝑖𝑗\ell_{j}(\hat{\xi}_{i})=\delta_{ij}roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT by construction,

j(x^)=k=0,kjpx^ξ^kξ^jξ^k,j=0,,p.formulae-sequencesubscript𝑗^𝑥superscriptsubscriptproductformulae-sequence𝑘0𝑘𝑗𝑝^𝑥subscript^𝜉𝑘subscript^𝜉𝑗subscript^𝜉𝑘𝑗0𝑝\ell_{j}(\hat{x})=\prod_{k=0,k\neq j}^{p}\frac{\hat{x}-\hat{\xi}_{k}}{\hat{\xi% }_{j}-\hat{\xi}_{k}},\quad j=0,\ldots,p.roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) = ∏ start_POSTSUBSCRIPT italic_k = 0 , italic_k ≠ italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT divide start_ARG over^ start_ARG italic_x end_ARG - over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , italic_j = 0 , … , italic_p . (2.7)

Another useful basis is formed by the hierarchical Lobatto shape functions {lj}subscript𝑙𝑗\{l_{j}\}{ italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, which are constructed by augmenting the so-called bubble functions (integrated Legendre polynomials) with linear Lagrange functions,

lj(x^)={(1x^)/2for j=0,(1+x^)/2for j=p,1x^Pj(z)dzfor j=1,,p1.subscript𝑙𝑗^𝑥cases1^𝑥2for 𝑗01^𝑥2for 𝑗𝑝superscriptsubscript1^𝑥subscript𝑃𝑗𝑧differential-d𝑧for 𝑗1𝑝1l_{j}(\hat{x})=\begin{cases}(1-\hat{x})/2&\mbox{for }j=0,\\ (1+\hat{x})/2&\mbox{for }j=p,\\ \int_{-1}^{\hat{x}}P_{j}(z)\leavevmode\nobreak\ \mathrm{d}z&\mbox{for }j=1,% \ldots,p-1.\end{cases}italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) = { start_ROW start_CELL ( 1 - over^ start_ARG italic_x end_ARG ) / 2 end_CELL start_CELL for italic_j = 0 , end_CELL end_ROW start_ROW start_CELL ( 1 + over^ start_ARG italic_x end_ARG ) / 2 end_CELL start_CELL for italic_j = italic_p , end_CELL end_ROW start_ROW start_CELL ∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over^ start_ARG italic_x end_ARG end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z ) roman_d italic_z end_CELL start_CELL for italic_j = 1 , … , italic_p - 1 . end_CELL end_ROW (2.8)

These two choices of shape functions are plotted in Figure 3.

The assembly of the stiffness matrix A𝐴Aitalic_A is described as follows. On each cell K𝒯h𝐾subscript𝒯K\in\mathcal{T}_{h}italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT we define the cell stiffness matrix AK(p+1)d×(p+1)dsuperscript𝐴𝐾superscriptsuperscript𝑝1𝑑superscript𝑝1𝑑A^{K}\in\mathbb{R}^{(p+1)^{d}\times(p+1)^{d}}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p + 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × ( italic_p + 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT in terms of the basis functions {ϕjK}subscriptsuperscriptitalic-ϕ𝐾𝑗\{\phi^{K}_{j}\}{ italic_ϕ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } that are supported on K𝐾Kitalic_K, which are obtained from the reference shape functions {ϕ^j}subscript^italic-ϕ𝑗\{\hat{\phi}_{j}\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } via ϕjK=ϕ^jFK1subscriptsuperscriptitalic-ϕ𝐾𝑗subscript^italic-ϕ𝑗superscriptsubscript𝐹𝐾1\phi^{K}_{j}=\hat{\phi}_{j}\circ F_{K}^{-1}italic_ϕ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Then, the cell stiffness matrices are

[AK]ij=KϕiKϕjKd𝐱.subscriptdelimited-[]superscript𝐴𝐾𝑖𝑗subscript𝐾subscriptsuperscriptitalic-ϕ𝐾𝑖subscriptsuperscriptitalic-ϕ𝐾𝑗d𝐱[A^{K}]_{ij}=\int_{K}\nabla\phi^{K}_{i}\cdot\nabla\phi^{K}_{j}\leavevmode% \nobreak\ \mathrm{d}\mathbf{x}.[ italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∇ italic_ϕ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ∇ italic_ϕ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_d bold_x . (2.9)

The global stiffness matrix is then assembled via direct stiffness summation:

A=K𝒯hRKAKRK,𝐴subscript𝐾subscript𝒯superscriptsubscript𝑅𝐾topsuperscript𝐴𝐾subscript𝑅𝐾A=\sum_{K\in\mathcal{T}_{h}}R_{K}^{\top}A^{K}R_{K},italic_A = ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , (2.10)

where RK(p+1)d×nsubscript𝑅𝐾superscriptsuperscript𝑝1𝑑𝑛R_{K}\in\mathbb{R}^{(p+1)^{d}\times n}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p + 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × italic_n end_POSTSUPERSCRIPT is the Boolean restriction matrix from the global DOFs to those local to cell K𝐾Kitalic_K.

Refer to caption
(a) GLL, j(x^)subscript𝑗^𝑥\ell_{j}(\hat{x})roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG )
Refer to caption
(b) GLL, B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG
Refer to caption
(c) GLL, A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG
Refer to caption
(d) Hierarchical, lj(x^)subscript𝑙𝑗^𝑥l_{j}(\hat{x})italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG )
Refer to caption
(e) Hierarchical, B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG
Refer to caption
(f) Hierarchical, A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG
Figure 3: Plots of the interval shape functions (p=4𝑝4p=4italic_p = 4) and non-zero structure of the mass and stiffness matrices on the reference interval (p=7𝑝7p=7italic_p = 7).

2.2 Tensor product structure on Cartesian cells

If d=1𝑑1d=1italic_d = 1 and FKsubscript𝐹𝐾F_{K}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is an affine mapping, then the cell stiffness matrices are

[AK]ij=1LK^ϕ^iϕ^jdx^=1LK[A^]ij.subscriptdelimited-[]superscript𝐴𝐾𝑖𝑗1superscript𝐿𝐾subscript^subscriptsuperscript^italic-ϕ𝑖subscriptsuperscript^italic-ϕ𝑗differential-d^𝑥1superscript𝐿𝐾subscriptdelimited-[]^𝐴𝑖𝑗[A^{K}]_{ij}=\frac{1}{L^{K}}\int_{\hat{\mathcal{I}}}\hat{\phi}^{\prime}_{i}% \hat{\phi}^{\prime}_{j}\leavevmode\nobreak\ \mathrm{d}\hat{x}=\frac{1}{L^{K}}[% \hat{A}]_{ij}.[ italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT end_ARG ∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_I end_ARG end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_d over^ start_ARG italic_x end_ARG = divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT end_ARG [ over^ start_ARG italic_A end_ARG ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT . (2.11)

Here A^(p+1)×(p+1)^𝐴superscript𝑝1𝑝1\hat{A}\in\mathbb{R}^{(p+1)\times(p+1)}over^ start_ARG italic_A end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p + 1 ) × ( italic_p + 1 ) end_POSTSUPERSCRIPT is the stiffness matrix on the reference interval, and LK=|K|/|^|superscript𝐿𝐾𝐾^L^{K}=\absolutevalue{K}/|\hat{\mathcal{I}}|italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = | start_ARG italic_K end_ARG | / | over^ start_ARG caligraphic_I end_ARG |, where |K|𝐾\absolutevalue{K}| start_ARG italic_K end_ARG | denotes the measure of the cell K𝐾Kitalic_K.

For d={2,3}𝑑23d=\{2,3\}italic_d = { 2 , 3 }, we will first consider the case where ΩΩ\Omegaroman_Ω can be tessellated with a mesh 𝒯hsubscript𝒯\mathcal{T}_{h}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT consisting of Cartesian cells, i.e. the cells are rectangular quadrilaterals or hexahedra (all internal angles are right angles). In this idealized setting, the cell stiffness matrices are separable in the reference coordinates

AK={μ1KB^A^+μ2KA^B^if d=2,μ1KB^B^A^+μ2KB^A^B^+μ3KA^B^B^if d=3,superscript𝐴𝐾casestensor-productsubscriptsuperscript𝜇𝐾1^𝐵^𝐴tensor-productsubscriptsuperscript𝜇𝐾2^𝐴^𝐵if 𝑑2tensor-productsubscriptsuperscript𝜇𝐾1^𝐵^𝐵^𝐴tensor-productsubscriptsuperscript𝜇𝐾2^𝐵^𝐴^𝐵tensor-productsubscriptsuperscript𝜇𝐾3^𝐴^𝐵^𝐵if 𝑑3A^{K}=\begin{cases}\mu^{K}_{1}\hat{B}\otimes\hat{A}+\mu^{K}_{2}\hat{A}\otimes% \hat{B}&\mbox{if\leavevmode\nobreak\ }d=2,\\ \mu^{K}_{1}\hat{B}\otimes\hat{B}\otimes\hat{A}+\mu^{K}_{2}\hat{B}\otimes\hat{A% }\otimes\hat{B}+\mu^{K}_{3}\hat{A}\otimes\hat{B}\otimes\hat{B}&\mbox{if% \leavevmode\nobreak\ }d=3,\end{cases}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_A end_ARG + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG italic_A end_ARG ⊗ over^ start_ARG italic_B end_ARG end_CELL start_CELL if italic_d = 2 , end_CELL end_ROW start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_A end_ARG + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_A end_ARG ⊗ over^ start_ARG italic_B end_ARG + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT over^ start_ARG italic_A end_ARG ⊗ over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_B end_ARG end_CELL start_CELL if italic_d = 3 , end_CELL end_ROW (2.12)

where

[B^]ij=^ϕ^iϕ^jdx^,subscriptdelimited-[]^𝐵𝑖𝑗subscript^subscript^italic-ϕ𝑖subscript^italic-ϕ𝑗differential-d^𝑥[\hat{B}]_{ij}=\int_{\hat{\mathcal{I}}}\hat{\phi}_{i}\leavevmode\nobreak\ \hat% {\phi}_{j}\leavevmode\nobreak\ \mathrm{d}\hat{x},[ over^ start_ARG italic_B end_ARG ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_I end_ARG end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_d over^ start_ARG italic_x end_ARG , (2.13)

is the mass matrix on the reference interval, μjK=(LjK)2i=1dLiKsubscriptsuperscript𝜇𝐾𝑗superscriptsubscriptsuperscript𝐿𝐾𝑗2superscriptsubscriptproduct𝑖1𝑑subscriptsuperscript𝐿𝐾𝑖\mu^{K}_{j}=(L^{K}_{j})^{-2}\prod_{i=1}^{d}L^{K}_{i}italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and LjKsubscriptsuperscript𝐿𝐾𝑗L^{K}_{j}italic_L start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the length of K𝐾Kitalic_K along the j𝑗jitalic_j-th axis divided by |^|^|\hat{\mathcal{I}}|| over^ start_ARG caligraphic_I end_ARG |. The symbol tensor-product\otimes denotes the Kronecker product, which for matrices Am×n,Br×sformulae-sequence𝐴superscript𝑚𝑛𝐵superscript𝑟𝑠A\in\mathbb{R}^{m\times n},B\in\mathbb{R}^{r\times s}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT , italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_s end_POSTSUPERSCRIPT, is defined as the block matrix

AB=[a11Ba1nBam1BamnB]rm×sn.tensor-product𝐴𝐵matrixsubscript𝑎11𝐵subscript𝑎1𝑛𝐵subscript𝑎𝑚1𝐵subscript𝑎𝑚𝑛𝐵superscript𝑟𝑚𝑠𝑛A\otimes B=\begin{bmatrix}a_{11}B&\cdots&a_{1n}B\\ \vdots&\ddots&\vdots\\ a_{m1}B&\cdots&a_{mn}B\end{bmatrix}\in\mathbb{R}^{rm\times sn}.italic_A ⊗ italic_B = [ start_ARG start_ROW start_CELL italic_a start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_B end_CELL start_CELL ⋯ end_CELL start_CELL italic_a start_POSTSUBSCRIPT 1 italic_n end_POSTSUBSCRIPT italic_B end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_a start_POSTSUBSCRIPT italic_m 1 end_POSTSUBSCRIPT italic_B end_CELL start_CELL ⋯ end_CELL start_CELL italic_a start_POSTSUBSCRIPT italic_m italic_n end_POSTSUBSCRIPT italic_B end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_r italic_m × italic_s italic_n end_POSTSUPERSCRIPT . (2.14)

It follows that if A𝐴Aitalic_A and B𝐵Bitalic_B are sparse, then ABtensor-product𝐴𝐵A\otimes Bitalic_A ⊗ italic_B is also sparse.

For the GLL basis {j}subscript𝑗\{\ell_{j}\}{ roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, both A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG are dense, but these are sparse in the hierarchical basis {lj}subscript𝑙𝑗\{l_{j}\}{ italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }. This is illustrated in Figure 3. On affine cells, the hierarchical basis yields a sparse stiffness matrix. The bubble functions {lj}j=1p1superscriptsubscriptsubscript𝑙𝑗𝑗1𝑝1\{l_{j}\}_{j=1}^{p-1}{ italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT, satisfy lj(x^)=Pj(x^)subscriptsuperscript𝑙𝑗^𝑥subscript𝑃𝑗^𝑥l^{\prime}_{j}(\hat{x})=P_{j}(\hat{x})italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) = italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ), and due to the orthogonality of the Legendre polynomials, the interior block of A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG is diagonal. The only off-diagonal non-zeros in A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG are due to the coupling between the interface modes l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, lpsubscript𝑙𝑝l_{p}italic_l start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Nevertheless, in order for this sparsity to propagate to higher dimensions, we would additionally wish that B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG is also as sparse as possible. This is not quite the case for {lj}subscript𝑙𝑗\{l_{j}\}{ italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, as B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG has two interior blocks with tri-diagonal structure, in the even-odd decomposition. Therefore, on a typical row, A𝐴Aitalic_A will have the structure of the 5-point stencil for d=2𝑑2d=2italic_d = 2 and that of a 13-point stencil for d=3𝑑3d=3italic_d = 3.

2.3 The fast diagonalization method

Linear systems involving structured matrices such as that defined in (2.12) can be solved efficiently using the fast diagonalization method (FDM) [34]. This method reduces the computation into a sequence of eigenvalue problems on the interval in a similar fashion as the method of separation of variables. It requires a separable PDE and a tensor product basis; therefore it can only be applied on meshes or mesh patches with tensor product structure.

To illustrate the FDM, we consider solving a problem on the interior of a Cartesian cell, AIIKu¯IK=r¯IKsubscriptsuperscript𝐴𝐾𝐼𝐼subscriptsuperscript¯𝑢𝐾𝐼subscriptsuperscript¯𝑟𝐾𝐼A^{K}_{II}\underline{u}^{K}_{I}=\underline{r}^{K}_{I}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT under¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = under¯ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, where AIIK=RIKARIKA^{K}_{II}=R^{K}_{I}AR^{K}_{I}{}^{\top}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT italic_A italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_FLOATSUPERSCRIPT ⊤ end_FLOATSUPERSCRIPT and RIK(p1)d×nsubscriptsuperscript𝑅𝐾𝐼superscriptsuperscript𝑝1𝑑𝑛R^{K}_{I}\in\mathbb{R}^{(p-1)^{d}\times n}italic_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p - 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × italic_n end_POSTSUPERSCRIPT is the Boolean restriction matrix onto the interior DOFs of K𝐾Kitalic_K. We may first solve the generalized eigenvalue problem on the interior of the reference interval

A^IIS^II=B^IIS^IIΛ^II,subscript^𝐴𝐼𝐼subscript^𝑆𝐼𝐼subscript^𝐵𝐼𝐼subscript^𝑆𝐼𝐼subscript^Λ𝐼𝐼\hat{A}_{II}\hat{S}_{II}=\hat{B}_{II}\hat{S}_{II}\hat{\Lambda}_{II},over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT , (2.15)

in conjunction with the normalization condition S^IIB^IIS^II=𝟙superscriptsubscript^𝑆𝐼𝐼topsubscript^𝐵𝐼𝐼subscript^𝑆𝐼𝐼1\hat{S}_{II}^{\top}\hat{B}_{II}\hat{S}_{II}={\mathds{1}}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = blackboard_1. Here A^II,B^II(p1)×(p1)subscript^𝐴𝐼𝐼subscript^𝐵𝐼𝐼superscript𝑝1𝑝1\hat{A}_{II},\hat{B}_{II}\in\mathbb{R}^{(p-1)\times(p-1)}over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT , over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p - 1 ) × ( italic_p - 1 ) end_POSTSUPERSCRIPT are the interior blocks of A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG, respectively, Λ^II(p1)×(p1)subscript^Λ𝐼𝐼superscript𝑝1𝑝1\hat{\Lambda}_{II}\in\mathbb{R}^{(p-1)\times(p-1)}over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p - 1 ) × ( italic_p - 1 ) end_POSTSUPERSCRIPT is the diagonal matrix of eigenvalues, and S^II(p1)×(p1)subscript^𝑆𝐼𝐼superscript𝑝1𝑝1\hat{S}_{II}\in\mathbb{R}^{(p-1)\times(p-1)}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p - 1 ) × ( italic_p - 1 ) end_POSTSUPERSCRIPT is the matrix of eigenvectors. The generalized eigenproblem (2.15) may be equivalently rewritten as

S^IIA^IIS^II=Λ^II,S^IIB^IIS^II=𝟙.formulae-sequencesuperscriptsubscript^𝑆𝐼𝐼topsubscript^𝐴𝐼𝐼subscript^𝑆𝐼𝐼subscript^Λ𝐼𝐼superscriptsubscript^𝑆𝐼𝐼topsubscript^𝐵𝐼𝐼subscript^𝑆𝐼𝐼1\hat{S}_{II}^{\top}\hat{A}_{II}\hat{S}_{II}=\hat{\Lambda}_{II},\quad\hat{S}_{% II}^{\top}\hat{B}_{II}\hat{S}_{II}={\mathds{1}}.over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT , over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = blackboard_1 . (2.16)

The corresponding continuous problem is to find sj(x^)subscript𝑠𝑗^𝑥s_{j}(\hat{x})italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) and λjsubscript𝜆𝑗\lambda_{j}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, j=1,,p1,𝑗1𝑝1j=1,\dots,p-1,italic_j = 1 , … , italic_p - 1 , such that

^si(x^)sj(x^)dx^=λjδij,^si(x^)sj(x^)dx^=δij,sj(±1)=0,formulae-sequencesubscript^subscriptsuperscript𝑠𝑖^𝑥subscriptsuperscript𝑠𝑗^𝑥differential-d^𝑥subscript𝜆𝑗subscript𝛿𝑖𝑗formulae-sequencesubscript^subscript𝑠𝑖^𝑥subscript𝑠𝑗^𝑥differential-d^𝑥subscript𝛿𝑖𝑗subscript𝑠𝑗plus-or-minus10\int_{\hat{\mathcal{I}}}s^{\prime}_{i}(\hat{x})s^{\prime}_{j}(\hat{x})% \leavevmode\nobreak\ \mathrm{d}\hat{x}=\lambda_{j}\delta_{ij},\quad\int_{\hat{% \mathcal{I}}}s_{i}(\hat{x})s_{j}(\hat{x})\leavevmode\nobreak\ \mathrm{d}\hat{x% }=\delta_{ij},\quad s_{j}(\pm 1)=0,∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_I end_ARG end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) roman_d over^ start_ARG italic_x end_ARG = italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , ∫ start_POSTSUBSCRIPT over^ start_ARG caligraphic_I end_ARG end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) roman_d over^ start_ARG italic_x end_ARG = italic_δ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ± 1 ) = 0 , (2.17)

with solution sj(x^)=sin(jπ(1+x^)/2)subscript𝑠𝑗^𝑥𝑗𝜋1^𝑥2s_{j}(\hat{x})=\sin(j\pi(1+\hat{x})/2)italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) = roman_sin ( start_ARG italic_j italic_π ( 1 + over^ start_ARG italic_x end_ARG ) / 2 end_ARG ), λj=j2π2/4subscript𝜆𝑗superscript𝑗2superscript𝜋24\lambda_{j}=j^{2}\pi^{2}/4italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 4. Thus, when A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG are discretized with the GLL basis, S^IIsubscript^𝑆𝐼𝐼\hat{S}_{II}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT approximates a discrete sine transform on the interior GLL nodes.

If AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is given by (2.12), then its inverse has the following diagonal factorization

(AIIK)1={k=1dS^II}(ΛIIK)1{k=1dS^II},superscriptsubscriptsuperscript𝐴𝐾𝐼𝐼1superscriptsubscripttensor-product𝑘1𝑑subscript^𝑆𝐼𝐼superscriptsubscriptsuperscriptΛ𝐾𝐼𝐼1superscriptsubscripttensor-product𝑘1𝑑superscriptsubscript^𝑆𝐼𝐼top\left(A^{K}_{II}\right)^{-1}=\left\{\bigotimes_{k=1}^{d}\hat{S}_{II}\right\}% \left(\Lambda^{K}_{II}\right)^{-1}\left\{\bigotimes_{k=1}^{d}\hat{S}_{II}^{% \top}\right\},( italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = { ⨂ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT } ( roman_Λ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { ⨂ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } , (2.18)

where

ΛIIK={μ1K𝟙Λ^II+μ2KΛ^II𝟙if d=2,μ1K𝟙𝟙Λ^II+μ2K𝟙Λ^II𝟙+μ3KΛ^II𝟙𝟙if d=3.subscriptsuperscriptΛ𝐾𝐼𝐼casestensor-productsubscriptsuperscript𝜇𝐾11subscript^Λ𝐼𝐼tensor-productsubscriptsuperscript𝜇𝐾2subscript^Λ𝐼𝐼1if 𝑑2tensor-productsubscriptsuperscript𝜇𝐾111subscript^Λ𝐼𝐼tensor-productsubscriptsuperscript𝜇𝐾21subscript^Λ𝐼𝐼1tensor-productsubscriptsuperscript𝜇𝐾3subscript^Λ𝐼𝐼11if 𝑑3\Lambda^{K}_{II}=\begin{cases}\mu^{K}_{1}{\mathds{1}}\otimes\hat{\Lambda}_{II}% +\mu^{K}_{2}\hat{\Lambda}_{II}\otimes{\mathds{1}}&\mbox{if\leavevmode\nobreak% \ }d=2,\\ \mu^{K}_{1}{\mathds{1}}\otimes{\mathds{1}}\otimes\hat{\Lambda}_{II}+\mu^{K}_{2% }{\mathds{1}}\otimes\hat{\Lambda}_{II}\otimes{\mathds{1}}+\mu^{K}_{3}\hat{% \Lambda}_{II}\otimes{\mathds{1}}\otimes{\mathds{1}}&\mbox{if\leavevmode% \nobreak\ }d=3.\\ \end{cases}roman_Λ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT = { start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_1 ⊗ over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ⊗ blackboard_1 end_CELL start_CELL if italic_d = 2 , end_CELL end_ROW start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_1 ⊗ blackboard_1 ⊗ over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT blackboard_1 ⊗ over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ⊗ blackboard_1 + italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ⊗ blackboard_1 ⊗ blackboard_1 end_CELL start_CELL if italic_d = 3 . end_CELL end_ROW (2.19)

Therefore, the solution of a system AIIKu¯IK=r¯IKsubscriptsuperscript𝐴𝐾𝐼𝐼subscriptsuperscript¯𝑢𝐾𝐼subscriptsuperscript¯𝑟𝐾𝐼A^{K}_{II}\underline{u}^{K}_{I}=\underline{r}^{K}_{I}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT under¯ start_ARG italic_u end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = under¯ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT can be obtained with 𝒪(pd+1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d+1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) computational work.

The main limitation of this approach is that it does not generalize to terms that contain first derivatives, ruling out the possible extension to advection problems. Mixed first derivative terms are also very common in symmetric coercive problems, for instance, when the cells have non-orthotropic deformations, or for vector-valued operators that mix first derivatives of distinct vector components, such as (𝐮\nabla(\nabla\cdot\mathbf{u}∇ ( ∇ ⋅ bold_u).

2.4 Sparse FDM discretization

Our key idea is to construct a new finite element basis on the interval, inspired by the FDM, which yields a sparse stiffness matrix. To construct this new basis, we solve (2.16) with A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG in the GLL basis. Then, we interpolate the eigenvectors of the FDM with polynomials {s^j}j=1p1Pp(^)superscriptsubscriptsubscript^𝑠𝑗𝑗1𝑝1subscriptP𝑝^\{\hat{s}_{j}\}_{j=1}^{p-1}\subset\mathrm{P}_{p}(\hat{\mathcal{I}}){ over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ⊂ roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) that satisfy s^j(±1)=0subscript^𝑠𝑗plus-or-minus10\hat{s}_{j}(\pm 1)=0over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ± 1 ) = 0, s^j(ξ^i)=[S^II]ijsubscript^𝑠𝑗subscript^𝜉𝑖subscriptdelimited-[]subscript^𝑆𝐼𝐼𝑖𝑗\hat{s}_{j}(\hat{\xi}_{i})=[\hat{S}_{II}]_{ij}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_ξ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [ over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for i,j[1,p1]𝑖𝑗1𝑝1i,j\in[1,p-1]italic_i , italic_j ∈ [ 1 , italic_p - 1 ]. The unisolvent dual basis of our proposed FDM discretization of Pp(^)subscriptP𝑝^\mathrm{P}_{p}(\hat{\mathcal{I}})roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) consists of point evaluation at the vertices and integral moments against the orthogonal interior shape functions s^jsubscript^𝑠𝑗\hat{s}_{j}over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, j=1,,p1𝑗1𝑝1j=1,\ldots,p-1italic_j = 1 , … , italic_p - 1. This construction ensures that A^IIsubscript^𝐴𝐼𝐼\hat{A}_{II}over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT and B^IIsubscript^𝐵𝐼𝐼\hat{B}_{II}over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT become diagonal under this basis. As an additional consequence, the 1D interface shape functions are also orthogonal to the interior ones, but not to each other. Hence B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG becomes as sparse as possible in this new FDM basis. Figure 4 shows the shape functions and the non-zero structure of A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG for the FDM basis.

Refer to caption
(a) FDM, s^j(x^)subscript^𝑠𝑗^𝑥\hat{s}_{j}(\hat{x})over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG )
Refer to caption
(b) FDM, B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG
Refer to caption
(c) FDM, A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG
Figure 4: Plots of the interval FDM shape functions (p=4𝑝4p=4italic_p = 4) and non-zero structure of the mass and stiffness matrices in the FDM basis on the reference interval (p=7𝑝7p=7italic_p = 7).

The stiffness matrix for a Cartesian cell AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT in this FDM basis is also given by (2.12), and has structured sparsity, since ABtensor-product𝐴𝐵A\otimes Bitalic_A ⊗ italic_B is sparse when A𝐴Aitalic_A and B𝐵Bitalic_B are sparse. Consequently, the global matrix A𝐴Aitalic_A is also sparse. We can thus apply sparse direct factorization methods or other preconditioners such as the additive Schwarz method.

For problems that are separable in Cartesian coordinates, we obtain a solver that is reminiscent of the FDM. The multiplication times the Kronecker product of the matrices of eigenvectors is incorporated in the definition of the FDM shape functions. If for some reason the problem needs to be solved with any other basis, one can transform the linear system back and forth to the FDM basis via interpolation and restriction at each application of the method. We also replace the diagonal matrix of the traditional FDM with a sparse matrix, which can be assembled even on 3D meshes with a very large number of cells and for very high degree. This approach removes the requirement of a global tensor product grid while exploiting the separability of the PDE and the local structure of Qp(Ω)subscriptQ𝑝Ω\mathrm{Q}_{p}(\Omega)roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω ), at the expense of replacing the diagonal factor by a sparse matrix.

2.5 Hybrid p𝑝pitalic_p-multigrid/Schwarz method

The solver of Pavarino is fully additive, across both the coarse grid and the vertex-star patches. In our work we consider a small variation of this, with the solver multiplicative between the two levels while remaining additive among the vertex-star patches. This improves the convergence at essentially no cost. The method can be interpreted as a hybrid multiplicative two-level V(1,1)V11\mathrm{V}(1,1)roman_V ( 1 , 1 )-cycle with the additive Schwarz method [18] with vertex-star patches as the fine grid relaxation and the lowest-order discretization on the same mesh as the coarse space. The sparse matrix for the coarse space may be assembled and factorized, or other preconditioners such as geometric or algebraic multigrid may be applied instead. The vertex-star patch Vjsubscript𝑉𝑗V_{j}italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT includes the degrees of freedom associated with vertex 𝐯jsubscript𝐯𝑗\mathbf{v}_{j}bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of 𝒯hsubscript𝒯\mathcal{T}_{h}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and all cells, facets, and edges adjacent to 𝐯jsubscript𝐯𝑗\mathbf{v}_{j}bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (the topological entities in the star of the vertex, a standard concept in algebraic topology [38, §2]).

We may write the multigrid relaxation as

PASM1=j=1JR¯jAj1R¯j,superscriptsubscript𝑃ASM1superscriptsubscript𝑗1𝐽superscriptsubscript¯𝑅𝑗topsuperscriptsubscript𝐴𝑗1subscript¯𝑅𝑗P_{\mathrm{ASM}}^{-1}=\sum_{j=1}^{J}\bar{R}_{j}^{\top}A_{j}^{-1}\bar{R}_{j},italic_P start_POSTSUBSCRIPT roman_ASM end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT over¯ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (2.20)

with R¯jsubscript¯𝑅𝑗\bar{R}_{j}over¯ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT the Boolean restriction matrix onto Vjsubscript𝑉𝑗V_{j}italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and Aj=R¯jAR¯jsubscript𝐴𝑗subscript¯𝑅𝑗𝐴superscriptsubscript¯𝑅𝑗topA_{j}=\bar{R}_{j}A\bar{R}_{j}^{\top}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = over¯ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_A over¯ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT are the sparse patch matrices for which we may explicitly compute a Cholesky decomposition. The relaxation is scaled by the damping coefficient

ω=2[(1+α)λmax+(1α)λmin]1,𝜔2superscriptdelimited-[]1𝛼subscript𝜆max1𝛼subscript𝜆min1\omega=2\left[(1+\alpha)\lambda_{\mathrm{max}}+(1-\alpha)\lambda_{\mathrm{min}% }\right]^{-1},italic_ω = 2 [ ( 1 + italic_α ) italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + ( 1 - italic_α ) italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (2.21)

where λmin,λmaxsubscript𝜆subscript𝜆\lambda_{\mathrm{\min}},\lambda_{\mathrm{\max}}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT are the extremal eigenvalues of PASM1Asuperscriptsubscript𝑃ASM1𝐴P_{\mathrm{ASM}}^{-1}Aitalic_P start_POSTSUBSCRIPT roman_ASM end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A estimated via the CG-Lanczos procedure [30], and α=0.25𝛼0.25\alpha=0.25italic_α = 0.25 is chosen to tackle the high frequency error, also ensuring that the error iteration matrix 𝟙ωPASM1A1𝜔superscriptsubscript𝑃ASM1𝐴{\mathds{1}}-\omega P_{\mathrm{ASM}}^{-1}Ablackboard_1 - italic_ω italic_P start_POSTSUBSCRIPT roman_ASM end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A is contractive.

To illustrate the direct solver on the Cartesian vertex-star patch shown in Figure 1a, we show in Figure 5 the non-zero structure for the patch matrix Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and its Cholesky factor. The sparsity pattern of the global matrix A𝐴Aitalic_A connects the interior DOFs to their projections onto the facets, hence a typical interior row of A𝐴Aitalic_A will have 2d+12𝑑12d+12 italic_d + 1 non-zeros. For the patch matrix Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, an interior row will only have d+1𝑑1d+1italic_d + 1 non-zeros, as the patch only includes one facet per dimension on each cell. Moreover, the total number of non-zeros of Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the same as that of a low-order finite difference or finite element discretization with the 5-point or 7-point stencil on the same grid.

Refer to caption
(a) Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, d=2𝑑2d=2italic_d = 2
Refer to caption
(b) chol(Aj)cholsubscript𝐴𝑗\mathrm{chol}(A_{j})roman_chol ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), d=2𝑑2d=2italic_d = 2
Refer to caption
(c) Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, d=3𝑑3d=3italic_d = 3
Refer to caption
(d) chol(Aj)cholsubscript𝐴𝑗\mathrm{chol}(A_{j})roman_chol ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), d=3𝑑3d=3italic_d = 3
Figure 5: Non-zero structure of the stiffness matrix in the FDM basis Aj=LjLjsubscript𝐴𝑗subscript𝐿𝑗superscriptsubscript𝐿𝑗topA_{j}=L_{j}L_{j}^{\top}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and its upper Cholesky factor Ljsuperscriptsubscript𝐿𝑗topL_{j}^{\top}italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT for the Poisson problem on a Cartesian vertex-star patch with p=4𝑝4p=4italic_p = 4. With the nested dissection reordering, the factor matrix has minimal fill-in, occurring only on the bottom-right block.

2.6 Computational complexity

Here we discuss the computational cost of the solution of the patch problem using the Cholesky factorization. Once the factorization has been computed, it may be applied in 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) cost, which is optimal for d{2,3}𝑑23d\in\{2,3\}italic_d ∈ { 2 , 3 }. Unfortunately, the factorization phase is suboptimal, requiring 𝒪(p3(d1))𝒪superscript𝑝3𝑑1\mathcal{O}(p^{3(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) operations to compute. The memory required to store the Cholesky factor is 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ), which for d=3𝑑3d=3italic_d = 3 is one factor of p𝑝pitalic_p higher than that required to store the solution.

Consider a stiffness matrix A𝐴Aitalic_A discretized with the FDM basis on any mesh with all cells Cartesian. The number of floating point operations (flops) needed to solve a linear system using a sparse Cholesky factorization A=LL𝐴𝐿superscript𝐿topA=LL^{\top}italic_A = italic_L italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is roughly four times the number of non-zero entries in L𝐿Litalic_L [14]. To maximize the sparsity in L𝐿Litalic_L, it is crucial to reorder the DOFs, such that interior DOFs are followed by the interface DOFs. This ensures that the fill-in is introduced only on the bottom-left block. To analyze the cost of factorization, we first introduce the block LDL𝐿𝐷superscript𝐿topLDL^{\top}italic_L italic_D italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT decomposition

A=[AIIAIΓAΓIAΓΓ]=[𝟙0AΓIAII1𝟙][AII00SΓ][𝟙AII1AIΓ0𝟙],𝐴matrixsubscript𝐴𝐼𝐼subscript𝐴𝐼Γsubscript𝐴Γ𝐼subscript𝐴ΓΓmatrix10subscript𝐴Γ𝐼superscriptsubscript𝐴𝐼𝐼11matrixsubscript𝐴𝐼𝐼00subscript𝑆Γmatrix1superscriptsubscript𝐴𝐼𝐼1subscript𝐴𝐼Γ01A=\begin{bmatrix}A_{II}&A_{I\Gamma}\\ A_{\Gamma I}&A_{\Gamma\Gamma}\end{bmatrix}=\begin{bmatrix}{\mathds{1}}&0\\ A_{\Gamma I}A_{II}^{-1}&{\mathds{1}}\end{bmatrix}\begin{bmatrix}A_{II}&0\\ 0&S_{\Gamma}\end{bmatrix}\begin{bmatrix}{\mathds{1}}&A_{II}^{-1}A_{I\Gamma}\\ 0&{\mathds{1}}\end{bmatrix},italic_A = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_I roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT roman_Γ roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL blackboard_1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_CELL start_CELL blackboard_1 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_S start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL blackboard_1 end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_I roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL blackboard_1 end_CELL end_ROW end_ARG ] , (2.22)

where SΓ=AΓΓAΓIAII1AIΓsubscript𝑆Γsubscript𝐴ΓΓsubscript𝐴Γ𝐼superscriptsubscript𝐴𝐼𝐼1subscript𝐴𝐼ΓS_{\Gamma}=A_{\Gamma\Gamma}-A_{\Gamma I}A_{II}^{-1}A_{I\Gamma}italic_S start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT roman_Γ roman_Γ end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_I roman_Γ end_POSTSUBSCRIPT is the interface Schur complement. By construction, the top-left block AIIsubscript𝐴𝐼𝐼A_{II}italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT is diagonal with positive entries, with Cholesky factor AII1/2superscriptsubscript𝐴𝐼𝐼12A_{II}^{1/2}italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT. If we decompose AIIsubscript𝐴𝐼𝐼A_{II}italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT and SΓsubscript𝑆ΓS_{\Gamma}italic_S start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT in the second matrix on the RHS of (2.22) into their Cholesky factors, and distribute each factor onto the other two matrices, we obtain the Cholesky decomposition of A𝐴Aitalic_A:

A=LL=[AII1/20AΓIAII1/2LΓ][AII1/2AII1/2AIΓ0LΓ],𝐴𝐿superscript𝐿topmatrixsuperscriptsubscript𝐴𝐼𝐼120subscript𝐴Γ𝐼superscriptsubscript𝐴𝐼𝐼12subscript𝐿Γmatrixsuperscriptsubscript𝐴𝐼𝐼12superscriptsubscript𝐴𝐼𝐼12subscript𝐴𝐼Γ0superscriptsubscript𝐿ΓtopA=LL^{\top}=\begin{bmatrix}A_{II}^{1/2}&0\\ A_{\Gamma I}A_{II}^{-1/2}&L_{\Gamma}\end{bmatrix}\begin{bmatrix}A_{II}^{1/2}&A% _{II}^{-1/2}A_{I\Gamma}\\ 0&L_{\Gamma}^{\top}\end{bmatrix},italic_A = italic_L italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_I roman_Γ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] , (2.23)

where the Schur complement is factorized as SΓ=LΓLΓsubscript𝑆Γsubscript𝐿Γsuperscriptsubscript𝐿ΓtopS_{\Gamma}=L_{\Gamma}L_{\Gamma}^{\top}italic_S start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Since AIIsubscript𝐴𝐼𝐼A_{II}italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT is diagonal, the off-diagonal block AΓIAII1/2subscript𝐴Γ𝐼superscriptsubscript𝐴𝐼𝐼12A_{\Gamma I}A_{II}^{-1/2}italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_I italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT will preserve the non-zero structure of AΓIsubscript𝐴Γ𝐼A_{\Gamma I}italic_A start_POSTSUBSCRIPT roman_Γ italic_I end_POSTSUBSCRIPT, and similarly for its transpose. Thus, fill-in is only introduced on the interface block through LΓsubscript𝐿ΓL_{\Gamma}italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT.

An ordering strategy that minimizes fill-in consists of applying nested dissection [21] on the adjacency graph that connects topological entities. Each node in this graph represents a cell, face, edge or vertex. The ordering of the entities is then used to permute the corresponding blocks in A𝐴Aitalic_A.

Assuming for the worst case that LΓsuperscriptsubscript𝐿ΓtopL_{\Gamma}^{\top}italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is dense, the memory required to store the Cholesky factor is 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ). This represents a significant increase from the traditional FDM, which is kept at the optimal 𝒪(pd)𝒪superscript𝑝𝑑\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). However, the DOF ordering does lead to some structured sparsity in LΓsuperscriptsubscript𝐿ΓtopL_{\Gamma}^{\top}italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, as can be seen in Figure 5. Nevertheless, we still observe dense 𝒪(pd1)×𝒪(pd1)𝒪superscript𝑝𝑑1𝒪superscript𝑝𝑑1\mathcal{O}(p^{d-1})\times\mathcal{O}(p^{d-1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ) × caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ) blocks. The fact that LΓsuperscriptsubscript𝐿ΓtopL_{\Gamma}^{\top}italic_L start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT contains these dense blocks indicates that 𝒪(p3(d1))𝒪superscript𝑝3𝑑1\mathcal{O}(p^{3(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) operations are required in the factorization phase. However, the forward and back-substitution steps have a computational cost of 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) operations, which is optimal for d3𝑑3d\leq 3italic_d ≤ 3.

Compared to the FEM-SEM approach, our approach with the FDM basis has a sparser Cholesky factorization. In Figure 6 we present the ratio of the number of non-zeros in the Cholesky factors of our approach and the FEM-SEM preconditioner ordered with nested dissection for d{2,3}𝑑23d\in\{2,3\}italic_d ∈ { 2 , 3 }. The fact that the ratio is always below 1 confirms that our approach is sparser, with a substantial gain at higher degrees. Practical FEM-SEM solvers use AMG [9] or patchwise multigrid with ILU smoothers [44] to avoid the cost of the Cholesky factorizations of the patch matrices.

3333 7777 15151515 31313131 000.20.20.20.20.40.40.40.40.60.60.60.60.80.80.80.81111p𝑝pitalic_pRelative non-zeros
Figure 6: Relative number of non-zero entries in the Cholesky factors of the stiffness matrix with the FDM approach and the FEM-SEM preconditioner, for a Cartesian vertex-star patch of 2dsuperscript2𝑑2^{d}2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT cells. The FDM approach is sparser, with substantial gains at higher degrees.

2.7 Extension to non-Cartesian cells

For arbitrarily deformed cells, the local stiffness matrices AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT cannot be expressed in terms of tensor products of A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG and B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG as in (2.12), and AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is not sparse in the FDM basis. The preconditioning techniques found in [16, 20, 55] introduce an auxiliary Cartesian domain to construct a separable problem for which the FDM is a direct solver. The method described by Fischer [20] constructs a preconditioner by replacing K𝐾Kitalic_K with its nearest rectangular approximation, whose dimensions are computed as the mean separation between the mapped GLL nodes from opposite facets of K𝐾Kitalic_K. Witte et al. [55] obtain the lengths from the average arclength of opposite sides of K𝐾Kitalic_K, but it is not clear how this extends to the 3D case. To the best of our knowledge, no theory underpins these choices.

Our approach to construct the separable surrogate is based on the theory of equivalent operator preconditioning [6]. We work with the bilinear form a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) in terms of the reference coordinates. We discard the mixed derivative terms that prevent separability, and we replace the coefficients with piecewise constants in the reference coordinates111Recall that piecewise constant coefficients in the physical coordinates will not yield piecewise constant coefficients in the reference coordinates.. We will prove that this choice yields a spectrally equivalent operator.

The bilinear form a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) can be expressed as a sum of cell contributions aK(,)subscript𝑎𝐾a_{K}(\cdot,\cdot)italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ⋅ , ⋅ ) where integration and differentiation are with respect to 𝐱^^𝐱\hat{\mathbf{x}}over^ start_ARG bold_x end_ARG. The measure d𝐱d𝐱\mathrm{d}\mathbf{x}roman_d bold_x is replaced by |det(DFK)|d𝐱^Dsubscript𝐹𝐾d^𝐱\absolutevalue{\det(\mathrm{D}F_{K})}\mathrm{d}\hat{\mathbf{x}}| start_ARG roman_det ( start_ARG roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_ARG ) end_ARG | roman_d over^ start_ARG bold_x end_ARG and the gradient is computed via the chain rule, since the arguments of the form become functions of 𝐱^^𝐱\hat{\mathbf{x}}over^ start_ARG bold_x end_ARG after being composed with FKsubscript𝐹𝐾F_{K}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. Hence,

a(v,u)=K𝒯haK(v,u)=K𝒯hK^^vFKG^K^uFKd𝐱^,𝑎𝑣𝑢subscript𝐾subscript𝒯subscript𝑎𝐾𝑣𝑢subscript𝐾subscript𝒯subscript^𝐾^𝑣subscript𝐹𝐾superscript^𝐺𝐾^𝑢subscript𝐹𝐾differential-d^𝐱a(v,u)=\sum_{K\in\mathcal{T}_{h}}a_{K}(v,u)=\sum_{K\in\mathcal{T}_{h}}\int_{% \hat{K}}\hat{\nabla}v\circ F_{K}\cdot\hat{G}^{K}\hat{\nabla}u\circ F_{K}% \leavevmode\nobreak\ \mathrm{d}\hat{\mathbf{x}},italic_a ( italic_v , italic_u ) = ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_v , italic_u ) = ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT over^ start_ARG ∇ end_ARG italic_v ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⋅ over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG ∇ end_ARG italic_u ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_d over^ start_ARG bold_x end_ARG , (2.24)

where ^^\hat{\nabla}over^ start_ARG ∇ end_ARG is the gradient with respect to 𝐱^^𝐱\hat{\mathbf{x}}over^ start_ARG bold_x end_ARG, and G^K:K^d×d:superscript^𝐺𝐾^𝐾superscript𝑑𝑑\hat{G}^{K}:\hat{K}\to\mathbb{R}^{d\times d}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT : over^ start_ARG italic_K end_ARG → blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT is the inverse metric of the coordinate transformation weighted by the Jacobian determinant,

G^K=|det(DFK)|DFK1DFK.superscript^𝐺𝐾Dsubscript𝐹𝐾Dsuperscriptsubscript𝐹𝐾1Dsuperscriptsubscript𝐹𝐾absenttop\hat{G}^{K}=\absolutevalue{\det(\mathrm{D}F_{K})}\mathrm{D}F_{K}^{-1}\mathrm{D% }F_{K}^{-\top}.over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = | start_ARG roman_det ( start_ARG roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_ARG ) end_ARG | roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - ⊤ end_POSTSUPERSCRIPT . (2.25)

This tensor encapsulates all of the geometry-dependent information in the form; it is spatially dependent for generally-deformed elements, and constant in the case of affine transformations. For a separable geometry, G^Ksuperscript^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is diagonal, and thus for a Cartesian cell it is diagonal and constant. To construct an auxiliary problem that is separable by the FDM in the reference coordinates, we replace G^Ksuperscript^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT in aK(,)subscript𝑎𝐾a_{K}(\cdot,\cdot)italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( ⋅ , ⋅ ) with a constant diagonal approximation diag(μjK)diagsubscriptsuperscript𝜇𝐾𝑗\operatorname{diag}(\mu^{K}_{j})roman_diag ( italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Each μjKsubscriptsuperscript𝜇𝐾𝑗\mu^{K}_{j}italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is given by the cell-wise average of the diagonal entry G^jjKsubscriptsuperscript^𝐺𝐾𝑗𝑗\hat{G}^{K}_{jj}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT,

μjK1|K^|K^G^jjKd𝐱^,subscriptsuperscript𝜇𝐾𝑗1^𝐾subscript^𝐾subscriptsuperscript^𝐺𝐾𝑗𝑗differential-d^𝐱\mu^{K}_{j}\coloneqq\frac{1}{|\hat{K}|}\int_{\hat{K}}\hat{G}^{K}_{jj}% \leavevmode\nobreak\ \mathrm{d}\hat{\mathbf{x}},italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≔ divide start_ARG 1 end_ARG start_ARG | over^ start_ARG italic_K end_ARG | end_ARG ∫ start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT roman_d over^ start_ARG bold_x end_ARG , (2.26)

where summation over the index j𝑗jitalic_j is not implied. As the approximation is local to each cell, it is still possible to assemble a sparse stiffness matrix discretizing the auxiliary problem on meshes where cells are not structured in a tensor product grid.

We now establish the spectral equivalence between the original bilinear form and the auxiliary separable one.

Theorem 2.1.

Let μ^Kdiag(μjK)normal-≔subscriptnormal-^𝜇𝐾normal-diagsubscriptsuperscript𝜇𝐾𝑗\hat{\mu}_{K}\coloneqq\operatorname{diag}(\mu^{K}_{j})over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ≔ roman_diag ( italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) be the constant diagonal approximation of G^Ksuperscriptnormal-^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, and define the auxiliary bilinear form

a~(v,u)K𝒯ha~K(v,u)K𝒯hK^^vFKμ^K^uFKd𝐱^.~𝑎𝑣𝑢subscript𝐾subscript𝒯subscript~𝑎𝐾𝑣𝑢subscript𝐾subscript𝒯subscript^𝐾^𝑣subscript𝐹𝐾subscript^𝜇𝐾^𝑢subscript𝐹𝐾differential-d^𝐱\tilde{a}(v,u)\coloneqq\sum_{K\in\mathcal{T}_{h}}\tilde{a}_{K}(v,u)\coloneqq% \sum_{K\in\mathcal{T}_{h}}\int_{\hat{K}}\hat{\nabla}v\circ F_{K}\cdot\hat{\mu}% _{K}\hat{\nabla}u\circ F_{K}\leavevmode\nobreak\ \mathrm{d}\hat{\mathbf{x}}.over~ start_ARG italic_a end_ARG ( italic_v , italic_u ) ≔ ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_v , italic_u ) ≔ ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT over^ start_ARG ∇ end_ARG italic_v ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⋅ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT over^ start_ARG ∇ end_ARG italic_u ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_d over^ start_ARG bold_x end_ARG . (2.27)

Then, there exist p𝑝pitalic_p-independent constants c𝑐citalic_c, C>0𝐶0C>0italic_C > 0 that depend on 𝒯hsubscript𝒯\mathcal{T}_{h}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT through G^Ksuperscriptnormal-^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT such that

ca(v,v)a~(v,v)CvV{0}.formulae-sequence𝑐𝑎𝑣𝑣~𝑎𝑣𝑣𝐶for-all𝑣𝑉0c\leq\frac{a(v,v)}{\tilde{a}(v,v)}\leq C\quad\forall\,v\in V\setminus\{0\}.italic_c ≤ divide start_ARG italic_a ( italic_v , italic_v ) end_ARG start_ARG over~ start_ARG italic_a end_ARG ( italic_v , italic_v ) end_ARG ≤ italic_C ∀ italic_v ∈ italic_V ∖ { 0 } . (2.28)

Proof 2.2.

Let cK,CKsubscript𝑐𝐾subscript𝐶𝐾c_{K},C_{K}italic_c start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT be lower and upper bounds for the spectrum of the diagonally scaled metric, so that σ(μ^K1/2G^Kμ^K1/2)[cK,CK]𝜎superscriptsubscriptnormal-^𝜇𝐾12superscriptnormal-^𝐺𝐾superscriptsubscriptnormal-^𝜇𝐾12subscript𝑐𝐾subscript𝐶𝐾\sigma(\hat{\mu}_{K}^{-1/2}\hat{G}^{K}\hat{\mu}_{K}^{-1/2})\in[c_{K},C_{K}]italic_σ ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) ∈ [ italic_c start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] for all 𝐱^K^normal-^𝐱normal-^𝐾\hat{\mathbf{x}}\in\hat{K}over^ start_ARG bold_x end_ARG ∈ over^ start_ARG italic_K end_ARG. We claim that

cKaK(v,v)a~K(v,v)CKv{vV:v|K0}.formulae-sequencesubscript𝑐𝐾subscript𝑎𝐾𝑣𝑣subscript~𝑎𝐾𝑣𝑣subscript𝐶𝐾for-all𝑣conditional-set𝑣𝑉evaluated-at𝑣𝐾0c_{K}\leq\frac{a_{K}(v,v)}{\tilde{a}_{K}(v,v)}\leq C_{K}\quad\forall\,v\in\{v% \in V:\left.v\right|_{K}\neq 0\}.italic_c start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ≤ divide start_ARG italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_v , italic_v ) end_ARG start_ARG over~ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_v , italic_v ) end_ARG ≤ italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∀ italic_v ∈ { italic_v ∈ italic_V : italic_v | start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ≠ 0 } . (2.29)

This result is obtained by first rewriting aK(v,v)subscript𝑎𝐾𝑣𝑣a_{K}(v,v)italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_v , italic_v ) with μ^K1/2μ^K1/2G^Kμ^K1/2μ^K1/2superscriptsubscriptnormal-^𝜇𝐾12superscriptsubscriptnormal-^𝜇𝐾12superscriptnormal-^𝐺𝐾superscriptsubscriptnormal-^𝜇𝐾12superscriptsubscriptnormal-^𝜇𝐾12\hat{\mu}_{K}^{1/2}\hat{\mu}_{K}^{-1/2}\hat{G}^{K}\hat{\mu}_{K}^{-1/2}\hat{\mu% }_{K}^{1/2}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT instead of G^Ksuperscriptnormal-^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, and then replacing μ^K1/2G^Kμ^K1/2subscriptsuperscriptnormal-^𝜇12𝐾superscriptnormal-^𝐺𝐾subscriptsuperscriptnormal-^𝜇12𝐾\hat{\mu}^{-1/2}_{K}\hat{G}^{K}\hat{\mu}^{-1/2}_{K}over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT with cK𝟙subscript𝑐𝐾1c_{K}{\mathds{1}}italic_c start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT blackboard_1 or CK𝟙subscript𝐶𝐾1C_{K}{\mathds{1}}italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT blackboard_1 to find the lower or upper bounds, respectively. It then follows that

cminK𝒯hcKa(v,v)a~(v,v)maxK𝒯hCKCvV{0}.formulae-sequence𝑐subscript𝐾subscript𝒯subscript𝑐𝐾𝑎𝑣𝑣~𝑎𝑣𝑣subscript𝐾subscript𝒯subscript𝐶𝐾𝐶for-all𝑣𝑉0c\coloneqq\min_{K\in\mathcal{T}_{h}}c_{K}\leq\frac{a(v,v)}{\tilde{a}(v,v)}\leq% \max_{K\in\mathcal{T}_{h}}C_{K}\eqqcolon C\quad\forall\,v\in V\setminus\{0\}.italic_c ≔ roman_min start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ≤ divide start_ARG italic_a ( italic_v , italic_v ) end_ARG start_ARG over~ start_ARG italic_a end_ARG ( italic_v , italic_v ) end_ARG ≤ roman_max start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ≕ italic_C ∀ italic_v ∈ italic_V ∖ { 0 } . (2.30)

Let A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG be the stiffness matrix associated with the auxiliary form a~(,)~𝑎\tilde{a}(\cdot,\cdot)over~ start_ARG italic_a end_ARG ( ⋅ , ⋅ ). By Theorem 2.1, the condition number κ(A~1A)𝜅superscript~𝐴1𝐴\kappa(\tilde{A}^{-1}A)italic_κ ( over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) is bounded by C/c𝐶𝑐C/citalic_C / italic_c independently of p𝑝pitalic_p. Numerical experiments also indicate that κ(A~1A)𝜅superscript~𝐴1𝐴\kappa(\tilde{A}^{-1}A)italic_κ ( over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) is independent of hhitalic_h under uniform refinements. Now consider a preconditioner P𝑃Pitalic_P where the auxiliary form a~(,)~𝑎\tilde{a}(\cdot,\cdot)over~ start_ARG italic_a end_ARG ( ⋅ , ⋅ ) is used additively in both the coarse solve and the vertex-star patches. In this case, Theorem 1 of [42] guarantees that κ(P1A~)𝜅superscript𝑃1~𝐴\kappa(P^{-1}\tilde{A})italic_κ ( italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG ) is bounded from above independently of hhitalic_h and p𝑝pitalic_p. Hence we may conclude that κ(P1A)κ(P1A~)κ(A~1A)𝜅superscript𝑃1𝐴𝜅superscript𝑃1~𝐴𝜅superscript~𝐴1𝐴\kappa(P^{-1}A)\leq\kappa(P^{-1}\tilde{A})\kappa(\tilde{A}^{-1}A)italic_κ ( italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ≤ italic_κ ( italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG ) italic_κ ( over~ start_ARG italic_A end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) is bounded independently of hhitalic_h and p𝑝pitalic_p. In practice, we expect that using multiplicative coarse grid correction with the original form a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) can only improve the preconditioner.

To gain useful insight, we consider the case where d=2𝑑2d=2italic_d = 2 and FKsubscript𝐹𝐾F_{K}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is an affine transformation, that is when K𝐾Kitalic_K is a parallelogram. Without loss of generality, suppose that one of the sides of K𝐾Kitalic_K has length 2L12subscript𝐿12L_{1}2 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and is aligned with the first reference coordinate axis, and the other side of length 2L22subscript𝐿22L_{2}2 italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is at an angle θ𝜃\thetaitalic_θ with respect to the same axis. The Jacobian of the coordinate transformation is

DFK=[L1L2cos(θ)0L2sin(θ)],Dsubscript𝐹𝐾matrixsubscript𝐿1subscript𝐿2𝜃0subscript𝐿2𝜃\mathrm{D}F_{K}=\begin{bmatrix}L_{1}&L_{2}\cos{\theta}\\ 0&L_{2}\sin{\theta}\end{bmatrix},roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_cos ( start_ARG italic_θ end_ARG ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_sin ( start_ARG italic_θ end_ARG ) end_CELL end_ROW end_ARG ] , (2.31)

to which corresponds the Jacobian-weighted inverse metric

G^K=1L1L2|sin(θ)|[L22L1L2cos(θ)L1L2cos(θ)L12].superscript^𝐺𝐾1subscript𝐿1subscript𝐿2𝜃matrixsuperscriptsubscript𝐿22subscript𝐿1subscript𝐿2𝜃subscript𝐿1subscript𝐿2𝜃superscriptsubscript𝐿12\hat{G}^{K}=\frac{1}{L_{1}L_{2}\absolutevalue{\sin{\theta}}}\begin{bmatrix}L_{% 2}^{2}&-L_{1}L_{2}\cos{\theta}\\ -L_{1}L_{2}\cos{\theta}&L_{1}^{2}\end{bmatrix}.over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_ARG roman_sin ( start_ARG italic_θ end_ARG ) end_ARG | end_ARG [ start_ARG start_ROW start_CELL italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL - italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_cos ( start_ARG italic_θ end_ARG ) end_CELL end_ROW start_ROW start_CELL - italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_cos ( start_ARG italic_θ end_ARG ) end_CELL start_CELL italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] . (2.32)

Since G^Ksuperscript^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is constant, μ^Ksubscript^𝜇𝐾\hat{\mu}_{K}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is simply the diagonal part of G^Ksuperscript^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. The spectrum of the diagonally scaled metric will be independent of L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, but still depend on θ𝜃\thetaitalic_θ,

σ(μ^K1/2G^Kμ^K1/2)=[1|cos(θ)|,1+|cos(θ)|].𝜎superscriptsubscript^𝜇𝐾12superscript^𝐺𝐾superscriptsubscript^𝜇𝐾121𝜃1𝜃\sigma\left(\hat{\mu}_{K}^{-1/2}\hat{G}^{K}\hat{\mu}_{K}^{-1/2}\right)=\left[1% -\absolutevalue{\cos{\theta}},1+\absolutevalue{\cos{\theta}}\right].italic_σ ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) = [ 1 - | start_ARG roman_cos ( start_ARG italic_θ end_ARG ) end_ARG | , 1 + | start_ARG roman_cos ( start_ARG italic_θ end_ARG ) end_ARG | ] . (2.33)

This spectrum is desirable because it is centered at 1 and bounded above for all θ𝜃\thetaitalic_θ. If we follow the geometric approaches of [20, 55], we would have to choose a rectangle with side lengths 2L12subscript𝐿12L_{1}2 italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2L22subscript𝐿22L_{2}2 italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as the auxiliary domain for the Poisson problem. Then, the previous bounds (2.33) would become scaled by |sin(θ)|1superscript𝜃1\absolutevalue{\sin{\theta}}^{-1}| start_ARG roman_sin ( start_ARG italic_θ end_ARG ) end_ARG | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. In this case, the spectrum is unbounded from above in the limit θ0𝜃0\theta\to 0italic_θ → 0.

2.8 Numerical experiments

We provide an implementation of the PpsubscriptP𝑝\mathrm{P}_{p}roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT element with the FDM shape functions on the interval in the FIAT [28] package. The extension to quadrilaterals and hexahedra is achieved by taking tensor products of the one-dimensional element with FInAT [23]. Code for the sum-factorized evaluation of the residual is automatically generated by Firedrake [47, 24], implementing a Gauß–Lobatto quadrature rule with 3(p+1)/23𝑝123(p+1)/23 ( italic_p + 1 ) / 2 points along each direction. The sparse preconditioner discretizing the auxiliary form is implemented in firedrake.FDMPC using PETSc [8]. The Cholesky factorization of the patch matrices is computed using CHOLMOD [14]. Most of our computations were performed using an Intel Xeon CPU E5-4627 v2 @@@@ 3.30GHz with 32 cores and 67.6 GB of RAM storage.

The hybrid p𝑝pitalic_p-multigrid/Schwarz solver employing the FDM/sparse relaxation is illustrated in Figure 7. To achieve scalability with respect to the mesh parameter hhitalic_h, on the p𝑝pitalic_p-coarse problem we employ geometric multigrid with damped point-Jacobi relaxation and a Cholesky factorization on the coarsest level using MUMPS [2]. We test the effectiveness of this approach on a hierarchy of meshes obtained by l0𝑙0l\geq 0italic_l ≥ 0 uniform refinements of the base meshes shown in Figure 8.

Krylov solver: PCGHybrid p𝑝pitalic_p-multigrid/Schwarz V-cycleRelaxation: FDM/sparsep𝑝pitalic_p-coarse: geometric multigridRelaxation: point-Jacobihhitalic_h-coarse: Cholesky
Figure 7: Solver diagram for the Poisson problem.

We present results for the Poisson equation in Ω=(0,1)dΩsuperscript01𝑑\Omega=(0,1)^{d}roman_Ω = ( 0 , 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT discretized on the three hierarchies of Cartesian, unstructured, and structured but deformed (Kershaw) [27] meshes. The coordinate field of the Kershaw mesh is in [Q3(Ω)]dC1(Ω)superscriptdelimited-[]subscriptQ3Ω𝑑superscript𝐶1Ω[\mathrm{Q}_{3}(\Omega)]^{d}\cap C^{1}(\Omega)[ roman_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∩ italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω ), with a cell aspect ratio of εy=εz=0.3subscript𝜀𝑦subscript𝜀𝑧0.3\varepsilon_{y}=\varepsilon_{z}=0.3italic_ε start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = italic_ε start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = 0.3 near the corners of the domain. We impose homogeneous Dirichlet BCs on ΓD=ΩsubscriptΓ𝐷Ω\Gamma_{D}=\partial\Omegaroman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = ∂ roman_Ω and a constant forcing f=1𝑓1f=1italic_f = 1. In Table 1 we present PCG iteration counts required to reduce the Euclidean norm of the residual by a factor of 108superscript10810^{8}10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT starting from a zero initial guess. In Table 2 we show the condition number κ(P1A)𝜅superscript𝑃1𝐴\kappa(P^{-1}A)italic_κ ( italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) estimated by CG-Lanczos. The results show almost complete p𝑝pitalic_p- and hhitalic_h-robustness in the Cartesian case, and very slow growth of iteration counts in the unstructured case. Given the lack of shape regularity, the Kershaw mesh is significantly more challenging; even with exact patch solvers, we do not expect hhitalic_h- or p𝑝pitalic_p-robustness.

Refer to caption
(a) Cartesian
Refer to caption
(b) Unstructured
Refer to caption
(c) Kershaw, d=2𝑑2d=2italic_d = 2
Refer to caption
(d) Kershaw, d=3𝑑3d=3italic_d = 3
Figure 8: Base meshes for the Poisson problem. The Cartesian and unstructured base meshes used for d=3𝑑3d=3italic_d = 3 are the extrusion with six layers of the two-dimensional meshes shown here.
Table 1: PCG iteration counts for the hybrid p𝑝pitalic_p-multigrid/Schwarz solver with the FDM/sparse relaxation. The patch problems are solved exactly on the Cartesian mesh.
Cartesian Unstructured Kershaw
d𝑑ditalic_d pl𝑝𝑙p\setminus litalic_p ∖ italic_l 0 1 2 0 1 2 0 1 2
2 3 7 8 9 12 13 14 27 35 54
7 8 8 9 16 16 17 44 56 78
15 8 9 9 19 19 19 58 69 90
31 8 9 9 21 20 21 67 80 97
3 3 12 12 12 17 17 18 54 66 102
7 12 12 12 22 21 21 98 106 158
15 12 13 25 24 131 132
Table 2: Estimated condition numbers for the preconditioned operator κ(P1A)𝜅superscript𝑃1𝐴\kappa(P^{-1}A)italic_κ ( italic_P start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) using the hybrid p𝑝pitalic_p-multigrid/Schwarz solver with the FDM/sparse relaxation.
Cartesian Unstructured Kershaw
d𝑑ditalic_d pl𝑝𝑙p\setminus litalic_p ∖ italic_l 0 1 2 0 1 2 0 1 2
2 3 1.44 1.49 1.50 2.14 2.37 2.81 9.34 15.6 34.5
7 1.48 1.48 1.50 3.23 3.27 3.79 19.6 30.3 57.6
15 1.51 1.51 1.52 4.06 3.78 4.13 30.5 45.8 69.0
31 1.54 1.52 1.52 4.45 4.06 4.36 40.4 57.1 73.3
3 3 2.87 2.49 2.45 4.16 4.21 4.55 34.8 46.1 117
7 2.79 2.70 2.67 5.88 5.54 5.47 100 110 266
15 2.83 2.79 7.12 6.44 165 151

To assess the computational performance of our approach, we solve the three-dimensional Poisson equation on a Cartesian mesh with 3×3×33333\times 3\times 33 × 3 × 3 cells with a single core of an Intel Core i7-10875H CPU @@@@ 2.30GHz. We plot in Figure 9 the runtimes, flop counts, and achieved arithmetic performance for the Cholesky factorization of the patch matrices, the solution of the patch problems using this factorization (per application of the relaxation), and the matrix-free evaluation of the residual (per Krylov iteration, excluding the application of the global to local map) as functions of p𝑝pitalic_p. The dotted lines are to indicate powers of 2p12𝑝12p-12 italic_p - 1, which is the number of DOFs along each side of typical vertex-star patch not intersecting the mesh boundary.

Despite the 𝒪(p3(d1))𝒪superscript𝑝3𝑑1\mathcal{O}(p^{3(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) computational cost of the Cholesky factorization, these results show 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) scaling for runtime up to p=15𝑝15p=15italic_p = 15. This speedup can be explained mainly by data locality. The sparse Cholesky factorization is obtained by recursively applying the block LDL𝐿𝐷superscript𝐿topLDL^{\top}italic_L italic_D italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT decomposition up to the point where the Schur complement is sufficiently dense. The computation of this Schur complement via dense matrix-matrix multiplication (BLAS-3 dgemm) dominates the computational cost. As p𝑝pitalic_p is increased, the utilization of arithmetic units increases in proportion to the dimension of the Schur complement, which explains the 𝒪(pd1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d-1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ) scaling of the achieved arithmetic performance. As the arithmetic capabilities become saturated for p>15𝑝15p>15italic_p > 15, the 𝒪(p3(d1))𝒪superscript𝑝3𝑑1\mathcal{O}(p^{3(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 3 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) scaling in the factorization runtime should become apparent.

Most of the time in the relaxation step is spent in accessing the factor matrix from memory, given the 𝒪(p2(d1))𝒪superscript𝑝2𝑑1\mathcal{O}(p^{2(d-1)})caligraphic_O ( italic_p start_POSTSUPERSCRIPT 2 ( italic_d - 1 ) end_POSTSUPERSCRIPT ) sub-optimal storage per patch. The relaxation is therefore limited by memory bandwidth and not arithmetically intense, which explains the poor arithmetic performance. This is in contrast to the sum-factorized residual evaluation, which has a 𝒪(pd)𝒪superscript𝑝𝑑\mathcal{O}(p^{d})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) memory footprint and presents better arithmetic utilization [29]. Nevertheless, the results indicate that the runtime for the solution of the patch problems with the sparse Cholesky factorization remains very close to that of the matrix-free residual evaluation for moderate p𝑝pitalic_p, being slightly faster for p7𝑝7p\leq 7italic_p ≤ 7, mainly due to lower operation count.

3333 7777 15151515 31313131 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT4161p𝑝pitalic_pRuntime (s)
(a) Runtime
3333 7777 15151515 31313131 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT4161p𝑝pitalic_pGflop
(b) Flop count
3333 7777 15151515 31313131 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT21p𝑝pitalic_pGflop/s
(c) Performance
Figure 9: Runtimes, flop counts, and achieved arithmetic performance for the Cholesky factorization (symbolic and numeric), solution of the patch problems, and residual evaluation for a Cartesian mesh with 3×3×33333\times 3\times 33 × 3 × 3 cells on a single CPU core. We observe that the factorization runtime scales better than expected, close to the optimal 𝒪(pd+1)𝒪superscript𝑝𝑑1\mathcal{O}(p^{d+1})caligraphic_O ( italic_p start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) complexity.

3 Application to linear elasticity problems

3.1 Primal formulation of linear elasticity

We now consider how these ideas may be applied in the more complex setting of a nonseparable, vector-valued PDE. The equations of linear elasticity describe the displacement 𝐮:Ωd:𝐮Ωsuperscript𝑑\mathbf{u}:\Omega\to\mathbb{R}^{d}bold_u : roman_Ω → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT of a solid body with a reference configuration ΩdΩsuperscript𝑑\Omega\subset\mathbb{R}^{d}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The primal formulation is to find 𝐮𝐮0V[H01(Ω)]d𝐮subscript𝐮0𝑉superscriptdelimited-[]subscriptsuperscript𝐻10Ω𝑑\mathbf{u}-\mathbf{u}_{0}\in V\coloneqq[H^{1}_{0}(\Omega)]^{d}bold_u - bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_V ≔ [ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that

a(𝐯,𝐮)=L(𝐯)𝐯V,formulae-sequence𝑎𝐯𝐮𝐿𝐯for-all𝐯𝑉a(\mathbf{v},\mathbf{u})=L(\mathbf{v})\quad\forall\,\mathbf{v}\in V,italic_a ( bold_v , bold_u ) = italic_L ( bold_v ) ∀ bold_v ∈ italic_V , (3.1)

where

a(𝐯,𝐮)=Ω2με(𝐯):ε(𝐮)+λ𝐯𝐮d𝐱,L(𝐯)=Ω𝐯𝐁d𝐱.:𝑎𝐯𝐮subscriptΩ2𝜇𝜀𝐯𝜀𝐮𝜆𝐯𝐮d𝐱𝐿𝐯subscriptΩ𝐯𝐁differential-d𝐱a(\mathbf{v},\mathbf{u})=\int_{\Omega}2\mu\varepsilon(\mathbf{v}):\varepsilon(% \mathbf{u})+\lambda\nabla\cdot\mathbf{v}\nabla\cdot\mathbf{u}\leavevmode% \nobreak\ \mathrm{d}\mathbf{x},\quad L(\mathbf{v})=\int_{\Omega}\mathbf{v}% \cdot\mathbf{B}\leavevmode\nobreak\ \mathrm{d}\mathbf{x}.italic_a ( bold_v , bold_u ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT 2 italic_μ italic_ε ( bold_v ) : italic_ε ( bold_u ) + italic_λ ∇ ⋅ bold_v ∇ ⋅ bold_u roman_d bold_x , italic_L ( bold_v ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT bold_v ⋅ bold_B roman_d bold_x . (3.2)

Here, we assume that the material is homogeneous and isotropic, and can thus be described by the Lamé parameters μ,λ>0𝜇𝜆0\mu,\lambda>0italic_μ , italic_λ > 0; ε(𝐮)=(𝐮+𝐮)/2𝜀𝐮𝐮superscript𝐮top2\varepsilon(\mathbf{u})=(\nabla\mathbf{u}+\nabla\mathbf{u}^{\top})/2italic_ε ( bold_u ) = ( ∇ bold_u + ∇ bold_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) / 2 is the linearized strain tensor; 𝐮0subscript𝐮0\mathbf{u}_{0}bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is Dirichlet data prescribed on ΓDΩsubscriptΓ𝐷Ω\Gamma_{D}\subseteq\partial\Omegaroman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ⊆ ∂ roman_Ω; and 𝐁[L2(Ω)]d𝐁superscriptdelimited-[]superscript𝐿2Ω𝑑\mathbf{B}\in[L^{2}(\Omega)]^{d}bold_B ∈ [ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a body force. The Poisson ratio ν=λ/(2μ+2λ)𝜈𝜆2𝜇2𝜆\nu=\lambda/(2\mu+2\lambda)italic_ν = italic_λ / ( 2 italic_μ + 2 italic_λ ) measures the compressibility of the material. In the incompressible limit λ𝜆\lambda\to\inftyitalic_λ → ∞ (i.e. ν1/2𝜈12\nu\to 1/2italic_ν → 1 / 2), the problem becomes ill-conditioned, as a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) becomes insensitive to divergence-free perturbations in the arguments.

Consider the partitioning of the stiffness matrix A𝐴Aitalic_A into blocks that act on each displacement component,

A=[A11A1dAd1Add].𝐴matrixsubscript𝐴11subscript𝐴1𝑑subscript𝐴𝑑1subscript𝐴𝑑𝑑A=\begin{bmatrix}A_{11}&\cdots&A_{1d}\\ \vdots&\ddots&\vdots\\ A_{d1}&\cdots&A_{dd}\end{bmatrix}.italic_A = [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_A start_POSTSUBSCRIPT 1 italic_d end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_d 1 end_POSTSUBSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_d italic_d end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (3.3)

The diagonal block Ajjsubscript𝐴𝑗𝑗A_{jj}italic_A start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT discretizes the bilinear form

Ωμvjuj+(μ+λ)vjxjujxjd𝐱,subscriptΩ𝜇subscript𝑣𝑗subscript𝑢𝑗𝜇𝜆subscript𝑣𝑗subscript𝑥𝑗subscript𝑢𝑗subscript𝑥𝑗d𝐱\int_{\Omega}\mu\nabla v_{j}\cdot\nabla u_{j}+(\mu+\lambda)\frac{\partial v_{j% }}{\partial x_{j}}\frac{\partial u_{j}}{\partial x_{j}}\leavevmode\nobreak\ % \mathrm{d}\mathbf{x},∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_μ ∇ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ ∇ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ( italic_μ + italic_λ ) divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG roman_d bold_x , (3.4)

where summation is not implied, and ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are components of 𝐮𝐮\mathbf{u}bold_u and 𝐯𝐯\mathbf{v}bold_v, respectively. The off-diagonal blocks Aijsubscript𝐴𝑖𝑗A_{ij}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, ij𝑖𝑗i\neq jitalic_i ≠ italic_j, discretize

Ωμvixjujxi+λvixiujxjd𝐱.subscriptΩ𝜇subscript𝑣𝑖subscript𝑥𝑗subscript𝑢𝑗subscript𝑥𝑖𝜆subscript𝑣𝑖subscript𝑥𝑖subscript𝑢𝑗subscript𝑥𝑗d𝐱\int_{\Omega}\mu\frac{\partial v_{i}}{\partial x_{j}}\frac{\partial u_{j}}{% \partial x_{i}}+\lambda\frac{\partial v_{i}}{\partial x_{i}}\frac{\partial u_{% j}}{\partial x_{j}}\leavevmode\nobreak\ \mathrm{d}\mathbf{x}.∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_μ divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + italic_λ divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG roman_d bold_x . (3.5)

The diagonal blocks can be diagonalized by the FDM on the interior of a Cartesian cell when the reference axes are aligned with the physical coordinates. The same statement does not hold true for the off-diagonal blocks, as they couple together different displacement components. This is because they discretize products of different first derivatives on the different components and hence are not separable.

The separate displacement components (SDC) preconditioner [11, 22] is defined as the block diagonal matrix ASDC=diag(A11,,Add)subscript𝐴SDCdiagsubscript𝐴11subscript𝐴𝑑𝑑A_{\mathrm{SDC}}=\operatorname{diag}(A_{11},\ldots,A_{dd})italic_A start_POSTSUBSCRIPT roman_SDC end_POSTSUBSCRIPT = roman_diag ( italic_A start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_d italic_d end_POSTSUBSCRIPT ). In other words, this approach is also described as block-Jacobi in the displacement components. The SDC preconditioner discretized with the FDM basis is sparse for Cartesian cells aligned with the coordinate axes. On arbitrary cells, for each separate component, we obtain an auxiliary form that is separable in the reference coordinates by selecting constant diagonal coefficients μ^Ksubscript^𝜇𝐾\hat{\mu}_{K}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT.

It is shown in [11] that for a homogeneous isotropic material with principal axes parallel to the axes of the reference coordinate system, the condition number of the preconditioned matrix will depend on the Poisson ratio:

κ(ASDC1A)d1γ1ν12ν,𝜅superscriptsubscript𝐴SDC1𝐴𝑑1𝛾1𝜈12𝜈\kappa(A_{\mathrm{SDC}}^{-1}A)\leq\frac{d-1}{\gamma}\frac{1-\nu}{1-2\nu},italic_κ ( italic_A start_POSTSUBSCRIPT roman_SDC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ) ≤ divide start_ARG italic_d - 1 end_ARG start_ARG italic_γ end_ARG divide start_ARG 1 - italic_ν end_ARG start_ARG 1 - 2 italic_ν end_ARG , (3.6)

where γ𝛾\gammaitalic_γ is the constant appearing in Korn’s inequality,

𝐮H1(Ω)d2γΩ𝐮𝐮+ε(𝐮):ε(𝐮)d𝐱𝐮V.:superscriptsubscriptnorm𝐮superscript𝐻1superscriptΩ𝑑2𝛾subscriptΩ𝐮𝐮𝜀𝐮𝜀𝐮d𝐱for-all𝐮𝑉\norm{\mathbf{u}}_{H^{1}(\Omega)^{d}}^{2}\leq\gamma\int_{\Omega}\mathbf{u}% \cdot\mathbf{u}+\varepsilon(\mathbf{u}):\varepsilon(\mathbf{u})\leavevmode% \nobreak\ \mathrm{d}\mathbf{x}\quad\forall\,\mathbf{u}\in V.∥ start_ARG bold_u end_ARG ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_γ ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT bold_u ⋅ bold_u + italic_ε ( bold_u ) : italic_ε ( bold_u ) roman_d bold_x ∀ bold_u ∈ italic_V . (3.7)

Thus, the convergence rate of the SDC preconditioner will deteriorate for ν𝜈\nuitalic_ν approaching 1/2121/21 / 2, the so-called nearly incompressible case.

We consider the reference configuration Ω=(0,1)dΩsuperscript01𝑑\Omega=(0,1)^{d}roman_Ω = ( 0 , 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT discretized on a Cartesian mesh with 8 cells along each direction. We specify μ=1𝜇1\mu=1italic_μ = 1, a uniform downwards body force 𝐁=0.02𝐞2𝐁0.02subscript𝐞2\mathbf{B}=-0.02\mathbf{e}_{2}bold_B = - 0.02 bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and homogeneous Dirichlet BCs on ΓD={𝐱Ω,x1=0}subscriptΓ𝐷formulae-sequence𝐱Ωsubscript𝑥10\Gamma_{D}=\{\mathbf{x}\in\partial\Omega,x_{1}=0\}roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = { bold_x ∈ ∂ roman_Ω , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 }. In Table 3 we present the PCG iteration counts required to reduce the Euclidean norm of the residual by a factor of 108superscript10810^{8}10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT starting from a zero initial guess. As the preconditioner, we employ the hybrid p𝑝pitalic_p-multigrid/Schwarz method with vertex-star patches and the SDC/FDM/sparse relaxation and a coarse space with p=1𝑝1p=1italic_p = 1. As expected from (3.6), the results confirm that the approach is reasonably p𝑝pitalic_p-robust, but that robustness with respect to ν𝜈\nuitalic_ν cannot be achieved with SDC relaxation on vertex-star patches.

Table 3: PCG iteration counts for the primal formulation of the linear elasticity problem using the SDC/FDM/sparse relaxation.
d𝑑ditalic_d pλ𝑝𝜆p\setminus\lambdaitalic_p ∖ italic_λ 00 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
2 3 13 14 24 70 199
7 17 17 28 76 236
15 18 19 30 81 249
31 20 20 32 84 258
3 3 20 22 39 114 362
7 25 28 48 123 381
15 27 29 51 125 373

3.2 Mixed FEM formulations of linear elasticity

In order to avoid locking in nearly incompressible continua, or impose the incompressibility constraint, the standard approach is to introduce a pressure-like variable and discretize with a mixed FEM. This is expressed by the weak formulation: find (𝐮𝐮0,p)V×Q𝐮subscript𝐮0𝑝𝑉𝑄(\mathbf{u}-\mathbf{u}_{0},p)\in V\times Q( bold_u - bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p ) ∈ italic_V × italic_Q such that

a(𝐯,𝐮)+b(p,𝐯)𝑎𝐯𝐮𝑏𝑝𝐯\displaystyle a(\mathbf{v},\mathbf{u})+b(p,\mathbf{v})italic_a ( bold_v , bold_u ) + italic_b ( italic_p , bold_v ) =L(𝐯)absent𝐿𝐯\displaystyle=L(\mathbf{v})= italic_L ( bold_v ) 𝐯V,for-all𝐯𝑉\displaystyle\quad\forall\,\mathbf{v}\in V,∀ bold_v ∈ italic_V , (3.8)
b(q,𝐮)c(q,p)𝑏𝑞𝐮𝑐𝑞𝑝\displaystyle b(q,\mathbf{u})-c(q,p)italic_b ( italic_q , bold_u ) - italic_c ( italic_q , italic_p ) =0absent0\displaystyle=0= 0 qQ,for-all𝑞𝑄\displaystyle\quad\forall\,q\in Q,∀ italic_q ∈ italic_Q , (3.9)

where

a(𝐯,𝐮)=Ω2με(𝐯):ε(𝐮)d𝐱,b(q,𝐮)=Ωqdiv(𝐮)d𝐱,c(q,p)=Ωλ1qpd𝐱,a(\mathbf{v},\mathbf{u})=\int_{\Omega}2\mu\varepsilon(\mathbf{v}):\varepsilon(% \mathbf{u})\leavevmode\nobreak\ \mathrm{d}\mathbf{x},\quad b(q,\mathbf{u})=% \int_{\Omega}q\leavevmode\nobreak\ \mathrm{div}(\mathbf{u})\leavevmode\nobreak% \ \mathrm{d}\mathbf{x},\quad c(q,p)=\int_{\Omega}\lambda^{-1}qp\leavevmode% \nobreak\ \mathrm{d}\mathbf{x},italic_a ( bold_v , bold_u ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT 2 italic_μ italic_ε ( bold_v ) : italic_ε ( bold_u ) roman_d bold_x , italic_b ( italic_q , bold_u ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_q roman_div ( bold_u ) roman_d bold_x , italic_c ( italic_q , italic_p ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_q italic_p roman_d bold_x , (3.10)

and Q=L02(Ω)𝑄subscriptsuperscript𝐿20ΩQ=L^{2}_{0}(\Omega)italic_Q = italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( roman_Ω ) for λ=𝜆\lambda=\inftyitalic_λ = ∞ and ΓD=ΩsubscriptΓ𝐷Ω\Gamma_{D}=\partial\Omegaroman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = ∂ roman_Ω, or Q=L2(Ω)𝑄superscript𝐿2ΩQ=L^{2}(\Omega)italic_Q = italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) otherwise.

In order for this problem to have a unique solution, we require the well-known Brezzi conditions: the solution for 𝐮𝐮\mathbf{u}bold_u is unique if a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) is coercive on the kernel of b(,)𝑏b(\cdot,\cdot)italic_b ( ⋅ , ⋅ ), and the solution for p𝑝pitalic_p is unique if there exists a right inverse for b(,)𝑏b(\cdot,\cdot)italic_b ( ⋅ , ⋅ ). This is expressed in the so-called inf–sup condition or LBB condition [7, 12]: there exists β𝛽\betaitalic_β, which might depend on ΩΩ\Omegaroman_Ω, such that

0<βinfqQsup𝐯Vb(q,𝐯)a(𝐯,𝐯)1/2qQ.0𝛽subscriptinfimum𝑞𝑄subscriptsupremum𝐯𝑉𝑏𝑞𝐯𝑎superscript𝐯𝐯12subscriptnorm𝑞𝑄0<\beta\coloneqq\adjustlimits{\inf}_{q\in Q}{\sup}_{\mathbf{v}\in V}\frac{b(q,% \mathbf{v})}{a(\mathbf{v},\mathbf{v})^{1/2}\norm{q}_{Q}}.0 < italic_β ≔ SUBSCRIPTOP start_ARG roman_inf end_ARG start_ARG italic_q ∈ italic_Q end_ARG SUBSCRIPTOP start_ARG roman_sup end_ARG start_ARG bold_v ∈ italic_V end_ARG divide start_ARG italic_b ( italic_q , bold_v ) end_ARG start_ARG italic_a ( bold_v , bold_v ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_ARG italic_q end_ARG ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT end_ARG . (3.11)

After selecting suitable finite dimensional subspaces VhVsubscript𝑉𝑉V_{h}\subset Vitalic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ⊂ italic_V, QhQsubscript𝑄𝑄Q_{h}\subset Qitalic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ⊂ italic_Q, we obtain a system of linear equations with the saddle point structure

[ABBC][𝐮¯p¯]=[𝐟¯g¯].matrix𝐴superscript𝐵top𝐵𝐶matrix¯𝐮¯𝑝matrix¯𝐟¯𝑔\begin{bmatrix}A&B^{\top}\\ B&-C\end{bmatrix}\begin{bmatrix}\underline{\mathbf{u}}\\ \underline{p}\end{bmatrix}=\begin{bmatrix}\underline{\mathbf{f}}\\ \underline{g}\end{bmatrix}.[ start_ARG start_ROW start_CELL italic_A end_CELL start_CELL italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_B end_CELL start_CELL - italic_C end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL under¯ start_ARG bold_u end_ARG end_CELL end_ROW start_ROW start_CELL under¯ start_ARG italic_p end_ARG end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL under¯ start_ARG bold_f end_ARG end_CELL end_ROW start_ROW start_CELL under¯ start_ARG italic_g end_ARG end_CELL end_ROW end_ARG ] . (3.12)

We require the analogous Brezzi conditions for the discrete problem: that a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) is coercive on the discrete kernel of b(,)𝑏b(\cdot,\cdot)italic_b ( ⋅ , ⋅ ), and that there exists a discrete inf-sup constant β~~𝛽\tilde{\beta}over~ start_ARG italic_β end_ARG independent of the mesh but possibly depending on p𝑝pitalic_p such that

0<β~infqhQhsup𝐯hVhb(qh,𝐯h)a(𝐯h,𝐯h)1/2qhQh.0~𝛽subscriptinfimumsubscript𝑞subscript𝑄subscriptsupremumsubscript𝐯subscript𝑉𝑏subscript𝑞subscript𝐯𝑎superscriptsubscript𝐯subscript𝐯12subscriptnormsubscript𝑞subscript𝑄0<\tilde{\beta}\coloneqq\adjustlimits{\inf}_{q_{h}\in Q_{h}}{\sup}_{\mathbf{v}% _{h}\in V_{h}}\frac{b(q_{h},\mathbf{v}_{h})}{a(\mathbf{v}_{h},\mathbf{v}_{h})^% {1/2}\norm{q_{h}}_{Q_{h}}}.0 < over~ start_ARG italic_β end_ARG ≔ SUBSCRIPTOP start_ARG roman_inf end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG SUBSCRIPTOP start_ARG roman_sup end_ARG start_ARG bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG divide start_ARG italic_b ( italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG start_ARG italic_a ( bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_ARG italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG . (3.13)

The discretization Vh×Qhsubscript𝑉subscript𝑄V_{h}\times Q_{h}italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT must be chosen carefully to satisfy these conditions; the discrete inf-sup condition will not be satisfied by arbitrary discretizations. If they are, we have the well-known error estimates

𝐮h𝐮VC1{inf𝐯hVh𝐯h𝐮V+infqhQhqhpQ},subscriptnormsubscript𝐮𝐮𝑉subscript𝐶1subscriptinfimumsubscript𝐯subscript𝑉subscriptnormsubscript𝐯𝐮𝑉subscriptinfimumsubscript𝑞subscript𝑄subscriptnormsubscript𝑞𝑝𝑄\displaystyle\norm{\mathbf{u}_{h}-\mathbf{u}}_{V}\leq C_{1}\left\{\inf_{% \mathbf{v}_{h}\in V_{h}}\norm{\mathbf{v}_{h}-\mathbf{u}}_{V}+\inf_{q_{h}\in Q_% {h}}\norm{q_{h}-p}_{Q}\right\},∥ start_ARG bold_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - bold_u end_ARG ∥ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT { roman_inf start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_ARG bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - bold_u end_ARG ∥ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT + roman_inf start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_ARG italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_p end_ARG ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT } , (3.14a)
phpQβ~1C2{inf𝐯hVh𝐯h𝐮V+infqhQhqhpQ},subscriptnormsubscript𝑝𝑝𝑄superscript~𝛽1subscript𝐶2subscriptinfimumsubscript𝐯subscript𝑉subscriptnormsubscript𝐯𝐮𝑉subscriptinfimumsubscript𝑞subscript𝑄subscriptnormsubscript𝑞𝑝𝑄\displaystyle\norm{p_{h}-p}_{Q}\leq\tilde{\beta}^{-1}C_{2}\left\{\inf_{\mathbf% {v}_{h}\in V_{h}}\norm{\mathbf{v}_{h}-\mathbf{u}}_{V}+\inf_{q_{h}\in Q_{h}}% \norm{q_{h}-p}_{Q}\right\},∥ start_ARG italic_p start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_p end_ARG ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ≤ over~ start_ARG italic_β end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT { roman_inf start_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_ARG bold_v start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - bold_u end_ARG ∥ start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT + roman_inf start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_ARG italic_q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_p end_ARG ∥ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT } , (3.14b)

where C1,C2>0subscript𝐶1subscript𝐶20C_{1},C_{2}>0italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 are generic constants independent of the mesh parameter hhitalic_h. For the use of high-order discretizations, it is desirable to choose an element pair where β~~𝛽\tilde{\beta}over~ start_ARG italic_β end_ARG does not decrease as the polynomial degree of the approximation is increased. Such a discretization is referred to as p𝑝pitalic_p-stable.

In fact, p𝑝pitalic_p-stability is important for solvers also. Approaches based on block-Gaußian elimination, such as the Uzawa algorithm [5] and block-preconditioned MINRES [41], require preconditioners for the negative pressure Schur complement S=C+BA1B𝑆𝐶𝐵superscript𝐴1superscript𝐵topS=C+BA^{-1}B^{\top}italic_S = italic_C + italic_B italic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. It is well known that for the Stokes system, the continuous analogue of S𝑆Sitalic_S, (2)1superscriptsuperscript21\nabla\cdot(-\nabla^{2})^{-1}\nabla∇ ⋅ ( - ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∇, is well approximated by the identity operator [51]. It follows that S𝑆Sitalic_S is spectrally equivalent to the pressure mass matrix, Mpsubscript𝑀𝑝M_{p}italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT,

β02q¯Sq¯q¯Mpq¯β12q¯dim(Qh){0}.formulae-sequencesuperscriptsubscript𝛽02superscript¯𝑞top𝑆¯𝑞superscript¯𝑞topsubscript𝑀𝑝¯𝑞superscriptsubscript𝛽12for-all¯𝑞superscriptdimensionsubscript𝑄0\beta_{0}^{2}\leq\frac{\underline{q}^{\top}S\underline{q}}{\underline{q}^{\top% }M_{p}\underline{q}}\leq\beta_{1}^{2}\quad\forall\,\underline{q}\in\mathbb{R}^% {\dim(Q_{h})}\setminus\{0\}.italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_S under¯ start_ARG italic_q end_ARG end_ARG start_ARG under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT under¯ start_ARG italic_q end_ARG end_ARG ≤ italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ under¯ start_ARG italic_q end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT roman_dim ( italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∖ { 0 } . (3.15)

The rate of convergence of block-preconditioned MINRES will be determined by the ratio β1/β0subscript𝛽1subscript𝛽0\beta_{1}/\beta_{0}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. For Stokes flows with pure Dirichlet BCs, β1=1subscript𝛽11\beta_{1}=1italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1, and β1=dsubscript𝛽1𝑑\beta_{1}=\sqrt{d}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = square-root start_ARG italic_d end_ARG otherwise. In general, we have β0=β~subscript𝛽0~𝛽\beta_{0}=\tilde{\beta}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over~ start_ARG italic_β end_ARG. Since A𝐴Aitalic_A is spectrally equivalent to the vector Laplacian, these results also hold for linear elasticity. We may expect solvers based on such techniques to degrade with p𝑝pitalic_p-refinement if the discretization is not p𝑝pitalic_p-stable.

If we choose to work with the [H1(Ω)]dsuperscriptdelimited-[]superscript𝐻1Ω𝑑[H^{1}(\Omega)]^{d}[ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT-conforming space Vh=[Qp(Ω)]dsubscript𝑉superscriptdelimited-[]subscriptQ𝑝Ω𝑑V_{h}=[\mathrm{Q}_{p}(\Omega)]^{d}italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = [ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, some standard inf–sup stable choices for Qhsubscript𝑄Q_{h}italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT are Qp1(Ω)subscriptQ𝑝1Ω\mathrm{Q}_{p-1}(\Omega)roman_Q start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ( roman_Ω ), DQp2(Ω)subscriptDQ𝑝2Ω\mathrm{DQ}_{p-2}(\Omega)roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT ( roman_Ω ) and DPp1(Ω)subscriptDP𝑝1Ω\mathrm{DP}_{p-1}(\Omega)roman_DP start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ( roman_Ω ). DQp2subscriptDQ𝑝2\mathrm{DQ}_{p-2}roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT denotes discontinuous piecewise polynomials of degree at most p2𝑝2p-2italic_p - 2 in each direction, while DPp1subscriptDP𝑝1\mathrm{DP}_{p-1}roman_DP start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT denotes discontinuous piecewise polynomials of total degree at most p1𝑝1p-1italic_p - 1. The choice Qh=Qp1subscript𝑄subscriptQ𝑝1Q_{h}=\mathrm{Q}_{p-1}italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_Q start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT gives rise to the high-order generalization of the Taylor–Hood mixed element [54]. Here, Mpsubscript𝑀𝑝M_{p}italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT will not be block diagonal, and hence more expensive preconditioning techniques will be required. Moreover, it is shown numerically in [1] that the Taylor–Hood element is not p𝑝pitalic_p-stable. The choice Qh=DQp2subscript𝑄subscriptDQ𝑝2Q_{h}=\mathrm{DQ}_{p-2}italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT exhibits an asymptotic decay of β~Cp(1d)/2~𝛽𝐶superscript𝑝1𝑑2\tilde{\beta}\leq Cp^{(1-d)/2}over~ start_ARG italic_β end_ARG ≤ italic_C italic_p start_POSTSUPERSCRIPT ( 1 - italic_d ) / 2 end_POSTSUPERSCRIPT as p𝑝p\to\inftyitalic_p → ∞ [10], and thus is not p𝑝pitalic_p-stable. In practice, it is observed that this is quite a pessimistic bound for moderate p𝑝pitalic_p [35]. The choice Qh=DPp1subscript𝑄subscriptDP𝑝1Q_{h}=\mathrm{DP}_{p-1}italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_DP start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT is p𝑝pitalic_p-stable, but numerical experiments reveal that the stability is severely affected by the cell aspect ratio, unlike the previous two choices [50]. Moreover this last space does not have tensor product shape functions, so its efficient implementation becomes challenging.

To construct p𝑝pitalic_p-stable discretizations that are also robust to cell aspect ratio, we turn to nonconforming schemes with VhH(div,Ω)subscript𝑉𝐻divΩV_{h}\subset H(\mathrm{div},\Omega)italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ⊂ italic_H ( roman_div , roman_Ω ) [15, 31]. In particular we consider the use of Raviart–Thomas elements [4] of degree p𝑝pitalic_p for Vhsubscript𝑉V_{h}italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT for the displacement, paired with Qh=DQp1subscript𝑄subscriptDQ𝑝1Q_{h}=\mathrm{DQ}_{p-1}italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_DQ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT. This pair satisfies div(Vh)=Qhdivsubscript𝑉subscript𝑄\mathrm{div}(V_{h})=Q_{h}roman_div ( italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) = italic_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, which enforces the incompressibility constraint (3.9) exactly in the numerical approximation for λ=𝜆\lambda=\inftyitalic_λ = ∞. The Raviart–Thomas elements are defined on the reference quadrilateral as

RTp(K^)=Pp(^)DPp1(^)DPp1(^)Pp(^).subscriptRT𝑝^𝐾direct-sumtensor-productsubscriptP𝑝^subscriptDP𝑝1^tensor-productsubscriptDP𝑝1^subscriptP𝑝^\mathrm{RT}_{p}(\hat{K})=\mathrm{P}_{p}(\hat{\mathcal{I}})\otimes\mathrm{DP}_{% p-1}(\hat{\mathcal{I}})\oplus\mathrm{DP}_{p-1}(\hat{\mathcal{I}})\otimes% \mathrm{P}_{p}(\hat{\mathcal{I}}).roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_K end_ARG ) = roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) ⊗ roman_DP start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) ⊕ roman_DP start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) ⊗ roman_P start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG caligraphic_I end_ARG ) . (3.16)

The analogous element in three dimensions is referred to as the Nédélec face element [39]. The definition can be extended to curvilinear cells via the contravariant Piola transform: for a function 𝐮^:K^d:^𝐮^𝐾superscript𝑑\hat{\mathbf{u}}:\hat{K}\to\mathbb{R}^{d}over^ start_ARG bold_u end_ARG : over^ start_ARG italic_K end_ARG → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we define 𝐮:Kd:𝐮𝐾superscript𝑑\mathbf{u}:K\to\mathbb{R}^{d}bold_u : italic_K → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as

𝐮=𝒫K(𝐮^)1|DFK|DFK(𝐮^FK1),𝐮subscript𝒫𝐾^𝐮1Dsubscript𝐹𝐾Dsubscript𝐹𝐾^𝐮superscriptsubscript𝐹𝐾1\mathbf{u}=\mathcal{P}_{K}(\hat{\mathbf{u}})\coloneqq\frac{1}{\absolutevalue{% \mathrm{D}F_{K}}}\mathrm{D}F_{K}\left(\hat{\mathbf{u}}\circ F_{K}^{-1}\right),bold_u = caligraphic_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over^ start_ARG bold_u end_ARG ) ≔ divide start_ARG 1 end_ARG start_ARG | start_ARG roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_ARG | end_ARG roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( over^ start_ARG bold_u end_ARG ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , (3.17)

and set

RTp(K)=𝒫K(RTp(K^)).subscriptRT𝑝𝐾subscript𝒫𝐾subscriptRT𝑝^𝐾\mathrm{RT}_{p}(K)=\mathcal{P}_{K}\left(\mathrm{RT}_{p}(\hat{K})\right).roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_K ) = caligraphic_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( over^ start_ARG italic_K end_ARG ) ) . (3.18)

These elements have superb properties, but their nonconforming nature must be suitably addressed in the discretization. They only impose continuity of the normal components of 𝐮𝐮\mathbf{u}bold_u across cell facets, and we therefore weakly enforce the tangential continuity via the symmetric interior penalty (SIPG) method [3]. The use of SIPG for the displacement requires further extension of the FDM/sparse relaxation; in particular, we must consider the additional facet integrals arising in the method, and show that the stiffness remains sparse.

3.3 Extension to interior penalty DG methods

Interior penalty discontinuous Galerkin (IP-DG) methods relax the continuity requirement of the discretization space. For instance, instead of [H1(Ω)]dsuperscriptdelimited-[]superscript𝐻1Ω𝑑[H^{1}(\Omega)]^{d}[ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we consider a larger function space with weaker continuity, such as [L2(Ω)]dsuperscriptdelimited-[]superscript𝐿2Ω𝑑[L^{2}(\Omega)]^{d}[ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT or H(div,Ω)𝐻divΩH(\mathrm{div},\Omega)italic_H ( roman_div , roman_Ω ). As previously mentioned, in order to deal with the non-conformity, C0superscript𝐶0C^{0}italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT-continuity is weakly enforced via the introduction of a penalty term on the set of interior facets ΓIsubscriptΓ𝐼\Gamma_{I}roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT of the mesh 𝒯hsubscript𝒯\mathcal{T}_{h}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT that vanishes for C0superscript𝐶0C^{0}italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT-continuous functions. Similarly, the weak prescription of the Dirichlet BC 𝐮=𝐮0𝐮subscript𝐮0\mathbf{u}=\mathbf{u}_{0}bold_u = bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT on ΓDsubscriptΓ𝐷\Gamma_{D}roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is achieved by introducing a penalty term on ΓDsubscriptΓ𝐷\Gamma_{D}roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT.

We consider the following SIPG formulation:

a(𝐯,𝐮)=K𝒯hK𝐯:v(𝐮)d𝐱+eΓIΓDeηhe1{G}𝐯:𝐮𝐯:{v(𝐮)}{G𝐯}:𝐮ds,\begin{split}&a(\mathbf{v},\mathbf{u})=\sum_{K\in\mathcal{T}_{h}}\int_{K}% \nabla\mathbf{v}:\mathcal{F}^{v}(\nabla\mathbf{u})\leavevmode\nobreak\ \mathrm% {d}\mathbf{x}\\ &+\sum_{e\in\Gamma_{I}\cup\Gamma_{D}}\int_{e}\eta h_{e}^{-1}\left\{G^{\top}% \right\}\left\llbracket\mathbf{v}\right\rrbracket:\left\llbracket\mathbf{u}% \right\rrbracket-\left\llbracket\mathbf{v}\right\rrbracket:\left\{\mathcal{F}^% {v}(\nabla\mathbf{u})\right\}-\left\{G^{\top}\nabla\mathbf{v}\right\}:\left% \llbracket\mathbf{u}\right\rrbracket\leavevmode\nobreak\ \mathrm{d}s,\end{split}start_ROW start_CELL end_CELL start_CELL italic_a ( bold_v , bold_u ) = ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∇ bold_v : caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) roman_d bold_x end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∪ roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_η italic_h start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } ⟦ bold_v ⟧ : ⟦ bold_u ⟧ - ⟦ bold_v ⟧ : { caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) } - { italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ bold_v } : ⟦ bold_u ⟧ roman_d italic_s , end_CELL end_ROW (3.19)
L(𝐯)=Ω𝐯𝐁d𝐱+ΓDηhe1G(𝐯𝐧):(𝐮0𝐧)G𝐯:(𝐮0𝐧)ds.:𝐿𝐯subscriptΩ𝐯𝐁differential-d𝐱subscriptsubscriptΓ𝐷𝜂superscriptsubscript𝑒1superscript𝐺toptensor-product𝐯𝐧tensor-productsubscript𝐮0𝐧superscript𝐺top𝐯:tensor-productsubscript𝐮0𝐧d𝑠L(\mathbf{v})=\int_{\Omega}\mathbf{v}\cdot\mathbf{B}\leavevmode\nobreak\ % \mathrm{d}\mathbf{x}+\int_{\Gamma_{D}}\eta h_{e}^{-1}G^{\top}(\mathbf{v}% \otimes\mathbf{n}):(\mathbf{u}_{0}\otimes\mathbf{n})-G^{\top}\nabla\mathbf{v}:% (\mathbf{u}_{0}\otimes\mathbf{n})\leavevmode\nobreak\ \mathrm{d}s.italic_L ( bold_v ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT bold_v ⋅ bold_B roman_d bold_x + ∫ start_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_η italic_h start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_v ⊗ bold_n ) : ( bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊗ bold_n ) - italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ bold_v : ( bold_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊗ bold_n ) roman_d italic_s . (3.20)

Here v(𝐮)superscript𝑣𝐮\mathcal{F}^{v}(\nabla\mathbf{u})caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) is a linear viscous flux. For the vector Poisson equation, the viscous flux is given by v(𝐮)=𝐮superscript𝑣𝐮𝐮\mathcal{F}^{v}(\nabla\mathbf{u})=\nabla\mathbf{u}caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) = ∇ bold_u. For the primal formulation of linear elasticity, the viscous flux corresponds to the stress tensor v(𝐮)=μ(𝐮+𝐮)+λ𝐮𝟙superscript𝑣𝐮𝜇𝐮superscript𝐮top𝜆𝐮1\mathcal{F}^{v}(\nabla\mathbf{u})=\mu(\nabla\mathbf{u}+\nabla\mathbf{u}^{\top}% )+\lambda\nabla\cdot\mathbf{u}{\mathds{1}}caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) = italic_μ ( ∇ bold_u + ∇ bold_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) + italic_λ ∇ ⋅ bold_u blackboard_1. For the mixed formulation of linear elasticity, the (1,1)11(1,1)( 1 , 1 )-block of the system has viscous flux v(𝐮)=μ(𝐮+𝐮)superscript𝑣𝐮𝜇𝐮superscript𝐮top\mathcal{F}^{v}(\nabla\mathbf{u})=\mu(\nabla\mathbf{u}+\nabla\mathbf{u}^{\top})caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) = italic_μ ( ∇ bold_u + ∇ bold_u start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ).

From left to right, the terms in the surface integral in (3.19) are referred to as the penalty, consistency, and adjoint consistency terms. The quantity G𝐺Gitalic_G is known as the homogeneity tensor,

Gijkl=uk,l[v(𝐮)]ij,subscript𝐺𝑖𝑗𝑘𝑙subscript𝑢𝑘𝑙subscriptdelimited-[]superscript𝑣𝐮𝑖𝑗G_{ijkl}=\frac{\partial}{\partial u_{k,l}}[\mathcal{F}^{v}(\nabla\mathbf{u})]_% {ij},italic_G start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT = divide start_ARG ∂ end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT end_ARG [ caligraphic_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT ( ∇ bold_u ) ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , (3.21)

for which we define the adjoint product with 𝐯𝐯\nabla\mathbf{v}∇ bold_v

[G𝐯]kl=Gijklvi,j.subscriptdelimited-[]superscript𝐺top𝐯𝑘𝑙subscript𝐺𝑖𝑗𝑘𝑙subscript𝑣𝑖𝑗[G^{\top}\nabla\mathbf{v}]_{kl}=G_{ijkl}v_{i,j}.[ italic_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ bold_v ] start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT = italic_G start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT . (3.22)

The average {}\left\{\cdot\right\}{ ⋅ } and jump delimited-⟦⟧\left\llbracket\cdot\right\rrbracket⟦ ⋅ ⟧ operators are defined for scalar, vector, and tensor arguments as follows. Let e𝑒eitalic_e be a facet of the mesh. For an interior facet, let Ksuperscript𝐾K^{-}italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and K+superscript𝐾K^{+}italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT be the two mesh cells that share it, and let 𝐰superscript𝐰\mathbf{w}^{-}bold_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and 𝐰+superscript𝐰\mathbf{w}^{+}bold_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT be the traces of a function 𝐰𝐰\mathbf{w}bold_w on e𝑒eitalic_e from Ksuperscript𝐾K^{-}italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and K+superscript𝐾K^{+}italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, respectively. On each facet we define

{𝐰}e={12(𝐰+𝐰+)eΓI,𝐰otherwise,𝐰e={𝐰𝐧+𝐰+𝐧+eΓI,𝐰𝐧otherwise.\left\{\mathbf{w}\right\}_{e}=\begin{cases}\frac{1}{2}(\mathbf{w}^{-}+\mathbf{% w}^{+})&e\in\Gamma_{I},\\ \mathbf{w}&\mbox{otherwise},\end{cases}\quad\left\llbracket\mathbf{w}\right% \rrbracket_{e}=\begin{cases}\mathbf{w}^{-}\otimes\mathbf{n}^{-}+\mathbf{w}^{+}% \otimes\mathbf{n}^{+}&e\in\Gamma_{I},\\ \mathbf{w}\otimes\mathbf{n}&\mbox{otherwise}.\end{cases}{ bold_w } start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = { start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + bold_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) end_CELL start_CELL italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_w end_CELL start_CELL otherwise , end_CELL end_ROW ⟦ bold_w ⟧ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = { start_ROW start_CELL bold_w start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ⊗ bold_n start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + bold_w start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ⊗ bold_n start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_CELL start_CELL italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_w ⊗ bold_n end_CELL start_CELL otherwise . end_CELL end_ROW (3.23)

In order to ensure coercivity of a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) as we do hhitalic_h or p𝑝pitalic_p refinement, the penalty term must be sufficiently large. The penalty coefficient ηhe1𝜂superscriptsubscript𝑒1\eta h_{e}^{-1}italic_η italic_h start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT must be chosen inversely proportional to the normal spacing of GLL nodes near the facet, i.e. η=𝒪(p(p+1))𝜂𝒪𝑝𝑝1\eta=\mathcal{O}(p(p+1))italic_η = caligraphic_O ( italic_p ( italic_p + 1 ) ) [37]. For the reciprocal length scale in the direction normal to facet e𝑒eitalic_e we use

he1|e|{|K|1}e.superscriptsubscript𝑒1𝑒subscriptsuperscript𝐾1𝑒h_{e}^{-1}\coloneqq\absolutevalue{e}\left\{\absolutevalue{K}^{-1}\right\}_{e}.italic_h start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≔ | start_ARG italic_e end_ARG | { | start_ARG italic_K end_ARG | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT . (3.24)

The stiffness matrix that corresponds to a(,)𝑎a(\cdot,\cdot)italic_a ( ⋅ , ⋅ ) in the SIPG formulation is obtained via direct stiffness summation over the cells and facets:

A=K𝒯hRKAKRK+eΓIΓDReAeRe,𝐴subscript𝐾subscript𝒯superscriptsubscript𝑅𝐾topsuperscript𝐴𝐾subscript𝑅𝐾subscript𝑒subscriptΓ𝐼subscriptΓ𝐷superscriptsubscript𝑅𝑒topsuperscript𝐴𝑒subscript𝑅𝑒A=\sum_{K\in\mathcal{T}_{h}}R_{K}^{\top}A^{K}R_{K}+\sum_{e\in\Gamma_{I}\cup% \Gamma_{D}}R_{e}^{\top}A^{e}R_{e},italic_A = ∑ start_POSTSUBSCRIPT italic_K ∈ caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∪ roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , (3.25)

where AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is the cell matrix discretizing the volume integral in K𝐾Kitalic_K, RKsubscript𝑅𝐾R_{K}italic_R start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT is the Boolean restriction onto the DOFs of K𝐾Kitalic_K, Aesuperscript𝐴𝑒A^{e}italic_A start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT is the facet matrix discretizing the surface integral on e𝑒eitalic_e, and Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT is Boolean restriction onto the DOFs of the cells sharing facet e𝑒eitalic_e.

To illustrate the extension of our approach to the SIPG discretization, we consider again the scalar Poisson equation. The discrete problem is to find uhVh=DQp(Ω)L2(Ω)subscript𝑢subscript𝑉subscriptDQ𝑝Ωsuperscript𝐿2Ωu_{h}\in V_{h}=\mathrm{DQ}_{p}(\Omega)\subset L^{2}(\Omega)italic_u start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_DQ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( roman_Ω ) ⊂ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ). On Cartesian cells, both AKsuperscript𝐴𝐾A^{K}italic_A start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT and Aesuperscript𝐴𝑒A^{e}italic_A start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT have a tensor product structure of the form (2.12), with matrices of operators on the interval that can be sparsified by the FDM. To illustrate this, suppose that, for eΓI𝑒subscriptΓ𝐼e\in\Gamma_{I}italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT reorders the DOFs such that the cells Ksuperscript𝐾K^{-}italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and K+superscript𝐾K^{+}italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT share e𝑒eitalic_e along the d𝑑ditalic_d-th reference coordinate axis, while leaving the other axes consistently oriented on both cells. The facet matrices are

Ae={EeB^if d=2,EeB^B^if d=3,superscript𝐴𝑒casestensor-productsuperscript𝐸𝑒^𝐵if 𝑑2tensor-productsuperscript𝐸𝑒^𝐵^𝐵if 𝑑3A^{e}=\begin{cases}E^{e}\otimes\hat{B}&\mbox{if\leavevmode\nobreak\ }d=2,\\ E^{e}\otimes\hat{B}\otimes\hat{B}&\mbox{if\leavevmode\nobreak\ }d=3,\end{cases}italic_A start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ⊗ over^ start_ARG italic_B end_ARG end_CELL start_CELL if italic_d = 2 , end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ⊗ over^ start_ARG italic_B end_ARG ⊗ over^ start_ARG italic_B end_ARG end_CELL start_CELL if italic_d = 3 , end_CELL end_ROW (3.26)

where the interval facet matrix Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT is defined in terms of the coefficients μjKsubscriptsuperscript𝜇𝐾𝑗\mu^{K}_{j}italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT appearing in (2.12), the 1D shape functions {ϕ^j}subscript^italic-ϕ𝑗\{\hat{\phi}_{j}\}{ over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, and their normal derivatives nϕ^j𝑛subscript^italic-ϕ𝑗\frac{\partial}{\partial n}\hat{\phi}_{j}divide start_ARG ∂ end_ARG start_ARG ∂ italic_n end_ARG over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT on ^^\partial\hat{\mathcal{I}}∂ over^ start_ARG caligraphic_I end_ARG (the usual derivative with a sign). When eΓD𝑒subscriptΓ𝐷e\in\Gamma_{D}italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, Ee(p+1)×(p+1)superscript𝐸𝑒superscript𝑝1𝑝1E^{e}\in\mathbb{R}^{(p+1)\times(p+1)}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p + 1 ) × ( italic_p + 1 ) end_POSTSUPERSCRIPT is given by

[Ee]ij=μe(ηϕ^i(x^e)ϕ^j(x^e)ϕ^i(x^e)nϕ^j(x^e)nϕ^i(x^e)ϕ^j(x^e)).subscriptdelimited-[]superscript𝐸𝑒𝑖𝑗superscript𝜇𝑒𝜂subscript^italic-ϕ𝑖superscript^𝑥𝑒subscript^italic-ϕ𝑗superscript^𝑥𝑒subscript^italic-ϕ𝑖superscript^𝑥𝑒𝑛subscript^italic-ϕ𝑗superscript^𝑥𝑒𝑛subscript^italic-ϕ𝑖superscript^𝑥𝑒subscript^italic-ϕ𝑗superscript^𝑥𝑒[E^{e}]_{ij}=\mu^{e}\left(\eta\hat{\phi}_{i}(\hat{x}^{e})\hat{\phi}_{j}(\hat{x% }^{e})-\hat{\phi}_{i}(\hat{x}^{e})\tfrac{\partial}{\partial n}\hat{\phi}_{j}(% \hat{x}^{e})-\tfrac{\partial}{\partial n}\hat{\phi}_{i}(\hat{x}^{e})\hat{\phi}% _{j}(\hat{x}^{e})\right).[ italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ( italic_η over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) - over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_n end_ARG over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) - divide start_ARG ∂ end_ARG start_ARG ∂ italic_n end_ARG over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) ) . (3.27)

Here μe=μlKsuperscript𝜇𝑒subscriptsuperscript𝜇𝐾𝑙\mu^{e}=\mu^{K}_{l}italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, where x^lsubscript^𝑥𝑙\hat{x}_{l}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the reference coordinate normal to e𝑒eitalic_e, and x^e^superscript^𝑥𝑒^\hat{x}^{e}\in\partial\hat{\mathcal{I}}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ∈ ∂ over^ start_ARG caligraphic_I end_ARG describes the facet e𝑒eitalic_e as the image of the plane x^l=x^esubscript^𝑥𝑙superscript^𝑥𝑒\hat{x}_{l}=\hat{x}^{e}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT under FKsubscript𝐹𝐾F_{K}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. When eΓI𝑒subscriptΓ𝐼e\in\Gamma_{I}italic_e ∈ roman_Γ start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT is a 2×2222\times 22 × 2 block matrix with blocks Erse(p+1)×(p+1)subscriptsuperscript𝐸𝑒𝑟𝑠superscript𝑝1𝑝1E^{e}_{rs}\in\mathbb{R}^{(p+1)\times(p+1)}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_p + 1 ) × ( italic_p + 1 ) end_POSTSUPERSCRIPT, r,s{0,1}𝑟𝑠01r,s\in\{0,1\}italic_r , italic_s ∈ { 0 , 1 }, given by

[Erse]ij=(1)rs2(η(μ0e+μ1e)ϕ^i(x^re)ϕ^j(x^se)μseϕ^i(x^re)nϕ^j(x^se)μrenϕ^i(x^re)ϕ^j(x^se)).subscriptdelimited-[]subscriptsuperscript𝐸𝑒𝑟𝑠𝑖𝑗superscript1𝑟𝑠2𝜂subscriptsuperscript𝜇𝑒0subscriptsuperscript𝜇𝑒1subscript^italic-ϕ𝑖subscriptsuperscript^𝑥𝑒𝑟subscript^italic-ϕ𝑗subscriptsuperscript^𝑥𝑒𝑠subscriptsuperscript𝜇𝑒𝑠subscript^italic-ϕ𝑖subscriptsuperscript^𝑥𝑒𝑟𝑛subscript^italic-ϕ𝑗subscriptsuperscript^𝑥𝑒𝑠subscriptsuperscript𝜇𝑒𝑟𝑛subscript^italic-ϕ𝑖subscriptsuperscript^𝑥𝑒𝑟subscript^italic-ϕ𝑗subscriptsuperscript^𝑥𝑒𝑠[E^{e}_{rs}]_{ij}=\tfrac{(-1)^{r-s}}{2}\left(\eta(\mu^{e}_{0}+\mu^{e}_{1})\hat% {\phi}_{i}(\hat{x}^{e}_{r})\hat{\phi}_{j}(\hat{x}^{e}_{s})-\mu^{e}_{s}\hat{% \phi}_{i}(\hat{x}^{e}_{r})\tfrac{\partial}{\partial n}\hat{\phi}_{j}(\hat{x}^{% e}_{s})-\mu^{e}_{r}\tfrac{\partial}{\partial n}\hat{\phi}_{i}(\hat{x}^{e}_{r})% \hat{\phi}_{j}(\hat{x}^{e}_{s})\right).[ italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_r - italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ( italic_η ( italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) - italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) divide start_ARG ∂ end_ARG start_ARG ∂ italic_n end_ARG over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) - italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_n end_ARG over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) ) . (3.28)

Here μ0e=μlK,μ1e=μmK+formulae-sequencesubscriptsuperscript𝜇𝑒0subscriptsuperscript𝜇superscript𝐾𝑙subscriptsuperscript𝜇𝑒1subscriptsuperscript𝜇superscript𝐾𝑚\mu^{e}_{0}=\mu^{K^{-}}_{l},\mu^{e}_{1}=\mu^{K^{+}}_{m}italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_μ start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where x^lsubscript^𝑥𝑙\hat{x}_{l}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and x^msubscript^𝑥𝑚\hat{x}_{m}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are the reference directions normal to e𝑒eitalic_e on Ksuperscript𝐾K^{-}italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and K+superscript𝐾K^{+}italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, respectively. Similarly, the facet e𝑒eitalic_e is the image of x^l=x^0esubscript^𝑥𝑙subscriptsuperscript^𝑥𝑒0\hat{x}_{l}=\hat{x}^{e}_{0}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT under FKsubscript𝐹superscript𝐾F_{K^{-}}italic_F start_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and that of x^m=x^1esubscript^𝑥𝑚subscriptsuperscript^𝑥𝑒1\hat{x}_{m}=\hat{x}^{e}_{1}over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT under FK+subscript𝐹superscript𝐾F_{K^{+}}italic_F start_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.

Some implementations of DQpsubscriptDQ𝑝\mathrm{DQ}_{p}roman_DQ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT do not feature an interior-interface decomposition and use the Gauß–Legendre (GL) nodal shape functions. The GL nodes do not include the endpoints, thus all shape functions have non-zero support at the facets, causing Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT to be dense. The matrices Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT are sparse for a basis with an interior-interface decomposition, such as the GLL Lagrange polynomials {j}subscript𝑗\{\ell_{j}\}{ roman_ℓ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, the hierarchical Lobatto polynomials {lj}subscript𝑙𝑗\{l_{j}\}{ italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, and the FDM polynomials {s^j}subscript^𝑠𝑗\{\hat{s}_{j}\}{ over^ start_ARG italic_s end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }. Since ϕ^j(±1)subscript^italic-ϕ𝑗plus-or-minus1\hat{\phi}_{j}(\pm 1)over^ start_ARG italic_ϕ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ± 1 ) is non-zero for a single j{0,p}𝑗0𝑝j\in\{0,p\}italic_j ∈ { 0 , italic_p }, each term in (3.28) and (3.27) corresponds to a non-zero entry, a non-zero row, and a non-zero column of Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT, respectively, as seen in Figure 10.

Refer to caption
Figure 10: Non-zero structure for the interior facet matrix Eesuperscript𝐸𝑒E^{e}italic_E start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT on the interval (p=4𝑝4p=4italic_p = 4).

Instead of diagonalizing the SIPG patch matrix as in [55], our FDM-based approach produces a sparse matrix with diagonal interior blocks on (possibly) unstructured vertex-star patches. Figure 11 shows the sparsity pattern of the matrix for the SIPG formulation of the Poisson equation on a Cartesian vertex-star patch in the FDM basis, along with its Cholesky factor. Here the matrix size is increased from (2p1)dsuperscript2𝑝1𝑑(2p-1)^{d}( 2 italic_p - 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT DOFs in the CG case to (2p+2)dsuperscript2𝑝2𝑑(2p+2)^{d}( 2 italic_p + 2 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. At low polynomial degrees, the interface DOFs form a large fraction of the total number, but the proportion decreases as p𝑝pitalic_p increases. The computational complexity analysis of Section 2.6 carries over to the SIPG case.

Refer to caption
(a) Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, d=2𝑑2d=2italic_d = 2
Refer to caption
(b) chol(Aj)cholsubscript𝐴𝑗\mathrm{chol}(A_{j})roman_chol ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), d=2𝑑2d=2italic_d = 2
Refer to caption
(c) Ajsubscript𝐴𝑗A_{j}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, d=3𝑑3d=3italic_d = 3
Refer to caption
(d) chol(Aj)cholsubscript𝐴𝑗\mathrm{chol}(A_{j})roman_chol ( italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), d=3𝑑3d=3italic_d = 3
Figure 11: Non-zero structure of the SIPG stiffness matrix in the FDM basis Aj=LjLjsubscript𝐴𝑗subscript𝐿𝑗superscriptsubscript𝐿𝑗topA_{j}=L_{j}L_{j}^{\top}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and its upper Cholesky factor Ljsuperscriptsubscript𝐿𝑗topL_{j}^{\top}italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT for the Poisson problem on a Cartesian vertex-star patch with p=4𝑝4p=4italic_p = 4. Since the space is discontinuous, the number of DOFs in a patch is increased. The number of non-zeros in an interior row is 3d+13𝑑13d+13 italic_d + 1, since the interior DOFs are connected to each of the 2d2𝑑2d2 italic_d facets of their corresponding cell, plus d𝑑ditalic_d more facets from the adjacent cells.

For a general viscous flux, we construct an auxiliary separable form by expressing the cell integrals in terms of the reference coordinates

aK(𝐯,𝐮)=KGijklKvixjukxld𝐱=K^G^ijklKx^j(viFK)x^l(ukFK)d𝐱^,subscript𝑎𝐾𝐯𝐮subscript𝐾superscriptsubscript𝐺𝑖𝑗𝑘𝑙𝐾subscript𝑣𝑖subscript𝑥𝑗subscript𝑢𝑘subscript𝑥𝑙differential-d𝐱subscript^𝐾superscriptsubscript^𝐺𝑖𝑗𝑘𝑙𝐾subscript^𝑥𝑗subscript𝑣𝑖subscript𝐹𝐾subscript^𝑥𝑙subscript𝑢𝑘subscript𝐹𝐾differential-d^𝐱a_{K}(\mathbf{v},\mathbf{u})=\int_{K}G_{ijkl}^{K}\frac{\partial v_{i}}{% \partial x_{j}}\frac{\partial u_{k}}{\partial x_{l}}\leavevmode\nobreak\ % \mathrm{d}\mathbf{x}=\int_{\hat{K}}\hat{G}_{ijkl}^{K}\frac{\partial}{\partial% \hat{x}_{j}}\left(v_{i}\circ F_{K}\right)\frac{\partial}{\partial\hat{x}_{l}}% \left(u_{k}\circ F_{K}\right)\leavevmode\nobreak\ \mathrm{d}\hat{\mathbf{x}},italic_a start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( bold_v , bold_u ) = ∫ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG ∂ italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG roman_d bold_x = ∫ start_POSTSUBSCRIPT over^ start_ARG italic_K end_ARG end_POSTSUBSCRIPT over^ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) divide start_ARG ∂ end_ARG start_ARG ∂ over^ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG ( italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∘ italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) roman_d over^ start_ARG bold_x end_ARG , (3.29)

where G^Ksuperscript^𝐺𝐾\hat{G}^{K}over^ start_ARG italic_G end_ARG start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT is the homogeneity tensor in the reference coordinates,

G^ijklK=|det(DFK)|[DFK1]jm[DFK1]lnGimknK.superscriptsubscript^𝐺𝑖𝑗𝑘𝑙𝐾Dsubscript𝐹𝐾subscriptdelimited-[]Dsuperscriptsubscript𝐹𝐾1𝑗𝑚subscriptdelimited-[]Dsuperscriptsubscript𝐹𝐾1𝑙𝑛superscriptsubscript𝐺𝑖𝑚𝑘𝑛𝐾\hat{G}_{ijkl}^{K}=\absolutevalue{\det(\mathrm{D}F_{K})}[\mathrm{D}F_{K}^{-1}]% _{jm}[\mathrm{D}F_{K}^{-1}]_{ln}G_{imkn}^{K}.over^ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT = | start_ARG roman_det ( start_ARG roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_ARG ) end_ARG | [ roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_j italic_m end_POSTSUBSCRIPT [ roman_D italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] start_POSTSUBSCRIPT italic_l italic_n end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i italic_m italic_k italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT . (3.30)

The auxiliary form a~(,)~𝑎\tilde{a}(\cdot,\cdot)over~ start_ARG italic_a end_ARG ( ⋅ , ⋅ ) is constructed by approximating G^ijklKsuperscriptsubscript^𝐺𝑖𝑗𝑘𝑙𝐾\hat{G}_{ijkl}^{K}over^ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT with a piecewise constant tensor that discards the entries where ik𝑖𝑘i\neq kitalic_i ≠ italic_k or jl𝑗𝑙j\neq litalic_j ≠ italic_l. The corresponding cell stiffness matrices become sparse in the FDM basis, and have a similar form as (2.12), except that the coefficients μjKsubscriptsuperscript𝜇𝐾𝑗\mu^{K}_{j}italic_μ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are diagonal matrices that multiply each term through an additional Kronecker product. Hence, we expect that preconditioners based on the auxiliary form to be limited by the coupling between vector components, by the mesh geometry, and by how GKsuperscript𝐺𝐾G^{K}italic_G start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT varies within K𝐾Kitalic_K.

This approach carries over to non-Cartesian cells for DG discretizations of the Poisson equation in the same way as the CG case. Unfortunately, the extension of our approach to vector-valued problems in H(div)𝐻divH(\mathrm{div})italic_H ( roman_div ) on non-Cartesian cells does not yield a good preconditioner. In this setting, we construct a block diagonal preconditioner separating the components of the DOFs, which are in the reference coordinates. For the vector Poisson problem on non-Cartesian cells, the Piola transform introduces off-diagonal contributions from the volume and surface terms, which does not occur on non-Piola-mapped elements, such as [Qp]dsuperscriptdelimited-[]subscriptQ𝑝𝑑[\mathrm{Q}_{p}]^{d}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and [DQp]dsuperscriptdelimited-[]subscriptDQ𝑝𝑑[\mathrm{DQ}_{p}]^{d}[ roman_DQ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The excluded terms are required for the surface integral terms to vanish for arguments with C0superscript𝐶0C^{0}italic_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT continuity, and without them the preconditioner might become indefinite on non-Cartesian cells.

3.4 Results for mixed formulations of linear elasticity

We consider the same problem as in Table 3, with both a conforming [Qp]d×DQp2superscriptdelimited-[]subscriptQ𝑝𝑑subscriptDQ𝑝2[\mathrm{Q}_{p}]^{d}\times\mathrm{DQ}_{p-2}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT discretization and a non-conforming RTp×DQp1subscriptRT𝑝subscriptDQ𝑝1\mathrm{RT}_{p}\times\mathrm{DQ}_{p-1}roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT discretization. For the H(div)𝐻divH(\mathrm{div})italic_H ( roman_div )-conforming discretization, the normal components of the Dirichlet BCs are enforced strongly, while the tangential components of the BCs are weakly enforced with SIPG. Enforcing the normal conditions strongly is crucial for achieving a divergence-free solution in the Stokes limit λ=𝜆\lambda=\inftyitalic_λ = ∞. For the penalty coefficient, we use η=p(p+1)𝜂𝑝𝑝1\eta=p(p+1)italic_η = italic_p ( italic_p + 1 ). We restrict our experiment to Cartesian cells, so that the FDM/sparse relaxation is applicable to the H(div)𝐻divH(\mathrm{div})italic_H ( roman_div )-conforming discretization.

We iteratively solve the discrete system (3.12)italic-(3.12italic-)\eqref{eq:saddle}italic_( italic_) via MINRES with a symmetric positive definite block diagonal preconditioner,

𝒫diag=[P100P2].subscript𝒫diagmatrixsubscript𝑃100subscript𝑃2\mathcal{P}_{\mathrm{diag}}=\begin{bmatrix}P_{1}&0\\ 0&P_{2}\end{bmatrix}.caligraphic_P start_POSTSUBSCRIPT roman_diag end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (3.31)

Here P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a preconditioner for the displacement block A𝐴Aitalic_A, and P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a preconditioner for the scaled pressure mass matrix (μ1+λ1)Mpsuperscript𝜇1superscript𝜆1subscript𝑀𝑝(\mu^{-1}+\lambda^{-1})M_{p}( italic_μ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. For P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT we employ the hybrid p𝑝pitalic_p-multigrid/Schwarz method with the SDC/FDM/sparse relaxation and [Q1]dsuperscriptdelimited-[]subscriptQ1𝑑[\mathrm{Q}_{1}]^{d}[ roman_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT as the coarse space. In our tests, we discretize the pressure space with the GL basis, and employ point-Jacobi on the pressure mass matrix, i.e. P2=(μ1+λ1)diag(Mp)subscript𝑃2superscript𝜇1superscript𝜆1diagsubscript𝑀𝑝P_{2}=(\mu^{-1}+\lambda^{-1})\operatorname{diag}(M_{p})italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( italic_μ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) roman_diag ( italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ). When 𝒯hsubscript𝒯\mathcal{T}_{h}caligraphic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT consists of Cartesian cells, Mp=diag(Mp)subscript𝑀𝑝diagsubscript𝑀𝑝M_{p}=\operatorname{diag}(M_{p})italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = roman_diag ( italic_M start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) in the GL basis. The solver is illustrated in Figure 12.

Krylov solver: MINRESBlock-diagonal preconditioner 𝒫diagsubscript𝒫diag\mathcal{P}_{\mathrm{diag}}caligraphic_P start_POSTSUBSCRIPT roman_diag end_POSTSUBSCRIPT(1,1)-block: Hybrid p𝑝pitalic_p-multigrid/Schwarz V-cycleRelaxation: SDC/FDM/sparseCoarse grid: Cholesky(2,2)-block: Mass matrix preconditionerRelaxation: point-Jacobi
Figure 12: Solver diagram for the mixed linear elasticity problem.

In Table 4 we present MINRES iteration counts for the same configuration considered in Table 3 in Section 3.1, using the [Qp]d×DQp2superscriptdelimited-[]subscriptQ𝑝𝑑subscriptDQ𝑝2[\mathrm{Q}_{p}]^{d}\times\mathrm{DQ}_{p-2}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT and RTp×DQp1subscriptRT𝑝subscriptDQ𝑝1\mathrm{RT}_{p}\times\mathrm{DQ}_{p-1}roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT elements, respectively. Both discretizations yield robust iteration counts with respect to λ𝜆\lambdaitalic_λ; the iterations grow with the former discretization much more quickly than the latter, especially in 3D.

Table 4: MINRES iteration counts for the mixed linear elasticity problem, using the solver in Figure 12.
[Qp]d×DQp2superscriptdelimited-[]subscriptQ𝑝𝑑subscriptDQ𝑝2[\mathrm{Q}_{p}]^{d}\times\mathrm{DQ}_{p-2}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT RTp×DQp1subscriptRT𝑝subscriptDQ𝑝1\mathrm{RT}_{p}\times\mathrm{DQ}_{p-1}roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT
d𝑑ditalic_d pλ𝑝𝜆p\setminus\lambdaitalic_p ∖ italic_λ 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT \infty 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT \infty
2 3 28 40 43 43 43 25 36 39 40 40
7 31 45 50 51 51 28 40 43 45 45
15 34 50 57 57 57 30 43 48 48 48
31 36 53 64 65 65 31 45 51 51 51
3 3 44 67 75 76 76 34 50 55 56 56
7 50 83 96 97 98 39 58 63 65 65
15 53 88 111 118 119 41 63 70 70 70

The solver configuration shown in Figure 12 is optimized for memory usage, employing a block diagonal preconditioner so that the short-term recurrences of MINRES may be exploited. If one is willing to trade memory for time, one may consider an alternative configuration shown in Figure 13 employing right-preconditioned GMRES [49] with a block upper triangular preconditioner,

𝒫upper=[P1B0P2],subscript𝒫uppermatrixsubscript𝑃1superscript𝐵top0subscript𝑃2\mathcal{P}_{\mathrm{upper}}=\begin{bmatrix}P_{1}&B^{\top}\\ 0&-P_{2}\end{bmatrix},caligraphic_P start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , (3.32)

which requires a single application each of P11superscriptsubscript𝑃11P_{1}^{-1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, P21superscriptsubscript𝑃21P_{2}^{-1}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and Bsuperscript𝐵topB^{\top}italic_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT per GMRES iteration. The GMRES iteration counts are presented in Table 5.

Krylov solver: GMRES(30)Block upper triangular preconditioner 𝒫uppersubscript𝒫upper\mathcal{P}_{\mathrm{upper}}caligraphic_P start_POSTSUBSCRIPT roman_upper end_POSTSUBSCRIPT(1,1)-block: Hybrid p𝑝pitalic_p-multigrid/Schwarz V-cycleRelaxation: SDC/FDM/sparseCoarse grid: Cholesky(2,2)-block: Mass matrix preconditionerRelaxation: point-Jacobi
Figure 13: Solver diagram for the mixed linear elasticity problem that trades memory for iteration counts.
Table 5: GMRES iteration counts for the mixed linear elasticity problem, using the solver in Figure 13.
[Qp]d×DQp2superscriptdelimited-[]subscriptQ𝑝𝑑subscriptDQ𝑝2[\mathrm{Q}_{p}]^{d}\times\mathrm{DQ}_{p-2}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT RTp×DQp1subscriptRT𝑝subscriptDQ𝑝1\mathrm{RT}_{p}\times\mathrm{DQ}_{p-1}roman_RT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT
d𝑑ditalic_d pλ𝑝𝜆p\setminus\lambdaitalic_p ∖ italic_λ 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT \infty 100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT 101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT 102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT \infty
2 3 17 23 25 26 26 14 20 22 22 22
7 18 25 27 28 28 16 22 24 24 24
15 20 27 33 34 34 17 23 26 26 26
31 21 30 38 38 39 18 24 28 28 28
3 3 24 32 33 34 34 17 23 25 25 25
7 27 35 38 38 38 21 26 29 29 29
15 28 38 44 46 46 22 28 31 31 32

In Table 6 we study the performance of our solver on an unstructured mesh. We consider the [Qp]d×DQp2superscriptdelimited-[]subscriptQ𝑝𝑑subscriptDQ𝑝2[\mathrm{Q}_{p}]^{d}\times\mathrm{DQ}_{p-2}[ roman_Q start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × roman_DQ start_POSTSUBSCRIPT italic_p - 2 end_POSTSUBSCRIPT discretization of incompressible linear elasticity (λ=𝜆\lambda=\inftyitalic_λ = ∞). We prescribe μ=1𝜇1\mu=1italic_μ = 1, a uniform downwards body force 𝐁=0.02𝐞2𝐁0.02subscript𝐞2\mathbf{B}=-0.02\mathbf{e}_{2}bold_B = - 0.02 bold_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and homogeneous Dirichlet BCs on the displacement on the holes of the domain. The three-dimensional mesh is obtained via extrusion by 16 layers of the two-dimensional mesh. The iteration counts follow the same pattern as before for this element: they are not p𝑝pitalic_p-robust as expected, but they remain modest even at very high degrees.

Table 6: GMRES iteration counts for the mixed formulation of the incompressible linear elasticity problem on the unstructured mesh shown here, using the solver in Figure 13.
d𝑑ditalic_d p𝑝pitalic_p # DOFs Iter.
2 3 17 466 26
7 104 250 29
15 499 002 35
31 2 173 242 44
3 3 588 927 39
7 7 876 575 47
11 31 236 927 51

[Uncaptioned image]

4 Conclusion

We have introduced a fast relaxation method required for unstructured vertex-centered patch problems arising in Pavarino’s approach, extending its practicality to much higher polynomial degrees. Our method relies on a spectrally equivalent form, constructed such that it is separable in the reference coordinates. We show promising results for the Poisson equation and mixed formulations of linear elasticity. A downside of the approach is its narrow applicability; it will not be effective on more general problems, especially for those where the dominant terms include mixed derivatives and mixed vector components. In addition, our method relies on having a good quality mesh, with its performance depending on the minimal angle; however, mesh generators with guarantees on the minimal angle are available in two dimensions [32]. So far, we have only considered constant-coefficient problems, but the theory of [6] suggests that our approach would remain effective for spatially varying coefficients. Work in progress shows that the memory bandwidth limitations and the suboptimal complexity of the sparse Cholesky factorization can be overcome by iterative patch solvers, such as incomplete factorizations and algebraic multigrid.

References

  • [1] M. Ainsworth, P. Coggins, and B. Senior, Mixed hp𝑝hpitalic_h italic_p-finite element methods for incompressible flow, Chapman and Hall CRC Research Notes in Mathematics, (2000), pp. 1–20.
  • [2] P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and J. Koster, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 15–41.
  • [3] D. N. Arnold, An interior penalty finite element method with discontinuous elements, SIAM J. Numer. Anal., 19 (1982), pp. 742–760.
  • [4] D. N. Arnold, D. Boffi, and R. S. Falk, Quadrilateral H𝐻Hitalic_H(div) finite elements, SIAM J. Numer. Anal., 42 (2005), pp. 2429–2451.
  • [5] K. J. Arrow, L. Hurwicz, and H. Uzawa, Studies in Non-linear Programming, Stanford University Press, 1958.
  • [6] O. Axelsson and J. Karátson, Equivalent operator preconditioning for elliptic problems, Numer. Algorithms, 50 (2009), pp. 297–380.
  • [7] I. Babuška, The finite element method with Lagrangian multipliers, Numer. Math., 20 (1973), pp. 179–192.
  • [8] S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Karpeyev, D. Kaushik, M. G. Knepley, D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith, S. Zampini, H. Zhang, and H. Zhang, PETSc users manual, Tech. Report ANL-95/11 - Revision 3.11, Argonne National Laboratory, 2019.
  • [9] P. D. Bello-Maldonado and P. F. Fischer, Scalable low-order finite element preconditioners for high-order spectral element Poisson solvers, SIAM J. Sci. Comput., 41 (2019), pp. S2–S18.
  • [10] C. Bernardi, Y. Maday, and B. Métivet, Spectral approximation of the periodic-nonperiodic Navier-Stokes equations, Numer. Math., 51 (1987), pp. 655–700.
  • [11] R. Blaheta, Displacement decomposition—incomplete factorization preconditioning techniques for linear elasticity problems, Numer. Lin. Algebra Appl., 1 (1994), pp. 107–128.
  • [12] F. Brezzi, On the existence, uniqueness and approximation of saddle-point problems arising from Lagrangian multipliers, ESAIM Math. Model. Num., 8 (1974), pp. 129–151.
  • [13] C. Canuto, Stabilization of spectral methods by finite element bubble functions, Comput. Methods Appl. Mech. Engrg., 116 (1994), pp. 13–26.
  • [14] Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate, ACM Trans. Math. Softw., 35 (2008).
  • [15] B. Cockburn, G. Kanschat, and D. Schötzau, A note on discontinuous Galerkin divergence-free solutions of the Navier–Stokes equations, J. Sci. Comput., 31 (2007), pp. 61–73.
  • [16] W. Couzy and M. O. Deville, A fast Schur complement method for the spectral element discretization of the incompressible Navier-Stokes equations, J. Comp. Phys., 116 (1995), pp. 135 – 142.
  • [17] L. T. Diosady and S. M. Murman, Scalable tensor-product preconditioners for high-order finite-element methods: Scalar equations, J. Comp. Phys., 394 (2019), pp. 759–776.
  • [18] M. Dryja and O. Widlund, An additive variant of the Schwarz alternating method for the case of many subregions, Technical Report 339, Ultracomputer Note 131, Department of Computer Science, Courant Institute, 1987.
  • [19] P. E. Farrell, M. G. Knepley, L. Mitchell, and F. Wechsung, PCPATCH: Software for the Topological Construction of Multigrid Relaxation Methods, ACM Trans. Math. Softw., 47 (2021).
  • [20] P. F. Fischer, H. M. Tufo, and N. I. Miller, An overlapping Schwarz method for spectral element simulation of three-dimensional incompressible flows, in Parallel Solution of Partial Differential Equations, Springer, 2000, pp. 159–180.
  • [21] A. George, Nested dissection of a regular finite element mesh, SIAM J. Numer. Anal., 10 (1973), pp. 345–363.
  • [22] I. Gustafsson and G. Lindskog, On parallel solution of linear elasticity problems: Part I: theory, Numer. Lin. Algebra Appl., 5 (1998), pp. 123–139.
  • [23] M. Homolya, R. C. Kirby, and D. A. Ham, Exposing and exploiting structure: optimal code generation for high-order finite element methods, arXiv preprint arXiv:1711.02473, (2017).
  • [24] M. Homolya, L. Mitchell, F. Luporini, and D. A. Ham, TSFC: a structure-preserving form compiler, SIAM J. Sci. Comput., 40 (2018), pp. C401–C428.
  • [25] I. Huismann, J. Stiller, and J. Fröhlich, Scaling to the stars–a linearly scaling elliptic solver for p𝑝pitalic_p-multigrid, J. Comp. Phys., 398 (2019), p. 108868.
  • [26] G. Karniadakis and S. Sherwin, Spectral/hp𝑝hpitalic_h italic_p element methods for computational fluid dynamics, Oxford University Press, 2013.
  • [27] D. S. Kershaw, Differencing of the diffusion equation in Lagrangian hydrodynamic codes, J. Comp. Phys., 39 (1981), pp. 375–395.
  • [28] R. C. Kirby, Algorithm 839: FIAT, a new paradigm for computing finite element basis functions, ACM Trans. Math. Soft., 30 (2004), pp. 502–516.
  • [29] M. Kronbichler and W. A. Wall, A Performance Comparison of Continuous and Discontinuous Galerkin Methods with Fast Multigrid Solvers, SIAM J. Sci. Comput., 40 (2018), pp. A3423–A3448.
  • [30] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, (1950).
  • [31] P. L. Lederer and J. Schöberl, Polynomial robust stability analysis for H(div)𝐻normal-divH(\mathrm{div})italic_H ( roman_div )-conforming finite elements for the Stokes equations, IMA J. Numer. Anal., 38 (2018), pp. 1832–1860.
  • [32] X. Liang and Y. Zhang, Hexagon-based all-quadrilateral mesh generation with guaranteed angle bounds, Comput. Methods Appl. Mech. Engrg., 200 (2011), pp. 2005–2020.
  • [33] J. W. Lottes and P. F. Fischer, Hybrid multigrid/Schwarz algorithms for the spectral element method, J. Sci. Comput., 24 (2005), pp. 45–78.
  • [34] R. E. Lynch, J. R. Rice, and D. H. Thomas, Direct solution of partial difference equations by tensor product methods, Numer. Math., 6 (1964), pp. 185–199.
  • [35] Y. Maday, D. Meiron, A. T. Patera, and E. M. Rønquist, Analysis of iterative methods for the steady and unsteady Stokes problem: Application to spectral element discretizations, SIAM J. Sci. Comput., 14 (1993), pp. 310–337.
  • [36] J.-F. Maitre and O. Pourquier, Condition number and diagonal preconditioning: comparison of the p𝑝pitalic_p-version and the spectral element methods, Numer. Math., 74 (1996), pp. 69–84.
  • [37] J. Manzanero, A. M. Rueda-Ramírez, G. Rubio, and E. Ferrer, The Bassi Rebay 1 scheme is a special case of the symmetric interior penalty formulation for discontinuous Galerkin discretisations with Gauss–Lobatto points, J. Comp. Phys., 363 (2018), pp. 1–10.
  • [38] J. R. Munkres, Elements of Algebraic Topology, CRC Press, 1984.
  • [39] J.-C. Nédélec, Mixed finite elements in 3superscript3\mathbb{R}^{3}blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, Numer. Math., 35 (1980), pp. 315–341.
  • [40] S. A. Orszag, Spectral methods for problems in complex geometries, J. Comp. Phys., 37 (1980), pp. 70 – 92.
  • [41] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617–629.
  • [42] L. F. Pavarino, Additive Schwarz methods for the p𝑝pitalic_p-version finite element method, Numer. Math., 66 (1993), pp. 493–515.
  • [43] L. F. Pavarino, E. Zampieri, R. Pasquetti, and F. Rapetti, Overlapping Schwarz methods for Fekete and Gauss–Lobatto spectral elements, 29 (2007), pp. 1073–1092.
  • [44] W. Pazner, Efficient low-order refined preconditioners for high-order matrix-free continuous and discontinuous Galerkin methods, SIAM J. Sci. Comput., 42 (2020), pp. A3055–A3083.
  • [45] W. Pazner and P.-O. Persson, Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods, J. Comp. Phys., 354 (2018), pp. 344–369.
  • [46] W. Pazner and P.-O. Persson, Interior penalty tensor-product preconditioners for high-order discontinuous Galerkin discretizations, in 2018 AIAA Aerospace Sciences Meeting, 2018, p. 1093.
  • [47] F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. McRae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly, Firedrake: automating the finite element method by composing abstractions, ACM Trans. Math. Soft., 43 (2016), pp. 24:1–24:27.
  • [48] J.-F. Remacle, R. Gandham, and T. Warburton, GPU accelerated spectral finite elements on all-hex meshes, J. Comp. Phys., 324 (2016), pp. 246–257.
  • [49] Y. Saad and M. H. Schultz, GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 856–869.
  • [50] D. Schötzau and C. Schwab, Mixed hp𝑝hpitalic_h italic_p-FEM on anisotropic meshes, Math. Models Methods Appl. Sci., 8 (1998), pp. 787–820.
  • [51] D. Silvester and A. Wathen, Fast iterative solution of stabilised Stokes systems Part II: using general block preconditioners, SIAM J. Numer. Anal., 31 (1994), pp. 1352–1367.
  • [52] J. Stiller, Robust multigrid for Cartesian interior penalty DG formulations of the Poisson equation in 3D, in Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2016, M. L. Bittencourt, N. A. Dumont, and J. S. Hesthaven, eds., Cham, 2017, Springer International Publishing, pp. 189–201.
  • [53] B. Szabó and I. Babuška, Finite element analysis, John Wiley & Sons, 1991.
  • [54] C. Taylor and P. Hood, A numerical solution of the Navier-Stokes equations using the finite element technique, Comput. Fluids, 1 (1973), pp. 73–100.
  • [55] J. Witte, D. Arndt, and G. Kanschat, Fast tensor product Schwarz smoothers for high-order discontinuous Galerkin methods, J. Comput. Methods Appl. Math., (2020).
  • [56] J. Xu, The auxiliary space method and optimal multigrid preconditioning techniques for unstructured grids, Computing, 56 (1996), pp. 215–235.