Analysis of an Adaptive Safeguarded Newton-Anderson Algorithm of Depth One with Applications to Fluid Problems

Matt Dallas ¹¹1Corresponding author: Matt Dallas. MD and SP were supported in part by the US NSF Project DMS 2011519 (PI: Pollock). LR was supported by US NSF Project DMS 2011490 (PI: Rebholz).
Keywords: Anderson acceleration, Newton’s Method, safeguarding, singular points, bifurcation.
2020 Mathematics Subject Classification: 65J15 ^,1 Sara Pollock² Leo G. Rebholz³
¹Department of Mathematics, University of Dallas, USA (mdallas@udallas.edu)
²Department of Mathematics, University of Florida, USA (s.pollock@ufl.edu)
³Department of Mathematics, Clemson University, USA (rebholz@clemson.edu)

(August 21, 2024)

Abstract

The purpose of this paper is to develop a practical strategy to accelerate Newton’s method in the vicinity of singular points. We present an adaptive safeguarding scheme with a tunable parameter, which we call adaptive $\gamma$ -safeguarding, that one can use in tandem with Anderson acceleration to improve the performance of Newton’s method when solving problems at or near singular points. The key features of adaptive $\gamma$ -safeguarding are that it converges locally for singular problems, and it can detect nonsingular problems automatically, in which case the Newton-Anderson iterates are scaled towards a standard Newton step. The result is a flexible algorithm that performs well for singular and nonsingular problems, and can recover convergence from both standard Newton and Newton-Anderson with the right parameter choice. This leads to faster local convergence compared to both Newton’s method, and Newton-Anderson without safeguarding, with effectively no additional computational cost. We demonstrate three strategies one can use when implementing Newton-Anderson and $\gamma$ -safeguarded Newton-Anderson to solve parameter-dependent problems near singular points. For our benchmark problems, we take two parameter-dependent incompressible flow systems: flow in a channel and Rayleigh-Bénard convection.

1 Introduction

Nonlinear systems of equations of the form $f(x)=0$ , with $f:\mathbb{R}^{n}\to\mathbb{R}^{n}$ , arise frequently in applications. Many times the solutions depend on parameters (see [6, 7, 33, 39, 40, 41, 53] and references therein) that can have a significant effect on the solution. Of particular interest are bifurcation points, which are characterized by the breakdown of local uniqueness of a solution for a particular parameter, and correspond to a qualitative change in the solution’s behavior [6, 32, 39]. Sets of solutions of similar qualitative behavior are called branches [39]. Studying these branches provides a more complete understanding of the solutions, and has applications in a wide variety of fields such as echocardiography [41], economics [56], physics [40], and engineering [57]. A necessary condition for bifurcation comes from the Implicit Function Theorem, which says that the Jacobian at a solution $x^{*}$ , $f^{\prime}(x^{*})$ , is necessarily singular if $x^{*}$ is a bifurcation point [32, p. 8]. Bifurcation points are thus examples of singular points, i.e., points $x$ for which $f^{\prime}(x)$ is singular. Techniques for computing solution branches such as continuation [53] and deflation [7, 17] require many solves of $f(x)=0$ as the parameter is varied, and these problems become singular or nearly singular near bifurcation points. A popular method for solving nonlinear equations is Newton’s method defined in Algorithm 1.

1: Choose

x_{0}\in\mathbb{R}^{n}

2: for k=1,2,… do

w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})

x_{k+1}\leftarrow x_{k}+w_{k+1}

5: end for

Algorithm 1 Newton

When $f^{\prime}(x)$ is Lipschitz continuous and $f^{\prime}(x^{*})$ is nonsingular, Newton’s method exhibits local quadratic convergence in a ball centered at $x^{*}$ . This is essentially the celebrated Newton-Kantorovich theorem [35]. If we remove the nonsingular assumption and let $f^{\prime}(x^{*})$ be singular, then the convergence behavior changes dramatically. Rather than local quadratic convergence from any $x_{0}$ in a sufficiently small ball around $x^{*}$ , we see local linear convergence in a starlike domain of convergence around $x^{*}$ [11, 12, 22, 48, 49]. Since bifurcation points are necessarily singular points, this means that continuation or deflation algorithms using Newton’s method may converge slowly or fail to converge at or near a bifurcation point. This challenge has motivated the study of modifications [11, 13, 18, 19, 23, 22, 27, 31] or alternatives [3, 4, 5, 16, 30, 50] to Newton’s method that can improve convergence behavior at singular points. Among the modifications, Richardson extrapolation and overrelaxation are popular and perform well. They can achieve superlinear and arbitrarily fast linear convergence respectively under certain conditions [23] at the cost of additional function evaluations and some knowledge of the order of the singularity [22, 23]. The order may be inferred from monitoring the singular values of the Jacobian as the solve progresses. A popular alternative to Newton’s method for singular problems is the Levenberg-Marquardt method [3, 4, 5, 16, 30]. Under standard assumptions and the local error bound, or local Lipschitzian error bound, the Levenberg-Marquardt method can achieve local quadratic convergence [30]. The local error bound is known to be much weaker than the standard nonsingularity assumption. Indeed, it can hold even for singular problems [4]. However, without the local error bound or nonsingularity, Levenberg-Marquardt is not guaranteed such success. In the absence of the local error bound, one may insist that the function $f$ is 2-regular [18, 19, 28, 29], in which case Levenberg-Marquardt converges locally linearly in a starlike domain much like Newton’s method [28].

The method of interest in this paper, Anderson acceleration, has a long track record of accelerating linearly converging fixed-point methods, and has been applied in many different fields [1, 34, 37, 38, 44, 51, 55]. Further, when applied as a modification to Newton’s method at singular points, in contrast to Richardson extrapolation or overrelaxation discussed in [23], this success requires no knowledge of the order of the root or additional function evaluations. It was also shown recently that under a condition equivalent to 2-regularity (discussed further in Section 2), Anderson accelerates Newton’s method when applied to singular problems [10], and therefore outperforms Levenberg-Marquardt in the absence of the local error bound. This was demonstrated numerically in [10]. Also known as Anderson extrapolation or Anderson mixing [54], Anderson acceleration was first introduced in 1965 by D.G. Anderson in [2] to improve the convergence of fixed-point iterations applied to integral equations. The algorithm combines the previous $m+1$ iterates and update steps into a new iterate at each step of the solve. The number $m$ is commonly known as the algorithmic depth. The combination of the $m+1$ iterates often involves solving a least-squares problem, but since $m$ is typically small, the computational cost of this step is in general orders of magnitude less than that of a single linear solve. There are problems for which taking $m$ much larger can be beneficial [42, 54], and later in this paper we will see numerically that increasing $m$ can improve or recover convergence near bifurcation points. With greater depths, the least-squares problem may suffer from ill-conditioning if proper care is not taken in the implementation [43]. In [10], the authors developed a convergence and acceleration theory for Anderson accelerated Newton’s method with depth $m=1$ , defined in Algorithm 2, applied to singular problems. This is a special case of depth $m\geq 1$ given in Algorithm 3.

1: Choose

x_{0}\in\mathbb{R}^{n}

. Set

w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})

, and

x_{1}=x_{0}+w_{1}

2: for k=1,2,… do

w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})

\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}

x_{k+1}\leftarrow x_{k}+w_{k+1}-\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})

6: end for

Algorithm 2 Newton-Anderson(1)

1: Choose

x_{0}\in\mathbb{R}^{n}

and

m\geq 0

. Set

w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})

, and

x_{1}=x_{0}+w_{1}

2: for k=1,2,… do

m_{k}\leftarrow\min\{k,m\}

w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})

F_{k}=\big{(}(w_{k+1}-w_{k})\cdots(w_{k-m+2}-w_{k-m+1})\big{)}

E_{k}=\big{(}(x_{k}-x_{k-1})\cdots(x_{k-m+1}-x_{k-m})\big{)}

\gamma_{k+1}\leftarrow\text{argmin}_{\gamma\in\mathbb{R}^{m}}\|w_{k+1}-F_{k}% \gamma\|_{2}^{2}

x_{k+1}\leftarrow x_{k}+w_{k+1}-(E_{k}+F_{k})\gamma_{k+1}

9: end for

Algorithm 3 Newton-Anderson(m)

A major challenge when proving convergence of Newton-like methods near singular points is ensuring that the iterates remain well-defined. The authors in [10] introduced a novel safeguarding scheme called $\gamma$ -safeguarding, defined in Algorithm 4 in Section 2, to deal with this problem. The result was a convergence proof for $\gamma$ -safeguarded Newton-Anderson, and it was observed numerically to perform better or no worse compared to standard Newton-Anderson, particularly when applied to nonsingular problems.

The purpose of this paper is to extend these ideas of [10] by developing an adaptive version of $\gamma$ -safeguarding that automatically detects nonsingular problems, and to demonstrate the effectiveness of Newton-Anderson and adaptive $\gamma$ -safeguarded Newton-Anderson at solving parameter-dependent PDEs near bifurcation points in fluid problems. This new adaptive scheme is proven to be locally convergent under the same conditions as the non-adaptive scheme, and the automatic detection of nonsingular problems enables local quadratic convergence when applied to nonsingular problems. Such a property is desirable when solving nonlinear problems near bifurcation points, because even if the problem itself is not singular, convergence can still be affected if it is close to a singular problem [14]. One would like to enjoy the benefits of Newton-Anderson in the preasymptotic regime such as a larger domain of convergence [45], but not lose quadratic convergence in the asymptotic regime if the problem is nonsingular. It is not always known a priori if a problem is singular or nonsingular, and Anderson acceleration can reduce the order of convergence when applied to superlinearly converging iterations such as Newton’s method applied to a nonsingular problem [47]. This problem is solved with adaptive $\gamma$ -safeguarded Newton-Anderson with effectively no additional computational cost. We also show numerically that increasing the algorithmic depth of the Newton-Anderson algorithm can recover convergence when Newton fails near bifurcation points, but only for specific choices of $m$ .

The algorithms of interest in this paper are Newton, defined in Algorithm 1, Newton-Anderson with algorithmic depth $1$ (NA( $1$ )) defined in Algorithm 2, Newton-Anderson with algorithmic depth $m$ (NA( $m$ )) defined in Algorithm 3, $\gamma$ -safeguarded Newton-Anderson ( $\gamma$ NA( $r$ )), defined in Algorithm 5, and adaptive $\gamma$ -safeguarded Newton-Anderson ( $\operatorname{\gamma\text{NAA}(\hat{r})}$ ), defined in Algorithm 7. We implement $\gamma$ NA( $r$ ) and $\operatorname{\gamma\text{NAA}(\hat{r})}$ by replacing line 5 in Algorithm 2 with, respectively, $\gamma$ -safeguarding (Algorithm 4) and adaptive $\gamma$ -safeguarding (Algorithm 6). The norm in the algorithms is the Euclidean norm. There should be no confusion between NA( $m$ ) and $\gamma$ NA( $r$ ) or $\operatorname{\gamma\text{NAA}(\hat{r})}$ since $\gamma$ -safeguarding is currently only developed for depth $m=1$ . The rest of the paper is organized as follows. In Section 2, we review the original $\gamma$ -safeguarding algorithm and its role in the convergence theory developed in [10]. In Section 3, we introduce the new adaptive $\gamma$ -safeguarding algorithm and prove that $\operatorname{\gamma\text{NAA}(\hat{r})}$ can recover local quadratic convergence when applied to nonsingular problems in Corollary 3.1. We conclude in Section 4 by applying NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ to two parameter-dependent incompressible flow systems, and discussing various strategies one can use when employing $\gamma$ -safeguarding.

2 The $\gamma$ -safeguarding algorithm

For the theory discussed in Section 2 and Section 3, unless stated otherwise, we take $f:\mathbb{R}^{n}\to\mathbb{R}^{n}$ to be a $C^{3}$ function, $f(x^{*})=0$ , $N=\text{null}\left(f^{\prime}(x^{*})\right)$ , $R=\text{range}\left(f^{\prime}(x^{*})\right)$ , $\dim N=1$ , and $P_{N}$ and $P_{R}$ to be the orthogonal projections onto $N$ and $R$ respectively. The assumption that $R\perp N$ is common in the singular Newton literature [11, 21], and no generality is lost in finite dimensions. Indeed, Newton’s method is essentially invariant under nonsingular affine transformations of the domain and nonsingular linear transformations of the range [15]. Thus to determine the convergence behavior of Newton’s method applied to a general $C^{3}$ function $F:\mathbb{R}^{n}\to\mathbb{R}^{n}$ , it suffices to study that of $f(x)=U^{T}F(x)V$ , where $U$ and $V$ come from the single value decomposition $F^{\prime}(x^{*})=U\Sigma V^{T}$ . Since $f^{\prime}(x^{*})=U^{T}F^{\prime}(x^{*})V=\Sigma$ , it follows that $\text{null}\left(f^{\prime}(x^{*})\right)\perp\text{range}\left(f^{\prime}(x^{% *})\right)$ . Lastly, let $\|\cdot\|$ denote the Euclidean 2-norm, $B_{\rho}(x)$ denote a ball of radius $\rho$ centered at $x$ , $e_{k}=x_{k}-x^{*}$ , $w_{k+1}=-f^{\prime}(x_{k})^{-1}f(x_{k})$ , and

\displaystyle\theta_{k+1}=\frac{\|w_{k+1}-\gamma_{k+1}(w_{k+1}-w_{k})\|}{\|w_{% k+1}\|},

(1)

where $\gamma_{k+1}$ is computed via Algorithm 2. The term $\theta_{k+1}$ is known as the optimization gain, and is key to determining when Anderson acceleration is successful both in the singular and nonsingular cases [10, 42].

Algorithm 4

\gamma

-safeguarding

1: Given

x_{k}

x_{k-1}

w_{k+1}

w_{k}

\gamma_{k+1}

, and

r\in(0,1)

. Set

\lambda=1

\beta_{k+1}\leftarrow r\|w_{k+1}\|/\|w_{k}\|

3: if

\gamma_{k+1}=0

\gamma_{k+1}\geq 1

then

\lambda\leftarrow 0

5: else if

|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}

then

\lambda\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}

7: end if

x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})

1: Choose

x_{0}\in\mathbb{R}^{n}

and

r\in(0,1)

. Set

w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})

, and

x_{1}=x_{0}+w_{1}

2: for k=1,2,… do

w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})

\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}

\beta_{k+1}\leftarrow r\|w_{k+1}\|/\|w_{k}\|

\lambda\leftarrow 1

7: if

\gamma_{k+1}=0

\gamma_{k+1}\geq 1

then

\lambda\leftarrow 0

9: else if

|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}

then

10:

\lambda\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}

11: end if

12:

x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})

13: end for

Algorithm 5

\gamma

-Safeguarded Newton-Anderson

\left(\gamma\text{NA}(r)\right)

To implement $\gamma$ NA $(r)$ , one replaces line 5 in Algorithm 2 with Algorithm 4 giving Algorithm 5. The resulting steps, $x_{k+1}=x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})$ , can be viewed as NA steps scaled by $\lambda$ towards a Newton step based on a user-chosen parameter, $r$ , which is set at the start of the solve. This parameter determines how strongly the NA steps are scaled towards a Newton step, i.e., how close a $\gamma$ NA $(r)$ step is to a Newton step. When $r\approx 0$ , the $\gamma$ NA $(r)$ iterates will be heavily scaled towards the latest Newton step, and when $r\approx 1$ , the $\gamma$ NA $(r)$ iterates will behave more like standard NA. The idea behind $\gamma$ -safeguarding is to take advantage of the particular convergence behavior of Newton’s method near singular points, which we now describe. Since $\dim N=1$ , $N$ is spanned by some nonzero $\varphi\in\mathbb{R}^{n}$ . If the linear operator $\hat{D}(\cdot):=P_{N}f^{\prime\prime}(x^{*})\left(\varphi,P_{N}(\cdot)\right)$ is nonsingular as a map from $N$ to $N$ , then there exists $\hat{\rho}>0$ and $\hat{\sigma}>0$ such that $f^{\prime}(x)$ is nonsingular for all $x\in\hat{W}:=B_{\hat{\rho}}(x^{*})\cap\{x:\|P_{R}(x-x^{*})\|<\hat{\sigma}\|P_{% N}(x-x^{*})\|\}$ and Newton’s method converges linearly to $x^{*}$ from any $x_{0}\in\hat{W}$ [11]. The assumption that $\hat{D}$ is nonsingular as a linear map on $N$ is equivalent to the assumption that $f$ is 2-regular at $x^{*}$ in the direction $\varphi$ [19]. A stronger result due to Griewank [22, Theorem 6.1] says that when $\hat{D}$ is nonsingular, Newton’s method converges from every $x_{0}$ in a starlike region with density 1 with respect to $x^{*}$ , and the iterates lead into $\hat{W}$ provided $\|x_{0}-x^{*}\|$ is sufficiently small. Thus for our purposes in this paper studying local convergence of NA, it suffices to study the behavior of iterates in $\hat{W}$ . Though the focus of this work is the $\dim N=1$ case, we note that Griewank’s Theorem [22, Theorem 6.1] holds for $\dim N>1$ , and numerical experiments from [10] on small-scale problems indicate that NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ are effective when $\dim N>1$ . While the current theoretical results for $\operatorname{\gamma\text{NAA}(\hat{r})}$ require $\dim N=1$ , extensions of these results for $\dim N>1$ will be studied in future work.

The main challenge in accelerating Newton’s method is ensuring the iterates remain in $\hat{W}$ , which requires $\|P_{R}(x-x^{*})\|/\|P_{N}(x-x^{*})\|$ to remain bounded. In other words, the iterates can’t be accelerated “too much” along the null space. Evidently, Anderson may accelerate Newton significantly, especially in $\hat{W}$ when $\dim N=1$ . This is demonstrated by the following proposition. We will use the notation $(x_{k}+w_{k+1})^{\alpha}:=x_{k}+w_{k+1}-\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{% k})$ . Similarly, $P_{N}e_{k}^{\alpha}=P_{N}e_{k}-\gamma_{k+1}P_{N}(e_{k}-e_{k-1})$ , $(T_{k}P_{R}e_{k})^{\alpha}:=T_{k}P_{R}e_{k}-\gamma_{k+1}(T_{k}P_{R}e_{k}-T_{k-% 1}P_{R}e_{k-1})$ , and $w_{k+1}^{\alpha}=w_{k+1}-\gamma_{k+1}(w_{k+1}-w_{k})$ . Here $T_{k}$ denotes a linear map defined in the proof of Proposition 2.1. We also note that $\hat{D}$ is nonsingular if and only if $\hat{D}(x)(\cdot):=P_{N}f^{\prime\prime}(x^{*})(P_{N}(x-x^{*}),P_{N}(\cdot))$ is nonsingular for all $x$ with $P_{N}(x-x^{*})\neq 0$ .

Proposition 2.1.

Let $f\in C^{3}$ , $\dim N=1$ , $\hat{D}$ nonsingular, and $x_{k},x_{k-1}\in\hat{W}$ so that $x_{k+1}=(x_{k}+w_{k+1})^{\alpha}$ is well-defined. If $P_{N}w_{k+1}\neq P_{N}w_{k}$ and $|1-\gamma_{k+1}|\,\|P_{N}e_{k}\|\neq|\gamma_{k+1}|\,\|P_{N}e_{k-1}\|\neq 0$ , then for sufficiently small $\hat{\sigma}>0$ and $\hat{\rho}>0$ there is a constant $C=C(\hat{\sigma},\hat{\rho})$ such that

\displaystyle\|P_{N}e_{k+1}\|\leq C\max\{|1-\hat{\gamma}|,|\hat{\gamma}|\}\max% \{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\},

(2)

where $\hat{\gamma}:=(P_{N}w_{k+1})^{T}(P_{N}w_{k+1}-P_{N}w_{k})/\|P_{N}w_{k+1}-P_{N}% w_{k}\|^{2}$ .

Proof.

First note that $w_{k+1}^{\alpha}=P_{N}w_{k+1}^{\alpha}+P_{R}w_{k+1}^{\alpha}$ . Since $\gamma_{k+1}:=\text{argmin}_{\gamma\in\mathbb{R}}\|w_{k+1}-\gamma(w_{k+1}-w_{k% })\|$ by Algorithm 2, and $R\perp N$ , we have that for any $\gamma\in\mathbb{R}$ ,

\displaystyle\|w_{k+1}^{\alpha}\|^{2}\leq\|P_{N}w_{k+1}-\gamma(P_{N}w_{k+1}-P_% {N}w_{k})\|^{2}+\|P_{R}w_{k+1}-\gamma(P_{R}w_{k+1}-P_{R}w_{k})\|^{2}.

(3)

Taking $\gamma=\hat{\gamma}$ gives $\|P_{N}w_{k+1}-\hat{\gamma}(P_{N}w_{k+1}-P_{N}w_{k})\|^{2}=0$ . By Proposition 3.1 in [10], we can write

\displaystyle w_{k+1}^{\alpha}=-(1/2)P_{N}e_{k}^{\alpha}+\left((T_{k}-I)P_{R}e% _{k}\right)^{\alpha}+q_{k-1}^{k},

(4)

where $T_{k}(\cdot):=(1/2)\hat{D}(x_{k})^{-1}f^{\prime\prime}(x_{k})(e_{k},\cdot)$ is a linear map whose range lies in $N$ [10], and $\|q_{k-1}^{k}\|\leq c\max\{|1-\gamma_{k+1}|,|\gamma_{k+1}|\}\max\{\|e_{k}\|^{2% },\|e_{k-1}\|^{2}\}$ for a constant $c$ determined by $f$ . Hence $\|w_{k+1}^{\alpha}\|\leq\|P_{R}w_{k+1}-\hat{\gamma}P_{R}(w_{k+1}-w_{k})\|\leq c% \max\{|1-\hat{\gamma}|,|\hat{\gamma}|\}\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\}.$ We also have from Proposition 3.1 in [10] that $P_{N}e_{k+1}=(1/2)P_{N}e_{k}^{\alpha}+(T_{k}P_{R}e_{k})^{\alpha}+q_{k-1}^{k}$ . Let $\mu_{k+1}^{e}=\|(T_{k}P_{R}e_{k})^{\alpha}+q_{k-1}^{k}\|/\|(1/2)P_{N}e_{k}^{% \alpha}\|$ . This expansion of $P_{N}e_{k+1}$ combined with Equation (4) gives

\displaystyle\|P_{N}e_{k+1}\|\leq\left(\frac{1+\mu_{k_{+}1}^{e}}{1-\mu_{k+1}^{% e}}\right)\|P_{N}w_{k+1}^{\alpha}\|\leq C_{k}\max\{|1-\hat{\gamma}|,|\hat{% \gamma}|\}\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\},

(5)

where we define $C_{k}:=c(1+\mu_{k_{+}1}^{e})(1-\mu_{k+1}^{e})^{-1}$ . One can show that $\mu_{k+1}^{e}\to 0$ as $\hat{\sigma}$ and $\hat{\rho}$ tend to zero [10]. Thus for sufficiently small $\hat{\sigma}$ and $\hat{\rho}$ we have $C_{k}\leq C$ . This completes the proof. ∎

Note that $\hat{\gamma}$ can be very large when $P_{N}w_{k+1}\approx P_{N}w_{k}$ , which may correspond to the terminal phase of the solve. So this bound is meaningful in the asymptotic regime when there is still a significant decrease in the residual at each step. Such a bound is good for a single step, but this dramatic acceleration of $P_{N}e_{k+1}$ could place $x_{k+1}$ outside the domain of invertibility $\hat{W}$ , i.e., $x_{k+1}$ may stray too far from $N$ . This is where $\gamma$ -safeguarding is useful. It ensures that the $\gamma$ NA $(r)$ iterates remain within $\hat{W}$ by taking advantage of the way Newton steps are attracted to $N$ and scaling NA steps towards Newton steps when the conditions of Algorithm 4 are met. This is also the key to the convergence proof of $\gamma$ NA $(r)$ given in [10]. However, given the results of [47], if the problem is nonsingular one should use Newton’s method without Anderson, but it is not always obvious a priori if the problem at hand is singular or nonsingular. In the next section, we develop an adaptive version of $\gamma$ NA $(r)$ that enjoys guaranteed local convergence when applied to singular problems, but can also detect nonsingular problems automatically, at no additional computational cost, and “turn off” NA in response. This leads to local quadratic convergence if the problem is nonsingular.

3 Adaptive $\gamma$ -safeguarding

It was observed in [10] that $\gamma$ NA $(r)$ performed competitively with standard NA(1). For example, $\gamma$ NA(0.5) could outperform NA(1) when applied to certain nonsingular problems. This is not surprising given the results of [47], and it had previously been observed numerically in [45] that NA does not necessarily improve convergence when applied to nonsingular problems. Thus, by setting the $\gamma$ -safeguarding parameter $r=0.5$ , thereby scaling the iterates closer to Newton steps, we see less of the effect of a full NA step on the order of convergence.

Even if it is known that the problem is nonsingular, one may still wish to use NA to take advantage of the larger domain of convergence [45]. In such a scenario one could set $r$ close to zero, but this means the effect of NA is never completely eliminated which could lead to a smaller order of convergence relative to Newton for nonsingular problems. This, and the fact that often it is not known a priori if the problem is singular or nonsingular, motivates the development of an adaptive form of $\gamma$ NA( $r$ ) that can automatically detect nonsingular problems, and scale an NA step accordingly, without sacrificing local convergence and acceleration for singular problems. We will denote this adaptive choice of $r$ from Algorithm 4 by $r_{k+1}$ , with $k$ denoting the iteration count. There are three criteria that the choice of $r_{k+1}$ should satisfy, which we record in Criteria 3.1.

Criteria 3.1.

An adaptive $\gamma$ -safeguarding tolerance $r_{k+1}$ should satisfy the following.

0.

$r_{k+1}{\ll}1$ if $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|{\ll}1$ ;
0.

$r_{k+1}\approx 1$ if $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|\approx 1$ ; and
0.

$\lim_{k\to\infty}r_{k+1}=0$ if $f^{\prime}(x^{*})$ is nonsingular.

Criterion 3.1. ‣ 3.1 says that if $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|$ is very small, then we want to scale the NA step generated from $x_{k}$ and $x_{k-1}$ heavily towards $x_{k}+w_{k+1}$ . In the singular case, this will (locally) keep $x_{k+1}$ within the domain of invertibility. Alternatively, if the problem is nonsingular, but close to a singular problem, then scaling NA towards a Newton step is also preferred when $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|$ is small since then we do not slow Newton’s fast local quadratic convergence. Criterion 3.1. ‣ 3.1 says that if $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|$ is close to one, then the error is not decreasing significantly, and we want to allow NA to act on $x_{k}$ and $x_{k-1}$ without significant scaling of $\gamma_{k+1}$ from safeguarding. Lastly, Criterion 3.1. ‣ 3.1 is important because if $f^{\prime}(x^{*})$ is nonsingular, Newton’s method will converge quadratically in a neighborhood of $x^{*}$ . We therefore want to “turn off” NA near $x^{*}$ , and insisting that $r_{k+1}\to 0$ asymptotically achieves this.

Observing Criteria 3.1. ‣ 3.1- ‣ 3.1, one may note that we essentially want $r_{k+1}$ to behave like $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|$ within the domain of convergence $\hat{W}$ . Of course, we can not compute $\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|$ , but Equation (4) says that in $\hat{W}$ , $w_{k+1}\approx P_{N}e_{k}$ . So if we take $r_{k+1}=\|w_{k+1}\|/\|w_{k}\|$ , we can expect Criteria 3.1. ‣ 3.1- ‣ 3.1 to be enforced locally. For Criterion 3.1. ‣ 3.1, if $f^{\prime}(x^{*})$ is nonsingular, then locally we will have $\|w_{k+1}\|/\|w_{k}\|\to 0$ . Thus $r_{k+1}=\|w_{k+1}\|/\|w_{k}\|$ satisfies the three criteria within the domain of convergence. With this choice of $r_{k+1}$ , we have adaptive $\gamma$ -safeguarding and $\operatorname{\gamma\text{NAA}(\hat{r})}$ .

Algorithm 6 Adaptive

\gamma

-safeguarding

1: Given

x_{k}

x_{k-1}

w_{k+1}

w_{k}

\gamma_{k+1}

, and

\hat{r}\in(0,1)

, set

\eta_{k+1}=\|w_{k+1}\|/\|w_{k}\|

r_{k+1}=\min\{\eta_{k+1},\hat{r}\}

, and

\lambda^{a}=1

\beta_{k+1}\leftarrow r_{k+1}\eta_{k+1}

3: if

\gamma_{k+1}=0

\gamma_{k+1}\geq 1

then

\lambda^{a}\leftarrow 0

5: else if

|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}

then

\lambda^{a}\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}

7: end if

x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda^{a}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-% w_{k})

1: Choose

x_{0}\in\mathbb{R}^{n}

and

\hat{r}\in(0,1)

. Set

w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})

and

x_{1}=x_{0}+w_{1}

2: for k=1,2,… do

w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})

\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}

\eta_{k+1}\leftarrow\|w_{k+1}\|/\|w_{k}\|

r_{k+1}\leftarrow\min\{\eta_{k+1},\hat{r}\}

\beta_{k+1}\leftarrow r_{k+1}\eta_{k+1}

\lambda^{a}\leftarrow 1

9: if

\gamma_{k+1}=0

\gamma_{k+1}\geq 1

then

10:

\lambda^{a}\leftarrow 0

11: else if

|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}

then

12:

\lambda^{a}\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}

13: end if

14:

x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda^{a}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-% w_{k})

15: end for

Algorithm 7 Adaptive

\gamma

-Safeguarded Newton-Anderson

\left(\operatorname{\gamma\text{NAA}(\hat{r})})\right)

Adaptive $\gamma$ -safeguarding differs from Algorithm 4 only in line 2. In Algorithm 4, $\beta_{k+1}=r\eta_{k+1}$ whereas $\beta_{k+1}=r_{k+1}\eta_{k+1}$ in Algorithm 6 with $r_{k+1}=\min\{\eta_{k+1},\hat{r}\}$ . This one change can have a significant impact on convergence as demonstrated in Section 4, and can enable locally quadratic convergence when applied to nonsingular problems (see Corollary 3.1). Similar to $\gamma$ NA $(r)$ , one implements $\operatorname{\gamma\text{NAA}(\hat{r})}$ by replacing line 5 in Algorithm 2 with Algorithm 6 and setting $\hat{r}$ at the start of the solve. The result is Algorithm 7. The choice of $\hat{r}$ here sets the weakest safeguarding the user wants to impose. Thus we are always safeguarding at least as strictly as standard $\gamma$ -safeguarding with $r=\hat{r}$ . Stated concisely, we have $r_{k+1}\leq\hat{r}$ . Local convergence of $\operatorname{\gamma\text{NAA}(\hat{r})}$ then follows from Theorem $6.1$ in [10]. To state this precisely, let $\lambda_{k+1}$ be the value of $\lambda^{a}$ computed by Algorithm 6 at step $k$ , $\theta_{k+1}^{\lambda}=\|w_{k+1}-\lambda_{k+1}\gamma_{k+1}(w_{k+1}-w_{k})\|/\|% w_{k+1}\|$ , $x_{k+1}^{NA}:=x_{k}+w_{k+1}-\lambda_{k+1}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_% {k})$ , and $\sigma_{k}:=\|P_{R}e_{k}\|/\|P_{N}e_{k}\|$ . Then we have the following theorem.

Theorem 3.1.

Let $\dim N=1$ , and let $\hat{D}$ be invertible as a map on $N$ . Let $W_{k}:=B_{\|e_{k}\|}(x^{*})\cap\{x:\|P_{R}(x-x^{*})\|<\sigma_{k}\|P_{N}(x-x^{*% })\|\}$ . If $x_{0}$ is chosen so that $\sigma_{0}<\hat{\sigma}$ and $\|e_{0}\|<\hat{\rho}$ , for sufficiently small $\hat{\sigma}$ and $\hat{\rho}$ , $x_{1}=x_{0}+w_{1}$ , and $x_{k+1}=x_{k+1}^{NA}$ for $k\geq 1$ , then $W_{k+1}\subset W_{0}$ for all $k\geq 0$ and $x_{k}\to x^{*}$ . That is, $\{x_{k}\}$ remains well-defined and converges to $x^{*}$ . Furthermore, there exist constants $C>0$ and $\kappa\in(1/2,1)$ such that

	$\displaystyle\\|P_{R}e_{k+1}\\|$	$\displaystyle\leq C\max\{\|1-\lambda_{k+1}\gamma_{k+1}\|\,\\|e_{k}\\|^{2},\|\lambda% _{k+1}\gamma_{k+1}\|\,\\|e_{k-1}\\|^{2}\}$		(6)
	$\displaystyle\\|P_{N}e_{k+1}\\|$	$\displaystyle\leq\kappa\theta_{k+1}^{\lambda}\\|P_{N}e_{k}\\|$		(7)

for all $k\geq 1$ .

Under the assumptions of Theorem 3.1, Griewank’s Theorem [22, Theorem 6.1] says that Newton’s method almost surely leads into the domain of convergence $\hat{W}$ provided $x_{0}$ is sufficiently close to $x^{*}$ . This effectively means that if the sequence $x_{k}$ generated by $\operatorname{\gamma\text{NAA}(\hat{r})}$ approaches $x^{*}$ we will have almost sure convergence eventually, and this convergence will be faster than Newton. The precise improvement is determined by the asymptotic behavior of $\theta_{k+1}^{\lambda}$ . Globalization techniques such as linesearch methods may be used to bring the $\operatorname{\gamma\text{NAA}(\hat{r})}$ iterates closer to $x^{*}$ . In particular, $\operatorname{\gamma\text{NAA}(\hat{r})}$ with an Armijo linesearch was shown to be effective in [10].

Our choice of $r_{k+1}$ in Algorithm 6 is partially motivated by Criterion 3.1. ‣ 3.1. That is, in the case of a nonsingular problem, we prefer to use Newton asymptotically rather than NA. Hence we want $r_{k+1}$ to tend to zero as our solver converges, thereby scaling the $\operatorname{\gamma\text{NAA}(\hat{r})}$ iterates heavily towards pure Newton steps in the asymptotic regime and enjoying quadratic convergence locally. The remainder of this section is dedicated to quantifying how close a $\operatorname{\gamma\text{NAA}(\hat{r})}$ iterate is to a standard Newton iterate in the asymptotic regime when $f^{\prime}(x^{*})$ is nonsingular. The main result is Theorem 3.2 below which bounds $\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|$ locally, where $x_{k+1}^{\operatorname{Newt}}:=x_{k}+w_{k+1}$ , and $x_{k+1}^{\operatorname{NA}}:=x_{k+1}^{\operatorname{Newt}}-\lambda_{k+1}\gamma% _{k+1}(x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}})$ . This notation is introduced to emphasize the Newton and Newton-Anderson iterates in the comparison. We will also define $e_{k}^{\operatorname{Newt}}:=x_{k}^{\operatorname{Newt}}-x^{*}=x_{k-1}+w_{k}-x% ^{*}$ . The following lemma will be used in the proof of Theorem 3.2. Lemma 3.1 bounds $|\lambda_{k+1}\gamma_{k+1}|$ , the scaled $\gamma_{k+1}$ returned by $\operatorname{\gamma\text{NAA}(\hat{r})}$ at iteration $k$ , in terms of $\eta_{k+1}$ and $r_{k+1}=\min\{\eta_{k+1},\hat{r}\}$ . The proof consists of walking through the cases in Algorithm 6, and is therefore left to the interested reader.

Lemma 3.1.

Let $\eta_{k+1}=\|w_{k+1}\|/\|w_{k}\|$ and $\hat{r}\in(0,1)$ . Define $r_{k+1}:=\min\{\eta_{k+1},\hat{r}\}$ and $\beta_{k+1}:=r_{k+1}\eta_{k+1}$ as in Algorithm 6. Let $\lambda_{k+1}$ be the value computed by Algorithm 6 at iteration $k$ . If $\eta_{k+1}<1$ , then $\lambda_{k+1}\gamma_{k+1}$ returned by Algorithm 6 satisfies

\displaystyle|\lambda_{k+1}\gamma_{k+1}|\leq\frac{\beta_{k+1}}{1-\beta_{k+1}}.

(8)

when $\lambda_{k+1}=1$ , and

\displaystyle|\lambda_{k+1}\gamma_{k+1}|=\frac{\beta_{k+1}}{1+\operatorname{% sign}(\gamma_{k+1})\beta_{k+1}}.

(9)

when $\lambda_{k+1}<1$ .

With Lemma 3.1, we can bound $\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|$ in terms of $\|e_{k}\|$ , $\|e_{k-1}\|$ , and $\eta_{k+1}$ .

Theorem 3.2.

If $f^{\prime}(x^{*})$ is nonsingular, then there exists a $\rho>0$ and a constant $C$ depending only on $f$ such that for $x_{k}$ and $x_{k-1}$ in $B_{\rho}(x^{*})$ and $\eta_{k+1}<1$ ,

\displaystyle\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C% \left(\frac{\beta_{k+1}}{1-\beta_{k+1}}\right)\max\{\|e_{k}\|^{2},\|e_{k-1}\|^% {2}\}

(10)

when $\lambda_{k+1}=1$ , and

\displaystyle\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C% \left(\frac{\beta_{k+1}}{1+\operatorname{sign}(\gamma_{k+1})\beta_{k+1}}\right% )\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\}

(11)

when $\lambda_{k+1}<1$ .

Proof.

Using our notation from the discussion preceding Lemma 3.1, an iterate generated by $\operatorname{\gamma\text{NAA}(\hat{r})}$ takes the form $x_{k+1}^{\operatorname{NA}}=x_{k+1}^{\operatorname{Newt}}-\lambda_{k+1}\gamma_% {k+1}\left(x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\right)$ . Therefore $\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|=|\lambda_{k+1}% \gamma_{k+1}|\,\|x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\|$ . Since $f^{\prime}(x^{*})$ is nonsingular, we can take $\rho$ sufficiently small to ensure that $\|e_{j+1}^{\operatorname{Newt}}\|\leq C\|e_{j}\|^{2}$ , where $C$ is a constant determined by $f$ , when $\|e_{j}\|<\rho$ for $j=k,k-1$ . Hence, upon adding and subtracting $x^{*}$ we obtain $\|x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\|=\|e_{k+1}^{% \operatorname{Newt}}-e_{k}^{\operatorname{Newt}}\|\leq 2C\max\{\|e_{k}\|^{2},% \|e_{k-1}\|^{2}\}$ . To complete the proof, we write $2C=C$ and apply Lemma 3.1 to bound $|\lambda_{k+1}\gamma_{k+1}|$ for the cases $\lambda_{k+1}=1$ and $\lambda_{k+1}<1$ . ∎

We conclude this section with Corollary 3.1, which proves that $\operatorname{\gamma\text{NAA}(\hat{r})}$ can recover local quadratic convergence from NA when applied to nonsingular problems.

Corollary 3.1.

If $f^{\prime}(x^{*})$ is nonsingular, $\|e_{k}\|<\|e_{k-1}\|$ , and $\eta_{k+1}<\hat{r}$ , then there exists a $\rho>0$ and constants $C_{1}$ and $C_{2}$ depending only on $f$ such that

\displaystyle\|e_{k+1}\|\leq\left(\frac{C_{1}}{1-\hat{r}^{2}}+C_{2}\right)\|e_% {k}\|^{2}

(12)

for $x_{k},x_{k-1}\in B_{\rho}(x^{*})$ .

Proof.

Adding and subtracting $e_{k+1}^{\operatorname{Newt}}$ to $e_{k+1}$ gives $\|e_{k+1}\|\leq\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|+% \|e_{k+1}^{\operatorname{Newt}}\|.$ By Theorem 3.2, $\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C\beta_{k+1}% (1-\beta_{k+1})^{-1}\max\left\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\right\}$ , and for $x_{k}\in B_{\rho}(x^{*})$ , $\|e_{k+1}^{\operatorname{Newt}}\|\leq C_{2}\|e_{k}\|^{2}$ . Since $\eta_{k+1}<\hat{r}$ , we have $\beta_{k+1}=\eta_{k+1}^{2}$ . Moreover, when $f^{\prime}(x^{*})$ is nonsingular, Taylor expansion shows that

\displaystyle\eta_{k+1}\leq C_{1}\frac{\|e_{k}\|}{\|e_{k-1}\|}

(13)

for $x_{k},x_{k-1}\in B_{\rho}(x^{*})$ . Thus $\beta_{k+1}\max\left\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\right\}\leq C_{1}\|e_{k}\|% ^{2}$ , and therefore

\displaystyle\|e_{k+1}\|\leq\left(\frac{C_{1}}{1-\hat{r}^{2}}+C_{2}\right)\|e_% {k}\|^{2}.

(14)

∎

4 Numerics

In this section, we demonstrate the effectiveness of NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ near bifurcation points by applying these algorithms to the following parameter-dependent PDEs. All computations are performed on an M1 MacBook with GNU Octave 8.2.0.

4.1 Test Problems

Navier-Stokes Flow in a Channel

{\small\left\{\begin{split}-\mu\Delta\operatorname{\mathbf{u}}+\operatorname{% \mathbf{u}}\cdot\nabla\operatorname{\mathbf{u}}+\nabla p&=\mathbf{0}\\ \nabla\cdot\operatorname{\mathbf{u}}&=\mathbf{0}\end{split}\qquad\qquad\begin{% split}\mathbf{u}&=\mathbf{u}_{\text{in}},\hskip 4.5pt\Gamma_{\text{in}},\\ \mathbf{u}&=0,\hskip 4.5pt\Gamma_{\text{wall}},\\ -p\mathbf{n}+(\mu\nabla\mathbf{u})\mathbf{n}&=0,\hskip 4.5pt\Gamma_{\text{out}% }.\end{split}\right.}

(15)

Rayleigh-Bénard Convection

{\small\left\{\begin{split}-\mu\Delta\operatorname{\mathbf{u}}+\operatorname{% \mathbf{u}}\cdot\nabla\operatorname{\mathbf{u}}+\nabla p-\text{Ri}\,T\mathbf{e% }_{y}&=\mathbf{0}\\ \nabla\cdot\operatorname{\mathbf{u}}&=0\\ -\kappa\Delta T+\operatorname{\mathbf{u}}\cdot\nabla T&=0\\ \end{split}\qquad\qquad\begin{split}T&=1,\hskip 2.25pt\Gamma_{1}:=\{1\}\times(% 0,1),\\ T&=0,\hskip 2.25pt\Gamma_{2}:=\{0\}\times(0,1),\\ \nabla T\cdot\operatorname{\mathbf{n}}&=0,\hskip 2.25pt\Gamma_{3}:=(0,1)\times% \{0,1\},\\ \operatorname{\mathbf{u}}&=\mathbf{0},\hskip 2.25pt\partial\Omega=\Gamma_{1}% \cup\Gamma_{2}\cup\Gamma_{3}.\end{split}\right.}

(16)

In both models, $\mathbf{u}$ denotes the fluid velocity, $p$ the pressure, and $\mathbf{n}$ the outward normal. In Model (15), $\mu$ denotes the viscosity parameter. In Model (16), $T$ denotes the temperature of the fluid, and $\operatorname{Ri}$ denotes the Richardson number, which is the parameter of interest for Model (16). We set $\mu=\kappa=10^{-2}$ . These parameters values are chosen so as to replicate the results seen in [20] as $\operatorname{Ri}$ ranges from $3.0$ to $3.5$ . For the flow in a channel, Model (15), we use $\mathcal{P}_{2}-\mathcal{P}_{1}$ Taylor-Hood elements [8, p. 164]. The channel, shown in Figure 1, is arranged such that the left most boundary lies at $x=0$ , the right most boundary lies at $x=50$ , and the boundary components are given by $\Gamma_{\text{in}}=\{0\}\times[2.5,5]$ , $\Gamma_{\text{out}}=\{50\}\times[0,7.5]$ , and $\Gamma_{\text{wall}}=[0,10]\times(\{2.5\}\cup\{5\})\cup\{10\}\times([0,2.5]% \cup[5,7.5])\cup[10,40]\times(\{0\}\cup\{7.5\})$ . For the Rayleigh-Béndard Model, we use $\mathcal{P}_{2}-\mathcal{P}_{1}^{\text{disc}}$ Scott-Vogelius elements [24], where $\mathcal{P}_{1}^{\text{disc}}$ denotes piece-wise linear discontinuous elements. The $\mathcal{P}_{2}-\mathcal{P}_{1}^{\text{disc}}$ elements are stable on Alfeld-split, also known as barycenter-split, triangulations [46, p. 77]. The meshes used for each model are shown below.

Refer to caption — Figure 1: Meshes used for benchmark problems. Top: Mesh used for Rayleigh-Bénard model. Bottom: mesh used for flow in a channel.

For Model (15), it is known [39] that there exists a critical viscosity $\mu^{*}\in(0.9,1)$ at which a bifurcation occurs. For $\mu>\mu^{*}$ , the stable velocity solution is symmetric about the center horizontal ( $y=3.75$ ) as seen in the top plot of Figure 2. For $\mu<\mu^{*}$ , there is still a symmetric solution, but it is unstable. Stability is inherited by two asymmetric solutions seen in the bottom two plots of Figure 2.

The parameter region of interest for Model (16) is $\operatorname{Ri}\in[3,3.5]$ . In this range, the flow appears to be in transition from a single eddy in the center of the domain to two eddies as seen in Figure 3 below.

For Model (15), we take the zero vector as our initial iterate. For Model (16), we initialize our iterate by applying a Picard step and then applying $\operatorname{\gamma\text{NAA}(\hat{r})}$ . The $H^{1}$ seminorm is used in our implementations [8]. Before we discuss the numerical results, we recall that

•

Newt is Algorithm 1.
•

NA is Algorithm 2.
•

NA( $m$ ) is Algorithm 3.
•

$\gamma$ NA $(r)$ , Algorithm 5, is Algorithm 2 with line 5 replaced by Algorithm 4.
•

$\operatorname{\gamma\text{NAA}(\hat{r})}$ , Algorithm 7, is Algorithm 2 with line 5 replaced by Algorithm 6.

For $\gamma$ NA $(r)$ and $\operatorname{\gamma\text{NAA}(\hat{r})}$ , we set $r$ and $\hat{r}$ respectively to a fixed quantity for all iterations.

4.2 General Discussion of Results

The following experiments demonstrate three strategies for solving nonlinear problems near bifurcations using NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ . The first two are asymptotic safeguarding and preasymptotic safeguarding. With asymptotic safeguarding, we run NA until the residual is smaller than some user-chosen threshold, and we use $\operatorname{\gamma\text{NAA}(\hat{r})}$ for all subsequent iterates. Hence the solve will behave like NA until the last few iterations when $\operatorname{\gamma\text{NAA}(\hat{r})}$ is applied. For the experiments in Section 4.3, we activated $\operatorname{\gamma\text{NAA}(\hat{r})}$ when $\|w_{k+1}\|<10^{-1}$ . We chose $10^{-1}$ since we want to activate $\gamma$ -safeguarding as early as possible. This allows for earlier detection of a nonsingular problem, and thus faster convergence. We found that activating $\gamma$ -safeguarding for $\|w_{k+1}\|<\tau<10^{-1}$ will not necessarily break convergence, but if the problem is nonsingular, this will not be detected as early. Activating $\operatorname{\gamma\text{NAA}(\hat{r})}$ when $\|w_{k+1}\|<1$ can break convergence, though this seems to be problem-dependent. Convergence for Model (15) was virtually unaffected with threshold $\|w_{k+1}\|<1$ , but for Model (16) with $\operatorname{Ri}=3.5$ , setting the activation threshold to $1$ caused $\gamma\text{NAA}(0.9)$ to diverge when it had converged with threshold $0.1$ . With activation threshold $0.1$ , asymptotic safeguarding is shown to be effective close to the bifurcation point, where Newton’s method can fail to converge. Preasymptotic safeguarding, on the other hand, applies $\operatorname{\gamma\text{NAA}(\hat{r})}$ at each step of the solve. Interestingly, we observe that $\operatorname{\gamma\text{NAA}(\hat{r})}$ applied preasymptotically can outperform NA when applied to Model (16), and even recover convergence when both Newton and $\operatorname{NA}$ diverge (see Figure 11). A theoretical explanation for this requires a better understanding of these methods in the preasymptotic regime. In particular, a better understanding of the descent properties of Anderson acceleration would be of great value. It is known [52] that for singular problems in the preasymptotic regime, where $\|f(x)\|$ is not small, the Newton update step $s_{k}=-f^{\prime}(x_{k})^{-1}f(x_{k})$ , or $w_{k+1}$ in our notation, can be large and nearly orthogonal to the gradient of $\|f\|_{2}^{2}$ . With $\operatorname{NA}$ , our update step takes the form $s_{k}^{\operatorname{NA}}=w_{k+1}-\gamma_{k+1}(x_{k+1}^{\operatorname{Newt}}-x% _{k}^{\operatorname{Newt}})$ . It is clear that $s_{k}^{\operatorname{NA}}$ is a descent direction for sufficiently small $\gamma_{k+1}$ , since in this case it is nearly $w_{k+1}$ . It is possible that for certain values of $\gamma_{k+1}$ , $s_{k}^{\operatorname{NA}}$ is a stronger descent direction than $s_{k}$ .

Another possible explanation as to why $\operatorname{\gamma\text{NAA}(\hat{r})}$ can outperform $\operatorname{NA}$ in some cases is its resemblance to restarted Anderson acceleration methods. Restarted versions of Anderson acceleration are often applied in various forms for depth $m>1$ , and have been shown to effective in practice [9, 25, 26, 36]. In the special case of Newton-Anderson with depth $m=1$ , every odd iterate is simply a Newton step, rather than a combination of the previous two Newton steps. Hence the algorithm is “restarted” every other step. Explicitly, we have for $k\geq 1$ ,

	$\displaystyle x_{2k-1}$	$\displaystyle=x_{2k-1}^{\operatorname{Newt}}$
	$\displaystyle x_{2k}$	$\displaystyle=x_{2k}^{\operatorname{Newt}}-\gamma_{2k}(x_{2k}^{\operatorname{% Newt}}-x_{2k-1}^{\operatorname{Newt}})$

In other words, $\gamma_{2k-1}=0$ for all $k\geq 1$ . With $\operatorname{\gamma\text{NAA}(\hat{r})}$ , $\gamma_{k+1}$ is not necessarily set to zero, but it is scaled towards zero, significantly so depending on $\beta_{k+1}$ . In this way, one may think of $\operatorname{\gamma\text{NAA}(\hat{r})}$ as a quasi-restarted Anderson scheme when the depth $m=1$ . At the moment, $\gamma$ -safeguarding is not developed for $m>1$ , but this interpretation of $\operatorname{\gamma\text{NAA}(\hat{r})}$ as a quasi-restarted method could lead to such a development. Presently, these are only heuristics, but they provide interesting questions for future projects.

The third technique we demonstrate to solve these problems near bifurcation points is increasing the depth $m$ . Evidently, the right choice of $m$ can significantly improve convergence by reducing the number of iterations to convergence by half, and increase the domain of convergence with respect to the parameter. We found, however, that such performance was very sensitive to the choice of $m$ . So while the results suggest this could be developed into a viable strategy, more work is required to achieve this.

In all experiments we take $\hat{r}\in(0,1)$ since this is the range in which local convergence of $\operatorname{\gamma\text{NAA}(\hat{r})}$ is guaranteed by Theorem 3.1. In practice, neither Algorithm 5 nor Algorithm 7 breaks down if one sets $r$ or $\hat{r}$ respectively to zero, one, or a value greater than one. Setting $r$ or $\hat{r}$ to zero reduces the iteration to Newton, and choosing one leads to a more NA like iteration in the preasymptotic regime. A systematic study of $\operatorname{\gamma\text{NAA}(\hat{r})}$ with $\hat{r}\geq 1$ has not been performed, but experiments performed thus far show no significant advantage over $\hat{r}\in(0,1)$ .

What the best choice of $\hat{r}$ is remains an open question. Numerical experiments suggest the best choice depends on the initial guess $x_{0}$ . For example, when applied to the channel flow problem preasymptotically (see Section 4.4), setting $\hat{r}=0.9$ results in faster convergnece than Newton if $x_{0}$ is the zero vector. If we perturb this $x_{0}$ (discussed in Section 4.4.1), then the choice of $\hat{r}=0.9$ leads to $\operatorname{\gamma\text{NAA}(\hat{r})}$ converging slower than Newton, while $\hat{r}=0.5$ outperforms NA and Newton. This phenomena, that the best choice of $\hat{r}$ in $\operatorname{\gamma\text{NAA}(\hat{r})}$ varies with $x_{0}$ , is seen with other choices of $x_{0}$ as well. Elucidating this dependence is the subject of ongoing work.

4.3 Asymptotic Safeguarding

Under the assumptions of Theorem 3.1 or Theorem 3.2, $\operatorname{\gamma\text{NAA}(\hat{r})}$ is guaranteed to converge locally. This motivates the strategy of this subsection. As discussed above, asymptotic safeguarding is when we only apply $\operatorname{\gamma\text{NAA}(\hat{r})}$ once the residual is smaller than a set threshold. This allows one to take full advantage of NA in the preasymptotic regime, and ensures fast quadratic convergence for nonsingular problems. In practice, this means fast local convergence provided NA reaches the domain of convergence. We activate $\operatorname{\gamma\text{NAA}(\hat{r})}$ when $\|w_{k+1}\|<10^{-1}$ in the experiments below. When $\|w_{k+1}\|>10^{-1}$ , we run NA.

4.3.1 Results for Channel Flow Model

The results when applied to Model (15) are shown below in Figures 4, 5, and 6. The takeaway is that when applied asymptotically, convergence of $\operatorname{\gamma\text{NAA}(\hat{r})}$ is not as sensitive to the choice of $\hat{r}$ as it is when applied in the preasymptotic regime (see Section 4.4), and Algorithm 6 is working as intended by detecting that the problem is nonsingular. This is seen in the plot on the right of Figures 4, 5, and 6. Recall that $r_{k+1}$ is the adaptive parameter in Algorithm 6 that determines how close a $\operatorname{\gamma\text{NAA}(\hat{r})}$ step is to a Newton step. When $r_{k+1}\approx 0$ , and the criteria in Algorithm 6 is met, the $\operatorname{\gamma\text{NAA}(\hat{r})}$ step $x_{k+1}$ will be close to a Newton step. When $r_{k+1}\approx 1$ , this scaling will be much less severe, and the $\operatorname{\gamma\text{NAA}(\hat{r})}$ step $x_{k+1}$ will be close to an NA step. From our discussion in Section 3, we want $r_{k+1}\to 0$ when the problem is nonsingular in order to enjoy local quadratic convergence. This is not guaranteed with NA [47]. Observing the $r_{k+1}$ plots in Figures 4, 5, and 6, one notes that $r_{k+1}\to 0$ as the solver converges. Since $r_{k+1}=\min\{\eta_{k+1},\hat{r}\}$ , this is equivalent to $\eta_{k+1}\to 0$ as the solve converges. Thus by Theorem 3.2, the $\operatorname{\gamma\text{NAA}(\hat{r})}$ iterates converge to Newton iterates asymptotically. This is precisely what $\operatorname{\gamma\text{NAA}(\hat{r})}$ was designed to do: detect nonsingular problems, and respond by scaling the iterates towards Newton asymptotically. Since $\operatorname{\gamma\text{NAA}(\hat{r})}$ is only activated when $\|w_{k+1}\|<10^{-1}$ in these examples of asymptotic safeguarding, $r_{k+1}$ is only computed, and plotted, for the last two iterations. In the next section on preasymptotic safeguarding, a more interesting $r_{k+1}$ history is seen. Our methods, including $\operatorname{NA}$ , failed to converge for $\mu<0.92$ . If one wishes to solve a problem at a particular parameter, and a direct solve fails like we see here for $\mu<0.92$ , one could still employ $\operatorname{NA}$ or $\operatorname{\gamma\text{NAA}(\hat{r})}$ to solve the problem directly for a parameter value close to the desired one to obtain an initial guess for continuation. The benefit here is that the continuation would be required in a smaller parameter range, thus reducing the total number of solves required. We will see in Section 4.5 that increasing $m$ can lead to convergence for a wider range of parameters.

4.3.2 Results for Rayleigh-Bénard Model

The results of asymptotic safeguarding with activation threshold $0.1$ applied to Model (16) are similar to those of Model (15) seen in the previous section. In this case, however, NA diverged for $\operatorname{Ri}=3.0$ and $\operatorname{Ri}=3.2$ . For $\operatorname{Ri}=3.0$ , Newton’s method converges, but all methods diverged for $\operatorname{Ri}=3.2$ . In Section 4.4, we are able to recover convergence for $\operatorname{Ri}=3.2$ with preasymptotic safeguarding. For $\operatorname{Ri}=3.1$ , $\operatorname{Ri}=3.3$ , $\operatorname{Ri}=3.4$ , and $\operatorname{Ri}=3.5$ , Newton diverged while NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ converged. Like with Model (15), we see $r_{k+1}\to 0$ as the solve converges. The results for $\operatorname{Ri}=3.4$ , shown below, are representative of the others for which NA converged.

4.4 Preasymptotic Safeguarding

In this section, we demonstrate the preasymptotic safeguarding strategy, where $\operatorname{\gamma\text{NAA}(\hat{r})}$ is activated starting at iteration $k=2$ , the first iteration where $\operatorname{NA}$ can be applied. Compared to asymptotic safeguarding, preasymptotic safeguarding is more sensitive to the choice of $\hat{r}$ , but, with the right choice of $\hat{r}$ , it can recover convergence when both Newton and $\operatorname{NA}$ fail.

4.4.1 Results for Channel Flow Model

Here we present the results from applying the preasymptotic safeguarding strategy to the channel flow Model (15). Evidently, Newton’s method can still converge quickly with the right initial guess when $\mu=0.96$ . Figure 8 demonstrates that in this case, $\operatorname{NA}$ and $\operatorname{\gamma\text{NAA}(\hat{r})}$ also perform well. The right-most plot in Figure 8 demonstrates that the $\operatorname{\gamma\text{NAA}(\hat{r})}$ is working as intended. That is, $r_{k+1}\to 0$ , and therefore $\operatorname{\gamma\text{NAA}(\hat{r})}$ is detecting that Newton is converging quickly, and responds by scaling its update steps towards a pure Newton iteration. With $\mu=0.94$ , we are closer to the bifurcation point, and observing the left plot in Figure 9, we see that Newton takes many more iterations to converge. However, after a long preasymptotic phase, Newton does eventually converge quickly, which suggests that the problem is not truly singular for $\mu=0.94$ . This is again detected by $\operatorname{\gamma\text{NAA}(\hat{r})}$ . Even though the $\operatorname{\gamma\text{NAA}(\hat{r})}$ algorithms take a few more iterations to converge than NA with $\mu=0.94$ , the terminal order of convergence of $\operatorname{\gamma\text{NAA}(\hat{r})}$ is greater than that of NA. Approximating the rate by $q_{k+1}=\log(\|w_{k+1}\|)/\log(\|w_{k}\|)$ at each step $k$ , and letting $q_{\text{term}}$ denote the terminal order, we found that $q_{\text{term}}=1.537$ for NA, $q_{\text{term}}=3.386$ for $\gamma\text{NAA}(0.1)$ , and $q_{\text{term}}=2.091$ for $\gamma\text{NAA}(0.9)$ . In the right most plot in Figure 9, we take as our initial iterate the zero vector, but with the fifth entry set to 50. From this perturbed initial guess, Newton’s method is seen to perform better than NA. Further, $\gamma\text{NAA}(0.1)$ and $\gamma\text{NAA}(0.5)$ outperform both Newton and NA, with $\gamma\text{NAA}(0.5)$ converging in about half as many iterations as NA. This demonstrates the flexibility offered by $\operatorname{\gamma\text{NAA}(\hat{r})}$ . That is, whether Newton or NA is the best choice for a particular problem and initial guess, $\operatorname{\gamma\text{NAA}(\hat{r})}$ is more agnostic to these choices, and can perform well in either case. There is still, however, sensitivity to $\hat{r}$ . We found that $\operatorname{\gamma\text{NAA}(\hat{r})}$ failed to converge for $\hat{r}=0.3$ , $0.5$ , $0.6$ , $0.7$ , and $0.8$ . $\operatorname{\gamma\text{NAA}(\hat{r})}$ converged with $\hat{r}=0.4$ , but only after 74 iterations. We observed that $\gamma$ NA(0.5), the non-adaptive version of $\gamma$ -safeguarding, managed to converge. The reason for this variation is likely due to the complex behavior of Newton and NA in the preasymptotic regime, leading to sensitivity to the choice of $\hat{r}$ . With preasymptotic safeguarding, our methods failed for $\mu<0.94$ , hence continuation may still be required in some cases, but NA and $\operatorname{\gamma\text{NAA}(\hat{r})}$ can be used to efficiently solve the problem closer to the bifurcation point compared to Newton, thereby reducing the overall computational cost. The point is that applying $\gamma$ NA $(r)$ and $\operatorname{\gamma\text{NAA}(\hat{r})}$ in the preasymptotic regime can still lead to faster convergence than Newton, but this convergence is again sensitive to the choice of $\hat{r}$ , and further work is required to understand this sensitivity.

4.4.2 Results for Rayleigh-Bénard Model

With preasymptotic safeguarding applied to Model (16), we again see more varied behavior since we have $\gamma$ -safeguarding activated from the beginning of the solve. The results are shown in Figures 10, 11, and 12. The $r_{k+1}$ plots are only shown for those values of $\hat{r}$ for which $\operatorname{\gamma\text{NAA}(\hat{r})}$ converged. We found that for $\operatorname{Ri}=3.0$ and $\operatorname{Ri}=3.1$ , $\operatorname{\gamma\text{NAA}(\hat{r})}$ failed to converge with $\hat{r}=0.1$ , $0.5$ , or $0.9$ . Convergence is recovered with $\hat{r}=0.4$ and $\hat{r}=0.6$ respectively. We also ran $\operatorname{\gamma\text{NAA}(\hat{r})}$ with these $\hat{r}$ values for $\operatorname{Ri}=3.2$ , $3.3$ , $3.4$ , and $3.5$ . The results were similar for $\operatorname{Ri}=3.3$ , $3.4$ , and $3.5$ . Hence we only show the results for $\operatorname{Ri}=3.4$ in Figure 12. The theme demonstrated in Figures 10, 11, and 12 is that when preasymptotic safeguarding is employed, it is possible to converge faster than standard NA. Moreover, as seen in Figure 11, $\operatorname{\gamma\text{NAA}(\hat{r})}$ can converge when both Newton and NA diverge. One may note that in the right-most plot in Figure 10, prior to the asymptotic regime where $r_{k+1}\to 0$ , we observe $r_{k+1}<\hat{r}=0.6$ only twice. Using the restarted Anderson interpretation discussed in Section 4.2, we could say that there are two quasi-restarts prior to the asymptotic regime. Evidently, these two quasi-restarts are essential, since we found that non-adaptive $\gamma$ -safeguarding with $\hat{r}=0.6$ , Algorithm 4, diverges. Similar behavior of $r_{k+1}$ is seen in Figures 11 and 12. It remains unclear precisely how the choice of $\hat{r}$ affects convergence, e.g., in Figure 10, why does $\gamma\text{NAA}(0.6)$ converge, but $\operatorname{\gamma\text{NAA}(\hat{r})}$ diverges for $\hat{r}=0.1$ , $0.4$ , $0.5$ , and $0.9$ ? As previously discussed, a better understanding of these methods in the preasymptotic regime would help answer questions like these, and this will be the focus of future studies.

4.5 Increasing Anderson Depth

The strategy employed in this section is to increase the depth of the NA from $m=1$ used in previous sections.

4.5.1 Results for Channel Flow Model

The plots in Figure 13 demonstrate the effectiveness of increasing the algorithmic depth $m$ to solve Model (15) near the bifurcation point. We also experimented with applying $\operatorname{\gamma\text{NAA}(\hat{r})}$ asymptotically. When $\|w_{k+1}\|<1$ , we set $m=1$ and activated $\operatorname{\gamma\text{NAA}(\hat{r})}$ with $\hat{r}=0.9$ . These results are seen as the dashed lines in the left-most plot in Figure 13. The philosophy is similar to that of preasymptotic safeguarding. We use NA(3) to reach the asymptotic regime, and then allow adaptive $\gamma$ -safeguarding to detect if the problem is nonsingular. We set $m=1$ because, for the present, $\gamma$ -safeguarding is only designed for $m=1$ . The left-most plot in Figure 13 shows how, from the same initial iterate, we are able to solve Model (15) for a wider range of $\mu$ values, including in the regime where Newton, $\operatorname{NA}(1)$ , and $\operatorname{\gamma\text{NAA}(\hat{r})}$ failed to converge.

The right-most plot in Figure 13 focuses on the results of applying $\operatorname{NA}(3)$ to Model (15) with $\mu=0.92$ . The point here is that even in the regime where $\operatorname{NA}(1)$ converges, $\operatorname{NA}(3)$ converges in about half as many iterations. Thus increasing the algorithmic depth can lead to faster convergence. The improved convergence seen with increasing the depth $m$ suggests that a generalization of $\gamma$ -safeguarding for greater depths could be useful as a generalization of the strategies presented in previous sections. However, for this strategy to be effective in general, further study is needed on the proper choice of depth $m$ . In the chosen parameter regime, $m=3$ was the only value of $m\in\{1,2,...,10\}$ observed to improve convergence when NA(1) failed.

4.5.2 Results for Rayleigh-Bénard Model

The results of increasing $m$ to solve Model (16) are shown below in Figure 14. As with Model (15), we experimented with reducing $m=1$ and activating $\gamma$ -safeguarding asymptotically. The dashed lines in Figure 14 are the results of these experiments. For this problem, activation occurred when $\|w_{k+1}\|<10^{-1}$ since we found that activating $\operatorname{\gamma\text{NAA}(\hat{r})}$ with $\hat{r}=0.9$ when $\|w_{k+1}\|<1$ broke convergence like it did in Section 4.3.2. We once again observe that the right choice of $m>1$ can improve convergence significantly. This can be seen in Figure 14 for $\operatorname{Ri}=3.3$ . Moreover, Figure 14 demonstrates that increasing $m$ can recover convergence when $\operatorname{NA}(1)$ fails to converge for $\operatorname{Ri}=3.0$ and $\operatorname{Ri}=3.2$ . However, for $\operatorname{Ri}=$ 3.1, 3.4, and 3.5, there was no significant improvement gained from increasing $m>1$ . The results for $\operatorname{Ri}=3.4$ shown below in Figure 14 are representative of the results for $\operatorname{Ri}=3.1$ and $\operatorname{Ri}=3.5$ .

The most significant difference between $\operatorname{NA}(m)$ without asymptotic $\operatorname{\gamma\text{NAA}(\hat{r})}$ , and $\operatorname{NA}(m)$ with asymptotic $\operatorname{\gamma\text{NAA}(\hat{r})}$ , is seen with $m=10$ . It also appears that $m=10$ is more sensitive to the activation threshold than smaller choices of $m$ . This is seen in Figure 14 for $\operatorname{Ri}=3.3$ and $\operatorname{Ri}=3.4$ . These results are promising, and motivate further investigation to fully understand how the choice of $m$ affects convergence near singularities, and in particular, near bifurcation points.

5 Conclusion

We have presented a modification of Anderson accelerated Newton’s method for solving nonlinear equations near bifurcation points. We proved that, locally, this modified scheme can detect nonsingular problems and scale the iterates towards a pure Newton step, which leads to faster local convergence compared to standard NA. We numerically demonstrated two strategies one can employ when using our modified NA scheme to solve nonlinear problems near bifurcation points, with our test problems being two Navier-Stokes type parameter-dependent PDEs. Asymptotic safeguarding was shown to recover local quadratic convergence when the problem is nonsingular, and it shows virtually no sensitivity to the choice of parameter $\hat{r}$ . It can, however, be sensitive to the choice of activation threshold. Preasymptotic safeguarding is shown to significantly improve convergence, and can recover convergence when both Newton and $\operatorname{NA}$ fail. There is strong sensitivity to the choice of $\hat{r}$ though, and future work will clarify this dependence. We also demonstrated that increasing the Anderson depth $m$ can improve convergence, and increase the domain of convergence with respect to the problem parameter. Future projects will further study how the choice of $m$ impacts convergence, and work towards developing $\gamma$ -safeguarding for greater algorithmic depths.

6 Acknowledgements

MD and SP are supported in part by the National Science Foundation under Grant No. DMS-2011519. LR is supported in part by the National Science Foundation under Grant No. DMS-2011490. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1929284 while the authors were in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Acceleration and Extrapolation Methods (MD, SP and LR), and the Numerical PDEs: Analysis, Algorithms and Data Challenges (SP and LR), programs.

References

[1] H. An, X. Jia, and H.F. Walker. Anderson acceleration and application to the three-temperature energy equations. J. Comput. Phys., 347:1–19, 2017.
[2] D.G. Anderson. Iterative procedures for nonlinear integral equations. J. Assoc. Comput. Mach., 12(4):547–560, 1965.
[3] R. Behling and A. Fischer. A unified local convergence analysis of inexact constrained Levenberg-Marquardt methods. Optim. Lett., 6:927–940, 2012.
[4] R. Behling, A. Fischer, M. Herrich, A. Iusem, and Y. Ye. A Levenberg-Marquardt method with approximate projections. Comput. Optim. Appl., 59:5–26, 2014.
[5] S. Bellavia and B. Morini. Strong local convergence properties of adaptive regularized methods for nonlinear least squares. IMA J. Numer. Anal., 35, 2015.
[6] R.G. Bettiol and P. Piccione. Instability and bifurcation. Notices Am. Math. Soc., 67(11):1679–1691, 2020.
[7] N. Boullé, V. Dallas, and P.E. Farrell. Bifurcation analysis of two-dimensional Rayleigh-Bénard convection using deflation. Phys. Rev. E, 105, 2022.
[8] D. Braess. Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics. Cambridge University Press, 2007.
[9] M. Chupin, M. Dupuy, G. Legendre, and E. Séré. Convergence analysis of adaptive DIIS algorithms with application to electronic ground state calculations. ESAIM Math. Model. Numer. Anal., 55(6):2785–2825, 2021.
[10] M. Dallas and S. Pollock. Newton-Anderson at singular points. Int. J. Numer. Anal. and Mod., 20(5):667–692, 2023.
[11] D.W. Decker, H.B. Keller, and C.T. Kelley. Convergence rates for Newton’s method at singular points. SIAM J. Numer. Anal., 20(2):296–314, 1983.
[12] D.W. Decker and C.T. Kelley. Newton’s method at singular points I. SIAM J. Numer. Anal., 17(1):66–70, 1980.
[13] D.W. Decker and C.T. Kelley. Convergence acceleration for Newton’s method at singular points. SIAM J. Numer. Anal., 19(1):219–229, 1982.
[14] D.W. Decker and C.T. Kelley. Expanded convergence domains for Newton’s method at nearly singular roots. SIAM J. Sci. Stat. Comput., 6(4), 1985.
[15] P. Deuflhard. Newton Methods for Nonlinear Problems. Springer Series in Computational Mathematics. Springer, 2005.
[16] J. Fan and J. Zeng. A Levenberg–Marquardt algorithm with correction for singular system of nonlinear equations. Appl. Math. Comput., 219(17):9438–9446, 2013.
[17] P.E. Farrell, A. Birkisson, and S.W. Funke. Deflation techniques for finding distinct solutions of nonlinear partial differential equations. SIAM J. Sci. Comput., 37, 2015.
[18] A. Fischer, A.F. Izmailov, and M.V. Solodov. Accelerating convergence of the globalized Newton method to critical solutions of nonlinear equations. Comput. Optim. Appl., 78:273–286, 2021.
[19] A. Fischer, A.F. Izmailov, and M.V. Solodov. Unit stepsize for the Newton method close to critical solutions. Math. Program., 187:697–721, 2021.
[20] J.K. Galvin, A. Linke, L.G. Rebholz, and N. Wilson. Stabilizing poor mass conservation in incompressible flow problems with large irrotational forcing and application to thermal convection. Comput. Methods Appl. Mech. Engrg., pages 166–176, 2012.
[21] A.O. Griewank. Analysis and Modification of Newton’s Method at Singularities. PhD thesis, 1980.
[22] A.O. Griewank. Starlike domains of convergence for Newton’s method at singularities. Numer. Math., 35:95–111, 1980.
[23] A.O. Griewank. On solving nonlinear equations with simple singularities or nearly singular solutions. SIAM Review, 27(4):537–563, 1985.
[24] J. Guzḿan and L.R. Scott. The Scott-Vogelius finite elements revisited. Math. Comp., 88(316):515–529, 2019.
[25] H. He, S. Zhao, Y. Xi, J.C. Ho, and Y. Saad. Solve minimax optimization by Anderson acceleration. International Conference on Learning Representations, 2022.
[26] N.C. Henderson and R. Varadhan. Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms. Journal of computational and graphical statistics, 28(4):834–846, 2019.
[27] J.L. Hueso, E.M., and J.R. Torregrosa. Modified Newton’s method for systems of nonlinear equations with singular Jacobian. J. Comput. Appl. Math., 224(1):77–83, 2008.
[28] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. Critical solutions of nonlinear equations: local attraction for Newton-type methods. Math. Progam., 167:355–379, 2018.
[29] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. Critical solutions of nonlinear equations: stability issues. Math. Progam., 168:475–507, 2018.
[30] C. Kanzow, N. Yamashita, and M. Fukushima. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints. J. Comput. Appl. Math, 172:375–397, 2004.
[31] C.T. Kelley and R. Suresh. A new acceleration method for Newton’s method at singular points. SIAM J. Numer. Anal, 20(5):1001–1009, 1983.
[32] H. Kielhöfer. Bifurcation Theory: An Introduction with Applications to Partial Differential Equations. Springer Science+Business Media, 2012.
[33] C. Kuehn. PDE Dynamics: An Introduction. SIAM, 2019.
[34] P.A. Lott, H.F. Walker, C.S. Woodward, and U.M. Yang. An accelerated Picard method for nonlinear systems related to variably saturated flow. Adv. Water Resour., 38:92–101, 2012.
[35] J.M. Ortega. The Newton-Kantorovich theorem. Amer. Math. Monthly, 75(6):658–660, 1968.
[36] M.L. Pasini. Convergence analysis of Anderson-type acceleration of Richardson’s iteration. Numer. Linear Algebra Appl., 26, 2019.
[37] M.L. Pasini, J. Yin, V. Reshniak, and M.K. Stoyanov. Anderson acceleration for distributed training of deep learning models. In SoutheastCon 2022, pages 289–295, 2022.
[38] Y. Peng, B. Deng, J. Zhang, F. Geng, W. Qin, and L. Liu. Anderson acceleration for geometry optimization and physics simulation. ACM Trans. Graph., 37(4), 2018.
[39] F. Pichi. Reduced order models for parametric bifurcation problems in nonlinear PDEs. PhD thesis, 2020.
[40] F. Pichi, A. Quaini, and G. Rozza. A reduced order modeling technique to study bifurcating phenomena: Application to the Gross–Pitaevskii equation. SIAM J. Sci. Comput., 42(5), 2020.
[41] F. Pichi, M. Strazzullo, F. Ballarin, and G. Rozza. Driving bifurcating parametrized nonlinear pdes by optimal control strategies: application to Navier–Stokes equations with model order reduction. ESAIM: M2AN, 56(4):1361–1400, 2022.
[42] S. Pollock and L.G. Rebholz. Anderson acceleration for contractive and noncontractive operators. IMA J. Numer. Anal., 41(4):2841–2872, 2021.
[43] S. Pollock and L.G. Rebholz. Filtering for Anderson acceleration. SIAM J. Sci. Comput., 45(4):A1571–A1590, 2023.
[44] S. Pollock, L.G. Rebholz, and M. Xiao. Anderson-accelerated convergence of Picard iterations for incompressible Navier-Stokes equations. SIAM J. Numer. Anal., 57(2):615–637, 2019.
[45] S. Pollock and H. Schwartz. Benchmarking results for the Newton–Anderson method. Results Appl. Math., 8:100095, 2020.
[46] J. Qin. On the convergence of some low order mixed finite elements for incompressible fluids. PhD thesis, 1994.
[47] L.G. Rebholz and M. Xiao. The effect of Anderson acceleration on superlinear and sublinear convergence. J. Sci. Comput., 96(2), 2023.
[48] G.W. Reddien. On Newton’s method for singular problems. SIAM J. Numer. Anal., 15(5):993–996, 1978.
[49] G.W. Reddien. Newton’s method and high order singularities. Comput. Math. Appl., 5(2):79–86, 1979.
[50] R.B. Schnabel and P.D. Frank. Tensor methods for nonlinear equations. SIAM J. Numer. Anal., 21(5):815–843, 1984.
[51] R.B. Thompson, K.O. Rasmussen, and T. Lookman. Improved convergence in block copolymer self-consistent field theory by Anderson mixing. J. Chem. Phys., 120(1):31–34, 2004.
[52] R.S. Tuminaro, H.F. Walker, and J. N. Shadid. On backtracking failure in Newton-GMRES methods with demonstration for the Navier-Stokes equations. J. Comput. Phys., 180:549–558, 2002.
[53] H. Uecker. Continuation and bifurcation in nonlinear pdes – algorithms, applications, and experiments. Jahresbericht der Deutschen Mathematiker-Vereinigung, 124:43–80, 2021.
[54] H.F. Walker and P. Ni. Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal., 49(4):1715–1735, 2011.
[55] D. Wang, Y. He, and H. De Sterck. On the asymptotic linear convergence speed of Anderson acceleration applied to ADMM. J. Sci. Comput., 88(2):38, 2021.
[56] Q. Yuan, Y. Sun, and J. Ren. How interest rate influences a business cycle model. Discrete Contin. Dyn. Syst., 13(11):3231–3251, 2020.
[57] H. Zhou and Y. Qian. Double Hopf bifurcation analysis for coupled van der Pol–Rayleigh system with time delay. J. Vib. Eng. Technol., 12:6075–6087, 2024.

	$\displaystyle\\|P_{R}e_{k+1}\\|$	$\displaystyle\leq C\max\{\|1-\lambda_{k+1}\gamma_{k+1}\|\,\\|e_{k}\\|^{2},\|\lambda% _{k+1}\gamma_{k+1}\|\,\\|e_{k-1}\\|^{2}\}$		(6)
	$\displaystyle\\|P_{N}e_{k+1}\\|$	$\displaystyle\leq\kappa\theta_{k+1}^{\lambda}\\|P_{N}e_{k}\\|$		(7)

Analysis of an Adaptive Safeguarded Newton-Anderson Algorithm of Depth One with Applications to Fluid Problems

Abstract

1 Introduction

2 The γ𝛾\gammaitalic_γ-safeguarding algorithm

Proposition 2.1.

Proof.

3 Adaptive γ𝛾\gammaitalic_γ-safeguarding

Criteria 3.1.

Theorem 3.1.

Lemma 3.1.

Theorem 3.2.

Proof.

Corollary 3.1.

Proof.

4 Numerics

4.1 Test Problems

4.2 General Discussion of Results

4.3 Asymptotic Safeguarding

4.3.1 Results for Channel Flow Model

4.3.2 Results for Rayleigh-Bénard Model

4.4 Preasymptotic Safeguarding

4.4.1 Results for Channel Flow Model

4.4.2 Results for Rayleigh-Bénard Model

4.5 Increasing Anderson Depth

4.5.1 Results for Channel Flow Model

4.5.2 Results for Rayleigh-Bénard Model

5 Conclusion

6 Acknowledgements

References

2 The $\gamma$ -safeguarding algorithm

3 Adaptive $\gamma$ -safeguarding