Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Analysis of an Adaptive Safeguarded Newton-Anderson Algorithm of Depth One with Applications to Fluid Problems

Matt Dallas 111Corresponding author: Matt Dallas. MD and SP were supported in part by the US NSF Project DMS 2011519 (PI: Pollock). LR was supported by US NSF Project DMS 2011490 (PI: Rebholz).
Keywords: Anderson acceleration, Newton’s Method, safeguarding, singular points, bifurcation.
2020 Mathematics Subject Classification: 65J15
,1  Sara Pollock2  Leo G. Rebholz3
1Department of Mathematics, University of Dallas, USA (mdallas@udallas.edu)
2Department of Mathematics, University of Florida, USA (s.pollock@ufl.edu)
3Department of Mathematics, Clemson University, USA (rebholz@clemson.edu)
(August 21, 2024)
Abstract

The purpose of this paper is to develop a practical strategy to accelerate Newton’s method in the vicinity of singular points. We present an adaptive safeguarding scheme with a tunable parameter, which we call adaptive γ𝛾\gammaitalic_γ-safeguarding, that one can use in tandem with Anderson acceleration to improve the performance of Newton’s method when solving problems at or near singular points. The key features of adaptive γ𝛾\gammaitalic_γ-safeguarding are that it converges locally for singular problems, and it can detect nonsingular problems automatically, in which case the Newton-Anderson iterates are scaled towards a standard Newton step. The result is a flexible algorithm that performs well for singular and nonsingular problems, and can recover convergence from both standard Newton and Newton-Anderson with the right parameter choice. This leads to faster local convergence compared to both Newton’s method, and Newton-Anderson without safeguarding, with effectively no additional computational cost. We demonstrate three strategies one can use when implementing Newton-Anderson and γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson to solve parameter-dependent problems near singular points. For our benchmark problems, we take two parameter-dependent incompressible flow systems: flow in a channel and Rayleigh-Bénard convection.

1 Introduction

Nonlinear systems of equations of the form f(x)=0𝑓𝑥0f(x)=0italic_f ( italic_x ) = 0, with f:nn:𝑓superscript𝑛superscript𝑛f:\mathbb{R}^{n}\to\mathbb{R}^{n}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, arise frequently in applications. Many times the solutions depend on parameters (see [6, 7, 33, 39, 40, 41, 53] and references therein) that can have a significant effect on the solution. Of particular interest are bifurcation points, which are characterized by the breakdown of local uniqueness of a solution for a particular parameter, and correspond to a qualitative change in the solution’s behavior [6, 32, 39]. Sets of solutions of similar qualitative behavior are called branches [39]. Studying these branches provides a more complete understanding of the solutions, and has applications in a wide variety of fields such as echocardiography [41], economics [56], physics [40], and engineering [57]. A necessary condition for bifurcation comes from the Implicit Function Theorem, which says that the Jacobian at a solution xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), is necessarily singular if xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a bifurcation point [32, p. 8]. Bifurcation points are thus examples of singular points, i.e., points x𝑥xitalic_x for which f(x)superscript𝑓𝑥f^{\prime}(x)italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) is singular. Techniques for computing solution branches such as continuation [53] and deflation [7, 17] require many solves of f(x)=0𝑓𝑥0f(x)=0italic_f ( italic_x ) = 0 as the parameter is varied, and these problems become singular or nearly singular near bifurcation points. A popular method for solving nonlinear equations is Newton’s method defined in Algorithm 1.

1:  Choose x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.
2:  for k=1,2,… do
3:     wk+1f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4:     xk+1xk+wk+1subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1x_{k+1}\leftarrow x_{k}+w_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT
5:  end for
Algorithm 1 Newton

When f(x)superscript𝑓𝑥f^{\prime}(x)italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) is Lipschitz continuous and f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, Newton’s method exhibits local quadratic convergence in a ball centered at xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This is essentially the celebrated Newton-Kantorovich theorem [35]. If we remove the nonsingular assumption and let f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) be singular, then the convergence behavior changes dramatically. Rather than local quadratic convergence from any x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in a sufficiently small ball around xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we see local linear convergence in a starlike domain of convergence around xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [11, 12, 22, 48, 49]. Since bifurcation points are necessarily singular points, this means that continuation or deflation algorithms using Newton’s method may converge slowly or fail to converge at or near a bifurcation point. This challenge has motivated the study of modifications [11, 13, 18, 19, 23, 22, 27, 31] or alternatives [3, 4, 5, 16, 30, 50] to Newton’s method that can improve convergence behavior at singular points. Among the modifications, Richardson extrapolation and overrelaxation are popular and perform well. They can achieve superlinear and arbitrarily fast linear convergence respectively under certain conditions [23] at the cost of additional function evaluations and some knowledge of the order of the singularity [22, 23]. The order may be inferred from monitoring the singular values of the Jacobian as the solve progresses. A popular alternative to Newton’s method for singular problems is the Levenberg-Marquardt method [3, 4, 5, 16, 30]. Under standard assumptions and the local error bound, or local Lipschitzian error bound, the Levenberg-Marquardt method can achieve local quadratic convergence [30]. The local error bound is known to be much weaker than the standard nonsingularity assumption. Indeed, it can hold even for singular problems [4]. However, without the local error bound or nonsingularity, Levenberg-Marquardt is not guaranteed such success. In the absence of the local error bound, one may insist that the function f𝑓fitalic_f is 2-regular [18, 19, 28, 29], in which case Levenberg-Marquardt converges locally linearly in a starlike domain much like Newton’s method [28].

The method of interest in this paper, Anderson acceleration, has a long track record of accelerating linearly converging fixed-point methods, and has been applied in many different fields [1, 34, 37, 38, 44, 51, 55]. Further, when applied as a modification to Newton’s method at singular points, in contrast to Richardson extrapolation or overrelaxation discussed in [23], this success requires no knowledge of the order of the root or additional function evaluations. It was also shown recently that under a condition equivalent to 2-regularity (discussed further in Section 2), Anderson accelerates Newton’s method when applied to singular problems [10], and therefore outperforms Levenberg-Marquardt in the absence of the local error bound. This was demonstrated numerically in [10]. Also known as Anderson extrapolation or Anderson mixing [54], Anderson acceleration was first introduced in 1965 by D.G. Anderson in [2] to improve the convergence of fixed-point iterations applied to integral equations. The algorithm combines the previous m+1𝑚1m+1italic_m + 1 iterates and update steps into a new iterate at each step of the solve. The number m𝑚mitalic_m is commonly known as the algorithmic depth. The combination of the m+1𝑚1m+1italic_m + 1 iterates often involves solving a least-squares problem, but since m𝑚mitalic_m is typically small, the computational cost of this step is in general orders of magnitude less than that of a single linear solve. There are problems for which taking m𝑚mitalic_m much larger can be beneficial [42, 54], and later in this paper we will see numerically that increasing m𝑚mitalic_m can improve or recover convergence near bifurcation points. With greater depths, the least-squares problem may suffer from ill-conditioning if proper care is not taken in the implementation [43]. In [10], the authors developed a convergence and acceleration theory for Anderson accelerated Newton’s method with depth m=1𝑚1m=1italic_m = 1, defined in Algorithm 2, applied to singular problems. This is a special case of depth m1𝑚1m\geq 1italic_m ≥ 1 given in Algorithm 3.

1:  Choose x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Set w1=f(x0)1f(x0)subscript𝑤1superscript𝑓superscriptsubscript𝑥01𝑓subscript𝑥0w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and x1=x0+w1subscript𝑥1subscript𝑥0subscript𝑤1x_{1}=x_{0}+w_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
2:  for k=1,2,… do
3:     wk+1f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4:     γk+1(wk+1wk)Twk+1/wk+1wk22subscript𝛾𝑘1superscriptsubscript𝑤𝑘1subscript𝑤𝑘𝑇subscript𝑤𝑘1superscriptsubscriptnormsubscript𝑤𝑘1subscript𝑤𝑘22\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT / ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
5:     xk+1xk+wk+1γk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}\leftarrow x_{k}+w_{k+1}-\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
6:  end for
Algorithm 2 Newton-Anderson(1)
1:  Choose x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and m0𝑚0m\geq 0italic_m ≥ 0. Set w1=f(x0)1f(x0)subscript𝑤1superscript𝑓superscriptsubscript𝑥01𝑓subscript𝑥0w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and x1=x0+w1subscript𝑥1subscript𝑥0subscript𝑤1x_{1}=x_{0}+w_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
2:  for k=1,2,… do
3:     mkmin{k,m}subscript𝑚𝑘𝑘𝑚m_{k}\leftarrow\min\{k,m\}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← roman_min { italic_k , italic_m }
4:     wk+1f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
5:     Fk=((wk+1wk)(wkm+2wkm+1))subscript𝐹𝑘subscript𝑤𝑘1subscript𝑤𝑘subscript𝑤𝑘𝑚2subscript𝑤𝑘𝑚1F_{k}=\big{(}(w_{k+1}-w_{k})\cdots(w_{k-m+2}-w_{k-m+1})\big{)}italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⋯ ( italic_w start_POSTSUBSCRIPT italic_k - italic_m + 2 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k - italic_m + 1 end_POSTSUBSCRIPT ) )
6:     Ek=((xkxk1)(xkm+1xkm))subscript𝐸𝑘subscript𝑥𝑘subscript𝑥𝑘1subscript𝑥𝑘𝑚1subscript𝑥𝑘𝑚E_{k}=\big{(}(x_{k}-x_{k-1})\cdots(x_{k-m+1}-x_{k-m})\big{)}italic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) ⋯ ( italic_x start_POSTSUBSCRIPT italic_k - italic_m + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - italic_m end_POSTSUBSCRIPT ) )
7:     γk+1argminγmwk+1Fkγ22subscript𝛾𝑘1subscriptargmin𝛾superscript𝑚superscriptsubscriptnormsubscript𝑤𝑘1subscript𝐹𝑘𝛾22\gamma_{k+1}\leftarrow\text{argmin}_{\gamma\in\mathbb{R}^{m}}\|w_{k+1}-F_{k}% \gamma\|_{2}^{2}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← argmin start_POSTSUBSCRIPT italic_γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
8:     xk+1xk+wk+1(Ek+Fk)γk+1subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1subscript𝐸𝑘subscript𝐹𝑘subscript𝛾𝑘1x_{k+1}\leftarrow x_{k}+w_{k+1}-(E_{k}+F_{k})\gamma_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - ( italic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT
9:  end for
Algorithm 3 Newton-Anderson(m)

A major challenge when proving convergence of Newton-like methods near singular points is ensuring that the iterates remain well-defined. The authors in [10] introduced a novel safeguarding scheme called γ𝛾\gammaitalic_γ-safeguarding, defined in Algorithm 4 in Section 2, to deal with this problem. The result was a convergence proof for γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson, and it was observed numerically to perform better or no worse compared to standard Newton-Anderson, particularly when applied to nonsingular problems.

The purpose of this paper is to extend these ideas of [10] by developing an adaptive version of γ𝛾\gammaitalic_γ-safeguarding that automatically detects nonsingular problems, and to demonstrate the effectiveness of Newton-Anderson and adaptive γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson at solving parameter-dependent PDEs near bifurcation points in fluid problems. This new adaptive scheme is proven to be locally convergent under the same conditions as the non-adaptive scheme, and the automatic detection of nonsingular problems enables local quadratic convergence when applied to nonsingular problems. Such a property is desirable when solving nonlinear problems near bifurcation points, because even if the problem itself is not singular, convergence can still be affected if it is close to a singular problem [14]. One would like to enjoy the benefits of Newton-Anderson in the preasymptotic regime such as a larger domain of convergence [45], but not lose quadratic convergence in the asymptotic regime if the problem is nonsingular. It is not always known a priori if a problem is singular or nonsingular, and Anderson acceleration can reduce the order of convergence when applied to superlinearly converging iterations such as Newton’s method applied to a nonsingular problem [47]. This problem is solved with adaptive γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson with effectively no additional computational cost. We also show numerically that increasing the algorithmic depth of the Newton-Anderson algorithm can recover convergence when Newton fails near bifurcation points, but only for specific choices of m𝑚mitalic_m.

The algorithms of interest in this paper are Newton, defined in Algorithm 1, Newton-Anderson with algorithmic depth 1111 (NA(1111)) defined in Algorithm 2, Newton-Anderson with algorithmic depth m𝑚mitalic_m (NA(m𝑚mitalic_m)) defined in Algorithm 3, γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson (γ𝛾\gammaitalic_γNA(r𝑟ritalic_r)), defined in Algorithm 5, and adaptive γ𝛾\gammaitalic_γ-safeguarded Newton-Anderson (γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG )), defined in Algorithm 7. We implement γ𝛾\gammaitalic_γNA(r𝑟ritalic_r) and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) by replacing line 5 in Algorithm 2 with, respectively, γ𝛾\gammaitalic_γ-safeguarding (Algorithm 4) and adaptive γ𝛾\gammaitalic_γ-safeguarding (Algorithm 6). The norm in the algorithms is the Euclidean norm. There should be no confusion between NA(m𝑚mitalic_m) and γ𝛾\gammaitalic_γNA(r𝑟ritalic_r) or γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) since γ𝛾\gammaitalic_γ-safeguarding is currently only developed for depth m=1𝑚1m=1italic_m = 1. The rest of the paper is organized as follows. In Section 2, we review the original γ𝛾\gammaitalic_γ-safeguarding algorithm and its role in the convergence theory developed in [10]. In Section 3, we introduce the new adaptive γ𝛾\gammaitalic_γ-safeguarding algorithm and prove that γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) can recover local quadratic convergence when applied to nonsingular problems in Corollary 3.1. We conclude in Section 4 by applying NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) to two parameter-dependent incompressible flow systems, and discussing various strategies one can use when employing γ𝛾\gammaitalic_γ-safeguarding.

2 The γ𝛾\gammaitalic_γ-safeguarding algorithm

For the theory discussed in Section 2 and Section 3, unless stated otherwise, we take f:nn:𝑓superscript𝑛superscript𝑛f:\mathbb{R}^{n}\to\mathbb{R}^{n}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to be a C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT function, f(x)=0𝑓superscript𝑥0f(x^{*})=0italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0, N=null(f(x))𝑁nullsuperscript𝑓superscript𝑥N=\text{null}\left(f^{\prime}(x^{*})\right)italic_N = null ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ), R=range(f(x))𝑅rangesuperscript𝑓superscript𝑥R=\text{range}\left(f^{\prime}(x^{*})\right)italic_R = range ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ), dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1, and PNsubscript𝑃𝑁P_{N}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT and PRsubscript𝑃𝑅P_{R}italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT to be the orthogonal projections onto N𝑁Nitalic_N and R𝑅Ritalic_R respectively. The assumption that RNperpendicular-to𝑅𝑁R\perp Nitalic_R ⟂ italic_N is common in the singular Newton literature [11, 21], and no generality is lost in finite dimensions. Indeed, Newton’s method is essentially invariant under nonsingular affine transformations of the domain and nonsingular linear transformations of the range [15]. Thus to determine the convergence behavior of Newton’s method applied to a general C3superscript𝐶3C^{3}italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT function F:nn:𝐹superscript𝑛superscript𝑛F:\mathbb{R}^{n}\to\mathbb{R}^{n}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, it suffices to study that of f(x)=UTF(x)V𝑓𝑥superscript𝑈𝑇𝐹𝑥𝑉f(x)=U^{T}F(x)Vitalic_f ( italic_x ) = italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_F ( italic_x ) italic_V, where U𝑈Uitalic_U and V𝑉Vitalic_V come from the single value decomposition F(x)=UΣVTsuperscript𝐹superscript𝑥𝑈Σsuperscript𝑉𝑇F^{\prime}(x^{*})=U\Sigma V^{T}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_U roman_Σ italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Since f(x)=UTF(x)V=Σsuperscript𝑓superscript𝑥superscript𝑈𝑇superscript𝐹superscript𝑥𝑉Σf^{\prime}(x^{*})=U^{T}F^{\prime}(x^{*})V=\Sigmaitalic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_V = roman_Σ, it follows that null(f(x))range(f(x))perpendicular-tonullsuperscript𝑓superscript𝑥rangesuperscript𝑓superscript𝑥\text{null}\left(f^{\prime}(x^{*})\right)\perp\text{range}\left(f^{\prime}(x^{% *})\right)null ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ⟂ range ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ). Lastly, let \|\cdot\|∥ ⋅ ∥ denote the Euclidean 2-norm, Bρ(x)subscript𝐵𝜌𝑥B_{\rho}(x)italic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x ) denote a ball of radius ρ𝜌\rhoitalic_ρ centered at x𝑥xitalic_x, ek=xkxsubscript𝑒𝑘subscript𝑥𝑘superscript𝑥e_{k}=x_{k}-x^{*}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, wk+1=f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}=-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and

θk+1=wk+1γk+1(wk+1wk)wk+1,subscript𝜃𝑘1normsubscript𝑤𝑘1subscript𝛾𝑘1subscript𝑤𝑘1subscript𝑤𝑘normsubscript𝑤𝑘1\displaystyle\theta_{k+1}=\frac{\|w_{k+1}-\gamma_{k+1}(w_{k+1}-w_{k})\|}{\|w_{% k+1}\|},italic_θ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = divide start_ARG ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ end_ARG start_ARG ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ end_ARG , (1)

where γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is computed via Algorithm 2. The term θk+1subscript𝜃𝑘1\theta_{k+1}italic_θ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is known as the optimization gain, and is key to determining when Anderson acceleration is successful both in the singular and nonsingular cases [10, 42].

Algorithm 4 γ𝛾\gammaitalic_γ-safeguarding
1:  Given xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, xk1subscript𝑥𝑘1x_{k-1}italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, wk+1subscript𝑤𝑘1w_{k+1}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, and r(0,1)𝑟01r\in(0,1)italic_r ∈ ( 0 , 1 ). Set λ=1𝜆1\lambda=1italic_λ = 1.
2:  βk+1rwk+1/wksubscript𝛽𝑘1𝑟normsubscript𝑤𝑘1normsubscript𝑤𝑘\beta_{k+1}\leftarrow r\|w_{k+1}\|/\|w_{k}\|italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_r ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥
3:  if γk+1=0subscript𝛾𝑘10\gamma_{k+1}=0italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 0 or γk+11subscript𝛾𝑘11\gamma_{k+1}\geq 1italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≥ 1 then
4:     λ0𝜆0\lambda\leftarrow 0italic_λ ← 0
5:  else if |γk+1|/|1γk+1|>βk+1subscript𝛾𝑘11subscript𝛾𝑘1subscript𝛽𝑘1|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}| italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | / | 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | > italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT then
6:     λβk+1γk+1(βk+1+sign(γk+1))𝜆subscript𝛽𝑘1subscript𝛾𝑘1subscript𝛽𝑘1signsubscript𝛾𝑘1\lambda\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}italic_λ ← divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) end_ARG
7:  end if
8:  xk+1xk+wk+1λγk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1𝜆subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
1:  Choose x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and r(0,1)𝑟01r\in(0,1)italic_r ∈ ( 0 , 1 ). Set w1=f(x0)1f(x0)subscript𝑤1superscript𝑓superscriptsubscript𝑥01𝑓subscript𝑥0w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and x1=x0+w1subscript𝑥1subscript𝑥0subscript𝑤1x_{1}=x_{0}+w_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
2:  for k=1,2,… do
3:     wk+1f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4:     γk+1(wk+1wk)Twk+1/wk+1wk22subscript𝛾𝑘1superscriptsubscript𝑤𝑘1subscript𝑤𝑘𝑇subscript𝑤𝑘1superscriptsubscriptnormsubscript𝑤𝑘1subscript𝑤𝑘22\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT / ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
5:     βk+1rwk+1/wksubscript𝛽𝑘1𝑟normsubscript𝑤𝑘1normsubscript𝑤𝑘\beta_{k+1}\leftarrow r\|w_{k+1}\|/\|w_{k}\|italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_r ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥
6:     λ1𝜆1\lambda\leftarrow 1italic_λ ← 1
7:     if γk+1=0subscript𝛾𝑘10\gamma_{k+1}=0italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 0 or γk+11subscript𝛾𝑘11\gamma_{k+1}\geq 1italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≥ 1 then
8:        λ0𝜆0\lambda\leftarrow 0italic_λ ← 0
9:     else if |γk+1|/|1γk+1|>βk+1subscript𝛾𝑘11subscript𝛾𝑘1subscript𝛽𝑘1|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}| italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | / | 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | > italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT then
10:        λβk+1γk+1(βk+1+sign(γk+1))𝜆subscript𝛽𝑘1subscript𝛾𝑘1subscript𝛽𝑘1signsubscript𝛾𝑘1\lambda\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}italic_λ ← divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) end_ARG
11:     end if
12:     xk+1xk+wk+1λγk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1𝜆subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
13:  end for
Algorithm 5 γ𝛾\gammaitalic_γ-Safeguarded Newton-Anderson (γNA(r))𝛾NA𝑟\left(\gamma\text{NA}(r)\right)( italic_γ NA ( italic_r ) )

To implement γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ), one replaces line 5 in Algorithm 2 with Algorithm 4 giving Algorithm 5. The resulting steps, xk+1=xk+wk+1λγk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1𝜆subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}=x_{k}+w_{k+1}-\lambda\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), can be viewed as NA steps scaled by λ𝜆\lambdaitalic_λ towards a Newton step based on a user-chosen parameter, r𝑟ritalic_r, which is set at the start of the solve. This parameter determines how strongly the NA steps are scaled towards a Newton step, i.e., how close a γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) step is to a Newton step. When r0𝑟0r\approx 0italic_r ≈ 0, the γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) iterates will be heavily scaled towards the latest Newton step, and when r1𝑟1r\approx 1italic_r ≈ 1, the γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) iterates will behave more like standard NA. The idea behind γ𝛾\gammaitalic_γ-safeguarding is to take advantage of the particular convergence behavior of Newton’s method near singular points, which we now describe. Since dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1, N𝑁Nitalic_N is spanned by some nonzero φn𝜑superscript𝑛\varphi\in\mathbb{R}^{n}italic_φ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. If the linear operator D^():=PNf′′(x)(φ,PN())assign^𝐷subscript𝑃𝑁superscript𝑓′′superscript𝑥𝜑subscript𝑃𝑁\hat{D}(\cdot):=P_{N}f^{\prime\prime}(x^{*})\left(\varphi,P_{N}(\cdot)\right)over^ start_ARG italic_D end_ARG ( ⋅ ) := italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ( italic_φ , italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( ⋅ ) ) is nonsingular as a map from N𝑁Nitalic_N to N𝑁Nitalic_N, then there exists ρ^>0^𝜌0\hat{\rho}>0over^ start_ARG italic_ρ end_ARG > 0 and σ^>0^𝜎0\hat{\sigma}>0over^ start_ARG italic_σ end_ARG > 0 such that f(x)superscript𝑓𝑥f^{\prime}(x)italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) is nonsingular for all xW^:=Bρ^(x){x:PR(xx)<σ^PN(xx)}𝑥^𝑊assignsubscript𝐵^𝜌superscript𝑥conditional-set𝑥normsubscript𝑃𝑅𝑥superscript𝑥^𝜎normsubscript𝑃𝑁𝑥superscript𝑥x\in\hat{W}:=B_{\hat{\rho}}(x^{*})\cap\{x:\|P_{R}(x-x^{*})\|<\hat{\sigma}\|P_{% N}(x-x^{*})\|\}italic_x ∈ over^ start_ARG italic_W end_ARG := italic_B start_POSTSUBSCRIPT over^ start_ARG italic_ρ end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∩ { italic_x : ∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ < over^ start_ARG italic_σ end_ARG ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ } and Newton’s method converges linearly to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from any x0W^subscript𝑥0^𝑊x_{0}\in\hat{W}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ over^ start_ARG italic_W end_ARG [11]. The assumption that D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG is nonsingular as a linear map on N𝑁Nitalic_N is equivalent to the assumption that f𝑓fitalic_f is 2-regular at xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in the direction φ𝜑\varphiitalic_φ [19]. A stronger result due to Griewank [22, Theorem 6.1] says that when D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG is nonsingular, Newton’s method converges from every x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in a starlike region with density 1 with respect to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and the iterates lead into W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG provided x0xnormsubscript𝑥0superscript𝑥\|x_{0}-x^{*}\|∥ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ is sufficiently small. Thus for our purposes in this paper studying local convergence of NA, it suffices to study the behavior of iterates in W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG. Though the focus of this work is the dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1 case, we note that Griewank’s Theorem [22, Theorem 6.1] holds for dimN>1dimension𝑁1\dim N>1roman_dim italic_N > 1, and numerical experiments from [10] on small-scale problems indicate that NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) are effective when dimN>1dimension𝑁1\dim N>1roman_dim italic_N > 1. While the current theoretical results for γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) require dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1, extensions of these results for dimN>1dimension𝑁1\dim N>1roman_dim italic_N > 1 will be studied in future work.

The main challenge in accelerating Newton’s method is ensuring the iterates remain in W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG, which requires PR(xx)/PN(xx)normsubscript𝑃𝑅𝑥superscript𝑥normsubscript𝑃𝑁𝑥superscript𝑥\|P_{R}(x-x^{*})\|/\|P_{N}(x-x^{*})\|∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ to remain bounded. In other words, the iterates can’t be accelerated “too much” along the null space. Evidently, Anderson may accelerate Newton significantly, especially in W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG when dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1. This is demonstrated by the following proposition. We will use the notation (xk+wk+1)α:=xk+wk+1γk+1(xkxk1+wk+1wk)assignsuperscriptsubscript𝑥𝑘subscript𝑤𝑘1𝛼subscript𝑥𝑘subscript𝑤𝑘1subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘(x_{k}+w_{k+1})^{\alpha}:=x_{k}+w_{k+1}-\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_{% k})( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT := italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Similarly, PNekα=PNekγk+1PN(ekek1)subscript𝑃𝑁superscriptsubscript𝑒𝑘𝛼subscript𝑃𝑁subscript𝑒𝑘subscript𝛾𝑘1subscript𝑃𝑁subscript𝑒𝑘subscript𝑒𝑘1P_{N}e_{k}^{\alpha}=P_{N}e_{k}-\gamma_{k+1}P_{N}(e_{k}-e_{k-1})italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ), (TkPRek)α:=TkPRekγk+1(TkPRekTk1PRek1)assignsuperscriptsubscript𝑇𝑘subscript𝑃𝑅subscript𝑒𝑘𝛼subscript𝑇𝑘subscript𝑃𝑅subscript𝑒𝑘subscript𝛾𝑘1subscript𝑇𝑘subscript𝑃𝑅subscript𝑒𝑘subscript𝑇𝑘1subscript𝑃𝑅subscript𝑒𝑘1(T_{k}P_{R}e_{k})^{\alpha}:=T_{k}P_{R}e_{k}-\gamma_{k+1}(T_{k}P_{R}e_{k}-T_{k-% 1}P_{R}e_{k-1})( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT := italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_T start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ), and wk+1α=wk+1γk+1(wk+1wk)superscriptsubscript𝑤𝑘1𝛼subscript𝑤𝑘1subscript𝛾𝑘1subscript𝑤𝑘1subscript𝑤𝑘w_{k+1}^{\alpha}=w_{k+1}-\gamma_{k+1}(w_{k+1}-w_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Here Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes a linear map defined in the proof of Proposition 2.1. We also note that D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG is nonsingular if and only if D^(x)():=PNf′′(x)(PN(xx),PN())assign^𝐷𝑥subscript𝑃𝑁superscript𝑓′′superscript𝑥subscript𝑃𝑁𝑥superscript𝑥subscript𝑃𝑁\hat{D}(x)(\cdot):=P_{N}f^{\prime\prime}(x^{*})(P_{N}(x-x^{*}),P_{N}(\cdot))over^ start_ARG italic_D end_ARG ( italic_x ) ( ⋅ ) := italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ( italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( ⋅ ) ) is nonsingular for all x𝑥xitalic_x with PN(xx)0subscript𝑃𝑁𝑥superscript𝑥0P_{N}(x-x^{*})\neq 0italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≠ 0.

Proposition 2.1.

Let fC3𝑓superscript𝐶3f\in C^{3}italic_f ∈ italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1, D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG nonsingular, and xk,xk1W^subscript𝑥𝑘subscript𝑥𝑘1^𝑊x_{k},x_{k-1}\in\hat{W}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ over^ start_ARG italic_W end_ARG so that xk+1=(xk+wk+1)αsubscript𝑥𝑘1superscriptsubscript𝑥𝑘subscript𝑤𝑘1𝛼x_{k+1}=(x_{k}+w_{k+1})^{\alpha}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT is well-defined. If PNwk+1PNwksubscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘P_{N}w_{k+1}\neq P_{N}w_{k}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≠ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and |1γk+1|PNek|γk+1|PNek101subscript𝛾𝑘1normsubscript𝑃𝑁subscript𝑒𝑘subscript𝛾𝑘1normsubscript𝑃𝑁subscript𝑒𝑘10|1-\gamma_{k+1}|\,\|P_{N}e_{k}\|\neq|\gamma_{k+1}|\,\|P_{N}e_{k-1}\|\neq 0| 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ≠ | italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ ≠ 0, then for sufficiently small σ^>0^𝜎0\hat{\sigma}>0over^ start_ARG italic_σ end_ARG > 0 and ρ^>0^𝜌0\hat{\rho}>0over^ start_ARG italic_ρ end_ARG > 0 there is a constant C=C(σ^,ρ^)𝐶𝐶^𝜎^𝜌C=C(\hat{\sigma},\hat{\rho})italic_C = italic_C ( over^ start_ARG italic_σ end_ARG , over^ start_ARG italic_ρ end_ARG ) such that

PNek+1Cmax{|1γ^|,|γ^|}max{ek2,ek12},normsubscript𝑃𝑁subscript𝑒𝑘1𝐶1^𝛾^𝛾superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\displaystyle\|P_{N}e_{k+1}\|\leq C\max\{|1-\hat{\gamma}|,|\hat{\gamma}|\}\max% \{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\},∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ≤ italic_C roman_max { | 1 - over^ start_ARG italic_γ end_ARG | , | over^ start_ARG italic_γ end_ARG | } roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , (2)

where γ^:=(PNwk+1)T(PNwk+1PNwk)/PNwk+1PNwk2assign^𝛾superscriptsubscript𝑃𝑁subscript𝑤𝑘1𝑇subscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘superscriptnormsubscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘2\hat{\gamma}:=(P_{N}w_{k+1})^{T}(P_{N}w_{k+1}-P_{N}w_{k})/\|P_{N}w_{k+1}-P_{N}% w_{k}\|^{2}over^ start_ARG italic_γ end_ARG := ( italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Proof.

First note that wk+1α=PNwk+1α+PRwk+1αsuperscriptsubscript𝑤𝑘1𝛼subscript𝑃𝑁superscriptsubscript𝑤𝑘1𝛼subscript𝑃𝑅superscriptsubscript𝑤𝑘1𝛼w_{k+1}^{\alpha}=P_{N}w_{k+1}^{\alpha}+P_{R}w_{k+1}^{\alpha}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT. Since γk+1:=argminγwk+1γ(wk+1wk)assignsubscript𝛾𝑘1subscriptargmin𝛾normsubscript𝑤𝑘1𝛾subscript𝑤𝑘1subscript𝑤𝑘\gamma_{k+1}:=\text{argmin}_{\gamma\in\mathbb{R}}\|w_{k+1}-\gamma(w_{k+1}-w_{k% })\|italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := argmin start_POSTSUBSCRIPT italic_γ ∈ blackboard_R end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ by Algorithm 2, and RNperpendicular-to𝑅𝑁R\perp Nitalic_R ⟂ italic_N, we have that for any γ𝛾\gamma\in\mathbb{R}italic_γ ∈ blackboard_R,

wk+1α2PNwk+1γ(PNwk+1PNwk)2+PRwk+1γ(PRwk+1PRwk)2.superscriptnormsuperscriptsubscript𝑤𝑘1𝛼2superscriptnormsubscript𝑃𝑁subscript𝑤𝑘1𝛾subscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘2superscriptnormsubscript𝑃𝑅subscript𝑤𝑘1𝛾subscript𝑃𝑅subscript𝑤𝑘1subscript𝑃𝑅subscript𝑤𝑘2\displaystyle\|w_{k+1}^{\alpha}\|^{2}\leq\|P_{N}w_{k+1}-\gamma(P_{N}w_{k+1}-P_% {N}w_{k})\|^{2}+\|P_{R}w_{k+1}-\gamma(P_{R}w_{k+1}-P_{R}w_{k})\|^{2}.∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ ( italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ ( italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (3)

Taking γ=γ^𝛾^𝛾\gamma=\hat{\gamma}italic_γ = over^ start_ARG italic_γ end_ARG gives PNwk+1γ^(PNwk+1PNwk)2=0superscriptnormsubscript𝑃𝑁subscript𝑤𝑘1^𝛾subscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘20\|P_{N}w_{k+1}-\hat{\gamma}(P_{N}w_{k+1}-P_{N}w_{k})\|^{2}=0∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_γ end_ARG ( italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0. By Proposition 3.1 in [10], we can write

wk+1α=(1/2)PNekα+((TkI)PRek)α+qk1k,superscriptsubscript𝑤𝑘1𝛼12subscript𝑃𝑁superscriptsubscript𝑒𝑘𝛼superscriptsubscript𝑇𝑘𝐼subscript𝑃𝑅subscript𝑒𝑘𝛼superscriptsubscript𝑞𝑘1𝑘\displaystyle w_{k+1}^{\alpha}=-(1/2)P_{N}e_{k}^{\alpha}+\left((T_{k}-I)P_{R}e% _{k}\right)^{\alpha}+q_{k-1}^{k},italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT = - ( 1 / 2 ) italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + ( ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_I ) italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , (4)

where Tk():=(1/2)D^(xk)1f′′(xk)(ek,)assignsubscript𝑇𝑘12^𝐷superscriptsubscript𝑥𝑘1superscript𝑓′′subscript𝑥𝑘subscript𝑒𝑘T_{k}(\cdot):=(1/2)\hat{D}(x_{k})^{-1}f^{\prime\prime}(x_{k})(e_{k},\cdot)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) := ( 1 / 2 ) over^ start_ARG italic_D end_ARG ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ⋅ ) is a linear map whose range lies in N𝑁Nitalic_N [10], and qk1kcmax{|1γk+1|,|γk+1|}max{ek2,ek12}normsuperscriptsubscript𝑞𝑘1𝑘𝑐1subscript𝛾𝑘1subscript𝛾𝑘1superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\|q_{k-1}^{k}\|\leq c\max\{|1-\gamma_{k+1}|,|\gamma_{k+1}|\}\max\{\|e_{k}\|^{2% },\|e_{k-1}\|^{2}\}∥ italic_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_c roman_max { | 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | , | italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | } roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } for a constant c𝑐citalic_c determined by f𝑓fitalic_f. Hence wk+1αPRwk+1γ^PR(wk+1wk)cmax{|1γ^|,|γ^|}max{ek2,ek12}.normsuperscriptsubscript𝑤𝑘1𝛼normsubscript𝑃𝑅subscript𝑤𝑘1^𝛾subscript𝑃𝑅subscript𝑤𝑘1subscript𝑤𝑘𝑐1^𝛾^𝛾superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\|w_{k+1}^{\alpha}\|\leq\|P_{R}w_{k+1}-\hat{\gamma}P_{R}(w_{k+1}-w_{k})\|\leq c% \max\{|1-\hat{\gamma}|,|\hat{\gamma}|\}\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\}.∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∥ ≤ ∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_γ end_ARG italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ ≤ italic_c roman_max { | 1 - over^ start_ARG italic_γ end_ARG | , | over^ start_ARG italic_γ end_ARG | } roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } . We also have from Proposition 3.1 in [10] that PNek+1=(1/2)PNekα+(TkPRek)α+qk1ksubscript𝑃𝑁subscript𝑒𝑘112subscript𝑃𝑁superscriptsubscript𝑒𝑘𝛼superscriptsubscript𝑇𝑘subscript𝑃𝑅subscript𝑒𝑘𝛼superscriptsubscript𝑞𝑘1𝑘P_{N}e_{k+1}=(1/2)P_{N}e_{k}^{\alpha}+(T_{k}P_{R}e_{k})^{\alpha}+q_{k-1}^{k}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ( 1 / 2 ) italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. Let μk+1e=(TkPRek)α+qk1k/(1/2)PNekαsuperscriptsubscript𝜇𝑘1𝑒normsuperscriptsubscript𝑇𝑘subscript𝑃𝑅subscript𝑒𝑘𝛼superscriptsubscript𝑞𝑘1𝑘norm12subscript𝑃𝑁superscriptsubscript𝑒𝑘𝛼\mu_{k+1}^{e}=\|(T_{k}P_{R}e_{k})^{\alpha}+q_{k-1}^{k}\|/\|(1/2)P_{N}e_{k}^{% \alpha}\|italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT = ∥ ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ / ∥ ( 1 / 2 ) italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∥. This expansion of PNek+1subscript𝑃𝑁subscript𝑒𝑘1P_{N}e_{k+1}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT combined with Equation (4) gives

PNek+1(1+μk+1e1μk+1e)PNwk+1αCkmax{|1γ^|,|γ^|}max{ek2,ek12},normsubscript𝑃𝑁subscript𝑒𝑘11superscriptsubscript𝜇subscript𝑘1𝑒1superscriptsubscript𝜇𝑘1𝑒normsubscript𝑃𝑁superscriptsubscript𝑤𝑘1𝛼subscript𝐶𝑘1^𝛾^𝛾superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\displaystyle\|P_{N}e_{k+1}\|\leq\left(\frac{1+\mu_{k_{+}1}^{e}}{1-\mu_{k+1}^{% e}}\right)\|P_{N}w_{k+1}^{\alpha}\|\leq C_{k}\max\{|1-\hat{\gamma}|,|\hat{% \gamma}|\}\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\},∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG 1 + italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT + end_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_ARG ) ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ∥ ≤ italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_max { | 1 - over^ start_ARG italic_γ end_ARG | , | over^ start_ARG italic_γ end_ARG | } roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , (5)

where we define Ck:=c(1+μk+1e)(1μk+1e)1assignsubscript𝐶𝑘𝑐1superscriptsubscript𝜇subscript𝑘1𝑒superscript1superscriptsubscript𝜇𝑘1𝑒1C_{k}:=c(1+\mu_{k_{+}1}^{e})(1-\mu_{k+1}^{e})^{-1}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := italic_c ( 1 + italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT + end_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) ( 1 - italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. One can show that μk+1e0superscriptsubscript𝜇𝑘1𝑒0\mu_{k+1}^{e}\to 0italic_μ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT → 0 as σ^^𝜎\hat{\sigma}over^ start_ARG italic_σ end_ARG and ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG tend to zero [10]. Thus for sufficiently small σ^^𝜎\hat{\sigma}over^ start_ARG italic_σ end_ARG and ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG we have CkCsubscript𝐶𝑘𝐶C_{k}\leq Citalic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_C. This completes the proof. ∎

Note that γ^^𝛾\hat{\gamma}over^ start_ARG italic_γ end_ARG can be very large when PNwk+1PNwksubscript𝑃𝑁subscript𝑤𝑘1subscript𝑃𝑁subscript𝑤𝑘P_{N}w_{k+1}\approx P_{N}w_{k}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≈ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, which may correspond to the terminal phase of the solve. So this bound is meaningful in the asymptotic regime when there is still a significant decrease in the residual at each step. Such a bound is good for a single step, but this dramatic acceleration of PNek+1subscript𝑃𝑁subscript𝑒𝑘1P_{N}e_{k+1}italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT could place xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT outside the domain of invertibility W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG, i.e., xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT may stray too far from N𝑁Nitalic_N. This is where γ𝛾\gammaitalic_γ-safeguarding is useful. It ensures that the γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) iterates remain within W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG by taking advantage of the way Newton steps are attracted to N𝑁Nitalic_N and scaling NA steps towards Newton steps when the conditions of Algorithm 4 are met. This is also the key to the convergence proof of γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) given in [10]. However, given the results of [47], if the problem is nonsingular one should use Newton’s method without Anderson, but it is not always obvious a priori if the problem at hand is singular or nonsingular. In the next section, we develop an adaptive version of γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) that enjoys guaranteed local convergence when applied to singular problems, but can also detect nonsingular problems automatically, at no additional computational cost, and “turn off” NA in response. This leads to local quadratic convergence if the problem is nonsingular.

3 Adaptive γ𝛾\gammaitalic_γ-safeguarding

It was observed in [10] that γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) performed competitively with standard NA(1). For example, γ𝛾\gammaitalic_γNA(0.5) could outperform NA(1) when applied to certain nonsingular problems. This is not surprising given the results of [47], and it had previously been observed numerically in [45] that NA does not necessarily improve convergence when applied to nonsingular problems. Thus, by setting the γ𝛾\gammaitalic_γ-safeguarding parameter r=0.5𝑟0.5r=0.5italic_r = 0.5, thereby scaling the iterates closer to Newton steps, we see less of the effect of a full NA step on the order of convergence.

Even if it is known that the problem is nonsingular, one may still wish to use NA to take advantage of the larger domain of convergence [45]. In such a scenario one could set r𝑟ritalic_r close to zero, but this means the effect of NA is never completely eliminated which could lead to a smaller order of convergence relative to Newton for nonsingular problems. This, and the fact that often it is not known a priori if the problem is singular or nonsingular, motivates the development of an adaptive form of γ𝛾\gammaitalic_γNA(r𝑟ritalic_r) that can automatically detect nonsingular problems, and scale an NA step accordingly, without sacrificing local convergence and acceleration for singular problems. We will denote this adaptive choice of r𝑟ritalic_r from Algorithm 4 by rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, with k𝑘kitalic_k denoting the iteration count. There are three criteria that the choice of rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT should satisfy, which we record in Criteria 3.1.

Criteria 3.1.

An adaptive γ𝛾\gammaitalic_γ-safeguarding tolerance rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT should satisfy the following.

  1. 0.

    rk+11much-less-thansubscript𝑟𝑘11r_{k+1}{\ll}1italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≪ 1 if PNek/PNek11much-less-thannormsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘11\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|{\ll}1∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ ≪ 1;

  2. 0.

    rk+11subscript𝑟𝑘11r_{k+1}\approx 1italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≈ 1 if PNek/PNek11normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘11\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|\approx 1∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ ≈ 1; and

  3. 0.

    limkrk+1=0subscript𝑘subscript𝑟𝑘10\lim_{k\to\infty}r_{k+1}=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 0 if f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular.

Criterion 3.1.3.1 says that if PNek/PNek1normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘1\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ is very small, then we want to scale the NA step generated from xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xk1subscript𝑥𝑘1x_{k-1}italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT heavily towards xk+wk+1subscript𝑥𝑘subscript𝑤𝑘1x_{k}+w_{k+1}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. In the singular case, this will (locally) keep xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT within the domain of invertibility. Alternatively, if the problem is nonsingular, but close to a singular problem, then scaling NA towards a Newton step is also preferred when PNek/PNek1normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘1\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ is small since then we do not slow Newton’s fast local quadratic convergence. Criterion 3.1.3.1 says that if PNek/PNek1normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘1\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ is close to one, then the error is not decreasing significantly, and we want to allow NA to act on xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xk1subscript𝑥𝑘1x_{k-1}italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT without significant scaling of γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT from safeguarding. Lastly, Criterion 3.1.3.1 is important because if f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, Newton’s method will converge quadratically in a neighborhood of xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We therefore want to “turn off” NA near xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and insisting that rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0 asymptotically achieves this.

Observing Criteria 3.1.3.1-3.1, one may note that we essentially want rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT to behave like PNek/PNek1normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘1\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ within the domain of convergence W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG. Of course, we can not compute PNek/PNek1normsubscript𝑃𝑁subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘1\|P_{N}e_{k}\|/\|P_{N}e_{k-1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥, but Equation (4) says that in W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG, wk+1PNeksubscript𝑤𝑘1subscript𝑃𝑁subscript𝑒𝑘w_{k+1}\approx P_{N}e_{k}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≈ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. So if we take rk+1=wk+1/wksubscript𝑟𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘r_{k+1}=\|w_{k+1}\|/\|w_{k}\|italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥, we can expect Criteria 3.1.3.1-3.1 to be enforced locally. For Criterion 3.1.3.1, if f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, then locally we will have wk+1/wk0normsubscript𝑤𝑘1normsubscript𝑤𝑘0\|w_{k+1}\|/\|w_{k}\|\to 0∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ → 0. Thus rk+1=wk+1/wksubscript𝑟𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘r_{k+1}=\|w_{k+1}\|/\|w_{k}\|italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ satisfies the three criteria within the domain of convergence. With this choice of rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, we have adaptive γ𝛾\gammaitalic_γ-safeguarding and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ).

Algorithm 6 Adaptive γ𝛾\gammaitalic_γ-safeguarding
1:  Given xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, xk1subscript𝑥𝑘1x_{k-1}italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, wk+1subscript𝑤𝑘1w_{k+1}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, and r^(0,1)^𝑟01\hat{r}\in(0,1)over^ start_ARG italic_r end_ARG ∈ ( 0 , 1 ), set ηk+1=wk+1/wksubscript𝜂𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘\eta_{k+1}=\|w_{k+1}\|/\|w_{k}\|italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥, rk+1=min{ηk+1,r^}subscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}=\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG }, and λa=1superscript𝜆𝑎1\lambda^{a}=1italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = 1.
2:  βk+1rk+1ηk+1subscript𝛽𝑘1subscript𝑟𝑘1subscript𝜂𝑘1\beta_{k+1}\leftarrow r_{k+1}\eta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT
3:  if γk+1=0subscript𝛾𝑘10\gamma_{k+1}=0italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 0 or γk+11subscript𝛾𝑘11\gamma_{k+1}\geq 1italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≥ 1 then
4:     λa0superscript𝜆𝑎0\lambda^{a}\leftarrow 0italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ← 0
5:  else if |γk+1|/|1γk+1|>βk+1subscript𝛾𝑘11subscript𝛾𝑘1subscript𝛽𝑘1|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}| italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | / | 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | > italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT then
6:     λaβk+1γk+1(βk+1+sign(γk+1))superscript𝜆𝑎subscript𝛽𝑘1subscript𝛾𝑘1subscript𝛽𝑘1signsubscript𝛾𝑘1\lambda^{a}\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ← divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) end_ARG
7:  end if
8:  xk+1xk+wk+1λaγk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1superscript𝜆𝑎subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda^{a}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-% w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
1:  Choose x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and r^(0,1)^𝑟01\hat{r}\in(0,1)over^ start_ARG italic_r end_ARG ∈ ( 0 , 1 ). Set w1=f(x0)1f(x0)subscript𝑤1superscript𝑓superscriptsubscript𝑥01𝑓subscript𝑥0w_{1}=-f^{\prime}(x_{0})^{-1}f(x_{0})italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and x1=x0+w1subscript𝑥1subscript𝑥0subscript𝑤1x_{1}=x_{0}+w_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
2:  for k=1,2,… do
3:     wk+1f(xk)1f(xk)subscript𝑤𝑘1superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘w_{k+1}\leftarrow-f^{\prime}(x_{k})^{-1}f(x_{k})italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4:     γk+1(wk+1wk)Twk+1/wk+1wk22subscript𝛾𝑘1superscriptsubscript𝑤𝑘1subscript𝑤𝑘𝑇subscript𝑤𝑘1superscriptsubscriptnormsubscript𝑤𝑘1subscript𝑤𝑘22\gamma_{k+1}\leftarrow(w_{k+1}-w_{k})^{T}w_{k+1}/\|w_{k+1}-w_{k}\|_{2}^{2}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT / ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
5:     ηk+1wk+1/wksubscript𝜂𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘\eta_{k+1}\leftarrow\|w_{k+1}\|/\|w_{k}\|italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥
6:     rk+1min{ηk+1,r^}subscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}\leftarrow\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG }
7:     βk+1rk+1ηk+1subscript𝛽𝑘1subscript𝑟𝑘1subscript𝜂𝑘1\beta_{k+1}\leftarrow r_{k+1}\eta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT
8:     λa1superscript𝜆𝑎1\lambda^{a}\leftarrow 1italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ← 1
9:     if γk+1=0subscript𝛾𝑘10\gamma_{k+1}=0italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 0 or γk+11subscript𝛾𝑘11\gamma_{k+1}\geq 1italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≥ 1 then
10:        λa0superscript𝜆𝑎0\lambda^{a}\leftarrow 0italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ← 0
11:     else if |γk+1|/|1γk+1|>βk+1subscript𝛾𝑘11subscript𝛾𝑘1subscript𝛽𝑘1|\gamma_{k+1}|/|1-\gamma_{k+1}|>\beta_{k+1}| italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | / | 1 - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | > italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT then
12:        λaβk+1γk+1(βk+1+sign(γk+1))superscript𝜆𝑎subscript𝛽𝑘1subscript𝛾𝑘1subscript𝛽𝑘1signsubscript𝛾𝑘1\lambda^{a}\leftarrow\dfrac{\beta_{k+1}}{\gamma_{k+1}\left(\beta_{k+1}+% \operatorname{sign}(\gamma_{k+1})\right)}italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ← divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ) end_ARG
13:     end if
14:     xk+1xk+wk+1λaγk+1(xkxk1+wk+1wk)subscript𝑥𝑘1subscript𝑥𝑘subscript𝑤𝑘1superscript𝜆𝑎subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}\leftarrow x_{k}+w_{k+1}-\lambda^{a}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-% w_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
15:  end for
Algorithm 7 Adaptive γ𝛾\gammaitalic_γ-Safeguarded Newton-Anderson (γNAA(r^)))\left(\operatorname{\gamma\text{NAA}(\hat{r})})\right)( start_OPFUNCTION italic_γ NAA ( over^ start_ARG roman_r end_ARG ) end_OPFUNCTION ) )

Adaptive γ𝛾\gammaitalic_γ-safeguarding differs from Algorithm 4 only in line 2. In Algorithm 4, βk+1=rηk+1subscript𝛽𝑘1𝑟subscript𝜂𝑘1\beta_{k+1}=r\eta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_r italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT whereas βk+1=rk+1ηk+1subscript𝛽𝑘1subscript𝑟𝑘1subscript𝜂𝑘1\beta_{k+1}=r_{k+1}\eta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT in Algorithm 6 with rk+1=min{ηk+1,r^}subscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}=\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG }. This one change can have a significant impact on convergence as demonstrated in Section 4, and can enable locally quadratic convergence when applied to nonsingular problems (see Corollary 3.1). Similar to γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ), one implements γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) by replacing line 5 in Algorithm 2 with Algorithm 6 and setting r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG at the start of the solve. The result is Algorithm 7. The choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG here sets the weakest safeguarding the user wants to impose. Thus we are always safeguarding at least as strictly as standard γ𝛾\gammaitalic_γ-safeguarding with r=r^𝑟^𝑟r=\hat{r}italic_r = over^ start_ARG italic_r end_ARG. Stated concisely, we have rk+1r^subscript𝑟𝑘1^𝑟r_{k+1}\leq\hat{r}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ over^ start_ARG italic_r end_ARG. Local convergence of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) then follows from Theorem 6.16.16.16.1 in [10]. To state this precisely, let λk+1subscript𝜆𝑘1\lambda_{k+1}italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the value of λasuperscript𝜆𝑎\lambda^{a}italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT computed by Algorithm 6 at step k𝑘kitalic_k, θk+1λ=wk+1λk+1γk+1(wk+1wk)/wk+1superscriptsubscript𝜃𝑘1𝜆normsubscript𝑤𝑘1subscript𝜆𝑘1subscript𝛾𝑘1subscript𝑤𝑘1subscript𝑤𝑘normsubscript𝑤𝑘1\theta_{k+1}^{\lambda}=\|w_{k+1}-\lambda_{k+1}\gamma_{k+1}(w_{k+1}-w_{k})\|/\|% w_{k+1}\|italic_θ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT = ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥, xk+1NA:=xk+wk+1λk+1γk+1(xkxk1+wk+1wk)assignsuperscriptsubscript𝑥𝑘1𝑁𝐴subscript𝑥𝑘subscript𝑤𝑘1subscript𝜆𝑘1subscript𝛾𝑘1subscript𝑥𝑘subscript𝑥𝑘1subscript𝑤𝑘1subscript𝑤𝑘x_{k+1}^{NA}:=x_{k}+w_{k+1}-\lambda_{k+1}\gamma_{k+1}(x_{k}-x_{k-1}+w_{k+1}-w_% {k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N italic_A end_POSTSUPERSCRIPT := italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and σk:=PRek/PNekassignsubscript𝜎𝑘normsubscript𝑃𝑅subscript𝑒𝑘normsubscript𝑃𝑁subscript𝑒𝑘\sigma_{k}:=\|P_{R}e_{k}\|/\|P_{N}e_{k}\|italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := ∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥. Then we have the following theorem.

Theorem 3.1.

Let dimN=1dimension𝑁1\dim N=1roman_dim italic_N = 1, and let D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG be invertible as a map on N𝑁Nitalic_N. Let Wk:=Bek(x){x:PR(xx)<σkPN(xx)}assignsubscript𝑊𝑘subscript𝐵normsubscript𝑒𝑘superscript𝑥conditional-set𝑥normsubscript𝑃𝑅𝑥superscript𝑥subscript𝜎𝑘normsubscript𝑃𝑁𝑥superscript𝑥W_{k}:=B_{\|e_{k}\|}(x^{*})\cap\{x:\|P_{R}(x-x^{*})\|<\sigma_{k}\|P_{N}(x-x^{*% })\|\}italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := italic_B start_POSTSUBSCRIPT ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∩ { italic_x : ∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ < italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ }. If x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is chosen so that σ0<σ^subscript𝜎0^𝜎\sigma_{0}<\hat{\sigma}italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < over^ start_ARG italic_σ end_ARG and e0<ρ^normsubscript𝑒0^𝜌\|e_{0}\|<\hat{\rho}∥ italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ < over^ start_ARG italic_ρ end_ARG, for sufficiently small σ^^𝜎\hat{\sigma}over^ start_ARG italic_σ end_ARG and ρ^^𝜌\hat{\rho}over^ start_ARG italic_ρ end_ARG, x1=x0+w1subscript𝑥1subscript𝑥0subscript𝑤1x_{1}=x_{0}+w_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and xk+1=xk+1NAsubscript𝑥𝑘1superscriptsubscript𝑥𝑘1𝑁𝐴x_{k+1}=x_{k+1}^{NA}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N italic_A end_POSTSUPERSCRIPT for k1𝑘1k\geq 1italic_k ≥ 1, then Wk+1W0subscript𝑊𝑘1subscript𝑊0W_{k+1}\subset W_{0}italic_W start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⊂ italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for all k0𝑘0k\geq 0italic_k ≥ 0 and xkxsubscript𝑥𝑘superscript𝑥x_{k}\to x^{*}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. That is, {xk}subscript𝑥𝑘\{x_{k}\}{ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } remains well-defined and converges to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Furthermore, there exist constants C>0𝐶0C>0italic_C > 0 and κ(1/2,1)𝜅121\kappa\in(1/2,1)italic_κ ∈ ( 1 / 2 , 1 ) such that

PRek+1normsubscript𝑃𝑅subscript𝑒𝑘1\displaystyle\|P_{R}e_{k+1}\|∥ italic_P start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ Cmax{|1λk+1γk+1|ek2,|λk+1γk+1|ek12}absent𝐶1subscript𝜆𝑘1subscript𝛾𝑘1superscriptnormsubscript𝑒𝑘2subscript𝜆𝑘1subscript𝛾𝑘1superscriptnormsubscript𝑒𝑘12\displaystyle\leq C\max\{|1-\lambda_{k+1}\gamma_{k+1}|\,\|e_{k}\|^{2},|\lambda% _{k+1}\gamma_{k+1}|\,\|e_{k-1}\|^{2}\}≤ italic_C roman_max { | 1 - italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , | italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (6)
PNek+1normsubscript𝑃𝑁subscript𝑒𝑘1\displaystyle\|P_{N}e_{k+1}\|∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ κθk+1λPNekabsent𝜅superscriptsubscript𝜃𝑘1𝜆normsubscript𝑃𝑁subscript𝑒𝑘\displaystyle\leq\kappa\theta_{k+1}^{\lambda}\|P_{N}e_{k}\|≤ italic_κ italic_θ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ (7)

for all k1𝑘1k\geq 1italic_k ≥ 1.

Under the assumptions of Theorem 3.1, Griewank’s Theorem [22, Theorem 6.1] says that Newton’s method almost surely leads into the domain of convergence W^^𝑊\hat{W}over^ start_ARG italic_W end_ARG provided x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is sufficiently close to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This effectively means that if the sequence xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT generated by γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) approaches xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT we will have almost sure convergence eventually, and this convergence will be faster than Newton. The precise improvement is determined by the asymptotic behavior of θk+1λsuperscriptsubscript𝜃𝑘1𝜆\theta_{k+1}^{\lambda}italic_θ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT. Globalization techniques such as linesearch methods may be used to bring the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) iterates closer to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. In particular, γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with an Armijo linesearch was shown to be effective in [10].

Our choice of rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT in Algorithm 6 is partially motivated by Criterion 3.1.3.1. That is, in the case of a nonsingular problem, we prefer to use Newton asymptotically rather than NA. Hence we want rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT to tend to zero as our solver converges, thereby scaling the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) iterates heavily towards pure Newton steps in the asymptotic regime and enjoying quadratic convergence locally. The remainder of this section is dedicated to quantifying how close a γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) iterate is to a standard Newton iterate in the asymptotic regime when f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular. The main result is Theorem 3.2 below which bounds xk+1NAxk+1Newtnormsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newt\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ locally, where xk+1Newt:=xk+wk+1assignsuperscriptsubscript𝑥𝑘1Newtsubscript𝑥𝑘subscript𝑤𝑘1x_{k+1}^{\operatorname{Newt}}:=x_{k}+w_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT := italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, and xk+1NA:=xk+1Newtλk+1γk+1(xk+1NewtxkNewt)assignsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newtsubscript𝜆𝑘1subscript𝛾𝑘1superscriptsubscript𝑥𝑘1Newtsuperscriptsubscript𝑥𝑘Newtx_{k+1}^{\operatorname{NA}}:=x_{k+1}^{\operatorname{Newt}}-\lambda_{k+1}\gamma% _{k+1}(x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT := italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ). This notation is introduced to emphasize the Newton and Newton-Anderson iterates in the comparison. We will also define ekNewt:=xkNewtx=xk1+wkxassignsuperscriptsubscript𝑒𝑘Newtsuperscriptsubscript𝑥𝑘Newtsuperscript𝑥subscript𝑥𝑘1subscript𝑤𝑘superscript𝑥e_{k}^{\operatorname{Newt}}:=x_{k}^{\operatorname{Newt}}-x^{*}=x_{k-1}+w_{k}-x% ^{*}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT := italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The following lemma will be used in the proof of Theorem 3.2. Lemma 3.1 bounds |λk+1γk+1|subscript𝜆𝑘1subscript𝛾𝑘1|\lambda_{k+1}\gamma_{k+1}|| italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT |, the scaled γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT returned by γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) at iteration k𝑘kitalic_k, in terms of ηk+1subscript𝜂𝑘1\eta_{k+1}italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and rk+1=min{ηk+1,r^}subscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}=\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG }. The proof consists of walking through the cases in Algorithm 6, and is therefore left to the interested reader.

Lemma 3.1.

Let ηk+1=wk+1/wksubscript𝜂𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘\eta_{k+1}=\|w_{k+1}\|/\|w_{k}\|italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ / ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ and r^(0,1)^𝑟01\hat{r}\in(0,1)over^ start_ARG italic_r end_ARG ∈ ( 0 , 1 ). Define rk+1:=min{ηk+1,r^}assignsubscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}:=\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG } and βk+1:=rk+1ηk+1assignsubscript𝛽𝑘1subscript𝑟𝑘1subscript𝜂𝑘1\beta_{k+1}:=r_{k+1}\eta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT := italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT as in Algorithm 6. Let λk+1subscript𝜆𝑘1\lambda_{k+1}italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the value computed by Algorithm 6 at iteration k𝑘kitalic_k. If ηk+1<1subscript𝜂𝑘11\eta_{k+1}<1italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < 1, then λk+1γk+1subscript𝜆𝑘1subscript𝛾𝑘1\lambda_{k+1}\gamma_{k+1}italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT returned by Algorithm 6 satisfies

|λk+1γk+1|βk+11βk+1.subscript𝜆𝑘1subscript𝛾𝑘1subscript𝛽𝑘11subscript𝛽𝑘1\displaystyle|\lambda_{k+1}\gamma_{k+1}|\leq\frac{\beta_{k+1}}{1-\beta_{k+1}}.| italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ≤ divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG . (8)

when λk+1=1subscript𝜆𝑘11\lambda_{k+1}=1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 1, and

|λk+1γk+1|=βk+11+sign(γk+1)βk+1.subscript𝜆𝑘1subscript𝛾𝑘1subscript𝛽𝑘11signsubscript𝛾𝑘1subscript𝛽𝑘1\displaystyle|\lambda_{k+1}\gamma_{k+1}|=\frac{\beta_{k+1}}{1+\operatorname{% sign}(\gamma_{k+1})\beta_{k+1}}.| italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | = divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG . (9)

when λk+1<1subscript𝜆𝑘11\lambda_{k+1}<1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < 1.

With Lemma 3.1, we can bound xk+1NAxk+1Newtnormsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newt\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ in terms of eknormsubscript𝑒𝑘\|e_{k}\|∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥, ek1normsubscript𝑒𝑘1\|e_{k-1}\|∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥, and ηk+1subscript𝜂𝑘1\eta_{k+1}italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT.

Theorem 3.2.

If f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, then there exists a ρ>0𝜌0\rho>0italic_ρ > 0 and a constant C𝐶Citalic_C depending only on f𝑓fitalic_f such that for xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xk1subscript𝑥𝑘1x_{k-1}italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT in Bρ(x)subscript𝐵𝜌superscript𝑥B_{\rho}(x^{*})italic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and ηk+1<1subscript𝜂𝑘11\eta_{k+1}<1italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < 1,

xk+1NAxk+1NewtC(βk+11βk+1)max{ek2,ek12}normsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newt𝐶subscript𝛽𝑘11subscript𝛽𝑘1superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\displaystyle\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C% \left(\frac{\beta_{k+1}}{1-\beta_{k+1}}\right)\max\{\|e_{k}\|^{2},\|e_{k-1}\|^% {2}\}∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ italic_C ( divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG ) roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (10)

when λk+1=1subscript𝜆𝑘11\lambda_{k+1}=1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 1, and

xk+1NAxk+1NewtC(βk+11+sign(γk+1)βk+1)max{ek2,ek12}normsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newt𝐶subscript𝛽𝑘11signsubscript𝛾𝑘1subscript𝛽𝑘1superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\displaystyle\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C% \left(\frac{\beta_{k+1}}{1+\operatorname{sign}(\gamma_{k+1})\beta_{k+1}}\right% )\max\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\}∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ italic_C ( divide start_ARG italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 + roman_sign ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_ARG ) roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (11)

when λk+1<1subscript𝜆𝑘11\lambda_{k+1}<1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < 1.

Proof.

Using our notation from the discussion preceding Lemma 3.1, an iterate generated by γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) takes the form xk+1NA=xk+1Newtλk+1γk+1(xk+1NewtxkNewt)superscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newtsubscript𝜆𝑘1subscript𝛾𝑘1superscriptsubscript𝑥𝑘1Newtsuperscriptsubscript𝑥𝑘Newtx_{k+1}^{\operatorname{NA}}=x_{k+1}^{\operatorname{Newt}}-\lambda_{k+1}\gamma_% {k+1}\left(x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\right)italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ). Therefore xk+1NAxk+1Newt=|λk+1γk+1|xk+1NewtxkNewtnormsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newtsubscript𝜆𝑘1subscript𝛾𝑘1normsuperscriptsubscript𝑥𝑘1Newtsuperscriptsubscript𝑥𝑘Newt\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|=|\lambda_{k+1}% \gamma_{k+1}|\,\|x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\|∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ = | italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥. Since f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, we can take ρ𝜌\rhoitalic_ρ sufficiently small to ensure that ej+1NewtCej2normsuperscriptsubscript𝑒𝑗1Newt𝐶superscriptnormsubscript𝑒𝑗2\|e_{j+1}^{\operatorname{Newt}}\|\leq C\|e_{j}\|^{2}∥ italic_e start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ italic_C ∥ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where C𝐶Citalic_C is a constant determined by f𝑓fitalic_f, when ej<ρnormsubscript𝑒𝑗𝜌\|e_{j}\|<\rho∥ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ < italic_ρ for j=k,k1𝑗𝑘𝑘1j=k,k-1italic_j = italic_k , italic_k - 1. Hence, upon adding and subtracting xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT we obtain xk+1NewtxkNewt=ek+1NewtekNewt2Cmax{ek2,ek12}normsuperscriptsubscript𝑥𝑘1Newtsuperscriptsubscript𝑥𝑘Newtnormsuperscriptsubscript𝑒𝑘1Newtsuperscriptsubscript𝑒𝑘Newt2𝐶superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\|x_{k+1}^{\operatorname{Newt}}-x_{k}^{\operatorname{Newt}}\|=\|e_{k+1}^{% \operatorname{Newt}}-e_{k}^{\operatorname{Newt}}\|\leq 2C\max\{\|e_{k}\|^{2},% \|e_{k-1}\|^{2}\}∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ = ∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ 2 italic_C roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. To complete the proof, we write 2C=C2𝐶𝐶2C=C2 italic_C = italic_C and apply Lemma 3.1 to bound |λk+1γk+1|subscript𝜆𝑘1subscript𝛾𝑘1|\lambda_{k+1}\gamma_{k+1}|| italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT | for the cases λk+1=1subscript𝜆𝑘11\lambda_{k+1}=1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = 1 and λk+1<1subscript𝜆𝑘11\lambda_{k+1}<1italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < 1. ∎

We conclude this section with Corollary 3.1, which proves that γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) can recover local quadratic convergence from NA when applied to nonsingular problems.

Corollary 3.1.

If f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, ek<ek1normsubscript𝑒𝑘normsubscript𝑒𝑘1\|e_{k}\|<\|e_{k-1}\|∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ < ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥, and ηk+1<r^subscript𝜂𝑘1^𝑟\eta_{k+1}<\hat{r}italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < over^ start_ARG italic_r end_ARG, then there exists a ρ>0𝜌0\rho>0italic_ρ > 0 and constants C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT depending only on f𝑓fitalic_f such that

ek+1(C11r^2+C2)ek2normsubscript𝑒𝑘1subscript𝐶11superscript^𝑟2subscript𝐶2superscriptnormsubscript𝑒𝑘2\displaystyle\|e_{k+1}\|\leq\left(\frac{C_{1}}{1-\hat{r}^{2}}+C_{2}\right)\|e_% {k}\|^{2}∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (12)

for xk,xk1Bρ(x)subscript𝑥𝑘subscript𝑥𝑘1subscript𝐵𝜌superscript𝑥x_{k},x_{k-1}\in B_{\rho}(x^{*})italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

Proof.

Adding and subtracting ek+1Newtsuperscriptsubscript𝑒𝑘1Newte_{k+1}^{\operatorname{Newt}}italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT to ek+1subscript𝑒𝑘1e_{k+1}italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT gives ek+1xk+1NAxk+1Newt+ek+1Newt.normsubscript𝑒𝑘1normsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newtnormsuperscriptsubscript𝑒𝑘1Newt\|e_{k+1}\|\leq\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|+% \|e_{k+1}^{\operatorname{Newt}}\|.∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ≤ ∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ + ∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ . By Theorem 3.2, xk+1NAxk+1NewtCβk+1(1βk+1)1max{ek2,ek12}normsuperscriptsubscript𝑥𝑘1NAsuperscriptsubscript𝑥𝑘1Newt𝐶subscript𝛽𝑘1superscript1subscript𝛽𝑘11superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12\|x_{k+1}^{\operatorname{NA}}-x_{k+1}^{\operatorname{Newt}}\|\leq C\beta_{k+1}% (1-\beta_{k+1})^{-1}\max\left\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\right\}∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ italic_C italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( 1 - italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, and for xkBρ(x)subscript𝑥𝑘subscript𝐵𝜌superscript𝑥x_{k}\in B_{\rho}(x^{*})italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), ek+1NewtC2ek2normsuperscriptsubscript𝑒𝑘1Newtsubscript𝐶2superscriptnormsubscript𝑒𝑘2\|e_{k+1}^{\operatorname{Newt}}\|\leq C_{2}\|e_{k}\|^{2}∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ∥ ≤ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Since ηk+1<r^subscript𝜂𝑘1^𝑟\eta_{k+1}<\hat{r}italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < over^ start_ARG italic_r end_ARG, we have βk+1=ηk+12subscript𝛽𝑘1superscriptsubscript𝜂𝑘12\beta_{k+1}=\eta_{k+1}^{2}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Moreover, when f(x)superscript𝑓superscript𝑥f^{\prime}(x^{*})italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is nonsingular, Taylor expansion shows that

ηk+1C1ekek1subscript𝜂𝑘1subscript𝐶1normsubscript𝑒𝑘normsubscript𝑒𝑘1\displaystyle\eta_{k+1}\leq C_{1}\frac{\|e_{k}\|}{\|e_{k-1}\|}italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ end_ARG (13)

for xk,xk1Bρ(x)subscript𝑥𝑘subscript𝑥𝑘1subscript𝐵𝜌superscript𝑥x_{k},x_{k-1}\in B_{\rho}(x^{*})italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Thus βk+1max{ek2,ek12}C1ek2subscript𝛽𝑘1superscriptnormsubscript𝑒𝑘2superscriptnormsubscript𝑒𝑘12subscript𝐶1superscriptnormsubscript𝑒𝑘2\beta_{k+1}\max\left\{\|e_{k}\|^{2},\|e_{k-1}\|^{2}\right\}\leq C_{1}\|e_{k}\|% ^{2}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT roman_max { ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∥ italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ≤ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and therefore

ek+1(C11r^2+C2)ek2.normsubscript𝑒𝑘1subscript𝐶11superscript^𝑟2subscript𝐶2superscriptnormsubscript𝑒𝑘2\displaystyle\|e_{k+1}\|\leq\left(\frac{C_{1}}{1-\hat{r}^{2}}+C_{2}\right)\|e_% {k}\|^{2}.∥ italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (14)

4 Numerics

In this section, we demonstrate the effectiveness of NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) near bifurcation points by applying these algorithms to the following parameter-dependent PDEs. All computations are performed on an M1 MacBook with GNU Octave 8.2.0.

4.1 Test Problems

  1. 1.

    Navier-Stokes Flow in a Channel

    {μΔ𝐮+𝐮𝐮+p=𝟎𝐮=𝟎𝐮=𝐮in,Γin,𝐮=0,Γwall,p𝐧+(μ𝐮)𝐧=0,Γout.{\small\left\{\begin{split}-\mu\Delta\operatorname{\mathbf{u}}+\operatorname{% \mathbf{u}}\cdot\nabla\operatorname{\mathbf{u}}+\nabla p&=\mathbf{0}\\ \nabla\cdot\operatorname{\mathbf{u}}&=\mathbf{0}\end{split}\qquad\qquad\begin{% split}\mathbf{u}&=\mathbf{u}_{\text{in}},\hskip 4.5pt\Gamma_{\text{in}},\\ \mathbf{u}&=0,\hskip 4.5pt\Gamma_{\text{wall}},\\ -p\mathbf{n}+(\mu\nabla\mathbf{u})\mathbf{n}&=0,\hskip 4.5pt\Gamma_{\text{out}% }.\end{split}\right.}{ start_ROW start_CELL - italic_μ roman_Δ bold_u + bold_u ⋅ ∇ bold_u + ∇ italic_p end_CELL start_CELL = bold_0 end_CELL end_ROW start_ROW start_CELL ∇ ⋅ bold_u end_CELL start_CELL = bold_0 end_CELL end_ROW start_ROW start_CELL bold_u end_CELL start_CELL = bold_u start_POSTSUBSCRIPT in end_POSTSUBSCRIPT , roman_Γ start_POSTSUBSCRIPT in end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL bold_u end_CELL start_CELL = 0 , roman_Γ start_POSTSUBSCRIPT wall end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL - italic_p bold_n + ( italic_μ ∇ bold_u ) bold_n end_CELL start_CELL = 0 , roman_Γ start_POSTSUBSCRIPT out end_POSTSUBSCRIPT . end_CELL end_ROW (15)
  2. 2.

    Rayleigh-Bénard Convection

    {μΔ𝐮+𝐮𝐮+pRiT𝐞y=𝟎𝐮=0κΔT+𝐮T=0T=1,Γ1:={1}×(0,1),T=0,Γ2:={0}×(0,1),T𝐧=0,Γ3:=(0,1)×{0,1},𝐮=𝟎,Ω=Γ1Γ2Γ3.{\small\left\{\begin{split}-\mu\Delta\operatorname{\mathbf{u}}+\operatorname{% \mathbf{u}}\cdot\nabla\operatorname{\mathbf{u}}+\nabla p-\text{Ri}\,T\mathbf{e% }_{y}&=\mathbf{0}\\ \nabla\cdot\operatorname{\mathbf{u}}&=0\\ -\kappa\Delta T+\operatorname{\mathbf{u}}\cdot\nabla T&=0\\ \end{split}\qquad\qquad\begin{split}T&=1,\hskip 2.25pt\Gamma_{1}:=\{1\}\times(% 0,1),\\ T&=0,\hskip 2.25pt\Gamma_{2}:=\{0\}\times(0,1),\\ \nabla T\cdot\operatorname{\mathbf{n}}&=0,\hskip 2.25pt\Gamma_{3}:=(0,1)\times% \{0,1\},\\ \operatorname{\mathbf{u}}&=\mathbf{0},\hskip 2.25pt\partial\Omega=\Gamma_{1}% \cup\Gamma_{2}\cup\Gamma_{3}.\end{split}\right.}{ start_ROW start_CELL - italic_μ roman_Δ bold_u + bold_u ⋅ ∇ bold_u + ∇ italic_p - Ri italic_T bold_e start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_CELL start_CELL = bold_0 end_CELL end_ROW start_ROW start_CELL ∇ ⋅ bold_u end_CELL start_CELL = 0 end_CELL end_ROW start_ROW start_CELL - italic_κ roman_Δ italic_T + bold_u ⋅ ∇ italic_T end_CELL start_CELL = 0 end_CELL end_ROW start_ROW start_CELL italic_T end_CELL start_CELL = 1 , roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := { 1 } × ( 0 , 1 ) , end_CELL end_ROW start_ROW start_CELL italic_T end_CELL start_CELL = 0 , roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := { 0 } × ( 0 , 1 ) , end_CELL end_ROW start_ROW start_CELL ∇ italic_T ⋅ bold_n end_CELL start_CELL = 0 , roman_Γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT := ( 0 , 1 ) × { 0 , 1 } , end_CELL end_ROW start_ROW start_CELL bold_u end_CELL start_CELL = bold_0 , ∂ roman_Ω = roman_Γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ roman_Γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∪ roman_Γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT . end_CELL end_ROW (16)

In both models, 𝐮𝐮\mathbf{u}bold_u denotes the fluid velocity, p𝑝pitalic_p the pressure, and 𝐧𝐧\mathbf{n}bold_n the outward normal. In Model (15), μ𝜇\muitalic_μ denotes the viscosity parameter. In Model (16), T𝑇Titalic_T denotes the temperature of the fluid, and RiRi\operatorname{Ri}roman_Ri denotes the Richardson number, which is the parameter of interest for Model (16). We set μ=κ=102𝜇𝜅superscript102\mu=\kappa=10^{-2}italic_μ = italic_κ = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. These parameters values are chosen so as to replicate the results seen in [20] as RiRi\operatorname{Ri}roman_Ri ranges from 3.03.03.03.0 to 3.53.53.53.5. For the flow in a channel, Model (15), we use 𝒫2𝒫1subscript𝒫2subscript𝒫1\mathcal{P}_{2}-\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Taylor-Hood elements [8, p. 164]. The channel, shown in Figure 1, is arranged such that the left most boundary lies at x=0𝑥0x=0italic_x = 0, the right most boundary lies at x=50𝑥50x=50italic_x = 50, and the boundary components are given by Γin={0}×[2.5,5]subscriptΓin02.55\Gamma_{\text{in}}=\{0\}\times[2.5,5]roman_Γ start_POSTSUBSCRIPT in end_POSTSUBSCRIPT = { 0 } × [ 2.5 , 5 ], Γout={50}×[0,7.5]subscriptΓout5007.5\Gamma_{\text{out}}=\{50\}\times[0,7.5]roman_Γ start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = { 50 } × [ 0 , 7.5 ], and Γwall=[0,10]×({2.5}{5}){10}×([0,2.5][5,7.5])[10,40]×({0}{7.5})subscriptΓwall0102.551002.557.5104007.5\Gamma_{\text{wall}}=[0,10]\times(\{2.5\}\cup\{5\})\cup\{10\}\times([0,2.5]% \cup[5,7.5])\cup[10,40]\times(\{0\}\cup\{7.5\})roman_Γ start_POSTSUBSCRIPT wall end_POSTSUBSCRIPT = [ 0 , 10 ] × ( { 2.5 } ∪ { 5 } ) ∪ { 10 } × ( [ 0 , 2.5 ] ∪ [ 5 , 7.5 ] ) ∪ [ 10 , 40 ] × ( { 0 } ∪ { 7.5 } ). For the Rayleigh-Béndard Model, we use 𝒫2𝒫1discsubscript𝒫2superscriptsubscript𝒫1disc\mathcal{P}_{2}-\mathcal{P}_{1}^{\text{disc}}caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT disc end_POSTSUPERSCRIPT Scott-Vogelius elements [24], where 𝒫1discsuperscriptsubscript𝒫1disc\mathcal{P}_{1}^{\text{disc}}caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT disc end_POSTSUPERSCRIPT denotes piece-wise linear discontinuous elements. The 𝒫2𝒫1discsubscript𝒫2superscriptsubscript𝒫1disc\mathcal{P}_{2}-\mathcal{P}_{1}^{\text{disc}}caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT disc end_POSTSUPERSCRIPT elements are stable on Alfeld-split, also known as barycenter-split, triangulations [46, p. 77]. The meshes used for each model are shown below.

Refer to caption
Refer to caption
Figure 1: Meshes used for benchmark problems. Top: Mesh used for Rayleigh-Bénard model. Bottom: mesh used for flow in a channel.

For Model (15), it is known [39] that there exists a critical viscosity μ(0.9,1)superscript𝜇0.91\mu^{*}\in(0.9,1)italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ ( 0.9 , 1 ) at which a bifurcation occurs. For μ>μ𝜇superscript𝜇\mu>\mu^{*}italic_μ > italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the stable velocity solution is symmetric about the center horizontal (y=3.75𝑦3.75y=3.75italic_y = 3.75) as seen in the top plot of Figure 2. For μ<μ𝜇superscript𝜇\mu<\mu^{*}italic_μ < italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, there is still a symmetric solution, but it is unstable. Stability is inherited by two asymmetric solutions seen in the bottom two plots of Figure 2.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Solutions to channel flow problem (15) for different μ𝜇\muitalic_μ. Top: Representative symmetric solution for μ=1.0𝜇1.0\mu=1.0italic_μ = 1.0. Middle: Representative asymmetric solution with positive vertical velocity upon exiting the narrow channel for μ=0.9𝜇0.9\mu=0.9italic_μ = 0.9. Bottom: Representative asymmetric solution with negative vertical velocity upon exiting the narrow channel for μ=0.9𝜇0.9\mu=0.9italic_μ = 0.9.

The parameter region of interest for Model (16) is Ri[3,3.5]Ri33.5\operatorname{Ri}\in[3,3.5]roman_Ri ∈ [ 3 , 3.5 ]. In this range, the flow appears to be in transition from a single eddy in the center of the domain to two eddies as seen in Figure 3 below.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Velocity streamlines for velocity u𝑢uitalic_u from Model (16) showing transition from one eddy to two eddies. Top Left: Ri = 3.0. Top Right: Ri = 3.1. Bottom Left: Ri = 3.2. Bottom Right: Ri = 3.4.

For Model (15), we take the zero vector as our initial iterate. For Model (16), we initialize our iterate by applying a Picard step and then applying γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ). The H1superscript𝐻1H^{1}italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT seminorm is used in our implementations [8]. Before we discuss the numerical results, we recall that

  • Newt is Algorithm 1.

  • NA is Algorithm 2.

  • NA(m𝑚mitalic_m) is Algorithm 3.

  • γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ), Algorithm 5, is Algorithm 2 with line 5 replaced by Algorithm 4.

  • γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ), Algorithm 7, is Algorithm 2 with line 5 replaced by Algorithm 6.

For γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ), we set r𝑟ritalic_r and r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG respectively to a fixed quantity for all iterations.

4.2 General Discussion of Results

The following experiments demonstrate three strategies for solving nonlinear problems near bifurcations using NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ). The first two are asymptotic safeguarding and preasymptotic safeguarding. With asymptotic safeguarding, we run NA until the residual is smaller than some user-chosen threshold, and we use γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) for all subsequent iterates. Hence the solve will behave like NA until the last few iterations when γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is applied. For the experiments in Section 4.3, we activated γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) when wk+1<101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. We chose 101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT since we want to activate γ𝛾\gammaitalic_γ-safeguarding as early as possible. This allows for earlier detection of a nonsingular problem, and thus faster convergence. We found that activating γ𝛾\gammaitalic_γ-safeguarding for wk+1<τ<101normsubscript𝑤𝑘1𝜏superscript101\|w_{k+1}\|<\tau<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < italic_τ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT will not necessarily break convergence, but if the problem is nonsingular, this will not be detected as early. Activating γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) when wk+1<1normsubscript𝑤𝑘11\|w_{k+1}\|<1∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 1 can break convergence, though this seems to be problem-dependent. Convergence for Model (15) was virtually unaffected with threshold wk+1<1normsubscript𝑤𝑘11\|w_{k+1}\|<1∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 1, but for Model (16) with Ri=3.5Ri3.5\operatorname{Ri}=3.5roman_Ri = 3.5, setting the activation threshold to 1111 caused γNAA(0.9)𝛾NAA0.9\gamma\text{NAA}(0.9)italic_γ NAA ( 0.9 ) to diverge when it had converged with threshold 0.10.10.10.1. With activation threshold 0.10.10.10.1, asymptotic safeguarding is shown to be effective close to the bifurcation point, where Newton’s method can fail to converge. Preasymptotic safeguarding, on the other hand, applies γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) at each step of the solve. Interestingly, we observe that γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied preasymptotically can outperform NA when applied to Model (16), and even recover convergence when both Newton and NANA\operatorname{NA}roman_NA diverge (see Figure 11). A theoretical explanation for this requires a better understanding of these methods in the preasymptotic regime. In particular, a better understanding of the descent properties of Anderson acceleration would be of great value. It is known [52] that for singular problems in the preasymptotic regime, where f(x)norm𝑓𝑥\|f(x)\|∥ italic_f ( italic_x ) ∥ is not small, the Newton update step sk=f(xk)1f(xk)subscript𝑠𝑘superscript𝑓superscriptsubscript𝑥𝑘1𝑓subscript𝑥𝑘s_{k}=-f^{\prime}(x_{k})^{-1}f(x_{k})italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), or wk+1subscript𝑤𝑘1w_{k+1}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT in our notation, can be large and nearly orthogonal to the gradient of f22superscriptsubscriptnorm𝑓22\|f\|_{2}^{2}∥ italic_f ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. With NANA\operatorname{NA}roman_NA, our update step takes the form skNA=wk+1γk+1(xk+1NewtxkNewt)superscriptsubscript𝑠𝑘NAsubscript𝑤𝑘1subscript𝛾𝑘1superscriptsubscript𝑥𝑘1Newtsuperscriptsubscript𝑥𝑘Newts_{k}^{\operatorname{NA}}=w_{k+1}-\gamma_{k+1}(x_{k+1}^{\operatorname{Newt}}-x% _{k}^{\operatorname{Newt}})italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT = italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT ). It is clear that skNAsuperscriptsubscript𝑠𝑘NAs_{k}^{\operatorname{NA}}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT is a descent direction for sufficiently small γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, since in this case it is nearly wk+1subscript𝑤𝑘1w_{k+1}italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. It is possible that for certain values of γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, skNAsuperscriptsubscript𝑠𝑘NAs_{k}^{\operatorname{NA}}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_NA end_POSTSUPERSCRIPT is a stronger descent direction than sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Another possible explanation as to why γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) can outperform NANA\operatorname{NA}roman_NA in some cases is its resemblance to restarted Anderson acceleration methods. Restarted versions of Anderson acceleration are often applied in various forms for depth m>1𝑚1m>1italic_m > 1, and have been shown to effective in practice [9, 25, 26, 36]. In the special case of Newton-Anderson with depth m=1𝑚1m=1italic_m = 1, every odd iterate is simply a Newton step, rather than a combination of the previous two Newton steps. Hence the algorithm is “restarted” every other step. Explicitly, we have for k1𝑘1k\geq 1italic_k ≥ 1,

x2k1subscript𝑥2𝑘1\displaystyle x_{2k-1}italic_x start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT =x2k1Newtabsentsuperscriptsubscript𝑥2𝑘1Newt\displaystyle=x_{2k-1}^{\operatorname{Newt}}= italic_x start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT
x2ksubscript𝑥2𝑘\displaystyle x_{2k}italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT =x2kNewtγ2k(x2kNewtx2k1Newt)absentsuperscriptsubscript𝑥2𝑘Newtsubscript𝛾2𝑘superscriptsubscript𝑥2𝑘Newtsuperscriptsubscript𝑥2𝑘1Newt\displaystyle=x_{2k}^{\operatorname{Newt}}-\gamma_{2k}(x_{2k}^{\operatorname{% Newt}}-x_{2k-1}^{\operatorname{Newt}})= italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_γ start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Newt end_POSTSUPERSCRIPT )

In other words, γ2k1=0subscript𝛾2𝑘10\gamma_{2k-1}=0italic_γ start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT = 0 for all k1𝑘1k\geq 1italic_k ≥ 1. With γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ), γk+1subscript𝛾𝑘1\gamma_{k+1}italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is not necessarily set to zero, but it is scaled towards zero, significantly so depending on βk+1subscript𝛽𝑘1\beta_{k+1}italic_β start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT. In this way, one may think of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) as a quasi-restarted Anderson scheme when the depth m=1𝑚1m=1italic_m = 1. At the moment, γ𝛾\gammaitalic_γ-safeguarding is not developed for m>1𝑚1m>1italic_m > 1, but this interpretation of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) as a quasi-restarted method could lead to such a development. Presently, these are only heuristics, but they provide interesting questions for future projects.

The third technique we demonstrate to solve these problems near bifurcation points is increasing the depth m𝑚mitalic_m. Evidently, the right choice of m𝑚mitalic_m can significantly improve convergence by reducing the number of iterations to convergence by half, and increase the domain of convergence with respect to the parameter. We found, however, that such performance was very sensitive to the choice of m𝑚mitalic_m. So while the results suggest this could be developed into a viable strategy, more work is required to achieve this.

In all experiments we take r^(0,1)^𝑟01\hat{r}\in(0,1)over^ start_ARG italic_r end_ARG ∈ ( 0 , 1 ) since this is the range in which local convergence of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is guaranteed by Theorem 3.1. In practice, neither Algorithm 5 nor Algorithm 7 breaks down if one sets r𝑟ritalic_r or r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG respectively to zero, one, or a value greater than one. Setting r𝑟ritalic_r or r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG to zero reduces the iteration to Newton, and choosing one leads to a more NA like iteration in the preasymptotic regime. A systematic study of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with r^1^𝑟1\hat{r}\geq 1over^ start_ARG italic_r end_ARG ≥ 1 has not been performed, but experiments performed thus far show no significant advantage over r^(0,1)^𝑟01\hat{r}\in(0,1)over^ start_ARG italic_r end_ARG ∈ ( 0 , 1 ).

What the best choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG is remains an open question. Numerical experiments suggest the best choice depends on the initial guess x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. For example, when applied to the channel flow problem preasymptotically (see Section 4.4), setting r^=0.9^𝑟0.9\hat{r}=0.9over^ start_ARG italic_r end_ARG = 0.9 results in faster convergnece than Newton if x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the zero vector. If we perturb this x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (discussed in Section 4.4.1), then the choice of r^=0.9^𝑟0.9\hat{r}=0.9over^ start_ARG italic_r end_ARG = 0.9 leads to γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) converging slower than Newton, while r^=0.5^𝑟0.5\hat{r}=0.5over^ start_ARG italic_r end_ARG = 0.5 outperforms NA and Newton. This phenomena, that the best choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG in γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) varies with x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, is seen with other choices of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as well. Elucidating this dependence is the subject of ongoing work.

4.3 Asymptotic Safeguarding

Under the assumptions of Theorem 3.1 or Theorem 3.2, γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is guaranteed to converge locally. This motivates the strategy of this subsection. As discussed above, asymptotic safeguarding is when we only apply γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) once the residual is smaller than a set threshold. This allows one to take full advantage of NA in the preasymptotic regime, and ensures fast quadratic convergence for nonsingular problems. In practice, this means fast local convergence provided NA reaches the domain of convergence. We activate γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) when wk+1<101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in the experiments below. When wk+1>101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|>10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ > 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, we run NA.

4.3.1 Results for Channel Flow Model

The results when applied to Model (15) are shown below in Figures 4, 5, and 6. The takeaway is that when applied asymptotically, convergence of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is not as sensitive to the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG as it is when applied in the preasymptotic regime (see Section 4.4), and Algorithm 6 is working as intended by detecting that the problem is nonsingular. This is seen in the plot on the right of Figures 4, 5, and 6. Recall that rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is the adaptive parameter in Algorithm 6 that determines how close a γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) step is to a Newton step. When rk+10subscript𝑟𝑘10r_{k+1}\approx 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≈ 0, and the criteria in Algorithm 6 is met, the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) step xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT will be close to a Newton step. When rk+11subscript𝑟𝑘11r_{k+1}\approx 1italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≈ 1, this scaling will be much less severe, and the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) step xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT will be close to an NA step. From our discussion in Section 3, we want rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0 when the problem is nonsingular in order to enjoy local quadratic convergence. This is not guaranteed with NA [47]. Observing the rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT plots in Figures 4, 5, and 6, one notes that rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0 as the solver converges. Since rk+1=min{ηk+1,r^}subscript𝑟𝑘1subscript𝜂𝑘1^𝑟r_{k+1}=\min\{\eta_{k+1},\hat{r}\}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_min { italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG }, this is equivalent to ηk+10subscript𝜂𝑘10\eta_{k+1}\to 0italic_η start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0 as the solve converges. Thus by Theorem 3.2, the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) iterates converge to Newton iterates asymptotically. This is precisely what γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) was designed to do: detect nonsingular problems, and respond by scaling the iterates towards Newton asymptotically. Since γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is only activated when wk+1<101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in these examples of asymptotic safeguarding, rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is only computed, and plotted, for the last two iterations. In the next section on preasymptotic safeguarding, a more interesting rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT history is seen. Our methods, including NANA\operatorname{NA}roman_NA, failed to converge for μ<0.92𝜇0.92\mu<0.92italic_μ < 0.92. If one wishes to solve a problem at a particular parameter, and a direct solve fails like we see here for μ<0.92𝜇0.92\mu<0.92italic_μ < 0.92, one could still employ NANA\operatorname{NA}roman_NA or γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) to solve the problem directly for a parameter value close to the desired one to obtain an initial guess for continuation. The benefit here is that the continuation would be required in a smaller parameter range, thus reducing the total number of solves required. We will see in Section 4.5 that increasing m𝑚mitalic_m can lead to convergence for a wider range of parameters.

Refer to caption
Refer to caption
Figure 4: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied asymptotically with Newton and NA applied to Model (15) with μ=0.96𝜇0.96\mu=0.96italic_μ = 0.96.
Refer to caption
Refer to caption
Figure 5: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied asymptotically with Newton and NA applied to Model (15) with μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94.
Refer to caption
Refer to caption
Figure 6: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied asymptotically with Newton and NA applied to (15) with μ=0.92𝜇0.92\mu=0.92italic_μ = 0.92.

4.3.2 Results for Rayleigh-Bénard Model

The results of asymptotic safeguarding with activation threshold 0.10.10.10.1 applied to Model (16) are similar to those of Model (15) seen in the previous section. In this case, however, NA diverged for Ri=3.0Ri3.0\operatorname{Ri}=3.0roman_Ri = 3.0 and Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2. For Ri=3.0Ri3.0\operatorname{Ri}=3.0roman_Ri = 3.0, Newton’s method converges, but all methods diverged for Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2. In Section 4.4, we are able to recover convergence for Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2 with preasymptotic safeguarding. For Ri=3.1Ri3.1\operatorname{Ri}=3.1roman_Ri = 3.1, Ri=3.3Ri3.3\operatorname{Ri}=3.3roman_Ri = 3.3, Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4, and Ri=3.5Ri3.5\operatorname{Ri}=3.5roman_Ri = 3.5, Newton diverged while NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) converged. Like with Model (15), we see rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0 as the solve converges. The results for Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4, shown below, are representative of the others for which NA converged.

Refer to caption
Refer to caption
Figure 7: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied asymptotically with Newton and NA applied to Model (16) with Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4.

4.4 Preasymptotic Safeguarding

In this section, we demonstrate the preasymptotic safeguarding strategy, where γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is activated starting at iteration k=2𝑘2k=2italic_k = 2, the first iteration where NANA\operatorname{NA}roman_NA can be applied. Compared to asymptotic safeguarding, preasymptotic safeguarding is more sensitive to the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG, but, with the right choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG, it can recover convergence when both Newton and NANA\operatorname{NA}roman_NA fail.

4.4.1 Results for Channel Flow Model

Here we present the results from applying the preasymptotic safeguarding strategy to the channel flow Model (15). Evidently, Newton’s method can still converge quickly with the right initial guess when μ=0.96𝜇0.96\mu=0.96italic_μ = 0.96. Figure 8 demonstrates that in this case, NANA\operatorname{NA}roman_NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) also perform well. The right-most plot in Figure 8 demonstrates that the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is working as intended. That is, rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0, and therefore γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is detecting that Newton is converging quickly, and responds by scaling its update steps towards a pure Newton iteration. With μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94, we are closer to the bifurcation point, and observing the left plot in Figure 9, we see that Newton takes many more iterations to converge. However, after a long preasymptotic phase, Newton does eventually converge quickly, which suggests that the problem is not truly singular for μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94. This is again detected by γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ). Even though the γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) algorithms take a few more iterations to converge than NA with μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94, the terminal order of convergence of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is greater than that of NA. Approximating the rate by qk+1=log(wk+1)/log(wk)subscript𝑞𝑘1normsubscript𝑤𝑘1normsubscript𝑤𝑘q_{k+1}=\log(\|w_{k+1}\|)/\log(\|w_{k}\|)italic_q start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = roman_log ( ∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ ) / roman_log ( ∥ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ) at each step k𝑘kitalic_k, and letting qtermsubscript𝑞termq_{\text{term}}italic_q start_POSTSUBSCRIPT term end_POSTSUBSCRIPT denote the terminal order, we found that qterm=1.537subscript𝑞term1.537q_{\text{term}}=1.537italic_q start_POSTSUBSCRIPT term end_POSTSUBSCRIPT = 1.537 for NA, qterm=3.386subscript𝑞term3.386q_{\text{term}}=3.386italic_q start_POSTSUBSCRIPT term end_POSTSUBSCRIPT = 3.386 for γNAA(0.1)𝛾NAA0.1\gamma\text{NAA}(0.1)italic_γ NAA ( 0.1 ), and qterm=2.091subscript𝑞term2.091q_{\text{term}}=2.091italic_q start_POSTSUBSCRIPT term end_POSTSUBSCRIPT = 2.091 for γNAA(0.9)𝛾NAA0.9\gamma\text{NAA}(0.9)italic_γ NAA ( 0.9 ). In the right most plot in Figure 9, we take as our initial iterate the zero vector, but with the fifth entry set to 50. From this perturbed initial guess, Newton’s method is seen to perform better than NA. Further, γNAA(0.1)𝛾NAA0.1\gamma\text{NAA}(0.1)italic_γ NAA ( 0.1 ) and γNAA(0.5)𝛾NAA0.5\gamma\text{NAA}(0.5)italic_γ NAA ( 0.5 ) outperform both Newton and NA, with γNAA(0.5)𝛾NAA0.5\gamma\text{NAA}(0.5)italic_γ NAA ( 0.5 ) converging in about half as many iterations as NA. This demonstrates the flexibility offered by γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ). That is, whether Newton or NA is the best choice for a particular problem and initial guess, γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) is more agnostic to these choices, and can perform well in either case. There is still, however, sensitivity to r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG. We found that γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) failed to converge for r^=0.3^𝑟0.3\hat{r}=0.3over^ start_ARG italic_r end_ARG = 0.3, 0.50.50.50.5, 0.60.60.60.6, 0.70.70.70.7, and 0.80.80.80.8. γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) converged with r^=0.4^𝑟0.4\hat{r}=0.4over^ start_ARG italic_r end_ARG = 0.4, but only after 74 iterations. We observed that γ𝛾\gammaitalic_γNA(0.5), the non-adaptive version of γ𝛾\gammaitalic_γ-safeguarding, managed to converge. The reason for this variation is likely due to the complex behavior of Newton and NA in the preasymptotic regime, leading to sensitivity to the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG. With preasymptotic safeguarding, our methods failed for μ<0.94𝜇0.94\mu<0.94italic_μ < 0.94, hence continuation may still be required in some cases, but NA and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) can be used to efficiently solve the problem closer to the bifurcation point compared to Newton, thereby reducing the overall computational cost. The point is that applying γ𝛾\gammaitalic_γNA(r)𝑟(r)( italic_r ) and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) in the preasymptotic regime can still lead to faster convergence than Newton, but this convergence is again sensitive to the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG, and further work is required to understand this sensitivity.

Refer to caption
Refer to caption
Figure 8: Results of preasymptotic safeguarding applied to Model (15) for μ=0.96𝜇0.96\mu=0.96italic_μ = 0.96. Left: Convergence history for Newton, NA, and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) for μ=0.96𝜇0.96\mu=0.96italic_μ = 0.96. Right: The value of rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT at each iteration for the adaptive methods.
Refer to caption
Refer to caption
Figure 9: Results of preasymptotic safeguarding applied to Model (15) for μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94. Left: Convergence history for Newton, NA, and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) for μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94 with zero vector initial guess. Right: Convergence history for Newton, NA, and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) for μ=0.94𝜇0.94\mu=0.94italic_μ = 0.94 with perturbed initial guess.

4.4.2 Results for Rayleigh-Bénard Model

With preasymptotic safeguarding applied to Model (16), we again see more varied behavior since we have γ𝛾\gammaitalic_γ-safeguarding activated from the beginning of the solve. The results are shown in Figures 10, 11, and 12. The rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT plots are only shown for those values of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG for which γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) converged. We found that for Ri=3.0Ri3.0\operatorname{Ri}=3.0roman_Ri = 3.0 and Ri=3.1Ri3.1\operatorname{Ri}=3.1roman_Ri = 3.1, γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) failed to converge with r^=0.1^𝑟0.1\hat{r}=0.1over^ start_ARG italic_r end_ARG = 0.1, 0.50.50.50.5, or 0.90.90.90.9. Convergence is recovered with r^=0.4^𝑟0.4\hat{r}=0.4over^ start_ARG italic_r end_ARG = 0.4 and r^=0.6^𝑟0.6\hat{r}=0.6over^ start_ARG italic_r end_ARG = 0.6 respectively. We also ran γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with these r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG values for Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2, 3.33.33.33.3, 3.43.43.43.4, and 3.53.53.53.5. The results were similar for Ri=3.3Ri3.3\operatorname{Ri}=3.3roman_Ri = 3.3, 3.43.43.43.4, and 3.53.53.53.5. Hence we only show the results for Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4 in Figure 12. The theme demonstrated in Figures 10, 11, and 12 is that when preasymptotic safeguarding is employed, it is possible to converge faster than standard NA. Moreover, as seen in Figure 11, γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) can converge when both Newton and NA diverge. One may note that in the right-most plot in Figure 10, prior to the asymptotic regime where rk+10subscript𝑟𝑘10r_{k+1}\to 0italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT → 0, we observe rk+1<r^=0.6subscript𝑟𝑘1^𝑟0.6r_{k+1}<\hat{r}=0.6italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT < over^ start_ARG italic_r end_ARG = 0.6 only twice. Using the restarted Anderson interpretation discussed in Section 4.2, we could say that there are two quasi-restarts prior to the asymptotic regime. Evidently, these two quasi-restarts are essential, since we found that non-adaptive γ𝛾\gammaitalic_γ-safeguarding with r^=0.6^𝑟0.6\hat{r}=0.6over^ start_ARG italic_r end_ARG = 0.6, Algorithm 4, diverges. Similar behavior of rk+1subscript𝑟𝑘1r_{k+1}italic_r start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is seen in Figures 11 and 12. It remains unclear precisely how the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG affects convergence, e.g., in Figure 10, why does γNAA(0.6)𝛾NAA0.6\gamma\text{NAA}(0.6)italic_γ NAA ( 0.6 ) converge, but γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) diverges for r^=0.1^𝑟0.1\hat{r}=0.1over^ start_ARG italic_r end_ARG = 0.1, 0.40.40.40.4, 0.50.50.50.5, and 0.90.90.90.9? As previously discussed, a better understanding of these methods in the preasymptotic regime would help answer questions like these, and this will be the focus of future studies.

Refer to caption
Refer to caption
Figure 10: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied preasymptotically with Newton and NA applied to Model (16) with Ri=3.1Ri3.1\operatorname{Ri}=3.1roman_Ri = 3.1.
Refer to caption
Refer to caption
Figure 11: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied preasymptotically with Newton and NA applied to Model (16) with Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2.
Refer to caption
Refer to caption
Figure 12: Comparison of γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) applied preasymptotically with Newton and NA applied to Model (16) with Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4.

4.5 Increasing Anderson Depth

The strategy employed in this section is to increase the depth of the NA from m=1𝑚1m=1italic_m = 1 used in previous sections.

4.5.1 Results for Channel Flow Model

The plots in Figure 13 demonstrate the effectiveness of increasing the algorithmic depth m𝑚mitalic_m to solve Model (15) near the bifurcation point. We also experimented with applying γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) asymptotically. When wk+1<1normsubscript𝑤𝑘11\|w_{k+1}\|<1∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 1, we set m=1𝑚1m=1italic_m = 1 and activated γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with r^=0.9^𝑟0.9\hat{r}=0.9over^ start_ARG italic_r end_ARG = 0.9. These results are seen as the dashed lines in the left-most plot in Figure 13. The philosophy is similar to that of preasymptotic safeguarding. We use NA(3) to reach the asymptotic regime, and then allow adaptive γ𝛾\gammaitalic_γ-safeguarding to detect if the problem is nonsingular. We set m=1𝑚1m=1italic_m = 1 because, for the present, γ𝛾\gammaitalic_γ-safeguarding is only designed for m=1𝑚1m=1italic_m = 1. The left-most plot in Figure 13 shows how, from the same initial iterate, we are able to solve Model (15) for a wider range of μ𝜇\muitalic_μ values, including in the regime where Newton, NA(1)NA1\operatorname{NA}(1)roman_NA ( 1 ), and γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) failed to converge.

The right-most plot in Figure 13 focuses on the results of applying NA(3)NA3\operatorname{NA}(3)roman_NA ( 3 ) to Model (15) with μ=0.92𝜇0.92\mu=0.92italic_μ = 0.92. The point here is that even in the regime where NA(1)NA1\operatorname{NA}(1)roman_NA ( 1 ) converges, NA(3)NA3\operatorname{NA}(3)roman_NA ( 3 ) converges in about half as many iterations. Thus increasing the algorithmic depth can lead to faster convergence. The improved convergence seen with increasing the depth m𝑚mitalic_m suggests that a generalization of γ𝛾\gammaitalic_γ-safeguarding for greater depths could be useful as a generalization of the strategies presented in previous sections. However, for this strategy to be effective in general, further study is needed on the proper choice of depth m𝑚mitalic_m. In the chosen parameter regime, m=3𝑚3m=3italic_m = 3 was the only value of m{1,2,,10}𝑚1210m\in\{1,2,...,10\}italic_m ∈ { 1 , 2 , … , 10 } observed to improve convergence when NA(1) failed.

Refer to caption
Refer to caption
Figure 13: Results from increasing Anderson depth m𝑚mitalic_m. Left: Convergence history of NA(3) for various choices of μ𝜇\muitalic_μ near μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Solid lines are NA(3)3(3)( 3 ), and dashed lines are NA(3)3(3)( 3 ) with asymptotic γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ). Right: Convergence history for NA(1) and NA(3) with μ=0.92𝜇0.92\mu=0.92italic_μ = 0.92.

4.5.2 Results for Rayleigh-Bénard Model

The results of increasing m𝑚mitalic_m to solve Model (16) are shown below in Figure 14. As with Model (15), we experimented with reducing m=1𝑚1m=1italic_m = 1 and activating γ𝛾\gammaitalic_γ-safeguarding asymptotically. The dashed lines in Figure 14 are the results of these experiments. For this problem, activation occurred when wk+1<101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT since we found that activating γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with r^=0.9^𝑟0.9\hat{r}=0.9over^ start_ARG italic_r end_ARG = 0.9 when wk+1<1normsubscript𝑤𝑘11\|w_{k+1}\|<1∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 1 broke convergence like it did in Section 4.3.2. We once again observe that the right choice of m>1𝑚1m>1italic_m > 1 can improve convergence significantly. This can be seen in Figure 14 for Ri=3.3Ri3.3\operatorname{Ri}=3.3roman_Ri = 3.3. Moreover, Figure 14 demonstrates that increasing m𝑚mitalic_m can recover convergence when NA(1)NA1\operatorname{NA}(1)roman_NA ( 1 ) fails to converge for Ri=3.0Ri3.0\operatorname{Ri}=3.0roman_Ri = 3.0 and Ri=3.2Ri3.2\operatorname{Ri}=3.2roman_Ri = 3.2. However, for Ri=Riabsent\operatorname{Ri}=roman_Ri = 3.1, 3.4, and 3.5, there was no significant improvement gained from increasing m>1𝑚1m>1italic_m > 1. The results for Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4 shown below in Figure 14 are representative of the results for Ri=3.1Ri3.1\operatorname{Ri}=3.1roman_Ri = 3.1 and Ri=3.5Ri3.5\operatorname{Ri}=3.5roman_Ri = 3.5.

The most significant difference between NA(m)NA𝑚\operatorname{NA}(m)roman_NA ( italic_m ) without asymptotic γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ), and NA(m)NA𝑚\operatorname{NA}(m)roman_NA ( italic_m ) with asymptotic γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ), is seen with m=10𝑚10m=10italic_m = 10. It also appears that m=10𝑚10m=10italic_m = 10 is more sensitive to the activation threshold than smaller choices of m𝑚mitalic_m. This is seen in Figure 14 for Ri=3.3Ri3.3\operatorname{Ri}=3.3roman_Ri = 3.3 and Ri=3.4Ri3.4\operatorname{Ri}=3.4roman_Ri = 3.4. These results are promising, and motivate further investigation to fully understand how the choice of m𝑚mitalic_m affects convergence near singularities, and in particular, near bifurcation points.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 14: Convergence history of NA(m)NA𝑚\operatorname{NA}(m)roman_NA ( italic_m ) for m=𝑚absentm=italic_m = 1,2,3,4,5, and 10 for Ri=Riabsent\operatorname{Ri}=roman_Ri =3.0, 3.2, 3.3, and 3.4. Dashed lines denote NA(m)NA𝑚\operatorname{NA}(m)roman_NA ( italic_m ) with asymptotic γNAA(r^)𝛾NAA^r\operatorname{\gamma\text{NAA}(\hat{r})}italic_γ NAA ( over^ start_ARG roman_r end_ARG ) with r^=0.9^𝑟0.9\hat{r}=0.9over^ start_ARG italic_r end_ARG = 0.9 activated when wk+1<101normsubscript𝑤𝑘1superscript101\|w_{k+1}\|<10^{-1}∥ italic_w start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∥ < 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

5 Conclusion

We have presented a modification of Anderson accelerated Newton’s method for solving nonlinear equations near bifurcation points. We proved that, locally, this modified scheme can detect nonsingular problems and scale the iterates towards a pure Newton step, which leads to faster local convergence compared to standard NA. We numerically demonstrated two strategies one can employ when using our modified NA scheme to solve nonlinear problems near bifurcation points, with our test problems being two Navier-Stokes type parameter-dependent PDEs. Asymptotic safeguarding was shown to recover local quadratic convergence when the problem is nonsingular, and it shows virtually no sensitivity to the choice of parameter r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG. It can, however, be sensitive to the choice of activation threshold. Preasymptotic safeguarding is shown to significantly improve convergence, and can recover convergence when both Newton and NANA\operatorname{NA}roman_NA fail. There is strong sensitivity to the choice of r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG though, and future work will clarify this dependence. We also demonstrated that increasing the Anderson depth m𝑚mitalic_m can improve convergence, and increase the domain of convergence with respect to the problem parameter. Future projects will further study how the choice of m𝑚mitalic_m impacts convergence, and work towards developing γ𝛾\gammaitalic_γ-safeguarding for greater algorithmic depths.

6 Acknowledgements

MD and SP are supported in part by the National Science Foundation under Grant No. DMS-2011519. LR is supported in part by the National Science Foundation under Grant No. DMS-2011490. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1929284 while the authors were in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Acceleration and Extrapolation Methods (MD, SP and LR), and the Numerical PDEs: Analysis, Algorithms and Data Challenges (SP and LR), programs.

References

  • [1] H. An, X. Jia, and H.F. Walker. Anderson acceleration and application to the three-temperature energy equations. J. Comput. Phys., 347:1–19, 2017.
  • [2] D.G. Anderson. Iterative procedures for nonlinear integral equations. J. Assoc. Comput. Mach., 12(4):547–560, 1965.
  • [3] R. Behling and A. Fischer. A unified local convergence analysis of inexact constrained Levenberg-Marquardt methods. Optim. Lett., 6:927–940, 2012.
  • [4] R. Behling, A. Fischer, M. Herrich, A. Iusem, and Y. Ye. A Levenberg-Marquardt method with approximate projections. Comput. Optim. Appl., 59:5–26, 2014.
  • [5] S. Bellavia and B. Morini. Strong local convergence properties of adaptive regularized methods for nonlinear least squares. IMA J. Numer. Anal., 35, 2015.
  • [6] R.G. Bettiol and P. Piccione. Instability and bifurcation. Notices Am. Math. Soc., 67(11):1679–1691, 2020.
  • [7] N. Boullé, V. Dallas, and P.E. Farrell. Bifurcation analysis of two-dimensional Rayleigh-Bénard convection using deflation. Phys. Rev. E, 105, 2022.
  • [8] D. Braess. Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics. Cambridge University Press, 2007.
  • [9] M. Chupin, M. Dupuy, G. Legendre, and E. Séré. Convergence analysis of adaptive DIIS algorithms with application to electronic ground state calculations. ESAIM Math. Model. Numer. Anal., 55(6):2785–2825, 2021.
  • [10] M. Dallas and S. Pollock. Newton-Anderson at singular points. Int. J. Numer. Anal. and Mod., 20(5):667–692, 2023.
  • [11] D.W. Decker, H.B. Keller, and C.T. Kelley. Convergence rates for Newton’s method at singular points. SIAM J. Numer. Anal., 20(2):296–314, 1983.
  • [12] D.W. Decker and C.T. Kelley. Newton’s method at singular points I. SIAM J. Numer. Anal., 17(1):66–70, 1980.
  • [13] D.W. Decker and C.T. Kelley. Convergence acceleration for Newton’s method at singular points. SIAM J. Numer. Anal., 19(1):219–229, 1982.
  • [14] D.W. Decker and C.T. Kelley. Expanded convergence domains for Newton’s method at nearly singular roots. SIAM J. Sci. Stat. Comput., 6(4), 1985.
  • [15] P. Deuflhard. Newton Methods for Nonlinear Problems. Springer Series in Computational Mathematics. Springer, 2005.
  • [16] J. Fan and J. Zeng. A Levenberg–Marquardt algorithm with correction for singular system of nonlinear equations. Appl. Math. Comput., 219(17):9438–9446, 2013.
  • [17] P.E. Farrell, A. Birkisson, and S.W. Funke. Deflation techniques for finding distinct solutions of nonlinear partial differential equations. SIAM J. Sci. Comput., 37, 2015.
  • [18] A. Fischer, A.F. Izmailov, and M.V. Solodov. Accelerating convergence of the globalized Newton method to critical solutions of nonlinear equations. Comput. Optim. Appl., 78:273–286, 2021.
  • [19] A. Fischer, A.F. Izmailov, and M.V. Solodov. Unit stepsize for the Newton method close to critical solutions. Math. Program., 187:697–721, 2021.
  • [20] J.K. Galvin, A. Linke, L.G. Rebholz, and N. Wilson. Stabilizing poor mass conservation in incompressible flow problems with large irrotational forcing and application to thermal convection. Comput. Methods Appl. Mech. Engrg., pages 166–176, 2012.
  • [21] A.O. Griewank. Analysis and Modification of Newton’s Method at Singularities. PhD thesis, 1980.
  • [22] A.O. Griewank. Starlike domains of convergence for Newton’s method at singularities. Numer. Math., 35:95–111, 1980.
  • [23] A.O. Griewank. On solving nonlinear equations with simple singularities or nearly singular solutions. SIAM Review, 27(4):537–563, 1985.
  • [24] J. Guzḿan and L.R. Scott. The Scott-Vogelius finite elements revisited. Math. Comp., 88(316):515–529, 2019.
  • [25] H. He, S. Zhao, Y. Xi, J.C. Ho, and Y. Saad. Solve minimax optimization by Anderson acceleration. International Conference on Learning Representations, 2022.
  • [26] N.C. Henderson and R. Varadhan. Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms. Journal of computational and graphical statistics, 28(4):834–846, 2019.
  • [27] J.L. Hueso, E.M., and J.R. Torregrosa. Modified Newton’s method for systems of nonlinear equations with singular Jacobian. J. Comput. Appl. Math., 224(1):77–83, 2008.
  • [28] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. Critical solutions of nonlinear equations: local attraction for Newton-type methods. Math. Progam., 167:355–379, 2018.
  • [29] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. Critical solutions of nonlinear equations: stability issues. Math. Progam., 168:475–507, 2018.
  • [30] C. Kanzow, N. Yamashita, and M. Fukushima. Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints. J. Comput. Appl. Math, 172:375–397, 2004.
  • [31] C.T. Kelley and R. Suresh. A new acceleration method for Newton’s method at singular points. SIAM J. Numer. Anal, 20(5):1001–1009, 1983.
  • [32] H. Kielhöfer. Bifurcation Theory: An Introduction with Applications to Partial Differential Equations. Springer Science+Business Media, 2012.
  • [33] C. Kuehn. PDE Dynamics: An Introduction. SIAM, 2019.
  • [34] P.A. Lott, H.F. Walker, C.S. Woodward, and U.M. Yang. An accelerated Picard method for nonlinear systems related to variably saturated flow. Adv. Water Resour., 38:92–101, 2012.
  • [35] J.M. Ortega. The Newton-Kantorovich theorem. Amer. Math. Monthly, 75(6):658–660, 1968.
  • [36] M.L. Pasini. Convergence analysis of Anderson-type acceleration of Richardson’s iteration. Numer. Linear Algebra Appl., 26, 2019.
  • [37] M.L. Pasini, J. Yin, V. Reshniak, and M.K. Stoyanov. Anderson acceleration for distributed training of deep learning models. In SoutheastCon 2022, pages 289–295, 2022.
  • [38] Y. Peng, B. Deng, J. Zhang, F. Geng, W. Qin, and L. Liu. Anderson acceleration for geometry optimization and physics simulation. ACM Trans. Graph., 37(4), 2018.
  • [39] F. Pichi. Reduced order models for parametric bifurcation problems in nonlinear PDEs. PhD thesis, 2020.
  • [40] F. Pichi, A. Quaini, and G. Rozza. A reduced order modeling technique to study bifurcating phenomena: Application to the Gross–Pitaevskii equation. SIAM J. Sci. Comput., 42(5), 2020.
  • [41] F. Pichi, M. Strazzullo, F. Ballarin, and G. Rozza. Driving bifurcating parametrized nonlinear pdes by optimal control strategies: application to Navier–Stokes equations with model order reduction. ESAIM: M2AN, 56(4):1361–1400, 2022.
  • [42] S. Pollock and L.G. Rebholz. Anderson acceleration for contractive and noncontractive operators. IMA J. Numer. Anal., 41(4):2841–2872, 2021.
  • [43] S. Pollock and L.G. Rebholz. Filtering for Anderson acceleration. SIAM J. Sci. Comput., 45(4):A1571–A1590, 2023.
  • [44] S. Pollock, L.G. Rebholz, and M. Xiao. Anderson-accelerated convergence of Picard iterations for incompressible Navier-Stokes equations. SIAM J. Numer. Anal., 57(2):615–637, 2019.
  • [45] S. Pollock and H. Schwartz. Benchmarking results for the Newton–Anderson method. Results Appl. Math., 8:100095, 2020.
  • [46] J. Qin. On the convergence of some low order mixed finite elements for incompressible fluids. PhD thesis, 1994.
  • [47] L.G. Rebholz and M. Xiao. The effect of Anderson acceleration on superlinear and sublinear convergence. J. Sci. Comput., 96(2), 2023.
  • [48] G.W. Reddien. On Newton’s method for singular problems. SIAM J. Numer. Anal., 15(5):993–996, 1978.
  • [49] G.W. Reddien. Newton’s method and high order singularities. Comput. Math. Appl., 5(2):79–86, 1979.
  • [50] R.B. Schnabel and P.D. Frank. Tensor methods for nonlinear equations. SIAM J. Numer. Anal., 21(5):815–843, 1984.
  • [51] R.B. Thompson, K.O. Rasmussen, and T. Lookman. Improved convergence in block copolymer self-consistent field theory by Anderson mixing. J. Chem. Phys., 120(1):31–34, 2004.
  • [52] R.S. Tuminaro, H.F. Walker, and J. N. Shadid. On backtracking failure in Newton-GMRES methods with demonstration for the Navier-Stokes equations. J. Comput. Phys., 180:549–558, 2002.
  • [53] H. Uecker. Continuation and bifurcation in nonlinear pdes – algorithms, applications, and experiments. Jahresbericht der Deutschen Mathematiker-Vereinigung, 124:43–80, 2021.
  • [54] H.F. Walker and P. Ni. Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal., 49(4):1715–1735, 2011.
  • [55] D. Wang, Y. He, and H. De Sterck. On the asymptotic linear convergence speed of Anderson acceleration applied to ADMM. J. Sci. Comput., 88(2):38, 2021.
  • [56] Q. Yuan, Y. Sun, and J. Ren. How interest rate influences a business cycle model. Discrete Contin. Dyn. Syst., 13(11):3231–3251, 2020.
  • [57] H. Zhou and Y. Qian. Double Hopf bifurcation analysis for coupled van der Pol–Rayleigh system with time delay. J. Vib. Eng. Technol., 12:6075–6087, 2024.