research-article

Faster Quantum-inspired Algorithms for Solving Linear Systems

Authors:

Changpeng Shao,

Ashley MontanaroAuthors Info & Claims

ACM Transactions on Quantum Computing, Volume 3, Issue 4

Article No.: 20, Pages 1 - 23

https://doi.org/10.1145/3520141

Published: 07 July 2022 Publication History

Get Access

Abstract

We establish an improved classical algorithm for solving linear systems in a model analogous to the QRAM that is used by quantum linear solvers. Precisely, for the linear system $A{\bf x}= {\bf b}$, we show that there is a classical algorithm that outputs a data structure for ${\bf x}$ allowing sampling and querying to the entries, where ${\bf x}$ is such that $\Vert {\bf x}- A^{+}{\bf b}\Vert \le \epsilon \Vert A^{+}{\bf b}\Vert$. This output can be viewed as a classical analogue to the output of quantum linear solvers. The complexity of our algorithm is $\widetilde{O}(\kappa _F^6 \kappa ^2/\epsilon ^2)$, where $\kappa _F = \Vert A\Vert _F\Vert A^{+}\Vert$ and $\kappa = \Vert A\Vert \Vert A^{+}\Vert$. This improves the previous best algorithm [Gilyén, Song and Tang, arXiv:2009.07268] of complexity $\widetilde{O}(\kappa _F^6 \kappa ^6/\epsilon ^4)$. Our algorithm is based on the randomized Kaczmarz method, which is a particular case of stochastic gradient descent. We also find that when A is row sparse, this method already returns an approximate solution ${\bf x}$ in time $\widetilde{O}(\kappa _F^2)$, while the best quantum algorithm known returns $| {\bf x} \rangle$ in time $\widetilde{O}(\kappa _F)$ when A is stored in the QRAM data structure. As a result, assuming access to QRAM and if A is row sparse, the speedup based on current quantum algorithms is quadratic.

Appendices

A Estimation of $\phi$

Using the notation (14), we can decompose the updating rule (13) of ${\bf y}$ as follows:

\begin{equation} {\bf y}_{k+1} = {\bf y}_k + {\bf z}_k + {\bf z}_k^{\prime }, \end{equation}

(26)

where

\[\begin{eqnarray*} {\bf z}_k = \frac{\tilde{b}_{r_k} - \langle \tilde{A}_{r_k*} | A^T| {\bf y}_{k} \rangle }{\Vert A_{r_k*}\Vert } {\bf e}_{r_k}, \quad {\bf z}_k^{\prime } = \frac{\mu _k}{\Vert A_{r_k*}\Vert } {\bf e}_{r_k}. \end{eqnarray*}\]

Denote

\begin{equation} Z = \Vert A{\bf x}_* - {\bf b}\Vert = \min _{{\bf x}} \Vert A{\bf x}- {\bf b}\Vert , \quad \Lambda = {\rm diag} (\Vert A_{i*}\Vert ^2: i \in [m]). \end{equation}

(27)

In the following, for any two vectors ${\bf a},{\bf b}$, we define $\langle {\bf a}|{\bf b}\rangle _{\Lambda } = \langle {\bf a}|\Lambda |{\bf b}\rangle$. To bound $\phi$, it suffices to bound $\Vert {\bf y}_T\Vert _\Lambda ^2$. From Equation (26), it is plausible that $\Vert {\bf y}_{k+1}\Vert _\Lambda ^2 = \Theta (\Vert {\bf y}_{k}\Vert _\Lambda ^2 + \Vert {\bf z}_{k}\Vert _\Lambda ^2 + \Vert {\bf z}_{k}^{\prime }\Vert _\Lambda ^2)$. In the following, we shall prove that in fact this holds up to a constant factor on average. We shall choose the initial vector as 0 for simplicity, i.e., ${\bf y}_0=0$.

First, we consider the case $Z = 0$, that is ${\bf b}= A{\bf x}_*$. In the following, we first fix $k,r_k$ and compute the mean value over the random variable D, then we compute the mean value over $r_k$ by fixing k. Finally, we calculate the mean value over the random variable k.

From now on, we assume that $d = \frac{4\Vert A\Vert _F^2 T}{\epsilon ^{2} \min _{j \in [n]} \Vert A_{*j}\Vert ^2}$ and $T=O(\kappa _F^2\log (1/\epsilon))$. By Lemma 9,

\[\begin{eqnarray*} {\mathbb {E}}_D[\Vert {\bf z}_k^{\prime }\Vert _\Lambda ^2] &\le & \frac{\Vert A\Vert _F^2 \Vert {\bf x}_k\Vert ^2}{d\min _{j \in [n]} \Vert A_{*j}\Vert ^2} = \frac{\epsilon ^2\Vert {\bf x}_k\Vert ^2}{4T} \le \frac{\epsilon ^2(\Vert {\bf x}_k-{\bf x}_*\Vert ^2+\Vert {\bf x}_*\Vert ^2)}{2T}, \\ {\mathbb {E}}_D[\langle {\bf z}_k|{\bf z}_k^{\prime }\rangle _\Lambda ] &=& \left(\tilde{b}_{r_k} - \langle \tilde{A}_{r_k*}|{\bf x}_k\rangle \right) {\mathbb {E}}_D[\mu _k] = 0, \\ {\mathbb {E}}_D[\langle {\bf y}_k|{\bf z}_k^{\prime }\rangle _\Lambda ] &=& \Vert A_{r_k*}\Vert \langle {\bf y}_k|{\bf e}_{r_k}\rangle {\mathbb {E}}_D[\mu _k] = 0. \end{eqnarray*}\]

About the norm of ${\bf z}_k$, we have

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf z}_k \Vert _\Lambda ^2] &=& \sum _{r_k = 1}^m \frac{{\Vert A_{r_k*}\Vert }^2}{\Vert A\Vert _F^2} \frac{(\tilde{b}_{r_k} - \langle \tilde{A}_{r_k*} | A^T| {\bf y}_{k} \rangle)^2 }{\Vert A_{r_k*}\Vert ^2} \Vert A_{r_k*}\Vert ^2 \\ &=& \frac{1}{\Vert A\Vert _F^2} \sum _{r_k = 1}^m (b_{r_k} - \langle A_{r_k}| {\bf x}_{k} \rangle)^2 \\ &\le & 2\frac{\Vert {\bf b}\Vert ^2 + \Vert A\Vert _F^2 \Vert {\bf x}_k\Vert ^2}{\Vert A\Vert _F^2} \le 2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} +4 \Vert {\bf x}_*\Vert ^2 + 4\Vert {\bf x}_*-{\bf x}_k\Vert ^2. \end{eqnarray*}\]

As for the inner product between ${\bf y}_k$ and ${\bf z}_k$, we have the following estimate:

\[\begin{eqnarray*} {\mathbb {E}}[\langle {\bf y}_k|{\bf z}_k \rangle _\Lambda ] &=& \sum _{r_k = 1}^m \frac{{\Vert A_{r_k*}\Vert }^2}{\Vert A\Vert _F^2} \frac{(\tilde{b}_{r_k} - \langle \tilde{A}_{r_k*} | A^T| {\bf y}_{k} \rangle) }{\Vert A_{r_k*}\Vert } \langle {\bf y}_k|{\bf e}_{r_k}\rangle \Vert A_{r_k*}\Vert ^2 \\ &=& \sum _{r_k = 1}^m \frac{{\Vert A_{r_k*}\Vert }^2}{\Vert A\Vert _F^2} {(b_{r_k} - \langle A_{r_k*} | A^T| {\bf y}_{k} \rangle) } \langle {\bf y}_k|{\bf e}_{r_k}\rangle \\ &=& \frac{\langle {\bf b}| {\bf y}_k\rangle _\Lambda - \Vert {\bf x}_k\Vert _\Lambda ^2}{\Vert A\Vert _F^2} = \frac{\langle {\bf x}_*|A^T| {\bf y}_k\rangle _\Lambda - \Vert {\bf x}_k\Vert _\Lambda ^2}{\Vert A\Vert _F^2} \\ &=& \frac{\langle {\bf x}_*| {\bf x}_k\rangle _\Lambda - \Vert {\bf x}_k\Vert _\Lambda ^2}{\Vert A\Vert _F^2} \le \Vert {\bf x}_* \Vert \Vert {\bf x}_k\Vert + \Vert {\bf x}_k\Vert ^2\\ &\le & 3\Vert {\bf x}_* \Vert ^2 +\Vert {\bf x}_* \Vert \Vert {\bf x}_k-{\bf x}_*\Vert +2\Vert {\bf x}_k-{\bf x}_*\Vert ^2 \\ &\le & \frac{7}{2}\Vert {\bf x}_* \Vert ^2 +\frac{5}{2}\Vert {\bf x}_k-{\bf x}_*\Vert ^2. \end{eqnarray*}\]

In the above, we used the fact that for any two vectors ${\bf a},{\bf b}$, we have $|\langle {\bf a}| {\bf b}\rangle _\Lambda | \le \Vert \Lambda \Vert ^2 |\langle {\bf a}| {\bf b}\rangle |$ and $\Vert \Lambda \Vert ^2 \le \Vert A\Vert _F^2$.

Hence, we have

\[\begin{eqnarray*} {\mathbb {E}}\left[\Vert {\bf y}_{k+1}\Vert _\Lambda ^2\right] &=& {\mathbb {E}}\left[\Vert {\bf y}_{k}\Vert _\Lambda ^2\right] + {\mathbb {E}}\left[\Vert {\bf z}_{k}\Vert _\Lambda ^2\right] + {\mathbb {E}}\left[\Vert {\bf z}_{k}^{\prime }\Vert _\Lambda ^2\right] +2{\mathbb {E}}[\langle {\bf y}_k |{\bf z}_k\rangle _\Lambda ] \\ &\le & {\mathbb {E}}\left[\Vert {\bf y}_k \Vert _\Lambda ^2\right] + 2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(9+ \frac{\epsilon ^2}{2T}\right){\mathbb {E}}[\Vert {\bf x}_k-{\bf x}_*\Vert ^2] + \left(11+\frac{\epsilon ^2}{2T}\right) \Vert {\bf x}_*\Vert ^2 \\ &\le & {\mathbb {E}}\left[\Vert {\bf y}_k \Vert _\Lambda ^2\right] + 2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2, \end{eqnarray*}\]

where we use that ${\mathbb {E}}[\Vert {\bf x}_k-{\bf x}_*\Vert ^2] \le \Vert {\bf x}_*\Vert ^2$ by Lemma 2. Therefore,

\[\begin{eqnarray*} {\mathbb {E}}\left[\Vert {\bf y}_T \Vert _\Lambda ^2\right] \le T\left(2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2\right). \end{eqnarray*}\]

This means that, with high probability,

\begin{equation*} \phi = T \frac{\Vert {\bf y}_T\Vert _{\Lambda }^2}{\Vert {\bf x}_T\Vert ^2} \le T^2\left(2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2\Vert {\bf x}_T\Vert ^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \frac{\Vert {\bf x}_*\Vert ^2}{\Vert {\bf x}_T\Vert ^2}\right) =O(T^2). \end{equation*}

When $Z\ne 0$, then ${\bf b}= A{\bf x}_* + {\bf c}$ for some vector ${\bf c}$ of norm Z that is not in the column space of A. This can happen when A is not full rank. Since ${\bf c}$ is independent of A, we cannot bound it in terms of A. In this case, the only change is ${\mathbb {E}}[\langle {\bf y}_k|{\bf z}_k \rangle _\Lambda ]$, which is now bounded by

\[\begin{eqnarray*} {\mathbb {E}}[\langle {\bf y}_k|{\bf z}_k \rangle _\Lambda ] &\le & \frac{\langle {\bf c}|{\mathbb {E}}[{\bf y}_k]\rangle _\Lambda }{\Vert A\Vert _F^2}+\frac{7}{2}\Vert {\bf x}_* \Vert ^2 +\frac{5}{2}\Vert {\bf x}_k-{\bf x}_*\Vert ^2 \\ &\le & \frac{\Vert A\Vert ^2}{\Vert A\Vert _F^2}\Vert {\bf c}\Vert \Vert {\mathbb {E}}[{\bf y}_k]\Vert + \frac{7}{2}\Vert {\bf x}_* \Vert ^2 +\frac{5}{2}\Vert {\bf x}_k-{\bf x}_*\Vert ^2 \\ &=& \frac{\kappa ^2Z}{\kappa _F^2} \Vert {\mathbb {E}}[{\bf y}_k]\Vert + \frac{7}{2}\Vert {\bf x}_* \Vert ^2 +\frac{5}{2}\Vert {\bf x}_k-{\bf x}_*\Vert ^2. \end{eqnarray*}\]

At the end, we obtain

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf y}_{k+1}\Vert _\Lambda ^2] \le {\mathbb {E}}[\Vert {\bf y}_k\Vert _\Lambda ^2] + 2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2 + \frac{2\kappa ^2Z}{\kappa _F^2} \Vert {\mathbb {E}}[{\bf y}_k]\Vert . \end{eqnarray*}\]

From Equation (26), we know that

\begin{equation*} {\mathbb {E}}[{\bf y}_{k+1}] =\left(I - \frac{AA^T}{\Vert A\Vert _F^2}\right){\mathbb {E}}[{\bf y}_k] + \frac{{\bf b}}{\Vert A\Vert _F^2}. \end{equation*}

This means

\begin{equation*} \Vert {\mathbb {E}}[{\bf y}_k]\Vert = \left\Vert \sum _{i=0}^{k-1} \left(I - \frac{AA^T}{\Vert A\Vert _F^2}\right)^i \frac{{\bf b}}{\Vert A\Vert _F^2}\right\Vert \le \sum _{i=0}^{k-1} \left(1 - \kappa _F^{-2}\right)^i \frac{\Vert {\bf b}\Vert }{\Vert A\Vert _F^2} \le \frac{\kappa _F^2 \Vert {\bf b}\Vert }{\Vert A\Vert _F^2}. \end{equation*}

Therefore,

\[\begin{eqnarray*} {\mathbb {E}}\left[\Vert {\bf y}_T\Vert _\Lambda ^2\right] &\le & T\left(\frac{2\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2 + \frac{2\kappa ^2 \Vert {\bf b}\Vert Z}{\Vert A\Vert _F^2} \right) \\ &=& T\left(\frac{2\Vert A{\bf x}_*\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2 + \frac{2\kappa ^2 \Vert {\bf b}\Vert Z + 2Z^2}{\Vert A\Vert _F^2}\right). \end{eqnarray*}\]

Finally, by Markov’s inequality, with high probability, we have

\begin{equation*} \phi = T \frac{\Vert {\bf y}_T\Vert _{\Lambda }^2}{\Vert {\bf x}_T\Vert ^2} = O\left(T^2+ T^2 \frac{\kappa ^2 \Vert {\bf b}\Vert Z + Z^2}{\Vert A\Vert _F^2\Vert {\bf x}_*\Vert ^2} \right), \end{equation*}

where we used the fact that $\Vert {\bf x}_T-{\bf x}_*\Vert \le \epsilon \Vert {\bf x}_*\Vert$ and $\Vert A{\bf x}_*\Vert ^2 / \Vert A\Vert _F^2 \le \Vert {\bf x}_*\Vert ^2$. Note that $\Vert A\Vert _F\Vert {\bf x}_*\Vert = \Vert A\Vert _F\Vert A^{+}{\bf b}\Vert \ge \Vert A\Vert _F \Vert {\bf b}\Vert /\Vert A\Vert \ge Z$, so the second term is bounded by $O(T^2 \kappa ^4 Z/\kappa _F^2\Vert {\bf b}\Vert)$. In the worst case, $\phi = O(T^2\kappa ^2)$ because $\kappa ^2 Z\le \kappa _F^2\Vert {\bf b}\Vert$.

B Estimation of the Convergence Rate

For simplicity, we assume that $A{\bf x}_* = {\bf b}$. Following from the analysis of Reference [35], the convergence rate does not change too much if the linear system is not consistent. By the updating formula (18), we have

\[\begin{eqnarray*} {\bf x}_{k+1} - {\bf x}_* &=& {\bf x}_k - {\bf x}_* + \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} (\tilde{b}_{i} - \langle \tilde{A}_{i*} |{\bf x}_{k}\rangle) | \tilde{A}_{i*} \rangle + \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} \langle \tilde{A}_{i*} | I - D_i |{\bf x}_{k}\rangle | \tilde{A}_{i*} \rangle \\ &=& \left(I - \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right) ({\bf x}_k - {\bf x}_*) + \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} \langle \tilde{A}_{i*} | I - D_i |{\bf x}_{k}\rangle | \tilde{A}_{i*} \rangle . \end{eqnarray*}\]

Below, we try to bound ${\mathbb {E}}[\Vert {\bf x}_{k+1} - {\bf x}_*\Vert ^2]$. We will follow the notation of Equation (14). But, here, to avoid any confusion, we denote

\begin{equation*} \mu _{ik} := \langle \tilde{A}_{i*} | I - D_i |{\bf x}_{k} \rangle . \end{equation*}

In the following, the result of Lemma 9 will be used, and $|{\mathcal {E}}_k| = \Vert A\Vert _F^2/\Vert A\Vert ^2$.

First, we have

\[\begin{eqnarray*} {\mathbb {E}}_{D}\left[\left\Vert \sum _{i\in {\mathcal {E}}_k} \langle \tilde{A}_{i*} | I - D_i |{\bf x}_{k} \rangle | \tilde{A}_{i*} \rangle \right\Vert ^2\right] &=& \sum _{i,j\in {\mathcal {E}}_k} \langle \tilde{A}_{i*}|\tilde{A}_{j*}\rangle {\mathbb {E}}_{D_i,D_j}[ \mu _{ik}\mu _{jk} ] \\ &=& \sum _{i\in {\mathcal {E}}_k} {\mathbb {E}}_{D_i} [ \mu _{ik}^2 ] +\sum _{i\ne j} \langle \tilde{A}_{i*}|\tilde{A}_{j*}\rangle {\mathbb {E}}_{D_i}\left[ \mu _{ik}\right] {\mathbb {E}}_{D_j}[\mu _{jk}] \\ &\le & \frac{\Vert A\Vert _F^4\Vert {\bf x}_k\Vert ^2}{d\Vert A\Vert ^2\min _{l\in [n]}\Vert A_{*l}\Vert ^2}. \end{eqnarray*}\]

By Lemma 9, we have

\[\begin{eqnarray*} && {\mathbb {E}}_D\left[\left\langle \left(I - \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right) ({\bf x}_k - {\bf x}_*) \Bigg | \sum _{i\in {\mathcal {E}}_k} \mu _{ik} | \tilde{A}_{i*} \rangle \right\rangle \right] \\ &=& \left\langle \left(I - \frac{1}{2}\sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right) ({\bf x}_k - {\bf x}_*) \Bigg | \sum _{i\in {\mathcal {E}}_k} {\mathbb {E}}_D[\mu _{ik}] | \tilde{A}_{i*} \rangle \right\rangle = 0. \end{eqnarray*}\]

Hence, after computing the mean value over D, we have

\[\begin{eqnarray*} \Vert {\bf x}_{k+1} - {\bf x}_*\Vert ^2 \le \left\Vert \left(I - \frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right) ({\bf x}_k - {\bf x}_*) \right\Vert ^2 + \frac{\Vert A\Vert _F^4\Vert {\bf x}_k\Vert ^2}{4d\Vert A\Vert ^2\min _{l\in [n]}\Vert A_{*l}\Vert ^2}. \end{eqnarray*}\]

For the first term, we now compute its mean value over the random variable ${\mathcal {E}}_k$

\[\begin{eqnarray*} && {\mathbb {E}}\left[\left\Vert \left(I -\frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right) ({\bf x}_k - {\bf x}_*) \right\Vert ^2\right] \\ &=& \langle {\bf x}_k - {\bf x}_* | {\mathbb {E}}\left[ \left(I -\frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right)^2 \right] | {\bf x}_k - {\bf x}_*\rangle . \end{eqnarray*}\]

And it can be shown that

\[\begin{eqnarray*} && {\mathbb {E}}\left[ \left(I -\frac{1}{2} \sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\right)^2 \right] \\ &=& {\mathbb {E}}\left[I - \frac{3}{4}\sum _{i\in {\mathcal {E}}_k} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}| + \frac{1}{4} \sum _{i,j\in {\mathcal {E}}_k, i\ne j} | \tilde{A}_{i*} \rangle \langle \tilde{A}_{i*}|\tilde{A}_{j*}\rangle \langle \tilde{A}_{j*}|\right] \\ &=& I - \frac{3q}{4}\frac{A^T A}{\Vert A\Vert _F^2} + \frac{1}{4} (q^2-q) \left(\frac{A^T A}{\Vert A\Vert _F^2} \right)^2 \\ &\preceq & \left(1-\frac{3q}{4} \frac{\sigma _{\min }(A^T A)}{\Vert A\Vert _F^2}+ \frac{1}{4} (q^2-q) \left(\frac{\sigma _{\min }(A^T A)}{\Vert A\Vert _F^2} \right)^2 \right) I \\ &\preceq & \left(1-\frac{1}{2\kappa ^2} \right) I. \end{eqnarray*}\]

In the last step, we used the result $q=\Vert A\Vert _F^2/\Vert A\Vert ^2$ so the second term is $3/4\kappa ^2$ and the third term is less than $1/4\kappa ^4 \le 1/4\kappa ^2$.

Therefore,

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf x}_{k+1} - {\bf x}_*\Vert ^2] \le \left(1-\frac{1}{2\kappa ^2}\right) {\mathbb {E}}[\Vert {\bf x}_{k} - {\bf x}_*\Vert ^2] + \frac{\Vert A\Vert _F^4{\mathbb {E}}[\Vert {\bf x}_k\Vert ^2]}{4d\Vert A\Vert ^2\min _{l\in [n]}\Vert A_{*l}\Vert ^2}. \end{eqnarray*}\]

Now, set $T = O(\kappa ^2 \log (2/\epsilon ^2))$, and

\begin{equation} d = \frac{\Vert A\Vert _F^4 T}{\epsilon ^2\Vert A\Vert ^2\min _{l\in [n]}\Vert A_{*l}\Vert ^2}. \end{equation}

(28)

Then,

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf x}_{k+1} - {\bf x}_*\Vert ^2] &\le & \left(1-\frac{1}{2\kappa ^2}\right) {\mathbb {E}}[\Vert {\bf x}_{k} - {\bf x}_*\Vert ^2] + \frac{\epsilon ^2}{4T} {\mathbb {E}}[\Vert {\bf x}_k\Vert ^2] \\ &\le & \left(1-\frac{1}{2\kappa ^2}+ \frac{\epsilon ^2}{2T}\right) {\mathbb {E}}[\Vert {\bf x}_{k} - {\bf x}_*\Vert ^2] + \frac{\epsilon ^2}{2T} \Vert {\bf x}_*\Vert ^2. \end{eqnarray*}\]

Finally, we obtain

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf x}_T - {\bf x}_*\Vert ^2] \lesssim \left(1-\frac{1}{2\kappa ^2} + \frac{\epsilon ^2}{2T}\right)^T \Vert {\bf x}_{0} - {\bf x}_*\Vert ^2 + \frac{\epsilon ^2}{2} \Vert {\bf x}_*\Vert ^2 \le \epsilon ^2 \Vert {\bf x}_*\Vert ^2. \end{eqnarray*}\]

C Estimation of $\phi$ for Kaczmarz Method with Averaging

The calculation here is similar to that in Appendix A. For simplicity, denote

\begin{equation*} {\bf y}_{k+1} = {\bf y}_k + \frac{1}{2} {\bf w}_k + \frac{1}{2} {\bf w}_k^{\prime }, \end{equation*}

where

\[\begin{eqnarray*} {\bf w}_k := \sum _{i \in {\mathcal {E}}_k} \frac{b_{i} - \langle A_{i*}| {\bf x}_k \rangle }{\Vert A_{i*}\Vert ^2} {\bf e}_{i*}, \quad {\bf w}_k ^{\prime } := \sum _{i\in {\mathcal {E}}_k} \frac{\mu _{ik}}{\Vert A_{i*}\Vert } {\bf e}_{i*}. \end{eqnarray*}\]

In this section, d is given in formula (28). From a similar estimation in Appendix A, we have ${\mathbb {E}}_D[\langle {\bf w}_k|{\bf w}_k^{\prime }\rangle _\Lambda ] ={\mathbb {E}}_D[\langle {\bf y}_k|{\bf w}_k^{\prime }\rangle _\Lambda ] = 0$ and

\[\begin{eqnarray*} {\mathbb {E}}_D[\Vert {\bf w}_k^{\prime }\Vert _\Lambda ^2] \le \frac{\Vert A\Vert _F^2}{\Vert A\Vert ^2}\frac{\Vert A\Vert _F^2 \Vert {\bf x}_k\Vert ^2}{d\min _{j \in [n]} \Vert A_{*j}\Vert ^2} = \frac{\epsilon ^2\Vert {\bf x}_k\Vert ^2}{4T} \le \frac{\epsilon ^2(\Vert {\bf x}_k-{\bf x}_*\Vert ^2+\Vert {\bf x}_*\Vert ^2)}{2T}. \end{eqnarray*}\]

As for $\Vert {\bf w}_k\Vert _\Lambda ^2$, we still have

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf w}_k\Vert _\Lambda ^2] &=&{\mathbb {E}}\left[ \sum _{i,j \in {\mathcal {E}}_k} \frac{b_{i} - \langle A_{i*}| {\bf x}_k \rangle }{\Vert A_{i*}\Vert ^2} \frac{b_{j} - \langle A_{j*}| {\bf x}_k \rangle }{\Vert A_{j*}\Vert ^2} \langle {\bf e}_{i*}|{\bf e}_{j*} \rangle \Vert A_{i*}\Vert ^2 \right] \\ &=& \frac{\Vert A\Vert _F^2}{\Vert A\Vert ^2} {\mathbb {E}}\left[\frac{(b_{i} - \langle A_{i*}| {\bf x}_k \rangle)^2 }{\Vert A_{i*}\Vert ^2}\right] \\ &=& \frac{\Vert {\bf b}- A{\bf x}_k\Vert ^2}{\Vert A\Vert ^2} \\ &\le & \frac{2\Vert {\bf b}\Vert ^2}{\Vert A\Vert ^2} + 2\Vert {\bf x}_k\Vert ^2. \end{eqnarray*}\]

By the estimation of ${\mathbb {E}}[\langle {\bf y}_k|{\bf z}_k\rangle _{\Lambda }]$ in Appendix A and noting that $\Vert \Lambda \Vert \le \Vert A\Vert$, we also have

\[\begin{eqnarray*} {\mathbb {E}}[\langle {\bf y}_k|{\bf w}_k\rangle _{\Lambda }] = \frac{\Vert A\Vert _F^2}{\Vert A\Vert ^2} {\mathbb {E}}[\langle {\bf y}_k|\frac{b_{i} - \langle A_{i*}| {\bf x}_k \rangle }{\Vert A_{i*}\Vert ^2} {\bf e}_{i*}\rangle ] \le \frac{7}{2}\Vert {\bf x}_* \Vert ^2 +\frac{5}{2}\Vert {\bf x}_k-{\bf x}_*\Vert ^2. \end{eqnarray*}\]

All the estimations above do not change. The constant 1/2 in the decomposition of ${\bf y}_{k+1}$ does not affect the upper bound, so when $Z=0,$ we have

\[\begin{eqnarray*} {\mathbb {E}}[\Vert {\bf y}_T \Vert _\Lambda ^2] \le T\left(2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \Vert {\bf x}_*\Vert ^2\right), \end{eqnarray*}\]

where $T=\widetilde{O}(\kappa ^2)$. Therefore,

\begin{equation*} \phi = T\frac{\kappa _F^2}{\kappa ^2} \frac{\Vert {\bf y}_T\Vert _{\Lambda }^2}{\Vert {\bf x}_T\Vert ^2} \le T^2\frac{\kappa _F^2}{\kappa ^2}\left(2\frac{\Vert {\bf b}\Vert ^2 }{\Vert A\Vert _F^2\Vert {\bf x}_T\Vert ^2} + \left(20 + \frac{\epsilon ^2}{T}\right) \frac{\Vert {\bf x}_*\Vert ^2}{\Vert {\bf x}_T\Vert ^2}\right) =O(\kappa _F^2\kappa ^2). \end{equation*}

When $Z\ne 0$, we similarly have $\phi = O(\kappa _F^2\kappa ^4)$.

References

[1]

Alexandr Andoni, Robert Krauthgamer, and Yosef Pogrow. 2018. On solving linear systems in sublinear time. In 10th Innovations in Theoretical Computer Science Conference (ITCS’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 124), Avrim Blum (Ed.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 3:1–3:19. DOI:

Abstract

A Estimation of \(\phi\)

B Estimation of the Convergence Rate

C Estimation of \(\phi\) for Kaczmarz Method with Averaging

References

Cited By

Index Terms

Recommendations

Bounds on Solutions of Linear Systems with Inaccurate Data

Hoffman"s Least Error Bounds for Systems of Linear Inequalities

Sparse Approximate Solutions to Linear Systems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations