Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Randomized matrix-free trace and log-determinant estimators

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

We present randomized algorithms for estimating the trace and determinant of Hermitian positive semi-definite matrices. The algorithms are based on subspace iteration, and access the matrix only through matrix vector products. We analyse the error due to randomization, for starting guesses whose elements are Gaussian or Rademacher random variables. The analysis is cleanly separated into a structural (deterministic) part followed by a probabilistic part. Our absolute bounds for the expectation and concentration of the estimators are non-asymptotic and informative even for matrices of low dimension. For the trace estimators, we also present asymptotic bounds on the number of samples (columns of the starting guess) required to achieve a user-specified relative error. Numerical experiments illustrate the performance of the estimators and the tightness of the bounds on low-dimensional matrices, and on a challenging application in uncertainty quantification arising from Bayesian optimal experimental design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The square matrix \(\mathbf {I}\) denotes the identity, with ones on the diagonal and zeros everywhere else.

  2. The superscript \(\dagger \) denotes the Moore–Penrose inverse.

  3. independent and identically distributed.

References

  1. Akçelik, V., Biros, G., Draganescu, A., Ghattas, O., Hill, J., Van Bloemen Waanders, B.: Dynamic data-driven inversion for terascale simulations: real-time identification of airborne contaminants. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 43–43 (2005)

  2. Alexanderian, A., Gloor, P.J., Ghattas, O.: On Bayesian \(A\)-and \(D\)-optimal experimental designs in infinite dimensions. Bayesian Anal. 11(3), 671–695 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Alexanderian, A., Petra, N., Stadler, G., Ghattas, O.: A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized \(\ell _0\)-sparsification. SIAM J. Sci. Comput. 36(5), A2122–A2148 (2014)

    Article  MATH  Google Scholar 

  4. Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Atkinson, A.C., Donev, A.N.: Optimum Experimental Designs. Oxford University Press, Oxford (1992)

    MATH  Google Scholar 

  6. Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM 58(2), Art. 8, 17 (2011)

  7. Bai, Z., Fahey, M., Golub, G.: Some large-scale matrix computation problems. J. Comput. Appl. Math. 74(1), 71–89 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bai, Z., Golub, G.H.: Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Ann. Numer. Math. 4(1-4), 29–38 (1997). The heritage of P.L. Chebyshev: a Festschrift in honor of the 70th birthday of T.J. Rivlin

  9. Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1-3), 41–54 (1999). Linear algebra and statistics (Istanbul, 1997)

  10. Boutsidis, C., Drineas, P., Kambadur, P., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv:1503.00374 (2015)

  11. Chaloner, K., Verdinelli, I.: Bayesian experimental design: a review. Stat. Sci. 10(3), 273–304 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen, J., Anitescu, M., Saad, Y.: Computing \(f({A})b\) via least squares polynomial approximations. SIAM J. Sci. Comput. 33(1), 195–222 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Flath, P.H., Wilcox, L.C., Akçelik, V., Hill, J., van Bloemen Waanders, B., Ghattas, O.: Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM J. Sci. Comput. 33(1), 407–432 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. arXiv:1303.1849 (2013)

  15. Golub, G.H., Von Matt, U.: Generalized cross-validation for large-scale problems. J. Comput. Graph. Stat. 6(1), 1–34 (1997)

    MathSciNet  MATH  Google Scholar 

  16. Gu, M.: Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput. 37(3), A1139–A1173 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Haber, E., Horesh, L., Tenorio, L.: Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Probl. 24(5), 055012–055017 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Haber, E., Magnant, Z., Lucero, C., Tenorio, L.: Numerical methods for \(A\)-optimal designs with a sparsity constraint for ill-posed inverse problems. Comput. Optim. Appl. 52, 293–314 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Han, I., Malioutov, D., Shin, J.: Large-scale log-determinant computation through stochastic Chebyshev expansions. arXiv:1503.06394 (2015)

  21. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2002)

    Book  MATH  Google Scholar 

  22. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)

    Book  MATH  Google Scholar 

  23. Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)

    MATH  Google Scholar 

  24. Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat. Simul. Comput. 18(3), 1059–1076 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Statist. 1, 63–87 (1995/1997)

  26. Liberty, E., Woolfe, F., Martinsson, P.G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. USA 104(51), 20167–20172 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lin, L.: Randomized estimation of spectral densities of large matrices made accurate. Numer. Math. 1–31 (2016). doi:10.1007/s00211-016-0837-7

  28. Mahoney, M.W.: Randomized Algorithms for Matrices and Data. Now Publishers Inc, Hanover (2011)

    MATH  Google Scholar 

  29. Martinsson, P.G., Rokhlin, V., Tygert, M.: A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmon. Anal. 30(1), 47–68 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  30. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  31. Ouellette, D.V.: Schur complements and statistics. Linear Algebra Appl. 36, 187–295 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  32. Pace, R.K., LeSage, J.P.: Chebyshev approximation of log-determinants of spatial weight matrices. Comput. Stat. Data Anal. 45(2), 179–196 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  33. Parlett, B.N.: The Symmetric Eigenvalue Problem. Prentice Hall Inc, Englewood Cliffs (1980)

    MATH  Google Scholar 

  34. Petra, N., Stadler, G.: Model variational inverse problems governed by partial differential equations. Tech. Rep. 11-05, The Institute for Computational Engineering and Sciences, The University of Texas at Austin (2011)

  35. Roosta-Khorasani, F., Ascher, U.: Improved bounds on sample size for implicit matrix trace estimators. Found. Comput. Math. 15(5), 1187–1212 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  36. Rudelson, M., Vershynin, R.: Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 62(12), 1707–1739 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  37. Saad, Y.: Numerical methods for large eigenvalue problems. In: Classics in Applied Mathematics, vol. 66. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2011). Revised edition of the 1992 original

  38. Saibaba, A.K., Kitanidis, P.K.: Fast computation of uncertainty quantification measures in the geostatistical approach to solve inverse problems. Adv. Water Resour. 82, 124–138 (2015)

    Article  Google Scholar 

  39. Sorensen, D.C., Embree, M.: A DEIM induced CUR factorization. SIAM J. Sci. Comput. 38(3), A1454–A1482 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  40. Stirling, D.S.G.: Mathematical Analysis and Proof, 2nd edn. Horwood Publishing Limited, Chichester (2009)

    Book  MATH  Google Scholar 

  41. Tang, J.M., Saad, Y.: A probing method for computing the diagonal of a matrix inverse. Numer. Linear Algebra Appl. 19(3), 485–501 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  42. Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(1–2), 115–126 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  43. Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)

    Article  MATH  Google Scholar 

  44. Uciński, D.: Optimal Measurement Methods for Distributed Parameter System Identification. CRC Press, Boca Raton (2005)

    MATH  Google Scholar 

  45. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y.C., Kutyniok, G. (eds.) Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)

  46. Wahba, G.: Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14(4), 651–667 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  47. Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1990)

    Google Scholar 

  48. Zhang, Y., Leithead, W.E., Leith, D.J., Walshe, L.: Log-det approximation based on uniformly distributed seeds and its application to Gaussian process regression. J. Comput. Appl. Math. 220(1–2), 198–214 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind K. Saibaba.

Additional information

The third author acknowledges the support from the XDATA Program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323 FA8750-12-C-0323.

Appendices

Appendix 1: Gaussian random matrices

In this section, we state a lemma on the pseudo-inverse of a rectangular Gaussian random matrix, and use this result to prove both parts of Lemma 4.

1.1 Pseudo-inverse of a Gaussian random matrix

We state a result on the large deviation bound of the pseudo-inverse of a Gaussian random matrix [19, Proposition 10.4].

Lemma 6

Let \(\mathbf {G}\in \mathbb {R}^{k\times (k+p)}\) be a random Gaussian matrix and let \(p\ge 2\). For all \(t \ge 1\),

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}^\dagger \Vert _2 \ge \frac{e\sqrt{k+p}}{p+1}\cdot t \right] \le t^{-(p+1)}. \end{aligned}$$
(23)

1.2 Proof of Lemma 4

Proof

From [45, Corollary 5.35] we have

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2 > \sqrt{n-k} + \sqrt{k+p} + t \right] \le \exp (-t^2/2). \end{aligned}$$

Recall from (3) \(\mu = \sqrt{n-k} + \sqrt{k+p}\). From the law of the unconscious statistician [16, Proposition S4.2],

$$\begin{aligned} \mathbb {E}\left[ \, \Vert \mathbf {G}_2\Vert _2^2 \right] =&\> \int _0^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2> t \right] dt\\ \le&\> \int _0^{\mu } 2t dt + \int _{\mu }^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_2\Vert _2> t \right] dt \\ \le&\> \mu ^2 + \int _{0}^\infty 2(u+\mu ) \exp (-u^2/2)du = \> \mu ^2 + 2\left( 1+ \mu \sqrt{\frac{\pi }{2}}\right) . \end{aligned}$$

This concludes the proof for (16).

Next consider (17). Using Lemma 6, we have for \(t > 0\)

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2 \ge t \right] \le \> D t^{-(p+1)} \qquad D \equiv \> \frac{1}{\sqrt{2\pi (p+1)}}\left( \frac{e\sqrt{k+p}}{p+1}\right) . \end{aligned}$$
(24)

As before, we have

$$\begin{aligned} \mathbb {E}\left[ \, \Vert \mathbf {G}_1^{\dagger }\Vert _2^2 \right]&= \int _0^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2> t \right] dt\\&\le \int _0^{\beta } 2t dt + \int _{\beta }^\infty 2t \mathbb {P}\left[ \, \Vert \mathbf {G}_1^\dagger \Vert _2 > t \right] dt\\ & \le \beta ^2 + \int _{\beta }^\infty 2t D t^{-(p+1)} dt = \beta ^2 + 2D\frac{\beta ^{1-p}}{p-1}. \end{aligned}$$

Minimizing w.r.t. \(\beta \), we get \(\beta = (D)^{1/(p+1)}\). Substitute this value for \(\beta \) and simplify.

\(\square \)

Appendix 2: Rademacher random matrices

In this section, we state the matrix Chernoff inequalities [43] and other useful concentration inequalities and use these results to prove Theorem 10.

1.1 Useful concentration inequalities

The proof of Theorem 10 relies on the matrix concentration inequalities developed in [43]. We will need the following result [43, Theorem 5.1.1] in what follows.

Theorem 11

(Matrix Chernoff) Let \(\{\mathbf {X}_k\}\) be finite sequence of independent, random, \(d\times d\) Hermitian matrices. Assume that \(0 \le \lambda _{\min }(\mathbf {X}_k)\) and \(\lambda _{\max }(\mathbf {X}_k) \le L\) for each index k. Let us define

$$\begin{aligned} \mu _{\min } \equiv \lambda _{\min }\left( \sum _k\mathbb {E}\left[ \, \mathbf {X}_k \right] \right) \qquad \mu _{\max } \equiv \lambda _{\max }\left( \sum _k\mathbb {E}\left[ \, \mathbf {X}_k \right] \right) , \end{aligned}$$

and let \(g(x) \equiv e^x(1+x)^{-(1+x)}.\) Then for any \(\epsilon > 0\)

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _{\max }\left( \sum _k \mathbf {X}_k \right) \ge (1+\epsilon )\mu _{\max } \right] \le d g(\epsilon )^{\mu _{\max }/L}, \end{aligned}$$

and for any \(0 \le \epsilon <1 \)

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _{\min }\left( \sum _k \mathbf {X}_k \right) {\le } (1-\epsilon )\mu _{\min } \right] \le d g(-\epsilon )^{\mu _{\min }/L}. \end{aligned}$$

The following result was first proved by Ledoux [25] but we reproduce the statement from [42, Proposition 2.1].

Lemma 7

Suppose \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) is a convex function that satisfies the following Lipschitz bound

$$\begin{aligned} |f(\mathbf {x})-f(\mathbf {y})| \le L \Vert \mathbf {x}-\mathbf {y}\Vert _2 \qquad \text {for all }\>\mathbf {x},\mathbf {y}\in \mathbb {R}^n. \end{aligned}$$

Let \(\mathbf {z}\in \mathbb {R}^n\) be a random vector with entries drawn from an i.i.d. Rademacher distribution. Then, for all \(t \ge 0\),

$$\begin{aligned} \mathbb {P}\left[ \, f(\mathbf {z}) \ge \mathbb {E}\left[ \, f(\mathbf {z}) \right] + Lt \right] \le e^{-t^2/8}. \end{aligned}$$

Lemma 8

Let \(\mathbf {V}\) be a \(n\times r\) matrix with orthonormal columns and let \(n\ge r\). Let \(\mathbf {z}\) be an \(n\times 1\) vector with entries drawn from an i.i.d. Rademacher distribution. Then, for \(0< \delta < 1\),

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \mathbf {V}^*\mathbf {z}\Vert _2 \ge \sqrt{r} + \sqrt{8\log \left( \frac{1}{\delta }\right) } \right] \le \delta . \end{aligned}$$

Proof

Our proof follows the strategy in [42, Lemma 3.3]. Define the function \(f(\mathbf {x}) = \Vert \mathbf {V}^*\mathbf {x}\Vert _2\). We observe that f satisfies the assumptions of Lemma 7, with Lipschitz constant \(L=1\); the latter follows from

$$\begin{aligned} |\Vert \mathbf {V}^*\mathbf {x}\Vert _2-\Vert \mathbf {V}^*\mathbf {y}\Vert _2| \le \Vert \mathbf {V}^*(\mathbf {x}-\mathbf {y})\Vert _2 \le \Vert \mathbf {x}-\mathbf {y}\Vert _2. \end{aligned}$$

Furthermore, using Hölder’s inequality

$$\begin{aligned} \mathbb {E}\left[ \, f(\mathbf {z}) \right] \le [\mathbb {E}\left[ \, f(\mathbf {z})^2 \right] ]^{1/2} = \Vert \mathbf {V}\Vert _F = \sqrt{r}. \end{aligned}$$

Using Lemma 7 with \(t_\delta = \sqrt{8\log \left( 1/\delta \right) }\) we have

$$\begin{aligned} \mathbb {P}\left[ \, f(\mathbf {z}) \ge \sqrt{r} + t_\delta \right] \le \mathbb {P}\left[ \, f(\mathbf {z}) \ge \mathbb {E}\left[ \, f(\mathbf {z}) \right] + t_\delta \right] \le e^{-t_\delta ^2/8} = \delta . \end{aligned}$$

\(\square \)

Lemma 9

Let \(X_i\) for \(i=1,\ldots ,n\) be a sequence of i.i.d. random variables. If for each \(i=1,\ldots ,n\), \(\mathbb {P}\left[ \, X_i \ge a \right] \le \xi \) holds, where \(\xi \in (0,1]\), then

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i \ge a \right] \le n\xi . \end{aligned}$$

Proof

Since \( \mathbb {P}\left[ \, X_i \ge a \right] \le \xi \) then \(\mathbb {P}\left[ \, X_i < a \right] \ge 1 - \xi \). We can bound

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i \ge a \right] =&\left( 1-\mathbb {P}\left[ \, \max _{i=1,\cdots ,n} X_i< a \right] \right) \\ =&\left( 1- \prod _{i=1}^n\mathbb {P}\left[ \, X_i <a \right] \right) \le 1 - (1-\xi )^n. \end{aligned}$$

The proof follows from Bernoulli’s inequality [40, Theorem 5.1] which states \((1-\xi )^n \ge 1 - n\xi \) for \(\xi \in [0,1]\) and \(n\ge 1\). \(\square \)

1.2 Proof of Theorem 10

Proof

Recall that \(\varvec{\Omega }_1 = \mathbf {U}^*_1 \varvec{\Omega }\) and \(\varvec{\Omega }_2 = \mathbf {U}_2^*\varvec{\Omega }\) where \(\varvec{\Omega }\) is random matrix with entries chosen from an i.i.d. Rademacher distribution. The proof proceeds in three steps.

1. Bound for \(\Vert \varvec{\Omega }_2\Vert _2^2\) The proof uses the matrix Chernoff concentration inequality. Let \(\varvec{\omega }_i \in \mathbb {R}^{n\times 1}\) be the i-th column of \(\varvec{\Omega }\). Note \( \varvec{\Omega }_2\varvec{\Omega }_2^* \in \mathbb {C}^{(n-k)\times (n-k)}\) and

$$\begin{aligned} \mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] = \sum _{i=1}^\ell \mathbf {U}_2^*\mathbb {E}\left[ \, \varvec{\omega }_i\varvec{\omega }_i^{*} \right] \mathbf {U}_2 = \ell \mathbf {I}_{n-k}. \end{aligned}$$

Furthermore, define \(\mu _\text {min} ( \varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {min} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )\) and \(\mu _\text {max} (\varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {max} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )\). Clearly \(\mu _\text {min} = \mu _\text {max} = \ell \). Note that here we have expressed \( \varvec{\Omega }_2\varvec{\Omega }_2^*\) as a finite sum of \(\ell \) rank-1 matrices, each with a single nonzero eigenvalue \(\varvec{\omega }_i^* \mathbf {U}_2\mathbf {U}_2^*\varvec{\omega }_i \). We want to obtain a probabilistic bound for the maximum eigenvalue i.e., \(L_2 = \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2\). Using Lemma 8 we can write with probability at most \(e^{-t^2/8}\)

$$\begin{aligned} \left( \sqrt{n-k} + t\right) ^2 \le \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2 = \varvec{\omega }_i^* \mathbf {U}_2\mathbf {U}_2^*\varvec{\omega }_i . \end{aligned}$$

Since \(\Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2\) are i.i.d., applying Lemma 9 gives

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2 \ge \sqrt{n-k} + t \right] \le \ell e^{-t^2/8}. \end{aligned}$$

Take \(t = \sqrt{ 8\log (4\ell /\delta )}\) to obtain

$$\begin{aligned} \mathbb {P}\left[ \, L_2 \ge C_u^2 \right] \le \delta /4, \qquad C_u \equiv \sqrt{n-k} + \sqrt{ 8\log \left( \frac{4\ell }{\delta }\right) }. \end{aligned}$$
(25)

The matrix \(\varvec{\Omega }_2\) satisfies the conditions of the matrix Chernoff theorem 11; for \(\eta \ge 0\) we have

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _\text {max}(\varvec{\Omega }_2\varvec{\Omega }_2^*) \ge (1+\eta ) \ell \right] \le (n-k) g(\eta )^\frac{\ell }{L_2}, \end{aligned}$$

where the function \(g(\eta )\) is defined in Theorem 11. For \(\eta > 1\) the Chernoff bounds can be simplified [30, Section 4.3] since \(g(\eta ) \le e^{-\eta /3}\), to obtain

$$\begin{aligned} \mathbb {P}\left[ \, \lambda _\text {max}(\varvec{\Omega }_2\varvec{\Omega }_2^*) \ge (1+\eta ) \ell \right] \le (n-k) \exp \left( -\frac{\eta \ell }{3L_2}\right) . \end{aligned}$$

Choose the parameter

$$\begin{aligned} \eta _\delta = C_{\ell ,\delta }C_u^2 = 3\ell ^{-1}C_u^2\log \left( \frac{4(n-k)}{\delta }\right) , \end{aligned}$$

so that

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1+\eta _\delta )\ell \right] \le&\> (n-k)\exp \left( -\frac{C_u^2}{L_2}\log \frac{4(n-k)}{\delta }\right) \\ =&\> (n-k)\left( \frac{\delta }{4(n-k)}\right) ^{C_u^2/L_2}. \end{aligned}$$

Finally, we want to find a lower bound for \(\Vert \varvec{\Omega }_2\Vert _2^2\). Define the events

$$\begin{aligned} A = \>\left\{ \varvec{\Omega }_2 \mid L_2 < C_u^2 \right\} , \qquad B = \>\left\{ \varvec{\Omega }_2 \mid \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1+\eta _\delta )\ell \right\} . \end{aligned}$$

Note that \(\mathbb {P}\left[ \, A^c \right] \le \delta /4\) and under event A we have \(C_u^2 >L_2\) so that

$$\begin{aligned} \mathbb {P}\left[ \, B\mid A \right] \le (n-k)\left( \frac{\delta }{4(n-k)}\right) ^{C_u^2/L_2} \le \delta /4. \end{aligned}$$

Using the law of total probability

$$\begin{aligned} \mathbb {P}\left[ \, B \right] =&\> \mathbb {P}\left[ \, B\mid A \right] \mathbb {P}\left[ \, A \right] + \mathbb {P}\left[ \, B\mid A^c \right] \mathbb {P}\left[ \, A^c \right] \\ \le&\> \mathbb {P}\left[ \, B\mid A \right] + \mathbb {P}\left[ \, A^c \right] , \end{aligned}$$

we can obtain a bound for \(\mathbb {P}\left[ \, B \right] \) as

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \ge \ell \left( 1 + C_u^2 C_{\ell ,\delta }\right) \right] \le \delta /2. \end{aligned}$$

2. Bound for \(\Vert \varvec{\Omega }_1^\dagger \Vert _2^2\) The steps are similar and we again use the matrix Chernoff concentration inequality. Consider \(\varvec{\Omega }_1\varvec{\Omega }_1^*\in \mathbb {C}^{k\times k}\), and as before, write this matrix as the sum of rank-1 matrices to obtain

$$\begin{aligned} \mathbb {E}\left[ \, \varvec{\Omega }_1\varvec{\Omega }_1^* \right] = \sum _{i=1}^\ell \mathbf {U}_1^*\mathbb {E}\left[ \, \varvec{\omega }_i\varvec{\omega }_i^{*} \right] \mathbf {U}_1 = \ell \mathbf {I}_{k}, \end{aligned}$$

and \(\mu _\text {min} (\varvec{\Omega }_1\varvec{\Omega }_1^*) = \ell \). Each summand in the above decomposition of \(\varvec{\Omega }_1\varvec{\Omega }_1^*\) has one nonzero eigenvalue \(\varvec{\omega }_i^* \mathbf {U}_1\mathbf {U}_1^*\varvec{\omega }_i \). Following the same strategy as in Step 1, we define \(L_1 \equiv \max _{i=1,\ldots ,\ell }\Vert \mathbf {U}_1^*\varvec{\omega }_i\Vert _2^2 \) and apply Lemma 8 to obtain

$$\begin{aligned} \mathbb {P}\left[ \, \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_1^*\varvec{\omega }_i\Vert _2 \ge \sqrt{k} + t \right] \le \ell e^{-t^2/8} \le ne^{-t^2/8}. \end{aligned}$$

Take \(t = \sqrt{ 8\log (4n/\delta )}\) to obtain

$$\begin{aligned} \mathbb {P}\left[ \, L_1 \ge C_l^2 \right] \le \delta /4, \qquad C_l \equiv \sqrt{k} + \sqrt{ 8\log \left( \frac{4n}{\delta }\right) }. \end{aligned}$$
(26)

A straightforward application of the Chernoff bound in Theorem 11 gives us

$$\begin{aligned} {\mathbb {P}\left[ \, \lambda _\text {min}(\varvec{\Omega }_1\varvec{\Omega }_1^*) \le (1-\rho ) \ell \right] \le k g(-\rho )^\frac{\ell }{L_1}. } \end{aligned}$$

Next, observe that \(-\log g(-\rho )\) has the Taylor series expansion in the region \(0 {<} \rho {<} 1\)

$$\begin{aligned} -\log g(-\rho ) = {\rho } + (1-\rho )\log (1-\rho ) = \frac{\rho ^2}{2} + \frac{\rho ^3}{6} + \frac{\rho ^4}{12} + \cdots \end{aligned}$$

so that \(-\log g(-\rho ) \ge \rho ^2/2\) for \(0< \rho < 1\) or \(g(-\rho ) \le e^{-\rho ^2/2}\). This gives us

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell } \right] \> \le \> k \exp \left( -\frac{\rho ^2\ell }{2L_1}\right) , \end{aligned}$$
(27)

where we have used \(\lambda _\text {min}(\varvec{\Omega }_1\varvec{\Omega }_1^*) = 1/\Vert \varvec{\Omega }_1^\dagger \Vert _2^2\) assuming \(\mathsf {rank}(\varvec{\Omega }_1) = k\).

With the number of samples as defined in Theorem 3

$$\begin{aligned} \ell \ge 2\rho ^{-2}C_l^2 \log \left( \frac{4k}{\delta }\right) , \end{aligned}$$

the Chernoff bound (27) becomes

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell } \right] \> \le k\left( \frac{\delta }{4k}\right) ^{C_l^2/L_1}. \end{aligned}$$

Define the events

$$\begin{aligned} C = \left\{ \varvec{\Omega }_1 \mid \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{ (1-\rho ) \ell }\right\} , \qquad D = \{ \varvec{\Omega }_1 \mid L_1 < C_\ell ^2\}. \end{aligned}$$

Note that \(\mathbb {P}\left[ \, D^c \right] \le \delta /4\) from (26). Then since the exponent is strictly greater than 1, we have

$$\begin{aligned} \mathbb {P}\left[ \, C \mid D \right] \> \le k\left( \frac{\delta }{4k}\right) ^{C_l^2/L_1} \le \delta /4. \end{aligned}$$

Using the conditioning argument as before gives \(\mathbb {P}\left[ \, C \right] \le \delta /2\).

3. Combining bounds Define the event

$$\begin{aligned} E = \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{(1-\rho )\ell }\right\} , \qquad F = \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_2\Vert _2^2 \ge (1 + C_{\ell ,\delta }C_u^2)\ell \right\} , \end{aligned}$$

where \(C_{\ell ,\delta } \) is defined in Step 1, \(\mathbb {P}\left[ \, E \right] \le \delta /2\) and from Step 2, \(\mathbb {P}\left[ \, F \right] \le \delta /2\). It can be verified that

$$\begin{aligned} \left\{ \varvec{\Omega }\mid \Vert \varvec{\Omega }_2\Vert _2^2 \Vert \varvec{\Omega }_1^\dagger \Vert _2^2\ge \frac{1}{1-\rho }(1 + C_{\ell ,\delta }C_u^2) \right\} \subseteq E \cup F, \end{aligned}$$

and therefore, we can use the union bound

$$\begin{aligned} \mathbb {P}\left[ \, \Vert \varvec{\Omega }_2\Vert _2^2 \Vert \varvec{\Omega }_1^\dagger \Vert _2^2 \ge \frac{1}{1-\rho }(1 + C_{\ell ,\delta }C_u^2) \right] \le \mathbb {P}\left[ \, E \right] + \mathbb {P}\left[ \, F \right] \le \delta . \end{aligned}$$

Plugging in the value of \(C_{\ell ,\delta }\) and \(C_u^2\) from Step 1 gives the desired result. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saibaba, A.K., Alexanderian, A. & Ipsen, I.C.F. Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137, 353–395 (2017). https://doi.org/10.1007/s00211-017-0880-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-017-0880-z

Mathematics Subject Classification