Abstract
We present randomized algorithms for estimating the trace and determinant of Hermitian positive semi-definite matrices. The algorithms are based on subspace iteration, and access the matrix only through matrix vector products. We analyse the error due to randomization, for starting guesses whose elements are Gaussian or Rademacher random variables. The analysis is cleanly separated into a structural (deterministic) part followed by a probabilistic part. Our absolute bounds for the expectation and concentration of the estimators are non-asymptotic and informative even for matrices of low dimension. For the trace estimators, we also present asymptotic bounds on the number of samples (columns of the starting guess) required to achieve a user-specified relative error. Numerical experiments illustrate the performance of the estimators and the tightness of the bounds on low-dimensional matrices, and on a challenging application in uncertainty quantification arising from Bayesian optimal experimental design.
Similar content being viewed by others
Notes
The square matrix \(\mathbf {I}\) denotes the identity, with ones on the diagonal and zeros everywhere else.
The superscript \(\dagger \) denotes the Moore–Penrose inverse.
independent and identically distributed.
References
Akçelik, V., Biros, G., Draganescu, A., Ghattas, O., Hill, J., Van Bloemen Waanders, B.: Dynamic data-driven inversion for terascale simulations: real-time identification of airborne contaminants. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 43–43 (2005)
Alexanderian, A., Gloor, P.J., Ghattas, O.: On Bayesian \(A\)-and \(D\)-optimal experimental designs in infinite dimensions. Bayesian Anal. 11(3), 671–695 (2016)
Alexanderian, A., Petra, N., Stadler, G., Ghattas, O.: A-optimal design of experiments for infinite-dimensional Bayesian linear inverse problems with regularized \(\ell _0\)-sparsification. SIAM J. Sci. Comput. 36(5), A2122–A2148 (2014)
Anitescu, M., Chen, J., Wang, L.: A matrix-free approach for solving the parametric Gaussian process maximum likelihood problem. SIAM J. Sci. Comput. 34(1), A240–A262 (2012)
Atkinson, A.C., Donev, A.N.: Optimum Experimental Designs. Oxford University Press, Oxford (1992)
Avron, H., Toledo, S.: Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM 58(2), Art. 8, 17 (2011)
Bai, Z., Fahey, M., Golub, G.: Some large-scale matrix computation problems. J. Comput. Appl. Math. 74(1), 71–89 (1996)
Bai, Z., Golub, G.H.: Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Ann. Numer. Math. 4(1-4), 29–38 (1997). The heritage of P.L. Chebyshev: a Festschrift in honor of the 70th birthday of T.J. Rivlin
Barry, R.P., Pace, R.K.: Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra Appl. 289(1-3), 41–54 (1999). Linear algebra and statistics (Istanbul, 1997)
Boutsidis, C., Drineas, P., Kambadur, P., Zouzias, A.: A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. arXiv:1503.00374 (2015)
Chaloner, K., Verdinelli, I.: Bayesian experimental design: a review. Stat. Sci. 10(3), 273–304 (1995)
Chen, J., Anitescu, M., Saad, Y.: Computing \(f({A})b\) via least squares polynomial approximations. SIAM J. Sci. Comput. 33(1), 195–222 (2011)
Flath, P.H., Wilcox, L.C., Akçelik, V., Hill, J., van Bloemen Waanders, B., Ghattas, O.: Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM J. Sci. Comput. 33(1), 407–432 (2011)
Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. arXiv:1303.1849 (2013)
Golub, G.H., Von Matt, U.: Generalized cross-validation for large-scale problems. J. Comput. Graph. Stat. 6(1), 1–34 (1997)
Gu, M.: Subspace iteration randomization and singular value problems. SIAM J. Sci. Comput. 37(3), A1139–A1173 (2015)
Haber, E., Horesh, L., Tenorio, L.: Numerical methods for experimental design of large-scale linear ill-posed inverse problems. Inverse Probl. 24(5), 055012–055017 (2008)
Haber, E., Magnant, Z., Lucero, C., Tenorio, L.: Numerical methods for \(A\)-optimal designs with a sparsity constraint for ill-posed inverse problems. Comput. Optim. Appl. 52, 293–314 (2012)
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Han, I., Malioutov, D., Shin, J.: Large-scale log-determinant computation through stochastic Chebyshev expansions. arXiv:1503.06394 (2015)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2002)
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)
Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Commun. Stat. Simul. Comput. 18(3), 1059–1076 (1989)
Ledoux, M.: On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Statist. 1, 63–87 (1995/1997)
Liberty, E., Woolfe, F., Martinsson, P.G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. USA 104(51), 20167–20172 (2007)
Lin, L.: Randomized estimation of spectral densities of large matrices made accurate. Numer. Math. 1–31 (2016). doi:10.1007/s00211-016-0837-7
Mahoney, M.W.: Randomized Algorithms for Matrices and Data. Now Publishers Inc, Hanover (2011)
Martinsson, P.G., Rokhlin, V., Tygert, M.: A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmon. Anal. 30(1), 47–68 (2011)
Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)
Ouellette, D.V.: Schur complements and statistics. Linear Algebra Appl. 36, 187–295 (1981)
Pace, R.K., LeSage, J.P.: Chebyshev approximation of log-determinants of spatial weight matrices. Comput. Stat. Data Anal. 45(2), 179–196 (2004)
Parlett, B.N.: The Symmetric Eigenvalue Problem. Prentice Hall Inc, Englewood Cliffs (1980)
Petra, N., Stadler, G.: Model variational inverse problems governed by partial differential equations. Tech. Rep. 11-05, The Institute for Computational Engineering and Sciences, The University of Texas at Austin (2011)
Roosta-Khorasani, F., Ascher, U.: Improved bounds on sample size for implicit matrix trace estimators. Found. Comput. Math. 15(5), 1187–1212 (2015)
Rudelson, M., Vershynin, R.: Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 62(12), 1707–1739 (2009)
Saad, Y.: Numerical methods for large eigenvalue problems. In: Classics in Applied Mathematics, vol. 66. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2011). Revised edition of the 1992 original
Saibaba, A.K., Kitanidis, P.K.: Fast computation of uncertainty quantification measures in the geostatistical approach to solve inverse problems. Adv. Water Resour. 82, 124–138 (2015)
Sorensen, D.C., Embree, M.: A DEIM induced CUR factorization. SIAM J. Sci. Comput. 38(3), A1454–A1482 (2016)
Stirling, D.S.G.: Mathematical Analysis and Proof, 2nd edn. Horwood Publishing Limited, Chichester (2009)
Tang, J.M., Saad, Y.: A probing method for computing the diagonal of a matrix inverse. Numer. Linear Algebra Appl. 19(3), 485–501 (2012)
Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(1–2), 115–126 (2011)
Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)
Uciński, D.: Optimal Measurement Methods for Distributed Parameter System Identification. CRC Press, Boca Raton (2005)
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y.C., Kutyniok, G. (eds.) Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)
Wahba, G.: Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14(4), 651–667 (1977)
Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1990)
Zhang, Y., Leithead, W.E., Leith, D.J., Walshe, L.: Log-det approximation based on uniformly distributed seeds and its application to Gaussian process regression. J. Comput. Appl. Math. 220(1–2), 198–214 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
The third author acknowledges the support from the XDATA Program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323 FA8750-12-C-0323.
Appendices
Appendix 1: Gaussian random matrices
In this section, we state a lemma on the pseudo-inverse of a rectangular Gaussian random matrix, and use this result to prove both parts of Lemma 4.
1.1 Pseudo-inverse of a Gaussian random matrix
We state a result on the large deviation bound of the pseudo-inverse of a Gaussian random matrix [19, Proposition 10.4].
Lemma 6
Let \(\mathbf {G}\in \mathbb {R}^{k\times (k+p)}\) be a random Gaussian matrix and let \(p\ge 2\). For all \(t \ge 1\),
1.2 Proof of Lemma 4
Proof
From [45, Corollary 5.35] we have
Recall from (3) \(\mu = \sqrt{n-k} + \sqrt{k+p}\). From the law of the unconscious statistician [16, Proposition S4.2],
This concludes the proof for (16).
Next consider (17). Using Lemma 6, we have for \(t > 0\)
As before, we have
Minimizing w.r.t. \(\beta \), we get \(\beta = (D)^{1/(p+1)}\). Substitute this value for \(\beta \) and simplify.
\(\square \)
Appendix 2: Rademacher random matrices
In this section, we state the matrix Chernoff inequalities [43] and other useful concentration inequalities and use these results to prove Theorem 10.
1.1 Useful concentration inequalities
The proof of Theorem 10 relies on the matrix concentration inequalities developed in [43]. We will need the following result [43, Theorem 5.1.1] in what follows.
Theorem 11
(Matrix Chernoff) Let \(\{\mathbf {X}_k\}\) be finite sequence of independent, random, \(d\times d\) Hermitian matrices. Assume that \(0 \le \lambda _{\min }(\mathbf {X}_k)\) and \(\lambda _{\max }(\mathbf {X}_k) \le L\) for each index k. Let us define
and let \(g(x) \equiv e^x(1+x)^{-(1+x)}.\) Then for any \(\epsilon > 0\)
and for any \(0 \le \epsilon <1 \)
The following result was first proved by Ledoux [25] but we reproduce the statement from [42, Proposition 2.1].
Lemma 7
Suppose \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) is a convex function that satisfies the following Lipschitz bound
Let \(\mathbf {z}\in \mathbb {R}^n\) be a random vector with entries drawn from an i.i.d. Rademacher distribution. Then, for all \(t \ge 0\),
Lemma 8
Let \(\mathbf {V}\) be a \(n\times r\) matrix with orthonormal columns and let \(n\ge r\). Let \(\mathbf {z}\) be an \(n\times 1\) vector with entries drawn from an i.i.d. Rademacher distribution. Then, for \(0< \delta < 1\),
Proof
Our proof follows the strategy in [42, Lemma 3.3]. Define the function \(f(\mathbf {x}) = \Vert \mathbf {V}^*\mathbf {x}\Vert _2\). We observe that f satisfies the assumptions of Lemma 7, with Lipschitz constant \(L=1\); the latter follows from
Furthermore, using Hölder’s inequality
Using Lemma 7 with \(t_\delta = \sqrt{8\log \left( 1/\delta \right) }\) we have
\(\square \)
Lemma 9
Let \(X_i\) for \(i=1,\ldots ,n\) be a sequence of i.i.d. random variables. If for each \(i=1,\ldots ,n\), \(\mathbb {P}\left[ \, X_i \ge a \right] \le \xi \) holds, where \(\xi \in (0,1]\), then
Proof
Since \( \mathbb {P}\left[ \, X_i \ge a \right] \le \xi \) then \(\mathbb {P}\left[ \, X_i < a \right] \ge 1 - \xi \). We can bound
The proof follows from Bernoulli’s inequality [40, Theorem 5.1] which states \((1-\xi )^n \ge 1 - n\xi \) for \(\xi \in [0,1]\) and \(n\ge 1\). \(\square \)
1.2 Proof of Theorem 10
Proof
Recall that \(\varvec{\Omega }_1 = \mathbf {U}^*_1 \varvec{\Omega }\) and \(\varvec{\Omega }_2 = \mathbf {U}_2^*\varvec{\Omega }\) where \(\varvec{\Omega }\) is random matrix with entries chosen from an i.i.d. Rademacher distribution. The proof proceeds in three steps.
1. Bound for \(\Vert \varvec{\Omega }_2\Vert _2^2\) The proof uses the matrix Chernoff concentration inequality. Let \(\varvec{\omega }_i \in \mathbb {R}^{n\times 1}\) be the i-th column of \(\varvec{\Omega }\). Note \( \varvec{\Omega }_2\varvec{\Omega }_2^* \in \mathbb {C}^{(n-k)\times (n-k)}\) and
Furthermore, define \(\mu _\text {min} ( \varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {min} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )\) and \(\mu _\text {max} (\varvec{\Omega }_2\varvec{\Omega }_2^*) \equiv \lambda _\text {max} (\mathbb {E}\left[ \, \varvec{\Omega }_2\varvec{\Omega }_2^* \right] )\). Clearly \(\mu _\text {min} = \mu _\text {max} = \ell \). Note that here we have expressed \( \varvec{\Omega }_2\varvec{\Omega }_2^*\) as a finite sum of \(\ell \) rank-1 matrices, each with a single nonzero eigenvalue \(\varvec{\omega }_i^* \mathbf {U}_2\mathbf {U}_2^*\varvec{\omega }_i \). We want to obtain a probabilistic bound for the maximum eigenvalue i.e., \(L_2 = \max _{i=1,\cdots ,\ell } \Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2\). Using Lemma 8 we can write with probability at most \(e^{-t^2/8}\)
Since \(\Vert \mathbf {U}_2^*\varvec{\omega }_i\Vert _2^2\) are i.i.d., applying Lemma 9 gives
Take \(t = \sqrt{ 8\log (4\ell /\delta )}\) to obtain
The matrix \(\varvec{\Omega }_2\) satisfies the conditions of the matrix Chernoff theorem 11; for \(\eta \ge 0\) we have
where the function \(g(\eta )\) is defined in Theorem 11. For \(\eta > 1\) the Chernoff bounds can be simplified [30, Section 4.3] since \(g(\eta ) \le e^{-\eta /3}\), to obtain
Choose the parameter
so that
Finally, we want to find a lower bound for \(\Vert \varvec{\Omega }_2\Vert _2^2\). Define the events
Note that \(\mathbb {P}\left[ \, A^c \right] \le \delta /4\) and under event A we have \(C_u^2 >L_2\) so that
Using the law of total probability
we can obtain a bound for \(\mathbb {P}\left[ \, B \right] \) as
2. Bound for \(\Vert \varvec{\Omega }_1^\dagger \Vert _2^2\) The steps are similar and we again use the matrix Chernoff concentration inequality. Consider \(\varvec{\Omega }_1\varvec{\Omega }_1^*\in \mathbb {C}^{k\times k}\), and as before, write this matrix as the sum of rank-1 matrices to obtain
and \(\mu _\text {min} (\varvec{\Omega }_1\varvec{\Omega }_1^*) = \ell \). Each summand in the above decomposition of \(\varvec{\Omega }_1\varvec{\Omega }_1^*\) has one nonzero eigenvalue \(\varvec{\omega }_i^* \mathbf {U}_1\mathbf {U}_1^*\varvec{\omega }_i \). Following the same strategy as in Step 1, we define \(L_1 \equiv \max _{i=1,\ldots ,\ell }\Vert \mathbf {U}_1^*\varvec{\omega }_i\Vert _2^2 \) and apply Lemma 8 to obtain
Take \(t = \sqrt{ 8\log (4n/\delta )}\) to obtain
A straightforward application of the Chernoff bound in Theorem 11 gives us
Next, observe that \(-\log g(-\rho )\) has the Taylor series expansion in the region \(0 {<} \rho {<} 1\)
so that \(-\log g(-\rho ) \ge \rho ^2/2\) for \(0< \rho < 1\) or \(g(-\rho ) \le e^{-\rho ^2/2}\). This gives us
where we have used \(\lambda _\text {min}(\varvec{\Omega }_1\varvec{\Omega }_1^*) = 1/\Vert \varvec{\Omega }_1^\dagger \Vert _2^2\) assuming \(\mathsf {rank}(\varvec{\Omega }_1) = k\).
With the number of samples as defined in Theorem 3
the Chernoff bound (27) becomes
Define the events
Note that \(\mathbb {P}\left[ \, D^c \right] \le \delta /4\) from (26). Then since the exponent is strictly greater than 1, we have
Using the conditioning argument as before gives \(\mathbb {P}\left[ \, C \right] \le \delta /2\).
3. Combining bounds Define the event
where \(C_{\ell ,\delta } \) is defined in Step 1, \(\mathbb {P}\left[ \, E \right] \le \delta /2\) and from Step 2, \(\mathbb {P}\left[ \, F \right] \le \delta /2\). It can be verified that
and therefore, we can use the union bound
Plugging in the value of \(C_{\ell ,\delta }\) and \(C_u^2\) from Step 1 gives the desired result. \(\square \)
Rights and permissions
About this article
Cite this article
Saibaba, A.K., Alexanderian, A. & Ipsen, I.C.F. Randomized matrix-free trace and log-determinant estimators. Numer. Math. 137, 353–395 (2017). https://doi.org/10.1007/s00211-017-0880-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-017-0880-z