Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
footnotetext: *: Equal contribution. Correspondence to hoppe@mathc.rwth-aachen.de.

Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning

Frederik Hoppe
RWTH Aachen University
hoppe@mathc.rwth-aachen.de
&Claudio Mayrink Verdun
Harvard University
claudioverdun@seas.harvard.edu
\ANDHannah Laus
TU Munich &\&& MCML
hannah.laus@tum.de &Felix Krahmer
TU Munich &\&& MCML
felix.krahmer@tum.de &Holger Rauhut
LMU Munich &\&& MCML
rauhut@math.lmu.de
Abstract

Uncertainty quantification (UQ) is a crucial but challenging task in many high-dimensional regression or learning problems to increase the confidence of a given predictor. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches such as the LASSO as well as to neural networks. One of the most notable UQ techniques is the debiased LASSO, which modifies the LASSO to allow for the construction of asymptotic confidence intervals by decomposing the estimation error into a Gaussian and an asymptotically vanishing bias component. However, in real-world problems with finite-dimensional data, the bias term is often too significant to be neglected, resulting in overly narrow confidence intervals. Our work rigorously addresses this issue and derives a data-driven adjustment that corrects the confidence intervals for a large class of predictors by estimating the means and variances of the bias terms from training data, exploiting high-dimensional concentration phenomena. This gives rise to non-asymptotic confidence intervals, which can help avoid overestimating uncertainty in critical applications such as MRI diagnosis. Importantly, our analysis extends beyond sparse regression to data-driven predictors like neural networks, enhancing the reliability of model-based deep learning. Our findings bridge the gap between established theory and the practical applicability of such debiased methods.

1 Introduction

The past few years have witnessed remarkable advances in high-dimensional statistical models, inverse problems, and learning methods for solving them. In particular, we have seen a surge of new methodologies and algorithms that have revolutionized our ability to extract insights from complex, high-dimensional data [Wainwright, 2019, Giraud, 2021, Wright and Ma, 2022]. Also, the theoretical underpinnings of the techniques in these fields have achieved tremendous success. However, the development of rigorous methods for quantifying uncertainty associated with their estimates, such as constructing confidence intervals for a given solution, has lagged behind, with much of the underlying theory remaining elusive.

In high-dimensional statistics, for example, even for classical regularized estimators such as the LASSO [Chen and Donoho, 1995, Tibshirani, 1996, Hastie et al., 2009], it was shown that a closed-form characterization of the probability distribution of the estimator in simple terms is not possible, e.g., [Knight and Fu, 2000, Theorem 5.1]. This, in turn, implies that it is very challenging to establish rigorous confidence intervals that would quantify the uncertainty of such estimated parameters. To overcome this, a series of papers [Zhang and Zhang, 2014, Javanmard and Montanari, 2014, van de Geer et al., 2014] proposed and analyzed the debiased LASSO, also known as the desparsified LASSO, a procedure to fix the bias introduced by the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT penalty in the LASSO; see [Javanmard and Montanari, 2014, Corollary 11] and [Zhang and Huang, 2008] for a discussion on the bias induced by the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT regularizer. The debiased estimator derived in the aforementioned works has established a principled framework for obtaining sharp confidence intervals for the LASSO, initiating a statistical inference approach with UQ guarantees for high-dimensional regression problems where the number of predictors significantly exceeds the number of observations. Recently, this estimator was also extended in several directions beyond 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-minimization which include, for example, deep unrolled algorithms [Hoppe et al., 2023b, Bellec and Tan, 2024] and it has been applied to fields like magnetic resonance image both with high-dimensional regression techniques as well as learning ones [Hoppe et al., 2023a, Hoppe et al., 2023b]; see the paragraph related works below.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 1: Illustration of the confidence interval correction. Figs. 1(a), 1(b), 1(c) show the construction of CIs with standard debiased techniques (w/o data adjustment) and with our proposed method (w/ Gaussian adjustment - Thm. 3 - in Fig. 1(b) and data adjustment - Thm. 2 - in Fig. 1(c)), respectively. The red points represent the entries that are not captured by the CIs. Additionally, Fig. 1(d) shows box plots of coverage over all components, and Fig. 1(e) shows them on the support. In the last two plots, the left box refers to the asymptotic and the right to the non-asymptotic CI based on Gaussian adjustment of 500500500500 feature vectors. We solve a sparse regression problem y=Ax+ε𝑦𝐴𝑥𝜀y=Ax+\varepsilonitalic_y = italic_A italic_x + italic_ε via the LASSO, where A4000×10000𝐴superscript400010000A\in\mathbb{C}^{4000\times 10000}italic_A ∈ blackboard_C start_POSTSUPERSCRIPT 4000 × 10000 end_POSTSUPERSCRIPT, xN𝑥superscript𝑁x\in\mathbb{C}^{N}italic_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is 200-sparse, and the noise level is 10%absentpercent10\approx 10\%≈ 10 %. The averaged coverage over 250250250250 vectors with significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 of the asymptotic confidence intervals is hW(0.05)=0.9353superscript𝑊0.050.9353h^{W}(0.05)=0.9353italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) = 0.9353 and on the support hSW(0.05)=0.8941subscriptsuperscript𝑊𝑆0.050.8941h^{W}_{S}(0.05)=0.8941italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) = 0.8941. Confidence intervals built with our proposed method yield for Gaussian adjustment hG(0.05)=0.9684superscript𝐺0.050.9684h^{G}(0.05)=0.9684italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) = 0.9684 and on the support hSG(0.05)=0.9421superscriptsubscript𝑆𝐺0.050.9421h_{S}^{G}(0.05)=0.9421italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) = 0.9421, and for data-driven adjustment h(0.05)=hS(0.05)=10.05subscript𝑆0.051h(0.05)=h_{S}(0.05)=1italic_h ( 0.05 ) = italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) = 1. For more details, cf. Section 5.1 and Appendix C.

The idea of the debiased LASSO is that its estimation error, i.e., the difference between the debiased estimator and the ground truth, can be decomposed into a Gaussian and a remainder/bias component. It has been shown in certain cases that the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm of the remainder component vanishes with high probability, assuming an asymptotic setting, i.e., when the dimensions of the problem grow. In this case, the estimator is proven to be approximately Gaussian from which the confidence intervals are derived. However, in practice, one needs to be in a very high-dimensional regime with enough data for these assumptions to kick in. In many applications with a finite set of observations, the remainder term does not vanish; it can rather be substantially large, and the confidence intervals constructed solely based on the Gaussian component fail to account for the entire estimation error. Consequently, the derived confidence intervals are narrower, resulting in an overestimation of certainty. This issue is particularly problematic in applications where it is crucial to estimate the magnitude of a vector coefficient with a high degree of confidence, such as in medical imaging applications.

Moreover, according to the standard theory of debiased estimators, the estimation of how small the remainer term is depends on how well one can quantify the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bounds for the corresponding biased estimator, e.g., the LASSO [van de Geer et al., 2014, Javanmard and Montanari, 2018]. Although sharp oracle inequalities exist for such classical regression estimators, cf. related works, the same cannot be said about when unrolled algorithms are employed. For the latter, generalization bounds are usually not sharp or do not exist.

In this paper, we tackle the challenge of constructing valid confidence intervals around debiased estimators used in high-dimensional regression. The key difficulty lies in accounting for the remainder term in the estimation error decomposition, which hinders the development of finite-sample confidence intervals. We propose a novel non-asymptotic theory that explicitly characterizes the remainder term, enabling us to construct reliable confidence intervals in the finite-sample regime. Furthermore, we extend our framework to quantify uncertainty for model-based neural networks used for solving inverse problems, which paves the way to a rigorous theory of data-driven UQ for modern deep learning techniques. We state an informal version of our main result, discussed in detail in Section 3.

Theorem 1 (Informal Version).

Let x(1),,x(l)Nsuperscript𝑥1superscript𝑥𝑙superscript𝑁x^{(1)},\ldots,x^{(l)}\in\mathbb{C}^{N}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be i.i.d. data. Let b(i)=Ax(i)+ε(i)superscript𝑏𝑖𝐴superscript𝑥𝑖superscript𝜀𝑖b^{(i)}=Ax^{(i)}+\varepsilon^{(i)}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_A italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT be a high-dimensional regression model with noise ε(i)𝒞𝒩(0,σ2IN×N)similar-tosuperscript𝜀𝑖𝒞𝒩0superscript𝜎2subscript𝐼𝑁𝑁\varepsilon^{(i)}\sim\mathcal{CN}(0,\sigma^{2}I_{N\times N})italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ). With the data, derive, for a significance level α𝛼\alphaitalic_α, a confidence radius rj(α)subscript𝑟𝑗𝛼r_{j}(\alpha)italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) for a new sample’s component xj(l+1)subscriptsuperscript𝑥𝑙1𝑗x^{(l+1)}_{j}italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Let (x^u)j(l+1)subscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗(\hat{x}^{u})^{(l+1)}_{j}( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT be the debiased estimator based on a (learned) high-dimensional regression estimator x^j(i)subscriptsuperscript^𝑥𝑖𝑗\hat{x}^{(i)}_{j}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then, it holds that

(|(x^u)j(l+1)xj(l+1)|rj(α))1α.subscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗subscriptsuperscript𝑥𝑙1𝑗subscript𝑟𝑗𝛼1𝛼\mathbb{P}\left(\left|(\hat{x}^{u})^{(l+1)}_{j}-x^{(l+1)}_{j}\right|\leq r_{j}% (\alpha)\right)\geq 1-\alpha.blackboard_P ( | ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) ≥ 1 - italic_α .

Theorem 1 has far-reaching implications that transcend the classical regularized high-dimensional regression setting. For example, it enables the establishment of rigorous confidence intervals for learning algorithms such as unrolled networks [Monga et al., 2021]. To our knowledge, obtaining rigorous UQ results for neural networks without relying on non-scalable Monte Carlo methods remains a challenging problem [Gawlikowski et al., 2023]. To address this and quantify uncertainty, our approach combines model-based prior knowledge with data-driven statistical techniques. The model-based component harnesses the Gaussian distribution of the noise to quantify the uncertainty arising from the noisy data itself. We note that the Gaussian assumption for the noise is not a limitation, and extensions to non-Gaussian distributions are also possible, as clarified by [van de Geer et al., 2014]. We make a Gaussian noise assumption here for the sake of clarity. Complementing this, the data-driven component is imperative for quantifying the uncertainty inherent in the estimator’s performance. Moreover, our approach does not require any assumptions regarding the convergence or quality properties of the estimator. This flexibility enables the debiased method to apply to a wide range of estimators.

Contributions. The key contributions in this work are threefold 111 The code for our findings is available on GitHub : https://github.com/frederikhoppe/UQ_high_dim_learning

  1. 1.

    We solve the problem illustrated in Fig. 1 by developing a non-asymptotic theory for constructing confidence intervals around the debiased LASSO estimator. Unlike existing approaches that rely on asymptotic arguments and ignore the remainder term, our finite-sample analysis explicitly accounts for the remainder, clarifying an important theoretical gap and providing rigorous guarantees without appealing to asymptotic regimes.

  2. 2.

    We establish a general framework that extends the debiasing techniques to model-based deep learning approaches for high-dimensional regression. Our results enable the principled measurement of uncertainty for estimators learned by neural networks, a capability crucial for reliable decision-making in safety-critical applications. We test our approach with state-of-the-art unrolled networks such as the It-Net [Genzel et al., 2022a].

  3. 3.

    For real-world medical imaging tasks, we demonstrate that the remainder term in the debiased LASSO estimation error can be accurately modeled as a Gaussian distribution. Leveraging this finding, we derive Gaussian adjusted CIs that provide sharper uncertainty estimates than previous methods, enhancing the practical utility of debiased estimators in high-stakes medical domains.

2 Background and Problem Formulation

In numerous real-world applications, we encounter high-dimensional regression problems where the number of features far exceeds the number of observations. This scenario, known as high-dimensional regression, arises when we aim to estimate N𝑁Nitalic_N features, described by x0Nsuperscript𝑥0superscript𝑁x^{0}\in\mathbb{C}^{N}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT from only a few m𝑚mitalic_m target measurements bm𝑏superscript𝑚b\in\mathbb{C}^{m}italic_b ∈ blackboard_C start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, where mNmuch-less-than𝑚𝑁m\ll Nitalic_m ≪ italic_N. Mathematically, this can be expressed as a linear model b=Ax0+ε𝑏𝐴superscript𝑥0𝜀b=Ax^{0}+\varepsilonitalic_b = italic_A italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_ε, where Am×N𝐴superscript𝑚𝑁A\in\mathbb{C}^{m\times N}italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_N end_POSTSUPERSCRIPT is the measurement matrix and ε𝒞𝒩(0,σ2IN×N)similar-to𝜀𝒞𝒩0superscript𝜎2subscript𝐼𝑁𝑁\varepsilon\sim\mathcal{CN}(0,\sigma^{2}I_{N\times N})italic_ε ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) is additive Gaussian noise with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In the presence of sparsity, where the feature vector x0superscript𝑥0x^{0}italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT has only s𝑠sitalic_s non-zero entries (sNmuch-less-than𝑠𝑁s\ll Nitalic_s ≪ italic_N), a popular approach is to solve the LASSO, which gives an estimator x^^𝑥\hat{x}over^ start_ARG italic_x end_ARG obtained by solving the following 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized optimization problem:

minxN12mAxb22+λx1.subscript𝑥superscript𝑁12𝑚superscriptsubscriptnorm𝐴𝑥𝑏22𝜆subscriptnorm𝑥1\min\limits_{x\in\mathbb{C}^{N}}\frac{1}{2m}\|Ax-b\|_{2}^{2}+\lambda\|x\|_{1}.roman_min start_POSTSUBSCRIPT italic_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_m end_ARG ∥ italic_A italic_x - italic_b ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (1)

However, the LASSO estimator is known to exhibit a systematic bias, and its distribution is intractable, posing challenges for uncertainty quantification [Knight and Fu, 2000]. To address this limitation, debiasing techniques have been developed in recent years [Zhang and Zhang, 2014, Javanmard and Montanari, 2014, van de Geer et al., 2014]. The debiased LASSO estimator, x^usuperscript^𝑥𝑢\hat{x}^{u}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT, is defined as:

x^u=x^+1mMA(Ax^b),superscript^𝑥𝑢^𝑥1𝑚𝑀superscript𝐴𝐴^𝑥𝑏\hat{x}^{u}=\hat{x}+\frac{1}{m}MA^{*}(A\hat{x}-b),over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = over^ start_ARG italic_x end_ARG + divide start_ARG 1 end_ARG start_ARG italic_m end_ARG italic_M italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_A over^ start_ARG italic_x end_ARG - italic_b ) , (2)

where M𝑀Mitalic_M is a correction matrix that could be chosen such that maxi,j{1,,N}|(MΣ^IN×N)ij|subscript𝑖𝑗1𝑁subscript𝑀^Σsubscript𝐼𝑁𝑁𝑖𝑗\max_{i,j\in\{1,\ldots,N\}}|(M\hat{\Sigma}-I_{N\times N})_{ij}|roman_max start_POSTSUBSCRIPT italic_i , italic_j ∈ { 1 , … , italic_N } end_POSTSUBSCRIPT | ( italic_M over^ start_ARG roman_Σ end_ARG - italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | is small. Here, Σ^=AAm^Σsuperscript𝐴𝐴𝑚\hat{\Sigma}=\frac{A^{*}A}{m}over^ start_ARG roman_Σ end_ARG = divide start_ARG italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A end_ARG start_ARG italic_m end_ARG. We refer to [Javanmard and Montanari, 2018] for a more detailed description of how to choose M𝑀Mitalic_M. Remarkably, the estimation error

x^ux0=MAε/m=:W+(MΣ^IN×N)(x0x^)=:R,superscript^𝑥𝑢superscript𝑥0subscript𝑀superscript𝐴𝜀𝑚:absent𝑊subscript𝑀^Σsubscript𝐼𝑁𝑁superscript𝑥0^𝑥:absent𝑅\hat{x}^{u}-x^{0}=\underbrace{MA^{*}\varepsilon/m}_{=:W}+\underbrace{(M\hat{% \Sigma}-I_{N\times N})(x^{0}-\hat{x})}_{=:R},over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = under⏟ start_ARG italic_M italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ε / italic_m end_ARG start_POSTSUBSCRIPT = : italic_W end_POSTSUBSCRIPT + under⏟ start_ARG ( italic_M over^ start_ARG roman_Σ end_ARG - italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) ( italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - over^ start_ARG italic_x end_ARG ) end_ARG start_POSTSUBSCRIPT = : italic_R end_POSTSUBSCRIPT , (3)

can be decomposed into a Gaussian component W𝒞𝒩(0,σ2mΣ^)similar-to𝑊𝒞𝒩0superscript𝜎2𝑚^ΣW\sim\mathcal{CN}(0,\frac{\sigma^{2}}{m}\hat{\Sigma})italic_W ∼ caligraphic_C caligraphic_N ( 0 , divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG over^ start_ARG roman_Σ end_ARG ) and a remainder term R𝑅Ritalic_R that vanishes asymptotically with high probability [Javanmard and Montanari, 2018, Theorem 3.8], assuming a Gaussian measurement matrix A𝐴Aitalic_A. Such a result was extended to matrices associated to a bounded orthonormal system like a subsampled Fourier matrix, allowing for extending the debiased LASSO to MRI [Hoppe et al., 2022]. The decomposition (3) and the asymptotic behavior of R𝑅Ritalic_R enable the construction of asymptotically valid CIs for the debiased LASSO estimate, providing principled UQ for high-dimensional sparse regression problems.

However, in real-world applications involving finite data regimes, the remainder term can be significant, rendering the asymptotic confidence intervals imprecise or even misleading, as illustrated in Fig. 1. This issue is particularly pronounced in high-stakes domains like medical imaging, where reliable UQ is crucial for accurate diagnosis and treatment planning. Second, the debiasing techniques have thus far been restricted to estimators whose error is well quantifiable, leaving the challenge of how they would behave for deep learning architectures open. In such cases, the behavior of the remainder term is largely unknown, precluding the direct application of existing debiasing methods and hindering the deployment of these methods in risk-sensitive applications.

A prominent example for solving the LASSO problem in (1) with an unrolled algorithm is the ISTA [Daubechies et al., 2004, Beck and Teboulle, 2009]:

xk+1=𝒮λ((IN×N1μATA)xk+1μATb),k0.formulae-sequencesuperscript𝑥𝑘1subscript𝒮𝜆subscript𝐼𝑁𝑁1𝜇superscript𝐴𝑇𝐴superscript𝑥𝑘1𝜇superscript𝐴𝑇𝑏𝑘0x^{k+1}=\mathcal{S}_{\lambda}\left((I_{N\times N}-\frac{1}{\mu}A^{T}A)x^{k}+% \frac{1}{\mu}A^{T}b\right),\qquad k\geq 0.italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( ( italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A ) italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_μ end_ARG italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_b ) , italic_k ≥ 0 .

Here, μ>0𝜇0\mu>0italic_μ > 0 is a step-size parameter, and 𝒮λ(x)subscript𝒮𝜆𝑥\mathcal{S}_{\lambda}(x)caligraphic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) is the soft-thresholding operator. The work [Gregor and LeCun, 2010] interpreted each ISTA iteration as a layer of a recurrent neural network (RNN). The Learned ISTA (LISTA) approach learns the parameters W1k,W2k,λksuperscriptsubscript𝑊1𝑘superscriptsubscript𝑊2𝑘superscript𝜆𝑘W_{1}^{k},W_{2}^{k},\lambda^{k}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT instead of using the fixed ISTA updates:

xk+1=𝒮λk(W2kxk+W1kb).superscript𝑥𝑘1subscript𝒮superscript𝜆𝑘superscriptsubscript𝑊2𝑘superscript𝑥𝑘superscriptsubscript𝑊1𝑘𝑏x^{k+1}=\mathcal{S}_{\lambda^{k}}(W_{2}^{k}x^{k}+W_{1}^{k}b).italic_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_b ) .

In this formulation, LISTA unrolls K𝐾Kitalic_K iterations into K𝐾Kitalic_K layers, with learnable parameters (Wk,λk)superscript𝑊𝑘superscript𝜆𝑘(W^{k},\lambda^{k})( italic_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) per layer. The parameters are learned by minimizing the reconstruction error minλ,W1li=1lxik(λ,W,b(i),x(i))x(i)22subscript𝜆𝑊1𝑙superscriptsubscript𝑖1𝑙superscriptsubscriptdelimited-∥∥superscriptsubscript𝑥𝑖𝑘𝜆𝑊superscript𝑏𝑖superscript𝑥𝑖superscript𝑥𝑖22\min_{\lambda,W}\frac{1}{l}\sum_{i=1}^{l}\lVert x_{i}^{k}(\lambda,W,b^{(i)},x^% {(i)})-x^{(i)}\rVert_{2}^{2}roman_min start_POSTSUBSCRIPT italic_λ , italic_W end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_l end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_λ , italic_W , italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT on training data (x(i),b(i))superscript𝑥𝑖superscript𝑏𝑖(x^{(i)},b^{(i)})( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ). Unrolled neural networks like LISTA have shown promise as model-based deep learning solutions for inverse problems, leveraging domain knowledge for improved performance. Such iterative end-to-end network schemes provide state-of-the-art reconstructions for inverse problems [Genzel et al., 2022a]. Recently, the work [Hoppe et al., 2023b] proposes a framework based on the debiasing step to derive confidence intervals specifically for the unrolled LISTA estimator. However, similar to the previously mentioned debiased LASSO literature, it only handles the asymptotic setting.

Related Works. High-dimensional regression. High-dimensional regression and sparse recovery is now a well-established theory, see [Foucart and Rauhut, 2013, Wainwright, 2019, Wright and Ma, 2022]. In this context, several extensions of the LASSO have been proposed such as the elastic net [Zou and Hastie, 2005], the group LASSO [Yuan and Lin, 2005], the LASSO with a nuclear norm penalization [Koltchinskii et al., 2011], the Sorted L-One Penalized Estimation (SLOPE) [Bogdan et al., 2015] which adapts the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm to control the false discovery rate. In addition to convex penalty functions, concave penalties have been explored to address some limitations of the LASSO, e.g., the Smoothly Clipped Absolute Deviation (SCAD) penalty [Fan and Li, 2001] and the Minimax Concave Penalty (MCP) [Zhang, 2010]. Non-convex variants of the LASSO for psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm (p<1𝑝1p<1italic_p < 1) minimization were also studied [Zheng et al., 2017, Rakotomamonjy et al., 2022] as well as noise-bling variants such as the square-root LASSO [Belloni et al., 2011, Mayrink Verdun et al., 2024]. Scalable and fast algorithms for solving the LASSO and its variants include semi-smooth Newton methods [Li et al., 2018] and IRLS [Kümmerle et al., 2021].

LASSO theory. Several works have established oracle inequalities for the LASSO [Bunea et al., 2007, Koltchinskii, 2009, Ye and Zhang, 2010, Raskutti et al., 2011, Dalalyan et al., 2017]. Another key theoretical result is the consistency of the LASSO in terms of variable selection. [Zhao and Yu, 2006] and [Wainwright, 2009] established the consistency of the LASSO while [Foucart et al., 2023] analyzed the sparsity behavior of the LASSO when the design matrices satisfy the Restricted Isometry Property.

Debiased estimators. After the first papers about the debiased LASSO [Zhang and Zhang, 2014, van de Geer et al., 2014, Javanmard and Montanari, 2014], some works have focused on improving its finite-sample performance and computational efficiency [Javanmard and Montanari, 2018, Li, 2020]. The size of the confidence intervals derived for the debiased LASSO has been proven to be sharp in the minimax sense [Cai and Guo, 2017]. Debiased estimators have been extended in several directions, e.g., [Li, 2020, Hoppe et al., 2022, Guo et al., 2022, Bellec and Zhang, 2022]. Recently, [Bellec and Zhang, 2023] established asymptotic normality results for a debiased estimator of convex regularizers beyond the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm. In the context of MR images, [Hoppe et al., 2024] explored a debiased estimator for inverse problems with a total variation regularizer. Debiased estimators have also been recently extended to unrolled estimators – see discussion in the next paragraph – in [Bellec and Tan, 2024, Hoppe et al., 2023b].

Algorithm unrolling and model-based deep learning for inverse problems. The idea of unfolding the iterative steps of classical algorithms into a deep neural network architecture dates back to [Gregor and LeCun, 2010], which proposed the Learned ISTA (LISTA) to fast approximate the solution of sparse coding problems. Several works have extended and improved upon the original LISTA framework [Ito et al., 2019, Wu et al., 2019, Liu and Chen, 2019, Chen et al., 2021, Aberdam et al., 2021, Zheng et al., 2022]. [Adler and Öktem, 2018] proposed the Learned Primal-Dual algorithm, unrolling the primal-dual hybrid gradient method for tomographic reconstruction. [Schlemper et al., 2017] proposed the Deep Cascade of Convolutional Neural Networks (DC-CNN) for dynamic MRI reconstruction. [Cherkaoui et al., 2020] unfolded proximal gradient descent solvers to learn their parameters for 1D TV regularized problems. [Monga et al., 2021] introduced a general framework for algorithm unrolling. [Aggarwal et al., 2018] developed MoDL, a model-based deep learning approach for MRI reconstruction that unrolls the ADMM algorithm. [Liu et al., 2019] proposed a proximal alternating direction network (PADNet) to unroll nonconvex optimization. See also the surveys for more information about unrolling and also the connection with physics-inspired methods [Zhang et al., 2023, Arridge et al., 2019]. [Genzel et al., 2022a, Genzel et al., 2022b] developed the It-Net, an unrolled proximal gradient descent scheme where the proximal operator is replaced by a U-Net. This scheme won the AAPM Challenge 2021 [Sidky and Pan, 2022] whose goal was to identify the state-of-the-art in solving the CT inverse problem with data-driven techniques. A generalization of the previous paradigm is the learning to optimize framework that develops an optimization method by training, i.e., learning from its performance on sample problem [Li and Malik, 2016, Chen et al., 2022].

Uncertainty Quantification. There have been a few attempts to quantify uncertainty on a pixel level for unrolled networks, e.g., [Ekmekci and Cetin, 2022]. However, such approaches are based on Bayesian networks and MC dropout [Gal and Ghahramani, 2016], which requires significant inference time paired with a loss of reconstruction performance since the dropout for UQ is a strong regularizer in the neural network. Unlike prior work, our contribution focuses on a scalable data-driven method that is easily implementable in the data reconstruction pipeline.

3 Data-Driven Confidence Intervals

We now introduce our data-driven approach to correct the CIs. Instead of deriving asymptotic CIs from the decomposition x^ux0=W+Rsuperscript^𝑥𝑢superscript𝑥0𝑊𝑅\hat{x}^{u}-x^{0}=W+Rover^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = italic_W + italic_R, by assuming that R𝑅Ritalic_R asymptotically vanishes, we utilize data (b(i),x(i))i=1lsuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖1𝑙\left(b^{(i)},x^{(i)}\right)_{i=1}^{l}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT along with concentration techniques to estimate the size of the bias component R𝑅Ritalic_R. We continue to leverage the prior knowledge of the Gaussian component W𝑊Witalic_W while extending the CIs’ applicability to a broad class of estimators, including neural networks. Our method is summarized in Algorithm 1, where the data is used to estimate the radii of the CIs, and in Algorithm 2, which constructs the estimator around which the CIs are centered. The following main result proves the validity of our method.

Theorem 2.

Let x(1),,x(l)Nsuperscript𝑥1superscript𝑥𝑙superscript𝑁x^{(1)},\ldots,x^{(l)}\in\mathbb{C}^{N}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be i.i.d. complex random variables representing ground truth data drawn from an unknown distribution \mathbb{Q}blackboard_Q. Suppose, that ε(i)𝒞𝒩(0,σ2Im×m)similar-tosuperscript𝜀𝑖𝒞𝒩0superscript𝜎2subscript𝐼𝑚𝑚\varepsilon^{(i)}\sim\mathcal{CN}(0,\sigma^{2}I_{m\times m})italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT ) is noise in the high-dimensional models b(i)=Ax(i)+ε(i)superscript𝑏𝑖𝐴superscript𝑥𝑖superscript𝜀𝑖b^{(i)}=Ax^{(i)}+\varepsilon^{(i)}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_A italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, where Am×N𝐴superscript𝑚𝑁A\in\mathbb{C}^{m\times N}italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_N end_POSTSUPERSCRIPT, and independent of the x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT’s. Let X^:mN:^𝑋superscript𝑚superscript𝑁\hat{X}:\mathbb{C}^{m}\to\mathbb{C}^{N}over^ start_ARG italic_X end_ARG : blackboard_C start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be a (learned) estimation function that maps the data b(i)superscript𝑏𝑖b^{(i)}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT to x^(i)superscript^𝑥𝑖\hat{x}^{(i)}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, which is an estimate for x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. Set |Rj(i)|=|ejT(MΣ^IN×N)(x^(i)x(i))|subscriptsuperscript𝑅𝑖𝑗superscriptsubscript𝑒𝑗𝑇𝑀^Σsubscript𝐼𝑁𝑁superscript^𝑥𝑖superscript𝑥𝑖|R^{(i)}_{j}|=|e_{j}^{T}(M\hat{\Sigma}-I_{N\times N})(\hat{x}^{(i)}-x^{(i)})|| italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | = | italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG - italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) | for fixed A𝐴Aitalic_A and M𝑀Mitalic_M. For j=1,,N𝑗1𝑁j=1,\ldots,Nitalic_j = 1 , … , italic_N, we denote the true but unknown mean with μj=𝔼[|Rj(1)|]subscript𝜇𝑗𝔼delimited-[]subscriptsuperscript𝑅1𝑗\mu_{j}=\mathbb{E}[|R^{(1)}_{j}|]italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = blackboard_E [ | italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ] and the unknown variance with (σR2)j:=𝔼[(|Rj(1)|μj)2]assignsubscriptsuperscriptsubscript𝜎𝑅2𝑗𝔼delimited-[]superscriptsubscriptsuperscript𝑅1𝑗subscript𝜇𝑗2(\sigma_{R}^{2})_{j}:=\mathbb{E}[(|R^{(1)}_{j}|-\mu_{j})^{2}]( italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := blackboard_E [ ( | italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ], respectively. Let S^j=1li=1l|Rj(i)|subscript^𝑆𝑗1𝑙superscriptsubscript𝑖1𝑙subscriptsuperscript𝑅𝑖𝑗\hat{S}_{j}=\frac{1}{l}\sum\limits_{i=1}^{l}|\ R^{(i)}_{j}|over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_l end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT | italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | be the unbiased sample mean estimator and (σ^R2)j=1l1i=1l(|Rj(i)|S^j)2subscriptsuperscriptsubscript^𝜎𝑅2𝑗1𝑙1superscriptsubscript𝑖1𝑙superscriptsubscriptsuperscript𝑅𝑖𝑗subscript^𝑆𝑗2(\hat{\sigma}_{R}^{2})_{j}=\frac{1}{l-1}\sum\limits_{i=1}^{l}(|R^{(i)}_{j}|-% \hat{S}_{j})^{2}( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_l - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( | italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT the unbiased variance estimator. Let α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) and γ(0,11lα)𝛾011𝑙𝛼\gamma\in\left(0,1-\frac{1}{l\alpha}\right)italic_γ ∈ ( 0 , 1 - divide start_ARG 1 end_ARG start_ARG italic_l italic_α end_ARG ). Furthermore, set the confidence regions for the sample x(l+1)similar-tosuperscript𝑥𝑙1x^{(l+1)}\sim\mathbb{Q}italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT ∼ blackboard_Q in the model b(l+1)=Ax(l+1)+ε(l+1)superscript𝑏𝑙1𝐴superscript𝑥𝑙1superscript𝜀𝑙1b^{(l+1)}=Ax^{(l+1)}+\varepsilon^{(l+1)}italic_b start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_A italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT as Cj(α)={z:|(x^u)j(l+1)z|rj(α)}subscript𝐶𝑗𝛼conditional-set𝑧subscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗𝑧subscript𝑟𝑗𝛼C_{j}(\alpha)=\{z\in\mathbb{C}:|(\hat{x}^{u})^{(l+1)}_{j}-z|\leq r_{j}(\alpha)\}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = { italic_z ∈ blackboard_C : | ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_z | ≤ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) } with radius

rj(α)=σ(MΣ^M)jj1/2mlog(1γjα)+cl(α)(σ^R)j+S^j,cl(α):=l21l2(1γj)αl.formulae-sequencesubscript𝑟𝑗𝛼𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚1subscript𝛾𝑗𝛼subscript𝑐𝑙𝛼subscriptsubscript^𝜎𝑅𝑗subscript^𝑆𝑗assignsubscript𝑐𝑙𝛼superscript𝑙21superscript𝑙21subscript𝛾𝑗𝛼𝑙r_{j}(\alpha)=\frac{\sigma(M\hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}}\sqrt{\log% \left(\frac{1}{\gamma_{j}\alpha}\right)}+c_{l}\left(\alpha\right)\cdot(\hat{% \sigma}_{R})_{j}+\hat{S}_{j},\qquad c_{l}(\alpha):=\sqrt{\frac{l^{2}-1}{l^{2}(% 1-\gamma_{j})\alpha-l}}.italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α end_ARG ) end_ARG + italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) ⋅ ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) := square-root start_ARG divide start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - italic_l end_ARG end_ARG . (4)

Then, it holds that

(xj(l+1)Cj(α))1α.subscriptsuperscript𝑥𝑙1𝑗subscript𝐶𝑗𝛼1𝛼\mathbb{P}\left(x^{(l+1)}_{j}\in C_{j}(\alpha)\right)\geq 1-\alpha.blackboard_P ( italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) ≥ 1 - italic_α . (5)

Theorem 2 achieves conservative confidence intervals that are proven to be valid, i.e., are proven to contain the true parameter with a probability of 1α1𝛼1-\alpha1 - italic_α. Its main advantage is that there are no assumptions on the distribution \mathbb{Q}blackboard_Q (except that σR2superscriptsubscript𝜎𝑅2\sigma_{R}^{2}italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT exists), making it widely applicable. Hence, Theorem 2 includes the worst-case distribution showing a way to quantify uncertainty even in such an ill-posed setting. Especially in medical imaging, such certainty guarantees are crucial for accurate diagnosis. The proof exploits the Gaussianity of the component W𝑊Witalic_W as well as an empirical version of Chebyshev’s inequality, which is tight when there is no information on the underlying distribution. The detailed proof can be found in Appendix B. For a thorough discussion on Theorem 2 including practical simplifications, we refer to Appendix A.

More certainty comes with the price of larger confidence intervals. If there is additional information on the distribution of R𝑅Ritalic_R, like the ability to be approximated by a Gaussian distribution, then the confidence intervals become tighter. This case, which includes relevant settings such as MRI, is discussed in Section 4.

Algorithm 1 Estimation of Confidence Radius
1:Input: Estimation function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG, dictionary matrix A𝐴Aitalic_A, correction matrix M𝑀Mitalic_M, data points (b(i),x(i))i=1lsuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖1𝑙\left(b^{(i)},x^{(i)}\right)_{i=1}^{l}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, significance level α𝛼\alphaitalic_α
2:for i=1,,l𝑖1𝑙i=1,\ldots,litalic_i = 1 , … , italic_l do
3:     Compute x^(i),R(i)Nsuperscript^𝑥𝑖superscript𝑅𝑖superscript𝑁\hat{x}^{(i)},R^{(i)}\in\mathbb{C}^{N}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT via x^(i)=X^(b(i))superscript^𝑥𝑖^𝑋superscript𝑏𝑖\hat{x}^{(i)}=\hat{X}(b^{(i)})over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = over^ start_ARG italic_X end_ARG ( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) and Rj(i)=ejT(MΣ^IN×N)(x^(i)x(i))subscriptsuperscript𝑅𝑖𝑗superscriptsubscript𝑒𝑗𝑇𝑀^Σsubscript𝐼𝑁𝑁superscript^𝑥𝑖superscript𝑥𝑖R^{(i)}_{j}=e_{j}^{T}(M\hat{\Sigma}-I_{N\times N})(\hat{x}^{(i)}-x^{(i)})italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG - italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT ) ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ).
4:end for
5:for j=1,,N𝑗1𝑁j=1,\ldots,Nitalic_j = 1 , … , italic_N do
6:     Estimate S^j=1li=1l|Rj(i)|subscript^𝑆𝑗1𝑙superscriptsubscript𝑖1𝑙subscriptsuperscript𝑅𝑖𝑗\hat{S}_{j}=\frac{1}{l}\sum\limits_{i=1}^{l}|R^{(i)}_{j}|over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_l end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT | italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | and (σ^R2)j=1l1i=1l(|Rj(i)|S^j)2subscriptsuperscriptsubscript^𝜎𝑅2𝑗1𝑙1superscriptsubscript𝑖1𝑙superscriptsubscriptsuperscript𝑅𝑖𝑗subscript^𝑆𝑗2(\hat{\sigma}_{R}^{2})_{j}=\frac{1}{l-1}\sum\limits_{i=1}^{l}(|R^{(i)}_{j}|-% \hat{S}_{j})^{2}( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_l - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( | italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
7:     Solve rj(α)=minγ(0,11lα)σ(MΣ^M)jj1/2mlog(1γα)+cl((1γ)α)(σ^R)j+S^jsubscript𝑟𝑗𝛼subscript𝛾011𝑙𝛼𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚1𝛾𝛼subscript𝑐𝑙1𝛾𝛼subscriptsubscript^𝜎𝑅𝑗subscript^𝑆𝑗r_{j}(\alpha)=\min\limits_{\gamma\in\left(0,1-\frac{1}{l\alpha}\right)}\frac{% \sigma(M\hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}}\sqrt{\log\left(\frac{1}{% \gamma\alpha}\right)}+c_{l}\left((1-\gamma)\alpha\right)\cdot(\hat{\sigma}_{R}% )_{j}+\hat{S}_{j}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = roman_min start_POSTSUBSCRIPT italic_γ ∈ ( 0 , 1 - divide start_ARG 1 end_ARG start_ARG italic_l italic_α end_ARG ) end_POSTSUBSCRIPT divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ italic_α end_ARG ) end_ARG + italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( ( 1 - italic_γ ) italic_α ) ⋅ ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
8:end for
9:Output: Radii of confidence regions (rj(α))j=1Nsuperscriptsubscriptsubscript𝑟𝑗𝛼𝑗1𝑁\left(r_{j}(\alpha)\right)_{j=1}^{N}( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT
Algorithm 2 Construction of Confidence Regions
1:Input: Estimation function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG, dictionary matrix A𝐴Aitalic_A, correction matrix M𝑀Mitalic_M, measurement b𝑏bitalic_b, radii (rj(α))j=1Nsuperscriptsubscriptsubscript𝑟𝑗𝛼𝑗1𝑁\left(r_{j}(\alpha)\right)_{j=1}^{N}( italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT (derived from Algorithm 1 or Theorem 3)
2:Compute estimator x^=X^(b)^𝑥^𝑋𝑏\hat{x}=\hat{X}(b)over^ start_ARG italic_x end_ARG = over^ start_ARG italic_X end_ARG ( italic_b ).
3:Construct debiased estimator via x^u=x^+1mMA(bAx^)superscript^𝑥𝑢^𝑥1𝑚𝑀superscript𝐴𝑏𝐴^𝑥\hat{x}^{u}=\hat{x}+\frac{1}{m}MA^{*}(b-A\hat{x})over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = over^ start_ARG italic_x end_ARG + divide start_ARG 1 end_ARG start_ARG italic_m end_ARG italic_M italic_A start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_b - italic_A over^ start_ARG italic_x end_ARG )
4:for j=1,,N𝑗1𝑁j=1,\ldots,Nitalic_j = 1 , … , italic_N do
5:     Construct confidence region Cj(α)={z|x^juz|rj(α)}subscript𝐶𝑗𝛼conditional-set𝑧subscriptsuperscript^𝑥𝑢𝑗𝑧subscript𝑟𝑗𝛼C_{j}(\alpha)=\{z\in\mathbb{C}\mid|\hat{x}^{u}_{j}-z|\leq r_{j}(\alpha)\}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = { italic_z ∈ blackboard_C ∣ | over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_z | ≤ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) }
6:end for
7:Output: Debiased estimator xusuperscript𝑥𝑢x^{u}italic_x start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT and confidence regions (C1(α))j=1Nsuperscriptsubscriptsubscript𝐶1𝛼𝑗1𝑁\left(C_{1}(\alpha)\right)_{j=1}^{N}( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_α ) ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT

4 Confidence Intervals for Gaussian Remainders

Valid confidence intervals can be derived most straightforwardly when the distribution of the remainder term is known and easily characterized. In such cases, more informative distributional assumptions lead to potentially tighter confidence intervals compared to Theorem 2, which makes no assumptions about the remainder component. In this section, we derive non-asymptotic confidence intervals assuming the remainder term to be approximated by a Gaussian distribution.

Theorem 3.

Let x^uNsuperscript^𝑥𝑢superscript𝑁\hat{x}^{u}\in\mathbb{C}^{N}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT be a debiased estimator for xN𝑥superscript𝑁x\in\mathbb{C}^{N}italic_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT with a remainder term R𝒞𝒩(0,ΣR/m)similar-to𝑅𝒞𝒩0subscriptΣ𝑅𝑚R\sim\mathcal{CN}(0,\Sigma_{R}/m)italic_R ∼ caligraphic_C caligraphic_N ( 0 , roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT / italic_m ). Then, Cj(α)={z|zx^ju|rj(α)}subscript𝐶𝑗𝛼conditional-set𝑧𝑧subscriptsuperscript^𝑥𝑢𝑗subscript𝑟𝑗𝛼C_{j}(\alpha)=\{z\in\mathbb{C}\mid|z-\hat{x}^{u}_{j}|\leq r_{j}(\alpha)\}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = { italic_z ∈ blackboard_C ∣ | italic_z - over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) } with radius

rjG(α)=(σ2(MΣ^M)jj+(ΣR)jj)1/2mlog(1α).subscriptsuperscript𝑟𝐺𝑗𝛼superscriptsuperscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗12𝑚1𝛼r^{G}_{j}(\alpha)=\frac{(\sigma^{2}(M\hat{\Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}% )^{1/2}}{\sqrt{m}}\sqrt{\log\left(\frac{1}{\alpha}\right)}.italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = divide start_ARG ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_α end_ARG ) end_ARG . (6)

is valid, i.e. (xjCj(α))1αsubscript𝑥𝑗subscript𝐶𝑗𝛼1𝛼\mathbb{P}\left(x_{j}\in C_{j}(\alpha)\right)\geq 1-\alphablackboard_P ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) ≥ 1 - italic_α.

For the proof, we refer to Appendix B. In Appendix D, we demonstrate empirically that the Gaussian assumption for the remainder term holds in a wide range of relevant practical settings. This validation enables the application of the proposed confidence intervals derived under this assumption. These confidence intervals strike a careful balance between non-asymptotic reliability, ensuring valid coverage even in finite-sample regimes, and tightness, providing informative and precise uncertainty estimates. By leveraging the Gaussian approximation, which becomes increasingly accurate in higher dimensions as illustrated in Figure 7, our framework offers a principled and computationally efficient approach to quantifying uncertainty in high-dimensional prediction problems. The variance of R𝑅Ritalic_R can be estimated with the given data using, e.g., the unbiased estimator for the variance as in Theorem 2.

5 Numerical Experiments

We evaluate the performance of our non-asymptotic confidence intervals through extensive numerical experiments across two settings: (i.) the classical debiased LASSO framework to contrast our non-asymptotic confidence intervals against the asymptotic ones. (ii.) the learned framework where we employ learned estimators, specifically the U-net [Ronneberger et al., 2015] as well as the It-Net [Genzel et al., 2022a], to reconstruct real-world MR images and quantify uncertainty. Our experiments demonstrate the importance of properly accounting for the remainder term in practical, non-asymptotic regimes. Each experiment follows the same structure:

  1. 1.

    Data Generation and Management: We fix the forward operator A𝐴Aitalic_A and generate n>2𝑛2n>2italic_n > 2 feature vectors x(i)i=1nsuperscriptsubscriptsuperscript𝑥𝑖𝑖1𝑛{x^{(i)}}_{i=1}^{n}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and noise vectors ε(i)i=1nsuperscriptsubscriptsuperscript𝜀𝑖𝑖1𝑛{\varepsilon^{(i)}}_{i=1}^{n}italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with ε(i)𝒞𝒩(0,σ2Im×m)similar-tosuperscript𝜀𝑖𝒞𝒩0superscript𝜎2subscript𝐼𝑚𝑚\varepsilon^{(i)}\sim\mathcal{CN}(0,\sigma^{2}I_{m\times m})italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT ). We obtain observations b(i)i=1nsuperscriptsubscriptsuperscript𝑏𝑖𝑖1𝑛{b^{(i)}}_{i=1}^{n}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT via b(i)=Ax(i)+ε(i)superscript𝑏𝑖𝐴superscript𝑥𝑖superscript𝜀𝑖b^{(i)}=Ax^{(i)}+\varepsilon^{(i)}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_A italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT + italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. We split the data (b(i),x(i))i=1nsuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖1𝑛{(b^{(i)},x^{(i)})}_{i=1}^{n}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT into an estimation dataset of size l𝑙litalic_l and a test dataset of size k𝑘kitalic_k (l+k=n𝑙𝑘𝑛l+k=nitalic_l + italic_k = italic_n). If we learn an estimator, we further split the data into training, estimation, and test sets.

  2. 2.

    Reconstruction: Depending on the experiment, we obtain a reconstruction function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG in one of the following ways: for the classical LASSO setting, we use the LASSO; for the learned estimator experiment, we train a U-Net [Ronneberger et al., 2015] or It-net [Genzel et al., 2022a] on the training data to serve as the reconstruction function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG.

  3. 3.

    Estimation of Confidence Radii: We run Algorithm 1 with A,X^,M𝐴^𝑋𝑀A,\hat{X},Mitalic_A , over^ start_ARG italic_X end_ARG , italic_M (that is chosen according to [Javanmard and Montanari, 2018]), the estimation data (b(i),x(i))i=1lsuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖1𝑙{(b^{(i)},x^{(i)})}_{i=1}^{l}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, and a predefined significance level α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) to obtain radii rj(α)j=1Nsubscript𝑟𝑗superscriptsubscript𝛼𝑗1𝑁{r_{j}(\alpha)}_{j=1}^{N}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. To construct the final confidence intervals, the radii need to be centered according to the debiased estimator. For every new measurement b𝑏bitalic_b, we run Algorithm 2 to obtain tailored confidence intervals for the feature vector x𝑥xitalic_x corresponding to b𝑏bitalic_b. In addition, we compute the CI for the Gaussian adjustment based on Theorem 3 using the estimation set to quantify the variance of R𝑅Ritalic_R with the unbiased estimator for the variance as before.

  4. 4.

    Evaluation: We use the test dataset (b(i),x(i))i=l+1ksuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖𝑙1𝑘{(b^{(i)},x^{(i)})}_{i=l+1}^{k}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT to evaluate our adjustments. For each b(i)superscript𝑏𝑖b^{(i)}italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, we run Algorithm 2 to obtain confidence intervals Cj(i)(α)j=1Nsuperscriptsubscript𝐶𝑗𝑖superscriptsubscript𝛼𝑗1𝑁{C_{j}^{(i)}(\alpha)}_{j=1}^{N}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_α ) start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT for x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. We estimate (xj(i)Cj(α))superscriptsubscript𝑥𝑗𝑖subscript𝐶𝑗𝛼\mathbb{P}(x_{j}^{(i)}\in C_{j}(\alpha))blackboard_P ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) by hj(α)=1ki=l+1k𝟙{xj(i)Cj(α)}subscript𝑗𝛼1𝑘superscriptsubscript𝑖𝑙1𝑘subscript1superscriptsubscript𝑥𝑗𝑖subscript𝐶𝑗𝛼h_{j}(\alpha)=\frac{1}{k}\sum_{i=l+1}^{k}\mathbbm{1}_{\{x_{j}^{(i)}\in C_{j}(% \alpha)\}}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_l + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) } end_POSTSUBSCRIPT and average over all components h(α)=1Nj=1Nhj(α)𝛼1𝑁superscriptsubscript𝑗1𝑁subscript𝑗𝛼h(\alpha)=\frac{1}{N}\sum_{j=1}^{N}h_{j}(\alpha)italic_h ( italic_α ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ). Since performance on the support S𝑆Sitalic_S is crucial, we define the hit rate on S𝑆Sitalic_S as hS(i)=1|S|j=1N𝟙{xj(i)Cj(α)}superscriptsubscript𝑆𝑖1𝑆superscriptsubscript𝑗1𝑁subscript1superscriptsubscript𝑥𝑗𝑖subscript𝐶𝑗𝛼h_{S}^{(i)}=\frac{1}{|S|}\sum_{j=1}^{N}\mathbbm{1}_{\{x_{j}^{(i)}\in C_{j}(% \alpha)\}}italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) } end_POSTSUBSCRIPT and average hS(α)=1li=1lhS(i)subscript𝑆𝛼1𝑙superscriptsubscript𝑖1𝑙superscriptsubscript𝑆𝑖h_{S}(\alpha)=\frac{1}{l}\sum_{i=1}^{l}h_{S}^{(i)}italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_α ) = divide start_ARG 1 end_ARG start_ARG italic_l end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. Note that the support may change with i𝑖iitalic_i. Moreover, we do the same for the CI based on the Gaussian-adjusted radii.

5.1 UQ for Classical Model-Based Regression

We consider a setting aligned with existing debiased LASSO literature, e.g., [Javanmard and Montanari, 2014] to demonstrate our approach’s extension of current UQ methods. The forward operator is a complex Gaussian matrix Am×N𝐴superscript𝑚𝑁A\in\mathbb{C}^{m\times N}italic_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_N end_POSTSUPERSCRIPT with dimensions N=10000𝑁10000N=10000italic_N = 10000, m=0.6N𝑚0.6𝑁m=0.6Nitalic_m = 0.6 italic_N, and Aij𝒞𝒩(0,1)similar-tosubscript𝐴𝑖𝑗𝒞𝒩01A_{ij}\sim\mathcal{CN}(0,1)italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , 1 ). We generate n=750𝑛750n=750italic_n = 750 s=0.1N𝑠0.1𝑁s=0.1Nitalic_s = 0.1 italic_N-sparse features x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT by randomly selecting m𝑚mitalic_m distinct indices from 1,,N1𝑁{1,\ldots,N}1 , … , italic_N and drawing magnitudes from 𝒞𝒩(0,1)𝒞𝒩01\mathcal{CN}(0,1)caligraphic_C caligraphic_N ( 0 , 1 ). With relative noise ε(i)Ax(i)0.2normsuperscript𝜀𝑖norm𝐴superscript𝑥𝑖0.2\frac{\|\varepsilon^{(i)}\|}{\|Ax^{(i)}\|}\approx 0.2divide start_ARG ∥ italic_ε start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ end_ARG start_ARG ∥ italic_A italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∥ end_ARG ≈ 0.2, we split the data (b(i),x(i))i=1nsuperscriptsubscriptsuperscript𝑏𝑖superscript𝑥𝑖𝑖1𝑛{(b^{(i)},x^{(i)})}_{i=1}^{n}( italic_b start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT into l=500𝑙500l=500italic_l = 500 estimation and k=250𝑘250k=250italic_k = 250 test data. For reconstruction, we solve the LASSO X^(b):=argminxN1mAxb+λx1assign^𝑋𝑏subscriptargmin𝑥superscript𝑁1𝑚norm𝐴𝑥𝑏𝜆subscriptnorm𝑥1\hat{X}(b):=\operatorname{argmin}_{x\in\mathbb{C}^{N}}\frac{1}{m}\|Ax-b\|+% \lambda\|x\|_{1}over^ start_ARG italic_X end_ARG ( italic_b ) := roman_argmin start_POSTSUBSCRIPT italic_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∥ italic_A italic_x - italic_b ∥ + italic_λ ∥ italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with λ=10σm(2+12log(N))𝜆10𝜎𝑚212𝑁\lambda=10\frac{\sigma}{\sqrt{m}}(2+\sqrt{12\log(N)})italic_λ = 10 divide start_ARG italic_σ end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG ( 2 + square-root start_ARG 12 roman_log ( italic_N ) end_ARG ) following [Hoppe et al., 2022].

With significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, we run Algorithm 1 to obtain confidence radii, choosing M=IN×N𝑀subscript𝐼𝑁𝑁M=I_{N\times N}italic_M = italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT [Javanmard and Montanari, 2018] and exploiting the relaxation (9). Averaged over the l𝑙litalic_l estimation data points, the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm ratios are: R2W2=0.9993subscriptnorm𝑅2subscriptnorm𝑊20.9993\frac{\|R\|_{2}}{\|W\|_{2}}=0.9993divide start_ARG ∥ italic_R ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_W ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = 0.9993 and RW=1.1581subscriptnorm𝑅subscriptnorm𝑊1.1581\frac{\|R\|_{\infty}}{\|W\|_{\infty}}=1.1581divide start_ARG ∥ italic_R ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_W ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG = 1.1581. In existing literature, the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm is typically measured when the remainder term vanishes, as it is relevant for pixel-wise confidence intervals. Here, the remainder term is of comparable order as the Gaussian term and hence, too significant to neglect in confidence intervals derivation.

Evaluating it on the remaining k=250𝑘250k=250italic_k = 250 data points, the data-driven and Gaussian-adjusted averaged hit rates are h(0.05)=10.051h(0.05)=1italic_h ( 0.05 ) = 1, hS(0.05)=1subscript𝑆0.051h_{S}(0.05)=1italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) = 1 and hG(0.05)=0.9691superscript𝐺0.050.9691h^{G}(0.05)=0.9691italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) = 0.9691, hSG(0.05)=0.8948subscriptsuperscript𝐺𝑆0.050.8948h^{G}_{S}(0.05)=0.8948italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) = 0.8948, respectively. Neglecting the remainder term yields hW(0.05)=0.8692superscript𝑊0.050.8692h^{W}(0.05)=0.8692italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) = 0.8692 and hSW(0.05)=0.6783subscriptsuperscript𝑊𝑆0.050.6783h^{W}_{S}(0.05)=0.6783italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) = 0.6783, which is substantially lower and violates the specified 0.050.050.050.05 significance level. Fig. 2 presents confidence intervals of each type for one data point x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and a detailed visualization of hj(0.05)subscript𝑗0.05h_{j}(0.05)italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 0.05 ), hS(i)(0.05)superscriptsubscript𝑆𝑖0.05h_{S}^{(i)}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( 0.05 ), hjG(0.05)superscriptsubscript𝑗𝐺0.05h_{j}^{G}(0.05)italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ), and (hSG)(i)(0.05)superscriptsubscriptsuperscript𝐺𝑆𝑖0.05(h^{G}_{S})^{(i)}(0.05)( italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( 0.05 ). Further experiments with different sparse regression settings, including subsampled Fourier matrices, are presented in Appendix C.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 2: Confidence intervals of asymptotic type 2(a), with Gaussian adjustment 2(b) and data-driven adjustment 2(c) for one evaluation feature vector in the sparse regression setting described in Section 5.1. Box plots of hit rates hj(0.05)subscript𝑗0.05h_{j}(0.05)italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 0.05 ) and hjG(0.05)superscriptsubscript𝑗𝐺0.05h_{j}^{G}(0.05)italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ), j=1,,N𝑗1𝑁j=1,\ldots,Nitalic_j = 1 , … , italic_N, averaged over feature vectors x(1),,x(500)superscript𝑥1superscript𝑥500x^{(1)},\ldots,x^{(500)}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( 500 ) end_POSTSUPERSCRIPT 2(d) and hit rates hS(0.05)(i)subscript𝑆superscript0.05𝑖h_{S}(0.05)^{(i)}italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and (hSG)(i)(0.05)superscriptsuperscriptsubscript𝑆𝐺𝑖0.05(h_{S}^{G})^{(i)}(0.05)( italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( 0.05 ), i=1,,k𝑖1𝑘i=1,\ldots,kitalic_i = 1 , … , italic_k, averaged over components j𝑗jitalic_j 2(e).

5.2 UQ for MRI Reconstruction with Neural Networks

We extend the debiasing approach to model-based deep learning for MRI reconstruction using the U-Net and It-Net on single-coil knee images from the NYU fastMRI dataset 222We obtained the data, which we used for conducting the experiments in this paper from the NYU fastMRI Initiative database (fastmri.med.nyu.edu) [Zbontar et al., 2019, Knoll et al., 2020]. The data was only obtained from the NYU fastMRI investigators, but they did not contribute any ideas, analysis, or writing to this paper. The list of the NYU fastMRI investigators, can be found at fastmri.med.nyu.edu, it is subject to updates. [Zbontar et al., 2019, Knoll et al., 2020]. Here, the forward operator is the undersampled Fourier operator 𝒫m×N𝒫superscript𝑚𝑁\mathcal{P}\mathcal{F}\in\mathbb{C}^{m\times N}caligraphic_P caligraphic_F ∈ blackboard_C start_POSTSUPERSCRIPT italic_m × italic_N end_POSTSUPERSCRIPT with N=320×320𝑁320320N=320\times 320italic_N = 320 × 320, m=0.6N𝑚0.6𝑁m=0.6Nitalic_m = 0.6 italic_N, the Fourier matrix \mathcal{F}caligraphic_F and a radial mask 𝒫𝒫\mathcal{P}caligraphic_P, see Figure 5(b). The noise level σ𝜎\sigmaitalic_σ is chosen such that the relative noise is approximately 0.10.10.10.1. The data is split into training (33370 slices), validation (5346 slices), estimation (1372 slices), and test (100 slices) datasets.

We then train an It-Net [Genzel et al., 2022a] with 8888 layers, a combination of MS-SSIM [Wang et al., 2003] and 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-losses and Adam optimizer with learning rate 5e55superscript𝑒55e^{-5}5 italic_e start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT for 15151515 epochs to obtain our reconstruction function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG.

With significance level α=0.1𝛼0.1\alpha=0.1italic_α = 0.1, we run Algorithm 1 to construct confidence radii, choosing M=IN×N𝑀subscript𝐼𝑁𝑁M=I_{N\times N}italic_M = italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT [Javanmard and Montanari, 2018] and exploiting the relaxation (9). Averaged over the l𝑙litalic_l estimation data points, we have R2W2=0.38subscriptnorm𝑅2subscriptnorm𝑊20.38\frac{\|R\|_{2}}{\|W\|_{2}}=0.38divide start_ARG ∥ italic_R ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_W ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = 0.38 and RW=0.49subscriptnorm𝑅subscriptnorm𝑊0.49\frac{\|R\|_{\infty}}{\|W\|_{\infty}}=0.49divide start_ARG ∥ italic_R ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_W ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG = 0.49, which indicates that the remainder term is significant and cannot be neglected. Evaluating the test data, the averages of the data-driven adjustment hit rates are h(0.1)=0.99990.10.9999h(0.1)=0.9999italic_h ( 0.1 ) = 0.9999, hS(0.1)=0.9998subscript𝑆0.10.9998h_{S}(0.1)=0.9998italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.1 ) = 0.9998, and the averages of the Gaussian adjusted hit rates are hG(0.1)=0.9752superscript𝐺0.10.9752h^{G}(0.1)=0.9752italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.1 ) = 0.9752, hSG(0.1)=0.98subscriptsuperscript𝐺𝑆0.10.98h^{G}_{S}(0.1)=0.98italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.1 ) = 0.98. Neglecting the remainder term, the hit rates of the asymptotic CIs are hW(0.1)=0.9502superscript𝑊0.10.9502h^{W}(0.1)=0.9502italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.1 ) = 0.9502 and hSW(0.1)=0.87subscriptsuperscript𝑊𝑆0.10.87h^{W}_{S}(0.1)=0.87italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.1 ) = 0.87. As in the sparse regression setting, they are significantly lower. Fig. 3 presents confidence intervals based on the data-driven adjustment and the asymptotic confidence intervals for a region in one image x(i)superscript𝑥𝑖x^{(i)}italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT. In addition, it contains a box plot showing the distribution of the hit rates based on the Gaussian adjustment and the asymptotic hit rates. More experiments for UQ for MRI reconstruction can be found in Appendix C and Tables 2 and 3.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 3: Reconstruction obtained with the It-Net as described in 5.2. Data-driven adjustment confidence intervals 3(b) and asymptotic confidence intervals 3(c) for the region (50 pixels) in 320x320 knee image 3(a); Box plots of hit rates 3(d) for 90%percent9090\%90 % confidence level for the Gaussian adjusted and asymptotic confidence intervals.

6 Final Remarks

In this work, we proposed a data-driven uncertainty quantification method that derives non-asymptotic confidence intervals based on debiased estimators. Our approach corrects asymptotic confidence intervals by incorporating an estimate of the remainder component and has solid theoretical foundations. While the correction can be based on prior knowledge, e.g., a Gaussian distribution of the remainder term, we also derive CI based on a data-driven adjustment without further information. This data-driven nature enhances its applicability to a wide range of estimators, including model-based deep-learning techniques. We conducted experiments that confirm our theoretical findings, demonstrating that even in classical sparse regression settings, the remainder term is too significant to be neglected. Furthermore, we applied the proposed method to MRI, achieving significantly better rates on the image support.

While our method corrects for the remainder term, larger remainder terms necessitate greater corrections, resulting in wider confidence intervals. Therefore, it is crucial to achieve a small remainder term to avoid excessively large confidence intervals. Additionally, the accuracy of our method depends on the quality of the estimates for the mean and variance of the remainder term, which improves with more available data. Additionally, the length of the intervals can be minimized over a larger parameter set, provided that more data is available. We leave as a future direction to study the sharpness of the proposed confidence intervals and radii for a given amount of data. Moreover, we would like to investigate how the length of the confidence intervals could be improved when estimating higher moments. We believe that our method is applicable to a wide variety of deep learning architectures, including vision transformers in MRI, e.g., [Lin and Heckel, 2022]. Testing the generality of the method with state-of-the-art architectures for different problems would demonstrate its broad usefulness.

Acknowledgments

We gratefully acknowledge financial support with funds provided by the German Federal Ministry of Education and Research in the grant “SparseMRI3D+: Compressive Sensing und Quantifizierung von Unsicherheiten für die beschleunigte multiparametrische quantitative Magnetresonanztomografie (FZK 05M20WOA)”.

References

  • [Aberdam et al., 2021] Aberdam, A., Golts, A., and Elad, M. (2021). Ada-lista: Learned solvers adaptive to varying models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):9222–9235.
  • [Adler and Öktem, 2018] Adler, J. and Öktem, O. (2018). Learned primal-dual reconstruction. IEEE transactions on medical imaging, 37(6):1322–1332.
  • [Aggarwal et al., 2018] Aggarwal, H. K., Mani, M. P., and Jacob, M. (2018). MoDL: Model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging, 38(2):394–405.
  • [Aja-Fernández and Vegas-Sánchez-Ferrero, 2016] Aja-Fernández, S. and Vegas-Sánchez-Ferrero, G. (2016). Statistical analysis of noise in MRI. Switzerland: Springer International Publishing.
  • [Arridge et al., 2019] Arridge, S., Maass, P., Öktem, O., and Schönlieb, C.-B. (2019). Solving inverse problems using data-driven models. Acta Numerica, 28:1–174.
  • [Beck and Teboulle, 2009] Beck, A. and Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1):183–202.
  • [Becker et al., 2011] Becker, S. R., Candès, E. J., and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Mathematical programming computation, 3(3):165–218.
  • [Bellec and Tan, 2024] Bellec, P. C. and Tan, K. (2024). Uncertainty quantification for iterative algorithms in linear models with application to early stopping. arXiv preprint arXiv:2404.17856.
  • [Bellec and Zhang, 2022] Bellec, P. C. and Zhang, C.-H. (2022). De-biasing the lasso with degrees-of-freedom adjustment. Bernoulli, 28(2):713–743.
  • [Bellec and Zhang, 2023] Bellec, P. C. and Zhang, C.-H. (2023). Debiasing convex regularized estimators and interval estimation in linear models. The Annals of Statistics, 51(2):391–436.
  • [Belloni et al., 2011] Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791–806.
  • [Bogdan et al., 2015] Bogdan, M., Van Den Berg, E., Sabatti, C., Su, W., and Candès, E. (2015). SLOPE–Adaptive Variable Selection via Convex Optimization. The Annals of Applied Statistics, 9(65):1103–1140.
  • [Bunea et al., 2007] Bunea, F., Tsybakov, A., and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics, 1:169–194.
  • [Cai and Guo, 2017] Cai, T. T. and Guo, Z. (2017). Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity. The Annals of Statistics, pages 615–646.
  • [Chen and Donoho, 1995] Chen, S. and Donoho, D. L. (1995). Examples of basis pursuit. In Wavelet Applications in Signal and Image Processing III, volume 2569, pages 564–574. SPIE.
  • [Chen et al., 2022] Chen, T., Chen, X., Chen, W., Heaton, H., Liu, J., Wang, Z., and Yin, W. (2022). Learning to optimize: A primer and a benchmark. Journal of Machine Learning Research, 23(189):1–59.
  • [Chen et al., 2021] Chen, X., Liu, J., Wang, Z., and Yin, W. (2021). Hyperparameter tuning is all you need for LISTA. Advances in Neural Information Processing Systems, 34:11678–11689.
  • [Cherkaoui et al., 2020] Cherkaoui, H., Sulam, J., and Moreau, T. (2020). Learning to solve TV regularised problems with unrolled algorithms. Advances in Neural Information Processing Systems, 33:11513–11524.
  • [Dalalyan et al., 2017] Dalalyan, A., Hebiri, M., and Lederer, J. C. (2017). On the prediction performance of the Lasso. Bernoulli, 23(1):552–581.
  • [Daubechies et al., 2004] Daubechies, I., Defrise, M., and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457.
  • [Dicker, 2014] Dicker, L. H. (2014). Variance estimation in high-dimensional linear models. Biometrika, 101(2):269–284.
  • [Ekmekci and Cetin, 2022] Ekmekci, C. and Cetin, M. (2022). Uncertainty quantification for deep unrolling-based computational imaging. IEEE Transactions on Computational Imaging, 8:1195–1209.
  • [Fan and Li, 2001] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.
  • [Foucart and Rauhut, 2013] Foucart, S. and Rauhut, H. (2013). A Mathematical Introduction to Compressive Sensing. Springer New York, New York, NY.
  • [Foucart et al., 2023] Foucart, S., Tadmor, E., and Zhong, M. (2023). On the sparsity of LASSO minimizers in sparse data recovery. Constructive Approximation, 57(2):901–919.
  • [Gal and Ghahramani, 2016] Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR.
  • [Gawlikowski et al., 2023] Gawlikowski, J., Tassi, C. R. N., Ali, M., Lee, J., Humt, M., Feng, J., Kruspe, A., Triebel, R., Jung, P., Roscher, R., et al. (2023). A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 56(Suppl 1):1513–1589.
  • [Genzel et al., 2022a] Genzel, M., Gühring, I., Macdonald, J., and März, M. (2022a). Near-exact recovery for tomographic inverse problems via deep learning. In International Conference on Machine Learning, pages 7368–7381. PMLR.
  • [Genzel et al., 2022b] Genzel, M., Macdonald, J., and März, M. (2022b). Solving inverse problems with deep neural networks–robustness included? IEEE transactions on pattern analysis and machine intelligence, 45(1):1119–1134.
  • [Giraud, 2021] Giraud, C. (2021). Introduction to high-dimensional statistics. Chapman and Hall/CRC.
  • [Giraud et al., 2012] Giraud, C., Huet, S., and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci., 27(4):500–518.
  • [Gregor and LeCun, 2010] Gregor, K. and LeCun, Y. (2010). Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on machine learning, pages 399–406.
  • [Guo et al., 2022] Guo, Z., Ćevid, D., and Bühlmann, P. (2022). Doubly debiased lasso: High-dimensional inference under hidden confounding. Annals of statistics, 50(3):1320.
  • [Hastie et al., 2009] Hastie, T., Tibshirani, R., Friedman, J. H., and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.
  • [Hoppe et al., 2022] Hoppe, F., Krahmer, F., Mayrink Verdun, C., Menzel, M. I., and Rauhut, H. (2022). Uncertainty quantification for sparse Fourier recovery. arXiv preprint arXiv:2212.14864.
  • [Hoppe et al., 2023a] Hoppe, F., Krahmer, F., Mayrink Verdun, C., Menzel, M. I., and Rauhut, H. (2023a). High-Dimensional Confidence Regions in Sparse MRI. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  • [Hoppe et al., 2024] Hoppe, F., Mayrink Verdun, C., Laus, H., Endt, S., Menzel, M. I., Krahmer, F., and Rauhut, H. (2024). Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images. In Proceedings of European Conference on Computer Vision (ECCV).
  • [Hoppe et al., 2023b] Hoppe, F., Mayrink Verdun, C., Laus, H., Krahmer, F., and Rauhut, H. (2023b). Uncertainty quantification for learned ISTA. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE.
  • [Ito et al., 2019] Ito, D., Takabe, S., and Wadayama, T. (2019). Trainable ISTA for sparse signal recovery. IEEE Transactions on Signal Processing, 67(12):3113–3125.
  • [Javanmard and Montanari, 2014] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15:2869–2909.
  • [Javanmard and Montanari, 2018] Javanmard, A. and Montanari, A. (2018). Debiasing the lasso: Optimal sample size for Gaussian designs. The Annals of Statistics, 46(6A).
  • [Kennedy and Ward, 2020] Kennedy, C. and Ward, R. (2020). Greedy variance estimation for the LASSO. Appl. Math. Optim., 82(3):1161–1182.
  • [Knight and Fu, 2000] Knight, K. and Fu, W. (2000). Asymptotics for Lasso-Type Estimators. The Annals of Statistics, 28(5):1356–1378.
  • [Knoll et al., 2020] Knoll, F., Zbontar, J., Sriram, A., Muckley, M. J., Bruno, M., Defazio, A., Parente, M., Geras, K. J., Katsnelson, J., Chandarana, H., Zhang, Z., Drozdzalv, M., Romero, A., Rabbat, M., Vincent, P., Pinkerton, J., Wang, D., Yakubova, N., Owens, E., Zitnick, C. L., Recht, M. P., Sodickson, D. K., and Lui, Y. W. (2020). fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Machine Learning. Radiology: Artificial Intelligence, 2(1):e190007. PMID: 32076662.
  • [Koltchinskii, 2009] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. In Annales de l’IHP Probabilités et statistiques, volume 45, pages 7–57.
  • [Koltchinskii et al., 2011] Koltchinskii, V., Lounici, K., and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5):2302 – 2329.
  • [Kümmerle et al., 2021] Kümmerle, C., Mayrink Verdun, C., and Stöger, D. (2021). Iteratively reweighted least squares for basis pursuit with global linear convergence rate. Advances in Neural Information Processing Systems, 34:2873–2886.
  • [Li and Malik, 2016] Li, K. and Malik, J. (2016). Learning to Optimize. In International Conference on Learning Representations.
  • [Li, 2020] Li, S. (2020). Debiasing the debiased Lasso with bootstrap. Electronic Journal of Statistics, 14:2298–2337.
  • [Li et al., 2018] Li, X., Sun, D., and Toh, K.-C. (2018). A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM Journal on Optimization, 28(1):433–458.
  • [Lin and Heckel, 2022] Lin, K. and Heckel, R. (2022). Vision transformers enable fast and robust accelerated mri. In International Conference on Medical Imaging with Deep Learning, pages 774–795. PMLR.
  • [Liu and Chen, 2019] Liu, J. and Chen, X. (2019). ALISTA: Analytic weights are as good as learned weights in LISTA. In International Conference on Learning Representations (ICLR).
  • [Liu et al., 2019] Liu, R., Cheng, S., Ma, L., Fan, X., and Luo, Z. (2019). Deep proximal unrolling: Algorithmic framework, convergence analysis and applications. IEEE Transactions on Image Processing, 28(10):5013–5026.
  • [Liu et al., 2020] Liu, X., Zheng, S., and Feng, X. (2020). Estimation of error variance via ridge regression. Biometrika, 107(2):481–488.
  • [Mayrink Verdun et al., 2024] Mayrink Verdun, C., Melnyk, O., Krahmer, F., and Jung, P. (2024). Fast, blind, and accurate: Tuning-free sparse regression with global linear convergence. In The Thirty Seventh Annual Conference on Learning Theory, pages 3823–3872. PMLR.
  • [Monga et al., 2021] Monga, V., Li, Y., and Eldar, Y. C. (2021). Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine, 38(2):18–44.
  • [Rakotomamonjy et al., 2022] Rakotomamonjy, A., Flamary, R., Salmon, J., and Gasso, G. (2022). Convergent working set algorithm for lasso with non-convex sparse regularizers. In International Conference on Artificial Intelligence and Statistics, pages 5196–5211. PMLR.
  • [Raskutti et al., 2011] Raskutti, G., Wainwright, M. J., and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over qsubscript𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-balls. IEEE transactions on information theory, 57(10):6976–6994.
  • [Reid et al., 2016] Reid, S., Tibshirani, R., and Friedman, J. (2016). A study of error variance estimation in lasso regression. Statist. Sinica, pages 35–67.
  • [Ronneberger et al., 2015] Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer.
  • [Saw et al., 1984] Saw, J. G., Yang, M. C. K., and Mo, T. C. (1984). Chebyshev Inequality with Estimated Mean and Variance. The American Statistician, 38(2):130–132.
  • [Schlemper et al., 2017] Schlemper, J., Caballero, J., Hajnal, J. V., Price, A. N., and Rueckert, D. (2017). A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE transactions on Medical Imaging, 37(2):491–503.
  • [Sidky and Pan, 2022] Sidky, E. Y. and Pan, X. (2022). Report on the AAPM deep-learning sparse-view CT grand challenge. Medical physics, 49(8):4935–4943.
  • [Stellato et al., 2017] Stellato, B., van Parys, B. P. G., and Goulart, P. J. (2017). Multivariate Chebyshev Inequality With Estimated Mean and Variance. The American Statistician, 71(2):123–127.
  • [Sun and Zhang, 2012] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4):879–898.
  • [Tibshirani, 1996] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288.
  • [van de Geer et al., 2014] van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3).
  • [Wainwright, 2009] Wainwright, M. J. (2009). Sharp thresholds for High-Dimensional and noisy sparsity recovery using 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-Constrained Quadratic Programming (Lasso). IEEE transactions on information theory, 55(5):2183–2202.
  • [Wainwright, 2019] Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press.
  • [Wang et al., 2003] Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee.
  • [Wright and Ma, 2022] Wright, J. and Ma, Y. (2022). High-dimensional data analysis with low-dimensional models: Principles, computation, and applications. Cambridge University Press.
  • [Wu et al., 2019] Wu, K., Guo, Y., Li, Z., and Zhang, C. (2019). Sparse coding with gated learned ISTA. In International conference on learning representations.
  • [Ye and Zhang, 2010] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the qsubscript𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT loss in rsubscript𝑟\ell_{r}roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT balls. The Journal of Machine Learning Research, 11:3519–3540.
  • [Yu and Bien, 2019] Yu, G. and Bien, J. (2019). Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3):533–546.
  • [Yuan and Lin, 2005] Yuan, M. and Lin, Y. (2005). Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67.
  • [Zbontar et al., 2019] Zbontar, J., Knoll, F., Sriram, A., Murrell, T., Huang, Z., Muckley, M. J., Defazio, A., Stern, R., Johnson, P., Bruno, M., Parente, M., Geras, K. J., Katsnelson, J., Chandarana, H., Zhang, Z., Drozdzal, M., Romero, A., Rabbat, M., Vincent, P., Yakubova, N., Pinkerton, J., Wang, D., Owens, E., Zitnick, C. L., Recht, M. P., Sodickson, D. K., and Lui, Y. W. (2019). fastMRI: An Open Dataset and Benchmarks for Accelerated MRI.
  • [Zhang, 2010] Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894–942.
  • [Zhang and Huang, 2008] Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Annals of Statistics, 36(4):1567–1594.
  • [Zhang and Zhang, 2014] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):217–242.
  • [Zhang et al., 2023] Zhang, J., Chen, B., Xiong, R., and Zhang, Y. (2023). Physics-inspired compressive sensing: Beyond deep unrolling. IEEE Signal Processing Magazine, 40(1):58–72.
  • [Zhao and Yu, 2006] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. The Journal of Machine Learning Research, 7:2541–2563.
  • [Zheng et al., 2017] Zheng, L., Maleki, A., Weng, H., Wang, X., and Long, T. (2017). Does psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-Minimization Outperform 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-Minimization? IEEE Transactions on Information Theory, 63(11):6896–6935.
  • [Zheng et al., 2022] Zheng, Z., Dai, W., Xue, D., Li, C., Zou, J., and Xiong, H. (2022). Hybrid ISTA: Unfolding ISTA with convergence guarantees using free-form deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3226–3244.
  • [Zou and Hastie, 2005] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320.

Supplementary material to the paper Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning.

In this supplement to the paper, we present in Section A a detailed discussion about aspects of the main result that are not mentioned in the main body of the paper. Moreover, Section B presents the proof Theorem 2 and Theorem 3. The former establishes data-driven confidence intervals, while the latter assumes the remainder component to be approximated by a Gaussian distribution. In Section C, we confirm our theoretical findings with several numerical experiments for classical high-dimensional regression as well as model-based neural networks. In Section D, we visualize the approximate Gaussian distribution of the remainder terms, demonstrating the applicability of Theorem 3 in relevant settings.

Appendix A Further Discussion of Main Result

Length of radius: To minimize the length of the radius, γj(0,11lα)subscript𝛾𝑗011𝑙𝛼\gamma_{j}\in\left(0,1-\frac{1}{l\alpha}\right)italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ ( 0 , 1 - divide start_ARG 1 end_ARG start_ARG italic_l italic_α end_ARG ) should be chosen as the minimizer of the problem

minγj(0,11lα)σ(MΣ^M)jj1/2mlog(1γjα)+l21l2(1γj)αl(σ^R)j,subscriptsubscript𝛾𝑗011𝑙𝛼𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚1subscript𝛾𝑗𝛼superscript𝑙21superscript𝑙21subscript𝛾𝑗𝛼𝑙subscriptsubscript^𝜎𝑅𝑗\min\limits_{\gamma_{j}\in\left(0,1-\frac{1}{l\alpha}\right)}\frac{\sigma(M% \hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}}\sqrt{\log\left(\frac{1}{\gamma_{j}% \alpha}\right)}+\sqrt{\frac{l^{2}-1}{l^{2}(1-\gamma_{j})\alpha-l}}\cdot(\hat{% \sigma}_{R})_{j},roman_min start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ ( 0 , 1 - divide start_ARG 1 end_ARG start_ARG italic_l italic_α end_ARG ) end_POSTSUBSCRIPT divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α end_ARG ) end_ARG + square-root start_ARG divide start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - italic_l end_ARG end_ARG ⋅ ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (7)

In order to minimize over a large set for a given significance level α𝛼\alphaitalic_α, a large number of data l𝑙litalic_l is needed. For fixed estimates S^jsubscript^𝑆𝑗\hat{S}_{j}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and (σ^R2)jsubscriptsuperscriptsubscript^𝜎𝑅2𝑗(\hat{\sigma}_{R}^{2})_{j}( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, more data leads to a potentially smaller confidence interval length. If we assume Rj=0subscript𝑅𝑗0R_{j}=0italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0, it follows that S^j=0subscript^𝑆𝑗0\hat{S}_{j}=0over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0 and (σ^R2)j=0subscriptsuperscriptsubscript^𝜎𝑅2𝑗0(\hat{\sigma}_{R}^{2})_{j}=0( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0. Then, γ=1𝛾1\gamma=1italic_γ = 1 is a valid choice, for which the function σ(MΣ^M)jj1/2mlog(1γjα)𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚1subscript𝛾𝑗𝛼\frac{\sigma(M\hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}}\sqrt{\log\left(\frac{1}% {\gamma_{j}\alpha}\right)}divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α end_ARG ) end_ARG is well-defined and is minimized. In this case, the radius coincides with the asymptotic radius derived in [Javanmard and Montanari, 2014, Javanmard and Montanari, 2018, van de Geer et al., 2014] (except for that these works handle the real case) and the ones in [Hoppe et al., 2023b] with M=IN×N𝑀subscript𝐼𝑁𝑁M=I_{N\times N}italic_M = italic_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT. In this sense, the asymptotic confidence intervals can be seen as a special case of the proposed method.

The significance level α𝛼\alphaitalic_α depends on γjsubscript𝛾𝑗\gamma_{j}italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and l𝑙litalic_l to assure cl()subscript𝑐𝑙c_{l}(\,\cdot\,)italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( ⋅ ) to be well-defined. For a large dataset x(1),,x(l)superscript𝑥1superscript𝑥𝑙x^{(1)},\ldots,x^{(l)}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, i.e. if l𝑙litalic_l is large, then it holds that limlcl(α)=liml11l2(1γj)α1l=1(1γj)αsubscript𝑙subscript𝑐𝑙𝛼subscript𝑙11superscript𝑙21subscript𝛾𝑗𝛼1𝑙11subscript𝛾𝑗𝛼\lim\limits_{l\to\infty}c_{l}(\alpha)=\lim\limits_{l\to\infty}\sqrt{\frac{1-% \frac{1}{l^{2}}}{(1-\gamma_{j})\alpha-\frac{1}{l}}}=\frac{1}{\sqrt{(1-\gamma_{% j})\alpha}}roman_lim start_POSTSUBSCRIPT italic_l → ∞ end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) = roman_lim start_POSTSUBSCRIPT italic_l → ∞ end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 1 - divide start_ARG 1 end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - divide start_ARG 1 end_ARG start_ARG italic_l end_ARG end_ARG end_ARG = divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α end_ARG end_ARG.

Probabilistic discussion: The probability in (5) is over the randomness of the noise as well as \mathbb{Q}blackboard_Q. The confidence circles Cj(α)subscript𝐶𝑗𝛼C_{j}(\alpha)italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) consist of two random variables, the debiased estimator x^jusubscriptsuperscript^𝑥𝑢𝑗\hat{x}^{u}_{j}over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and the radius rj(α)subscript𝑟𝑗𝛼r_{j}(\alpha)italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ). The former depends on the random noise and potentially on training data, while the latter depends on the estimators S^jsubscript^𝑆𝑗\hat{S}_{j}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σ^Rsubscript^𝜎𝑅\hat{\sigma}_{R}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, which in turn depend on both the noise and the data x(1),,x(l)superscript𝑥1superscript𝑥𝑙x^{(1)},\ldots,x^{(l)}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT.

A crucial requirement of applying the empirical version of Chebyshev’s inequality [Saw et al., 1984] is the independence and identical distribution of the variables |Rj(1)|,,|Rj(l)|superscriptsubscript𝑅𝑗1superscriptsubscript𝑅𝑗𝑙|R_{j}^{(1)}|,\ldots,|R_{j}^{(l)}|| italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | , … , | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT |. Therefore, it is essential that the estimator function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG is independent of the data x(1),,x(l)superscript𝑥1superscript𝑥𝑙x^{(1)},\ldots,x^{(l)}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. To achieve this, we train the estimator function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG using a dataset that is independent of the data x(1),,x(l)superscript𝑥1superscript𝑥𝑙x^{(1)},\ldots,x^{(l)}italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT, used for estimating R(1),,R(l)superscript𝑅1superscript𝑅𝑙R^{(1)},\ldots,R^{(l)}italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , … , italic_R start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT. However, the mean and variance of |R(1)|superscript𝑅1|R^{(1)}|| italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | and hence of |R(i)|superscript𝑅𝑖|R^{(i)}|| italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT | depend on the variance of the noise ε(1)superscript𝜀1\varepsilon^{(1)}italic_ε start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT, i.e. σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Thus, different noise levels σ𝜎\sigmaitalic_σ require a new estimation of the mean and variance of |R(1)|superscript𝑅1|R^{(1)}|| italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT |. Throughout this paper, we assume the noise level to be fixed and known. The latter assumption is motivated by two factors. First, the size of the confidence intervals relies on σ𝜎\sigmaitalic_σ. Given that the primary focus of this paper is to determine the size of the confidence intervals based on the remainder term R(1)superscript𝑅1R^{(1)}italic_R start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT, we seek to mitigate other influencing factors such as noise level estimation. Second, in domains like medical imaging, there is substantial knowledge about the noise level. For instance, the noise level in MRI can be directly measured from the scanner [Aja-Fernández and Vegas-Sánchez-Ferrero, 2016]. If the noise level is unknown, there are methods to estimate it. In the debiased LASSO literature, the most used method is the scaled LASSO [Sun and Zhang, 2012]. Other methods for sparse regression, either in the LASSO context or more general for high-dimensional models, are [Reid et al., 2016, Kennedy and Ward, 2020, Giraud et al., 2012, Dicker, 2014, Liu et al., 2020, Yu and Bien, 2019].

Relaxation of assumptions in practice: In practice, it is often the case, that |R1(i)|,,|RN(i)|subscriptsuperscript𝑅𝑖1subscriptsuperscript𝑅𝑖𝑁|R^{(i)}_{1}|,\ldots,|R^{(i)}_{N}|| italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | , … , | italic_R start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT | are identical distributed resulting in μ1==μNsubscript𝜇1subscript𝜇𝑁\mu_{1}=\ldots=\mu_{N}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = … = italic_μ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT and (σR2)1==(σR2)Nsubscriptsuperscriptsubscript𝜎𝑅21subscriptsuperscriptsubscript𝜎𝑅2𝑁(\sigma_{R}^{2})_{1}=\ldots=(\sigma_{R}^{2})_{N}( italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = … = ( italic_σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. Although the proof requires independence of the |Rj(i)|superscriptsubscript𝑅𝑗𝑖|R_{j}^{(i)}|| italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT |, there are cases when it might suffice to relax this assumption by estimating the mean and variance pixel-wise uniformly, i.e.,

S^=1lNi=1lj=1NRj(i)andσ^R2=1lN1i=1lj=1N(Rj(i)S^)2.formulae-sequence^𝑆1𝑙𝑁superscriptsubscript𝑖1𝑙superscriptsubscript𝑗1𝑁superscriptsubscript𝑅𝑗𝑖andsuperscriptsubscript^𝜎𝑅21𝑙𝑁1superscriptsubscript𝑖1𝑙superscriptsubscript𝑗1𝑁superscriptsuperscriptsubscript𝑅𝑗𝑖^𝑆2\hat{S}=\frac{1}{l\cdot N}\sum\limits_{i=1}^{l}\sum\limits_{j=1}^{N}R_{j}^{(i)% }\qquad\text{and}\qquad\hat{\sigma}_{R}^{2}=\frac{1}{l\cdot N-1}\sum\limits_{i% =1}^{l}\sum\limits_{j=1}^{N}(R_{j}^{(i)}-\hat{S})^{2}.over^ start_ARG italic_S end_ARG = divide start_ARG 1 end_ARG start_ARG italic_l ⋅ italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT and over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_l ⋅ italic_N - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - over^ start_ARG italic_S end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (8)

In addition to saving computational resources, accuracy improves due to the higher number of samples. Furthermore, instead of solving the optimization problem (7) for every j{1,,N}𝑗1𝑁j\in\{1,\ldots,N\}italic_j ∈ { 1 , … , italic_N }, it might be a good idea to choose γ1==γNsubscript𝛾1subscript𝛾𝑁\gamma_{1}=\ldots=\gamma_{N}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = … = italic_γ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as the minimizer of

minγj(0,11lα)σj=1N(MΣ^M)jj1/2mNlog(1γjα)+cl((1γj)α)1Nj=1N(σ^R)j.subscriptsubscript𝛾𝑗011𝑙𝛼𝜎superscriptsubscript𝑗1𝑁superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚𝑁1subscript𝛾𝑗𝛼subscript𝑐𝑙1subscript𝛾𝑗𝛼1𝑁superscriptsubscript𝑗1𝑁subscriptsubscript^𝜎𝑅𝑗\min\limits_{\gamma_{j}\in\left(0,1-\frac{1}{l\alpha}\right)}\frac{\sigma\sum% \limits_{j=1}^{N}(M\hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}N}\sqrt{\log\left(% \frac{1}{\gamma_{j}\alpha}\right)}+c_{l}\left((1-\gamma_{j})\alpha\right)\cdot% \frac{1}{N}\sum\limits_{j=1}^{N}(\hat{\sigma}_{R})_{j}.roman_min start_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ ( 0 , 1 - divide start_ARG 1 end_ARG start_ARG italic_l italic_α end_ARG ) end_POSTSUBSCRIPT divide start_ARG italic_σ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG italic_N end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α end_ARG ) end_ARG + italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α ) ⋅ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (9)

Then, one γ𝛾\gammaitalic_γ can be used for computing the potentially different radii rj(α)subscript𝑟𝑗𝛼r_{j}(\alpha)italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ).

Appendix B Proofs

Proof of Theorem 2.

The statement xj(l+1)Cj(α)subscriptsuperscript𝑥𝑙1𝑗subscript𝐶𝑗𝛼x^{(l+1)}_{j}\in C_{j}(\alpha)italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) is equivalent to |(x^u)j(l+1)xj(l+1)|rj(α)subscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗subscriptsuperscript𝑥𝑙1𝑗subscript𝑟𝑗𝛼|(\hat{x}^{u})^{(l+1)}_{j}-x^{(l+1)}_{j}|\leq r_{j}(\alpha)| ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ). To prove (5), we show that

(|(x^u)j(l+1)xj(l+1)|rj(α))αsubscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗subscriptsuperscript𝑥𝑙1𝑗subscript𝑟𝑗𝛼𝛼\mathbb{P}\left(|(\hat{x}^{u})^{(l+1)}_{j}-x^{(l+1)}_{j}|\geq r_{j}(\alpha)% \right)\leq\alphablackboard_P ( | ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) ≤ italic_α

In the next step, we write the radius r(α)𝑟𝛼r(\alpha)italic_r ( italic_α ) as the sum r(α)=rW(α)+rR(α)𝑟𝛼superscript𝑟𝑊𝛼superscript𝑟𝑅𝛼r(\alpha)=r^{W}(\alpha)+r^{R}(\alpha)italic_r ( italic_α ) = italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_α ) + italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT ( italic_α ). According to the decomposition (x^u)j(l+1)xj(l+1)=Wj+Rjsubscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗subscriptsuperscript𝑥𝑙1𝑗subscript𝑊𝑗subscript𝑅𝑗(\hat{x}^{u})^{(l+1)}_{j}-x^{(l+1)}_{j}=W_{j}+R_{j}( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT we obtain for fixed j{1,,N}𝑗1𝑁j\in\{1,\ldots,N\}italic_j ∈ { 1 , … , italic_N }

(|(x^ju)(l+1)xj(l+1)|rjW(α)+rjR(α))=(|Wj+Rj|rjW(α)+rjR(α))superscriptsubscriptsuperscript^𝑥𝑢𝑗𝑙1superscriptsubscript𝑥𝑗𝑙1subscriptsuperscript𝑟𝑊𝑗𝛼subscriptsuperscript𝑟𝑅𝑗𝛼subscript𝑊𝑗subscript𝑅𝑗subscriptsuperscript𝑟𝑊𝑗𝛼subscriptsuperscript𝑟𝑅𝑗𝛼\displaystyle\mathbb{P}\left(|(\hat{x}^{u}_{j})^{(l+1)}-x_{j}^{(l+1)}|\geq r^{% W}_{j}(\alpha)+r^{R}_{j}(\alpha)\right)=\mathbb{P}\left(|W_{j}+R_{j}|\geq r^{W% }_{j}(\alpha)+r^{R}_{j}(\alpha)\right)blackboard_P ( | ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) + italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) = blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) + italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) )
\displaystyle\leq (|Wj|+|Rj|rjW(α)+rjR(α))(|Wj|rjW(α))+(|Rj|rjR(α))subscript𝑊𝑗subscript𝑅𝑗superscriptsubscript𝑟𝑗𝑊𝛼superscriptsubscript𝑟𝑗𝑅𝛼subscript𝑊𝑗subscriptsuperscript𝑟𝑊𝑗𝛼subscript𝑅𝑗subscriptsuperscript𝑟𝑅𝑗𝛼\displaystyle\mathbb{P}\left(|W_{j}|+|R_{j}|\geq r_{j}^{W}(\alpha)+r_{j}^{R}(% \alpha)\right)\leq\mathbb{P}\left(|W_{j}|\geq r^{W}_{j}(\alpha)\right)+\mathbb% {P}\left(|R_{j}|\geq r^{R}_{j}(\alpha)\right)blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | + | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_α ) + italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT ( italic_α ) ) ≤ blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) + blackboard_P ( | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) )

where the last step follows from the pigeonhole principle. To estimate the first summand, we set rjW(α):=σ(MΣ^M)jj1/2mlog(1γjα)assignsubscriptsuperscript𝑟𝑊𝑗𝛼𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗12𝑚1subscript𝛾𝑗𝛼r^{W}_{j}(\alpha):=\frac{\sigma(M\hat{\Sigma}M^{*})_{jj}^{1/2}}{\sqrt{m}}\sqrt% {\log\left(\frac{1}{\gamma_{j}\alpha}\right)}italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) := divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α end_ARG ) end_ARG. Since |Wj|Rice(0,σ(MΣ^M)jj1/22m)similar-tosubscript𝑊𝑗Rice0𝜎superscriptsubscript𝑀^Σsuperscript𝑀𝑗𝑗122𝑚|W_{j}|\sim\operatorname{Rice}\left(0,\frac{\sigma(M\hat{\Sigma}M^{*})_{jj}^{1% /2}}{\sqrt{2m}}\right)| italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ∼ roman_Rice ( 0 , divide start_ARG italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG 2 italic_m end_ARG end_ARG ) we obtain

(|Wj|rjW(α))subscript𝑊𝑗subscriptsuperscript𝑟𝑊𝑗𝛼\displaystyle\mathbb{P}\left(|W_{j}|\geq r^{W}_{j}(\alpha)\right)blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) =2mσ2Σ^jjrjW(α)xexp(x2mσ2(MΣ^M)jj)𝑑x=(rjW(α))2mσ2(MΣ^M)jjexp(u)𝑑uabsent2𝑚superscript𝜎2subscript^Σ𝑗𝑗superscriptsubscriptsubscriptsuperscript𝑟𝑊𝑗𝛼𝑥superscript𝑥2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗differential-d𝑥subscriptsuperscriptsubscriptsuperscript𝑟𝑊𝑗𝛼2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗𝑢differential-d𝑢\displaystyle=\frac{2m}{\sigma^{2}\hat{\Sigma}_{jj}}\int\limits_{r^{W}_{j}(% \alpha)}^{\infty}x\exp\left(-\frac{x^{2}m}{\sigma^{2}(M\hat{\Sigma}M^{*})_{jj}% }\right)dx=\int\limits_{\frac{(r^{W}_{j}(\alpha))^{2}m}{\sigma^{2}(M\hat{% \Sigma}M^{*})_{jj}}}\exp(-u)du= divide start_ARG 2 italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x roman_exp ( - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) italic_d italic_x = ∫ start_POSTSUBSCRIPT divide start_ARG ( italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT roman_exp ( - italic_u ) italic_d italic_u
=exp((rjW(α))2mσ2(MΣ^M)jj)=exp(log(1/γjα))=γjα.absentsuperscriptsubscriptsuperscript𝑟𝑊𝑗𝛼2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗1subscript𝛾𝑗𝛼subscript𝛾𝑗𝛼\displaystyle=\exp\left(-\frac{(r^{W}_{j}(\alpha))^{2}m}{\sigma^{2}(M\hat{% \Sigma}M^{*})_{jj}}\right)=\exp\left(-\log(1/\gamma_{j}\alpha)\right)=\gamma_{% j}\alpha.= roman_exp ( - divide start_ARG ( italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) = roman_exp ( - roman_log ( 1 / italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α ) ) = italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α .

For estimating the term (|Rj|rjR(α))subscript𝑅𝑗subscriptsuperscript𝑟𝑅𝑗𝛼\mathbb{P}\left(|R_{j}|\geq r^{R}_{j}(\alpha)\right)blackboard_P ( | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ), we set rjR(α)=cl(α)(σ^R)j+S^jsubscriptsuperscript𝑟𝑅𝑗𝛼subscript𝑐𝑙𝛼subscriptsubscript^𝜎𝑅𝑗subscript^𝑆𝑗r^{R}_{j}(\alpha)=c_{l}(\alpha)\cdot(\hat{\sigma}_{R})_{j}+\hat{S}_{j}italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) ⋅ ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. This choice leads to

(|Rj|rjR(α))subscript𝑅𝑗subscriptsuperscript𝑟𝑅𝑗𝛼\displaystyle\mathbb{P}\left(|R_{j}|\geq r^{R}_{j}(\alpha)\right)blackboard_P ( | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) =(|Rj|S^jrjR(α)S^j)(||Rj|S^j|rjR(α)S^j)absentsubscript𝑅𝑗subscript^𝑆𝑗subscriptsuperscript𝑟𝑅𝑗𝛼subscript^𝑆𝑗subscript𝑅𝑗subscript^𝑆𝑗subscriptsuperscript𝑟𝑅𝑗𝛼subscript^𝑆𝑗\displaystyle=\mathbb{P}\left(|R_{j}|-\hat{S}_{j}\geq r^{R}_{j}(\alpha)-\hat{S% }_{j}\right)\leq\mathbb{P}\left(\left||R_{j}|-\hat{S}_{j}\right|\geq r^{R}_{j}% (\alpha)-\hat{S}_{j}\right)= blackboard_P ( | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≤ blackboard_P ( | | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=(||Rj|S^j|(σ^R)jrjR(α)S^j(σ^R)j)=(||Rj|S^j|(σ^R)jcl(α)).absentsubscript𝑅𝑗subscript^𝑆𝑗subscriptsubscript^𝜎𝑅𝑗subscriptsuperscript𝑟𝑅𝑗𝛼subscript^𝑆𝑗subscriptsubscript^𝜎𝑅𝑗subscript𝑅𝑗subscript^𝑆𝑗subscriptsubscript^𝜎𝑅𝑗subscript𝑐𝑙𝛼\displaystyle=\mathbb{P}\left(\frac{\left||R_{j}|-\hat{S}_{j}\right|}{(\hat{% \sigma}_{R})_{j}}\geq\frac{r^{R}_{j}(\alpha)-\hat{S}_{j}}{(\hat{\sigma}_{R})_{% j}}\right)=\mathbb{P}\left(\frac{\left||R_{j}|-\hat{S}_{j}\right|}{(\hat{% \sigma}_{R})_{j}}\geq c_{l}(\alpha)\right).= blackboard_P ( divide start_ARG | | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ≥ divide start_ARG italic_r start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) = blackboard_P ( divide start_ARG | | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ≥ italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) ) .

Now, we apply an empirical version of Chebyshev’s inequality [Saw et al., 1984, Stellato et al., 2017]. This leads to

(||Rj|S^j|(σ^R)jcl(α))subscript𝑅𝑗subscript^𝑆𝑗subscriptsubscript^𝜎𝑅𝑗subscript𝑐𝑙𝛼\displaystyle\mathbb{P}\left(\frac{\left||R_{j}|-\hat{S}_{j}\right|}{(\hat{% \sigma}_{R})_{j}}\geq c_{l}(\alpha)\right)blackboard_P ( divide start_ARG | | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | end_ARG start_ARG ( over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ≥ italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) ) min{1,1l+1(l+1)(l21+lcl(α)2)l2cl(α)2}absent11𝑙1𝑙1superscript𝑙21𝑙subscript𝑐𝑙superscript𝛼2superscript𝑙2subscript𝑐𝑙superscript𝛼2\displaystyle\leq\min\left\{1,\frac{1}{l+1}\left\lfloor\frac{(l+1)(l^{2}-1+lc_% {l}(\alpha)^{2})}{l^{2}c_{l}(\alpha)^{2}}\right\rfloor\right\}≤ roman_min { 1 , divide start_ARG 1 end_ARG start_ARG italic_l + 1 end_ARG ⌊ divide start_ARG ( italic_l + 1 ) ( italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 + italic_l italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⌋ }
min{1,l21+lcl(α)2l2cl(α)2}=min{1,l21+l21l(1γj)α1l(l21)l(1γj)α1}absent1superscript𝑙21𝑙subscript𝑐𝑙superscript𝛼2superscript𝑙2subscript𝑐𝑙superscript𝛼21superscript𝑙21superscript𝑙21𝑙1subscript𝛾𝑗𝛼1𝑙superscript𝑙21𝑙1subscript𝛾𝑗𝛼1\displaystyle\leq\min\left\{1,\frac{l^{2}-1+lc_{l}(\alpha)^{2}}{l^{2}c_{l}(% \alpha)^{2}}\right\}=\min\left\{1,\frac{l^{2}-1+\frac{l^{2}-1}{l(1-\gamma_{j})% \alpha-1}}{\frac{l(l^{2}-1)}{l(1-\gamma_{j})\alpha-1}}\right\}≤ roman_min { 1 , divide start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 + italic_l italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } = roman_min { 1 , divide start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 + divide start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_l ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - 1 end_ARG end_ARG start_ARG divide start_ARG italic_l ( italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG start_ARG italic_l ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - 1 end_ARG end_ARG }
=min{1,1+1l(1γj)α1ll(1γj)α1}=min{1,(1γj)α}=(1γj)α,absent111𝑙1subscript𝛾𝑗𝛼1𝑙𝑙1subscript𝛾𝑗𝛼111subscript𝛾𝑗𝛼1subscript𝛾𝑗𝛼\displaystyle=\min\left\{1,\frac{1+\frac{1}{l(1-\gamma_{j})\alpha-1}}{\frac{l}% {l(1-\gamma_{j})\alpha-1}}\right\}=\min\left\{1,(1-\gamma_{j})\alpha\right\}=(% 1-\gamma_{j})\alpha,= roman_min { 1 , divide start_ARG 1 + divide start_ARG 1 end_ARG start_ARG italic_l ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - 1 end_ARG end_ARG start_ARG divide start_ARG italic_l end_ARG start_ARG italic_l ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α - 1 end_ARG end_ARG } = roman_min { 1 , ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α } = ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α ,

where we used in the last step, that (1γj)α<α<11subscript𝛾𝑗𝛼𝛼1(1-\gamma_{j})\alpha<\alpha<1( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α < italic_α < 1. To summarize,

(|(x^u)j(l+1)xj(l+1)|rj(α))subscriptsuperscriptsuperscript^𝑥𝑢𝑙1𝑗subscriptsuperscript𝑥𝑙1𝑗subscript𝑟𝑗𝛼\displaystyle\mathbb{P}\left(|(\hat{x}^{u})^{(l+1)}_{j}-x^{(l+1)}_{j}|\geq r_{% j}(\alpha)\right)blackboard_P ( | ( over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) (|Wj|rjW(α))+(|Rj|rjR(α))absentsubscript𝑊𝑗superscriptsubscript𝑟𝑗𝑊𝛼subscript𝑅𝑗superscriptsubscript𝑟𝑗𝑅𝛼\displaystyle\leq\mathbb{P}\left(|W_{j}|\geq r_{j}^{W}(\alpha)\right)+\mathbb{% P}\left(|R_{j}|\geq r_{j}^{R}(\alpha)\right)≤ blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_α ) ) + blackboard_P ( | italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≥ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT ( italic_α ) )
γjα+(1γj)α=α.absentsubscript𝛾𝑗𝛼1subscript𝛾𝑗𝛼𝛼\displaystyle\leq\gamma_{j}\alpha+(1-\gamma_{j})\alpha=\alpha.≤ italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_α + ( 1 - italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_α = italic_α .

Proof of Theorem 3.

Since W𝒞𝒩(0,σ2mMΣ^M)similar-to𝑊𝒞𝒩0superscript𝜎2𝑚𝑀^Σsuperscript𝑀W\sim\mathcal{CN}(0,\frac{\sigma^{2}}{m}M\hat{\Sigma}M^{*})italic_W ∼ caligraphic_C caligraphic_N ( 0 , divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m end_ARG italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and R𝒞𝒩(0,1mΣR)similar-to𝑅𝒞𝒩01𝑚subscriptΣ𝑅R\sim\mathcal{CN}(0,\frac{1}{m}\Sigma_{R})italic_R ∼ caligraphic_C caligraphic_N ( 0 , divide start_ARG 1 end_ARG start_ARG italic_m end_ARG roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) the estimation error x^ux0=W+Rsuperscript^𝑥𝑢superscript𝑥0𝑊𝑅\hat{x}^{u}-x^{0}=W+Rover^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = italic_W + italic_R follows again a multivariate normal distribution with zero mean and covariance matrix 1m(σ2MΣ^M+ΣR)1𝑚superscript𝜎2𝑀^Σsuperscript𝑀subscriptΣ𝑅\frac{1}{m}(\sigma^{2}M\hat{\Sigma}M^{*}+\Sigma_{R})divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ). By exploiting the Gaussian distribution, we obtain

(|Wj+Rj|>rjG(α))subscript𝑊𝑗subscript𝑅𝑗subscriptsuperscript𝑟𝐺𝑗𝛼\displaystyle\mathbb{P}\left(|W_{j}+R_{j}|>r^{G}_{j}(\alpha)\right)blackboard_P ( | italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | > italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) =2mσ2(MΣ^M)jj+(ΣR)jjrjG(α)xexp(x2mσ2(MΣ^M)jj+(ΣR)jj)𝑑xabsent2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗superscriptsubscriptsubscriptsuperscript𝑟𝐺𝑗𝛼𝑥superscript𝑥2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗differential-d𝑥\displaystyle=\frac{2m}{\sigma^{2}(M\hat{\Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}}% \int\limits_{r^{G}_{j}(\alpha)}^{\infty}x\exp\left(-\frac{x^{2}m}{\sigma^{2}(M% \hat{\Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}}\right)dx= divide start_ARG 2 italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x roman_exp ( - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) italic_d italic_x
=rjG(α)2mσ2(MΣ^M)jj+(ΣR)jjexp(u)𝑑u=exp(rjG(α)2mσ2(MΣ^M)jj+(ΣR)jj)absentsubscriptsubscriptsuperscript𝑟𝐺𝑗superscript𝛼2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗𝑢differential-d𝑢subscriptsuperscript𝑟𝐺𝑗superscript𝛼2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗\displaystyle=\int\limits_{\frac{r^{G}_{j}(\alpha)^{2}m}{\sigma^{2}(M\hat{% \Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}}}\exp(-u)du=\exp\left(-\frac{r^{G}_{j}(% \alpha)^{2}m}{\sigma^{2}(M\hat{\Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}}\right)= ∫ start_POSTSUBSCRIPT divide start_ARG italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT roman_exp ( - italic_u ) italic_d italic_u = roman_exp ( - divide start_ARG italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG )

Thus, we have

(|x^juxj|>rjG(α))exp(rjG(α)2mσ2(MΣ^M)jj+(ΣR)jj),subscriptsuperscript^𝑥𝑢𝑗superscriptsubscript𝑥𝑗subscriptsuperscript𝑟𝐺𝑗𝛼subscriptsuperscript𝑟𝐺𝑗superscript𝛼2𝑚superscript𝜎2subscript𝑀^Σsuperscript𝑀𝑗𝑗subscriptsubscriptΣ𝑅𝑗𝑗\mathbb{P}(|\hat{x}^{u}_{j}-x_{j}^{*}|>r^{G}_{j}(\alpha))\leq\exp\left(-\frac{% r^{G}_{j}(\alpha)^{2}m}{\sigma^{2}(M\hat{\Sigma}M^{*})_{jj}+(\Sigma_{R})_{jj}}% \right),blackboard_P ( | over^ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | > italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) ) ≤ roman_exp ( - divide start_ARG italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG ) ,

which needs to be equal to α>0𝛼0\alpha>0italic_α > 0. Therefore,

rjG(α)=(σ(MΣ^Mjj+(ΣR)jj)1/2mlog(1α).r^{G}_{j}(\alpha)=\frac{(\sigma(M\hat{\Sigma}M_{jj}+(\Sigma_{R})_{jj})^{1/2}}{% \sqrt{m}}\sqrt{\log\left(\frac{1}{\alpha}\right)}.italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_α ) = divide start_ARG ( italic_σ ( italic_M over^ start_ARG roman_Σ end_ARG italic_M start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ( roman_Σ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG square-root start_ARG roman_log ( divide start_ARG 1 end_ARG start_ARG italic_α end_ARG ) end_ARG .

Appendix C Further Numerical Evaluation

To confirm our theoretical findings claiming that the incorporation of the bias component renders the confidence intervals more robust, we present additional numerical experiments here.

UQ for Classical Model-Based Regression

For the experiments described here, we use Tfocs [Becker et al., 2011]. Analogous to the experiment described in Section 5.1, we run further experiments in the classical sparse regression setting when the measurement matrix is a Gaussian and subsampled Fourier matrix. The different settings, including the results, can be found in Table 1. The results show that the Gaussian adjustment of our proposed method significantly increases the hit rates, especially on the support, while moderately increasing the confidence interval length. Our data-driven adjustment achieves even better hit rates, but the confidence intervals are larger. Although in well-posed settings, like the second column of Table 1, the hit rates hW(0.05)superscript𝑊0.05h^{W}(0.05)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) based on asymptotic confidence intervals lead overall to almost 95%percent9595\%95 %, however on the support, which are the crucial features, the asymptotic hit rates fail. In particular, our corrections are essential in ill-posed regression problems as the third Gaussian column. The hit rates for the asymptotic CIs and the corrected ones with Gaussian adjustment are visualized in more detail in Figure 4.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Refer to caption
(l)
Figure 4: Box plots for hit rates of sparse regression experiments. The settings are those described in Table 1. The first row presents the hit rates over all components, and the second the hit rates of the support, e.g., 4(a) and 4(g) correspond to the first column of the table, 4(b) and 4(h) to the second one and so forth. In each plot, the left box represents the asymptotic hit rates, and the right one the Gaussian-adjusted hit rates.
Gaussian Fourier
feature dimension 1000 10000 1000 10000 100000
undersampling 50% 40% 60% 40% 60% 50%
sparsity 7.5% 2% 10% 5% 10% 5%
relative noise 15% 10% 20% 15% 5% 10%
R/W 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-ratio 0.7062 0.5117 0.9993 0.5446 0.5924 0.4701
R/W subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-ratio 0.8191 0.5161 1.1581 0.5752 0.6088 0.4794
average asympt. radius rW(0.05)superscript𝑟𝑊0.05r^{W}(0.05)italic_r start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.0116 0.0027 0.0049 0.0130 0.0011 0.0008
av. radius rG(0.05)superscript𝑟𝐺0.05r^{G}(0.05)italic_r start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) (Thm.3) 0.0142 0.0031 0.0069 0.0148 0.0013 0.0009
av. radius r(0.05)𝑟0.05r(0.05)italic_r ( 0.05 ) (Thm.2) 0.0304 0.0060 0.0155 0.0295 0.0026 0.0016
hW(0.05)superscript𝑊0.05h^{W}(0.05)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.9437 0.9353 0.8692 0.9582 0.9246 0.9444
hG(0.05)superscript𝐺0.05h^{G}(0.05)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) 0.9787 0.9684 0.9691 0.9799 0.9665 0.9687
h(0.05)0.05h(0.05)italic_h ( 0.05 ) 1 1 1 1 1 0.9999
hSW(0.05)subscriptsuperscript𝑊𝑆0.05h^{W}_{S}(0.05)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) 0.7661 0.8941 0.6783 0.8629 0.8745 0.9041
hSG(0.05)subscriptsuperscript𝐺𝑆0.05h^{G}_{S}(0.05)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) 0.8852 0.9421 0.8948 0.9226 0.9396 0.9425
hS(0.05)subscript𝑆0.05h_{S}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) 0.9999 1 1 0.9999 1 0.9998
Table 1: Experiments for sparse regression for Gaussian and Fourier matrix. Every experiment uses 500 estimation and 250 evaluation data.

UQ for MRI Reconstruction with Neural Networks

In this section, we present more experiments for UQ for MRI reconstruction with neural networks. Our experimental settings, as well as our code for this experiment, are based on the paper and code 333https://github.com/jmaces/robust-nets by [Genzel et al., 2022a]. The dataset used for conducting the experiments is the fastMRI single-coil knee dataset. For documentation, see [Zbontar et al., 2019, Knoll et al., 2020].
Table 2 represents the results obtained by learning the reconstruction function X^^𝑋\hat{X}over^ start_ARG italic_X end_ARG using the It-net with 8888 layers, with 60%,40%percent60percent4060\%,40\%60 % , 40 % and 30%percent3030\%30 % radial undersampling and for noise levels obtained by adding complex gaussian noise with standard deviation σ=60𝜎60\sigma=60italic_σ = 60 and σ=84𝜎84\sigma=84italic_σ = 84, respectively. Similarly, Table 3 shows the results obtained by the U-Net. In Figure 6, the asymptotic hit rates and the Gaussian adjusted ones for the 95%percent9595\%95 % confidence level are compared in a box plot for each experiment.

All It-Net and U-Nets are trained with a combination of the MS-SSIM-loss [Wang et al., 2003], the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-loss and the Adam optimizer with a learning rate of 5e55superscript𝑒55e^{-5}5 italic_e start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, epsilon of 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and weight decay parameter 1e51superscript𝑒51e^{-5}1 italic_e start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT. The It-Nets were trained for 15151515 epochs, and the U-Nets were trained for 20202020 epochs, both with batch size 40404040. Every U-Net has 2222 input and output channels, 24242424 base channels, and encodes the image to a size of 20×20202020\times 2020 × 20 and at most 384384384384 channels. The It-Net employs the U-Net in each layer as a residual network and has a data consistency part around each U-Net in every layer.

Comparing the tables, the It-Net has, in general, better hit rates as well as a better R/W ratio than the U-Net due to its more accurate reconstruction. Further, the hit rates for all the pixels are higher than those obtained only for the support. For achieving reliable results for safety-critical applications, obtaining hit rates higher than the confidence level is crucial, especially on the support, i.e., on the non-zeros pixels. Otherwise, one might achieve a certain confidence level overall but cannot trust the pixels of interest.

The experiments were conducted using Pytorch 1.91.91.91.9 on a desktop with AMD EPYC 7F52 16-Core CPUs and NVIDIA A100 PCIe030 030 40GB GPUs. The code for the experiments can be found in the supplementary material. The execution time of the code is around 5555 hours for each It-Net, 2222 hours for each U-Net, and around 30303030 minutes for the rest of each experiment. So, in total, this gives us a time of 48484848 hours for the MRI reconstruction experiments. The execution time for the classical model-based regression experiments takes 5555 to 30303030 minutes each; therefore, in total, it is less than 3333 hours.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 5: Knee MRI groundtruth image from fastMRI dataset 5(a) [Zbontar et al., 2019, Knoll et al., 2020], radial sampling mask 5(b) and undersampled k-space data 5(c).
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Refer to caption
(l)
Figure 6: Box plots for hit rates of neural network experiments. The settings are those described in Table 2 and Table 3. The first row presents the hit rates for the different It-nets for 95%percent9595\%95 % confident intervals and the second the hit rates for the U-Nets for 95%percent9595\%95 % confident intervals, e.g., 6(a) and 6(g) correspond to the first columns of the tables. In each plot, the left box represents the asymptotic hit rates, the right the Gaussian-adjusted ones.
undersampling 60% 40% 30%
noise level 15% 10% 12% 8% 10% 7%
R/W 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-ratio 0.3558 0.3800 0.6332 0.6922 0.8759 0.5759
R/W subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-ratio 0.3842 0.4929 0.6268 0.6924 1.0614 0.6282
h(0.05)0.05h(0.05)italic_h ( 0.05 ) 1.0 1.0 1.0 1.0 1.0 1.0
hG(0.05)superscript𝐺0.05h^{G}(0.05)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) 0.9904 0.9899 0.9925 0.9919 0.9926 0.9893
hW(0.05)superscript𝑊0.05h^{W}(0.05)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.9737 0.9787 0.9632 0.9744 0.8825 0.9806
h(0.1)0.1h(0.1)italic_h ( 0.1 ) 0.9999 0.9999 1.0 1.0 1.0 1.0
hG(0.1)superscript𝐺0.1h^{G}(0.1)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.1 ) 0.9754 0.9752 0.9800 0.9793 0.98 0.9741
hW(0.1)superscript𝑊0.1h^{W}(0.1)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.1 ) 0.9403 0.9502 0.9196 0.9409 0.8094 0.9581
hS(0.05)subscript𝑆0.05h_{S}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) 1.0 1.0 1.0 1.0 1.0 1.0
hSG(0.05)superscriptsubscript𝑆𝐺0.05h_{S}^{G}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) 0.965 0.99 0.995 0.9995 0.985 0.985
hSW(0.05)superscriptsubscript𝑆𝑊0.05h_{S}^{W}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.94 0.91 0.955 0.915 0.975 0.965
hS(0.1)subscript𝑆0.1h_{S}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.1 ) 1.0 1.0 1.0 1.0 1.0 1.0
hSG(0.1)superscriptsubscript𝑆𝐺0.1h_{S}^{G}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.1 ) 0.955 0.98 0.985 0.995 0.98 0.975
hSW(0.1)superscriptsubscript𝑆𝑊0.1h_{S}^{W}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.1 ) 0.89 0.87 0.92 0.845 0.955 0.94
Table 2: Experiments for It-Net with 8 iterations. Results of hit rates averaged over k=100𝑘100k=100italic_k = 100 samples.
undersampling 60% 40% 30%
noise level 15% 10% 12% 8% 10% 7%
R/W 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-ratio 0.2747 0.3976 0.6182 0.6671 1.2641 1.3399
R/W subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-ratio 0.3923 0.4292 0.5681 0.7082 1.1668 1.2501
h(0.05)0.05h(0.05)italic_h ( 0.05 ) 1.0 1.0 1.0 1.0 1.0 1.0
hG(0.05)superscript𝐺0.05h^{G}(0.05)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) 0.9831 0.9857 0.9885 0.9903 0.9952 0.9943
hW(0.05)superscript𝑊0.05h^{W}(0.05)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.9642 0.9687 0.9439 0.9562 0.7277 0.7002
h(0.1)0.1h(0.1)italic_h ( 0.1 ) 0.9994 0.9999 1.0 1.0 1.0 1.0
hG(0.1)superscript𝐺0.1h^{G}(0.1)italic_h start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.1 ) 0.9623 0.9674 0.9723 0.9763 0.9867 0.9853
hW(0.1)superscript𝑊0.1h^{W}(0.1)italic_h start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.1 ) 0.9230 0.9310 0.8869 0.9091 0.6088 0.5585
hS(0.05)subscript𝑆0.05h_{S}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.05 ) 1.0 1.0 1.0 1.0 1.0 1.0
hSG(0.05)superscriptsubscript𝑆𝐺0.05h_{S}^{G}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.05 ) 0.99 0.995 0.995 0.99 0.99 0.995
hSW(0.05)superscriptsubscript𝑆𝑊0.05h_{S}^{W}(0.05)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.05 ) 0.945 0.93 0.935 0.955 0.925 0.695
hS(0.1)subscript𝑆0.1h_{S}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( 0.1 ) 1.0 1.0 1.0 1.0 1.0 1.0
hSG(0.1)superscriptsubscript𝑆𝐺0.1h_{S}^{G}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ( 0.1 ) 0.985 0.965 0.985 0.975 0.98 0.995
hSW(0.1)superscriptsubscript𝑆𝑊0.1h_{S}^{W}(0.1)italic_h start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( 0.1 ) 0.895 0.895 0.91 0.895 0.875 0.605
Table 3: Experiments for U-Nets. Results of hit rates averaged over k=100𝑘100k=100italic_k = 100 samples.

Appendix D Distribution Visualization of Remainder Term

In Figure 7, we present a series of histograms illustrating the empirical distribution of the remainder term’s real part across all experimental settings in sparse regression conducted in this paper. These histograms provide evidence that the remainder term can be approximated by a Gaussian distribution, with the approximation becoming increasingly precise as the dimensionality increases. Across low-dimensional scenarios, the empirical distributions exhibit some deviations from the Gaussian form, but these discrepancies diminish as the dimensionality grows larger. In high-dimensional regimes, the empirical distributions demonstrate an exceptional degree of convergence to the Gaussian approximation. This close alignment lends strong support to the validity of the key assumption of Theorem 3, allowing a Gaussian adjustment to the confidence intervals.

In Figure 8 we present a series of histograms representing the empirical distribution of the remainder term’s real part for the six different experimental settings, for the U-Net, conducted in this paper. Figure 9 represents the histograms for the It-Net experiments. In most scenarios, the real part of the remainder term is Gaussian distributed with mean 00. The only exceptions are Figures 8(e) and 8(f), which correspond to the U-Net experiments with 30%percent3030\%30 % undersampling.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 7: Histograms showing the empirical distribution of the remainder component. The real part of R𝑅Ritalic_R is plotted as a histogram and the red line corresponds to a Gaussian fit. One realization of the remainder term is visualized for each of the experiments described in Table 1. (7(a) corresponds to the first column, 7(b) to the second column and so on. The plots for the imaginary part look similar.)
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 8: Histograms showing the empirical distribution of the remainder component. The real part of R𝑅Ritalic_R is plotted as a histogram and the red line corresponds to a Gaussian fit. One realization of the remainder term is visualized for each of the experiments described in Table 3. (8(a) corresponds to the first column, 8(b) to the second column and so on.)
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Figure 9: Histograms showing the empirical distribution of the remainder component. The real part of R𝑅Ritalic_R is plotted as a histogram and the red line corresponds to a Gaussian fit. One realization of the remainder term is visualized for each of the experiments described in Table 2. (9(a) corresponds to the first column, 9(b) to the second column and so on. )