Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Image super-resolution with PCA Reduced generalized Gaussian mixture models Dang-Phuong-Lan Nguyen, Johannes Hertrich, Jean-Francois Aujol, Yannick Berthoumieu To cite this version: Dang-Phuong-Lan Nguyen, Johannes Hertrich, Jean-Francois Aujol, Yannick Berthoumieu. Image super-resolution with PCA Reduced generalized Gaussian mixture models. 2022. ฀hal-03664839฀ HAL Id: hal-03664839 https://hal.science/hal-03664839 Preprint submitted on 11 May 2022 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Image super-resolution with PCA Reduced generalized Gaussian mixture models Dang-Phuong-Lan Nguyen ∗† Johannes Hertrich Yannick Berthoumieu ‡ Jean-Francois Aujol ∗ † May 11, 2022 Abstract Single Image Super-Resolution algorithms based on patches have been noticed and widely used over the past decade. Recently, generalized Gaussian mixture models (GGMMs) have been shown to be a suitable tool for many image processing problems because of the flexible shape parameter. In this work, we first propose to use a joint GGMM learned from concatenated vectors of high- and low-resolution training patches. For each low-resolution patch, we compute the minimum mean square error (MMSE) estimator and generate the high-resolution image by averaging these estimates. We select the MMSE approach using GGMM as the method is invariant to affine contrast change and also invariant to a linear super-resolution operator. Unfortunately, the large dimension of the concatenated high- and low-resolution patches leads to instabilities and an intractable computational effort when estimating the parameters of the GGMM. Thus, we propose to combine a GGMM with a principal component analysis and derive an EM algorithm for estimating the parameters of the arising model. We demonstrate the performance of our model by numerical examples on synthetic and real images of materials’ microstructures. Key words: Generalized Gaussian mixture model, image super-resolution, high-dimensional data, dimensionality reduction. 1 Introduction Recent developments in imaging techniques deeply modify the way materials science consider their objects of research. The motivation to develop new super-resolution (SR) methods originates in these modifications. The material image has moved from micron-scale resolution to true nano scale. Studying material data is very challenging because high-resolution (HR) and low-resolution (LR) images have different contrasts. Besides the knowledge of super-resolution operator is not known. In this article, we introduce a novel SR method that deals with the unknown operator and the problem of an affine contrast change. Single-image super-resolution (SISR) aims to reconstruct a high-resolution (HR) image from a single low-resolution (LR) input image. In general, the LR image is generated by XL = AXH + ϵ where XL is the low-resolution image, XH is the high-resolution image, and A is an unknown super-resolution operator. Image restoration techniques can be grouped into two main approaches: pixel-based and patch-based. Patch-based methods are a popular and successful class of image restoration techniques, especially in super-resolution. ∗ Univ. Bordeaux, Bordeaux INP, CNRS, IMB, UMR 5251, F-33400 Talence, France, {lan.nguyen, jeanfrancois.aujol}@math.u-bordeaux.fr † Univ. Bordeaux, Bordeaux INP, CNRS, IMS, UMR 5218, F-33400 Talence, France, yannick.berthoumieu@imsbordeaux.fr ‡ TU Berlin, Straße des 17. Juni 136, D-10587 Berlin, Germany, {j.hertrich}@math.tu-berlin.de. 1 Among the patch-based methods for super-resolution images, Zoran and Weiss [1] proposed a method called expected patch log likelihood (EPLL) which can be applied for several inverse problems. This method uses the negative log-likelihood function of a Gaussian mixture model (GMM) as a regularizer of the inverse problem. The high-resolution image XH is estimated by solving the following: X 2 log p (XH,i ) argmin ∥AXH − XL ∥ − λ XH i∈I where p is the probability density function of the GMM and (XH,i )i∈I are the patches in the highresolution image XH . The following years, S. Parameswaran et al. extended the EPLL method by various accelerations in [2] Fast EPLL (called FEPLL). Before the reconstruction step, these approaches must learn the parameters of the GMM model from the HR training patches. GMM can be learned by the expectation–maximization algorithm (EM) [3, 4]. In the past decade various works [5] [6] [7] have shown that generalized Gaussian mixture model (GGMM) has more flexibility to adapt to the shape of data probability density function and less sensibility for over fitting the number of classes than the GMM. In [8], C. Deledalle et al. proposed a method that uses the generalized Gaussian mixture model (GGMM) for the EPLL algorithm (EPLL-GGMM). They showed that the GGMM gets the distribution of patches better than a GMM and that it outperforms the GMM when used in the EPLL framework. However, the EPLL-GGMM model learns the parameters of the mixture model by estimating the covariance matrix with the following formula: PN T i=1 ξi,k XH,i XH,i Σk = . PN i=1 ξi,k N Here {XH,i }i=1 is a set of all HR patches and ξi,k is the conditional distribution of the component k given the patch XH,i . Unfortunately, this formula is not true in the GGMM setting. In [6], Z. Boukouvalas et al. introduced a Riemannian averaged Fixed-point (RA-FP) algorithm for multivariate generalized Gaussian distribution (MGGD) parameter estimation. The RA-FP algorithm can effectively estimate the scatter matrix for any value of the shape parameter: for instance, β = 1 is the Gaussian distribution and β = 0.5 is the Laplacian distribution. In the case of the generalized Gaussian mixture model, we generalize the algorithm from [9] for the weighted maximum likelihood estimation using the EM algorithm based on a fixed-point algorithm. This algorithm estimates the parameters of the mixture model, including the shape parameters of each component. This algorithm is called FP-EM algorithm for generalized Gaussian mixture model. Furthermore, in some cases, the HR and LR images of the real data may not have the same contrast. Both EPLL and FEPLL methods for GMM or GGMM models require knowledge of the operator A of the super-resolution problem and the contrast change parameters, which we do not know in practice. To avoid this requirement, P. Sandeep et al. [10] proposed a method for the SR problem using a joint GMM, which is invariant to the affine change of contrast. It describes the distribution of the concatenated vectors of high and low resolution patches sampled from a large database of pairs of high resolution images and the corresponding low resolution one. Each highresolution patch is reconstructed from the low-resolution one by using the Minimum Mean Squared Error (MMSE) estimator as: X̂H = argmin MSE (T (XL ) − XL ) . T In our work, we provide a method that uses the MMSE estimator for the super-resolution based on GGMM for joint HR-LR modeling, called MMSE-GGMM. This method adapts the FP-EM algorithm to learn the GGMM models. Our method not only avoids learning the operator A, but is also invariant by affine contrast change. This is demonstrated through Theorem 4 as well as the experimental results in Section 6. However, the EM algorithm for these models (GMMs or GGMMs) becomes very slow because the number of data points becomes large and three-dimensional images. To overcome these issues, we propose a dimensionality reduction method. 2 The standard method for dimensionality reduction is the principal component analysis (PCA) [11], which was introduced in 1901 by Karl Pearson. Based on this method, [12] introduced the probabilistic principal component analysis (PPCA) for the Gaussian distribution. It was extended to the Gaussian Mixture model in [13] with the mixture of probabilistic principal component analysers (MPPCA) model. Following these works, [14, 15] proposed directly a GMM with PCA-based constrained covariance matrices. J. Hertrich et al. proposed a PCA-GMM model in [16], which combines the GMM with a PCA by adding the minimization term of the PCA and the negative log-likelihood function of the GMM on the reduced dimensionality data points. The authors rewrote this minimization problem again as a negative log-likelihood function of a Gaussian mixture model, which has additional constraints on the parameters. To develop the PCA with the GMM model up to the GGMM model, we introduce the PCAGGMM model. The PCA-GGMM model incorporates the PCA model with the negative log likelihood function of the GGMM on the low-dimensional data point. We derive an EM algorithm with a special M-step for finding a minimizer of our objective function. The M-step requires solutions of maximization problems with constraints on the Stiefel manifold. Due to the complexity of the GGMMs and the non-convexity of the PCA-GGMM, we use the ”Half-Quadratic splitting” [17] method to tackle the minimization problem in the M-step of the EM algorithm. Finally, the PCA-GGMM model can be used for the super-resolution model based on the MMSE method for GGMM. The main contribution of the paper is to use the GGMM model for SR. We also show that the MMSE method for the GGMM model has the affine contrast-invariant property. And finally, this paper adapts the PCA method of dimensionality reduction to the GGMM model. The paper is organized as follows. Section 2 briefly recalls the super-resolution problem introduced by [10] using joint mixture models. We revisit the EM algorithm for mixture models and describe the FP-EM algorithm for generalized Gaussian mixture model in Section 3. In Section 4, the PCA-reduced generalized Gaussian distribution is introduced. We derive an algorithm for estimating its parameters. In Section 5, the MMSE Estimator for generalized Gaussian distribution and the affine transformation of contrast and brightness are explained. Finally, Section 6 shows numerical examples of super-resolution based on our models on 2D and 3D images. The paper concludes with a discussion in Section 7. 2 Super-Resolution via Joint Mixture Models In this section, we briefly revisit the super-resolution by [10] using joint mixture models. Let {Pθ : θ ∈ Θ} be a parametric family of probability distributions with density functions p(·|θ). Then, a mixture model is a probability distribution defined by the probability density function F (x|w, Θ) = K X wk p(x|θk ), w = (w1 , ..., wk ), Θ = (θ1 , ..., θk ), (1) k=1 where the weights wk are non-negative and sum up to one. If θ = (µ, Σ) and Pθ = N (µ, Σ), we speak of a Gaussian mixture model (GMM). Throughout this paper, we aim to reconstruct the unknown high-resolution image XH based on a low resolution observation XL . Here, we assume that we have given a reference image, where the high-resolution image x̃H as well as the low-resolution observation x̃L are both known. In this setting, Sandeep and Jacob proposed in [10] to reconstruct XH using the following steps. 2 1. Learning a joint mixture model. In a first step, for given low-resolution patches x̃L,i ∈ Rτ 2 2 of an image and their high-resolution counterparts x̃H,i ∈ Rq τ , q ∈ N, q ≥ 2, i = 1, ..., N , we   x̃ H,i ∈ Rn , approximate the distribution of the joint high- and low-resolution patches xi = x̃L,i n = (q 2 +1)τ 2 by a mixture model. Here, Sandeep and Jacob use GMMs such that the resulting approximation is characterized by the parameters w = (wk )k , µ = (µk )k , 3 Σ = (Σk )k with µk =   µ H,k , µL,k Σ=  Σ H,k ΣT HL,k ΣHL,k  . ΣL,k 2. Estimation of the high-resolution patches using the MMSE estimator. In the second 2 step, we estimate the high resolution correspondence of a given low resolution patch xL ∈ Rτ . For this purpose, we first select the component k∗, such that the likelihood that xL belongs to the k ∗ -th component is maximal, i.e., we compute k ∗ = argmax wk p(xL |µL,k , ΣL,k ) k=1,...,K 2 2 Afterwards, we estimate the high resolution patch xH ∈ Rq τ as the minimum mean square estimator (MMSE) of XH given XL = xL for a random variable X = (XH , XL ) ∼ Pθk . For Gaussian distributions, this corresponds to x̂H = µH,k + ΣHL,k Σ−1 L (xL − µL,k ). We give a more detailed explanation on MMSE estimators in Section 5. 3. Reconstruction of the high-resolution image by patch averaging. Finally, we estimate for any patch within the low resolution image, the high resolution correspondence. Afterwards, qτ,qτ we reconstruct the high resolution image as follows: Let xH = (xk,l )qτ be a twok,l=1 ∈ R dimensional high-resolution patch. Then, we assign to each pixel xk,l the weight   ρk,l := exp − γ2 (k − qτ2+1 )2 + (l − qτ2+1 )2 . After that, we add up for each pixel in the high resolution image the corresponding weighted pixel values and normalize the result by dividing by the sum of the weights. Contributions of this paper Sandeep and Jacob considered in [10] Gaussian mixture models. However, it is well-known that GMMs are very flat-tailed and therefore lack within their ability to approximate patch distributions. Furthermore, for large magnification factors or three-dimensional images, the dimension of the joint mixture model becomes very high. Unfortunately, this leads to numerical instabilities and an intractable computational effort for estimating the joint mixture model. Thus, we propose within this paper two improvements of Sandeeps and Jacobs super-resolution method [10]. First, we replace the Gaussian distribution by generalized Gaussian distributions. Both contain a shape parameter, which can adjust to more heavy-tailed or flat-tailed distributions. Second, we incorporate a dimensionality reduction within mixture models of elliptical distributions and derive an algorithm for estimating its parameters. Additionally to these methodical improvements, we apply our method to three-dimensional real-world images showing material microstructures. 3 Parameter Estimation of Mixture Models In this section, we focus on the parameter estimation of mixture models. For this purpose, we aim to employ the expectation-maximization (EM) algorithm. First, in Subsection 3.1, we revisit the generic form of the EM algorithm for mixture models. This algorithm contains as a subproblem the weighted maximum likelihood estimation for the corresponding parametric distribution family. Thus, we consider this subproblem in Subsection 3.2 for some elliptical distributions, namely, the Gaussian distribution, the generalized Gaussian distribution. 4 3.1 EM algorithm for Mixture Models In the following, we consider the EM algorithm to estimate the parameters of the mixture models. EM algorithms were introduced in [18] and can be used for various probability distributions. We refer to [3] for an overview. Given samples x1 , ..., xN aims to minimize the negative log-likelihood function N 1 X L(w, Θ) = − log(F (xi |w, Θ), N i=1 where F is defined as in (1). Then, the EM algorithm for mixture models reads as Algorithm 1, see e.g. [19]. Algorithm 1 EM Algorithm for Mixture Models Input: x = (x1 , ..., xN ) ∈ Rn×N , initial estimate w(0) , Θ(0) . for r = 0, 1, ... do E-Step: For k = 1, ..., K and i = 1, . . . , N compute (r) (r) w p(xi |θ ) (r) αi,k = PK k (r) k (r) j=1 wj p(xi |θj ) M-Step: For k = 1, ..., K compute N 1 X (r) α , N i=1 i,k (r+1) = (r+1) = argmax wk θk θk N nX i=1 o (r) αi,k log(p(xi |θk )) . end for As outlined in the algorithm, we have to compute the weighted maximum likelihood estimator as a subproblem of the EM algorithm, i.e., we need to maximize ℓ(θ) = N X αi log(p(xi |θ)) (2) i=1 for the underlying parametric distribution family. For readability, in the following article, we denote αi,k by αi and θk by θ. In the following sections, we show how the EM algorithm can be done for various distributions. 3.2 Weighted Maximum Likelihood Estimation of generalized Gaussian Distribution In order to solve subproblem (2) within the EM algorithm for mixture models, we consider in the following the weighted maximum likelihood estimation for several elliptical distributions. Let SPD(n) be the set of positive-definite symmetric matrices. A probability distribution En (µ, Σ, g) with µ ∈ Rn and Σ ∈ SPD(n) is called an elliptical distribution if it admits a density function p(x|µ, Σ, g) ∝ Σ−1/2 g((x − µ)T Σ−1 (x − µ)), where g : R≥0 → R≥0 is Lebesgue measurable with Z ∞ tn/2−1 g(t)dt < ∞. 0 In this paper, we consider Gaussian, generalized Gaussian as examples of elliptical distributions. 5 Gaussian distribution. A probability distribution is a Gaussian distribution N (µ, Σ) with mean µ ∈ Rn and covariance matrix Σ ∈ SPD(n) if it has the probability density function  1 p(x|µ, Σ) = exp − 12 (x − µ)T Σ−1 (x − µ) . (2π)n/2 |Σ|1/2 The Gaussian distribution is an elliptical distribution with g(t) = exp (− 21 t). For samples x1 , ..., xN with weights α1 , ..., αN , we can calculate the weighted maximum likelihood estimator of µ and Σ, i.e., the solution of (2) by setting the gradients of the objective function to zero. This leads to estimators 1 µ̂ = PN i=1 αi N X αi xi , i=1 1 Σ̂ = PN i=1 αi N X i=1 N X 1 αi (xi − µ̂)(xi − µ̂)T = PN i=1 αi T αi xi xT i − µ̂µ̂ . i=1 Generalized Gaussian distribution. We call a probability distribution a generalized Gaussian distribution GG(µ, Σ, β) with location µ ∈ Rn , scatter matrix Σ ∈ SPD(n) and shape parameter β ∈ R>0 , if it has the probability density function p(x|µ, Σ, β) = C(β) exp − |Σ|1/2 1 2 (x − µ)T Σ−1 (x − µ) β  , where the normalizing constant C is given by C(β) = βΓ( n2 ) n . ) 2n/(2β) π n/2 Γ( 2β Here, Γ denotes the gamma function. For small values of the shape parameter β, we obtain heavytailed distributions, while large values of β mean that the corresponding generalized Gaussian distribution is flat-tailed. The generalized Gaussian distribution includes the Gaussian distribution for β = 1 and the Laplace distribution for β = 1/2. The generalized Gaussian distribution is an elliptical distribution with g(t) = exp (− 21 tβ ), β > 0. In the literature, several algorithms have been proposed to estimate the parameters of a generalized Gaussian distribution of samples x1 , ..., xN . Most of these methods are based on fixed point (FP) iterations [7, 20] or first-order computations on the Riemannian manifold of parameters [6]. For this work, we generalize the algorithm from [9] for weighted maximum likelihood estimation (2), i.e., for maximizing N X ℓ(µ, Σ, β) = αi log(p(xi |µ, Σ, β)), i=1 for some weights α1 , ..., αN > 0. Here, setting the gradient to zero yields that any minimizer (µ, Σ, β) of ℓ fulfills PN αi δiβ−1 xi , µ = Pi=1 N β−1 i=1 αi δi PN αi δiβ−1 (xi − µ)(xi − µ)T , Σ = i=1 PN i=1 αi β = β + ρ∇β ℓ(µ, Σ, β), −1 where δi = (xi −µ)Σ (xi −µ) and ρ > 0. This motivates us to maximize ℓ by generating a sequence (µ(r) , Σ(r) , β (r) ) by fixed-point (FP) iteration. PN (r) (r) αi (δi )β −1 xi (r+1) , µ = Pi=1 (r) β (r) −1 N i=1 αi (δi ) PN (r) (r) αi (δi )β −1 (xi − µ(r) )(xi − µ(r) )T , Σ(r+1) = i=1 PN i=1 αi β (r+1) = β (r) + ρ∇β ℓ(µ(r) , Σ(r) , β (r) ), 6 (r) where δi = (xi − µ(r) )(Σ(r) )−1 (xi − µ(r) ). Note that ∇β ℓ(µ, Σ, β) can be explicitly derived as ∇β ℓ(µ, Σ, β) = 1 β + N N X  X n n ) + log 2 α + αi δiβ log(δi ). ψ( i 2β 2β 2 i=1 i=1 The EM algorithm with the FP iteration for GGMM is called FP-EM algorithm. For the sake of completeness, we add the detailed computations in Appendix A. Remark. In the unweighted case, i.e., α1 = . . . = αN , the authors of [9] show convergence of the above iteration. Remark. Note that for the estimation of β, several other approaches could be considered such as the Newton-Raphson method [21]. 4 4.1 Dimensionality-Reduced Generalized Gaussian Distribution Combining PCA with generalized Gaussian distribution When considering high-dimensional probability distributions, the EM algorithm 1 for mixture models can be numerically unstable and computationally costly. Thus, we aim to reduce the dimension of the corresponding probability distributions. For this purpose, we aim to combine principal component analysis (PCA) with generalized Gaussian distributions. The PCA finds the affine subspace {U t+b : t ∈ Rd }, 1 ≤ d ≪ n with the smallest squared distance from the samples x1 , ..., xN by minimizing P (U, b) = N X ∥(U U T − I)(xi − b)∥2 , i=1 for b ∈ Rn and U ∈ St(d, n), where St(d, n) := {U ∈ Rn,d : U T U = I} is the Stiefel manifold. In the following, we consider the samples x1 , ..., xN ∈ Rn , which are approximately located in a d-dimensional subspace of Rn . Then, our objective is to find simultaneously the subspace {U t + b : t ∈ Rd }, U ∈ St(d, n), b ∈ Rn that contains the samples xi and a generalized Gaussian distribution GG(µ, Σ, β) with density function q(·|µ, Σ, β). For this, we propose to minimize the function ℓPCA (U, b, µ, Σ, β) := N X 1 ∥(U U T − I)(xi − b)∥2 − log(q(U T (xi − b)|µ, Σ, β)), 2 2σ i=1 (3) which is a weighted sum of a PCA term and the negative log-likelihood function of q within the lower dimensional subspace. The following proposition gives a general property of the function lP CA for the generalized Gaussian distribution. The proof is given in Appendix B. Proposition 1. Let GG(µ, Σ, β) be a generalized Gaussian distribution with a density function q(·|µ, Σ, β). The function ℓPCA (U, b, µ, Σ, β) is up to a constant the negative log-likelihood function of a probability density p(x|U, b, µ, Σ, β) := 1 (2πσ 2 )(n−d)/2 exp(− 2σ1 2 ∥(U U T − I)(x − b)∥2 )q(U T (xi − b)|µ, Σ, β). The distribution Pθ with θ = (U, b, µ, Σ, β) corresponding to this density is given by Pθ = r# (GG(µ, Σ, β) ⊗ N (0, σ 2 In−d )), r(x) = V x + b, where V = (U |Ũ ) is an orthogonal matrix, ⊗ denotes the product measure and r# P is the pushforward measure of P under r. 7 Due to its relationship to the PCA, we call the probability distribution p(·|U, b, µ, Σ, β) a PCAreduced generalized Gaussian distribution. Remark (PCA-reduced Gaussian distributions). The authors of [16] used mixture models of PCAreduced Gaussian distributions. In particular, they proved that a PCA-reduced Gaussian distribution is again a Gaussian distribution with a special structured covariance matrix. Using this result, they derived an EM algorithm for PCA-reduced Gaussian mixture models, which showed a significant speed-up compared to a high-dimensional GMM. However, for generalized Gaussian distribution, the PCA-reduced counterpart is not a generalized Gaussian distribution such that the analysis of [16] is no longer applicable. In the following subsection, we introduce a method for estimating the parameters of mixture models for the PCA-reduced generalized Gaussian distribution. We call this mixture model a PCAGGMM model. 4.2 Weighted Maximum Likelihood Estimation To estimate the parameters of a PCA-reduced generalized Gaussian distribution from samples x1 , ..., xN with weights α1 , ..., αN , our objective is to minimize the weighted negative log likelihood function N X αi log(p(xi |U, b, µ, Σ, β)). i=1 By Proposition 1 this is equivalent to minimizing the function ℓPCA,α (U, b, µ, Σ, β) := N X i=1 αi  1  T 2 T ∥(U U − I)(x − b)∥ − log(q(U (x − b)|µ, Σ, β)) , i i 2σ 2 which is up to the weights α the same function as ℓPCA from (3). Unfortunately, we cannot minimize ℓPCA,α directly. Instead, we apply a technique called half-quadratic splitting [17]. That is, instead of considering ℓPCA,α (U, b, µ, Σ, β), we minimize for some large η > 0 the function Hσ,η (z, U, b, µ, Σ, β) := N X i=1 αi   1 T 2 T 2 ∥(U U − I)(x − b)∥ + η∥U (x − b) − z ∥ − log(q(z |µ, Σ, β)) , i i i i 2σ 2 i.e., we replace U T (xi − b) in the second summand by zi and penalize the squared distance of U T (xi − b) and zi . Thus, for η → ∞, the minimizers of Hσ,η converge to the minimizers of ℓPCA,α . Now, we minimize Hσ,η iteratively with respect to (U, b), z and (µ, Σ, β), i.e., we generate a sequence (U (r) , b(r) , µ(r) , Σ(r) , β (r) )r by iteration U (r) ∈ argmin N X U ∈St(d,n) i=1 b(r) ∈ argmin b∈Rn z (r) ∈ N X αi i=1 argmin   1 (r−1) 2 T (r−1) 2 T (r−1) ∥(U U − I)(x − b )∥ + η∥U (x − b ) − z ∥ , i i i 2σ 2 (4)   1 (r−1) 2 ∥(U (r) (U (r) )T − I)(xi − b)∥2 + η∥(U (r) )T (xi − b) − zi ∥ , 2 2σ (5) αi N X z=(z1 ,...,zN )∈Rd,N i=1   αi η∥(U (r) )T (xi − b(r) ) − zi ∥2 − log(q(zi |µ(r−1) , Σ(r−1) , β (r−1) )) (6) and finally (µ(r) , Σ(r) , β (r) ) ∈ argmax N X µ∈Rd ,Σ∈SPD(d),β>0 i=1 (r) αi log(q(zi |µ, Σ, β)) (7) The final step (7) is the weighted maximum likelihood estimation of the generalized Gaussian distribution q, which was discussed in the previous section. It remains to solve the first three steps. 8 Solving (4) by Uzawas’ Algorithm. Using that for any y ∈ Rn and U ∈ St(d, n) it holds that ∥(U U T − I)y∥2 = y T (U U T − I)2 y = y T (U U T U U T − 2U U T + I)y = y T (U U T − 2U U T + I)y = y T (I − U U T )y = ∥y∥2 − ∥U T y∥2 , the optimization problem in (4) reads as N X   1 αi − 2 ∥U T (xi − b)∥2 + η∥U T (xi − b) − zi ∥2 2σ U ∈St(d,n) i=1 argmin Now, we transform the problem in an unconstrained problem by considering the Lagrangian function L(U, Λ) = N X   1 αi − 2 ∥U T (xi − b)∥2 + η∥U T (xi − b) − zi ∥2 + tr(Λ(U T U − I)). 2σ i=1 Then, the solution of (4) is equivalent to solving the saddle-point problem argmin max L(U, Λ). d,d U ∈Rn,d Λ∈R To solve this saddle-point problem, we use the Uzawas’ algorithm [22], which consists of the following two steps: First, we minimize L with respect to U . Second, we perform a gradient ascent step with respect to Λ. The second step is given by Λ(r+1) = Λ(r) + ρ((U (r+1) )T U (r+1) − I) In the case of our specific Lagrangian L, the second step is given by the following proposition. The proof is given in Appendix C. Proposition 2. Let Λ be fixed. Then, any minimizer of L(U, Λ) solves the Sylvester equation  (η − 1 2σ 2 ) N X i=1 N  X αi (xi − b)(xi − b)T U + U Λ = η αi (xi − b)ziT . i=1 Setting the gradient of the objective function to zero leads to Solving (5). N X i=1 αi     1 T T (b − xi ) − ηU zi = 0 I − U U + ηU U 2σ 2 which is equivalent to b= PN i=1 αi xi +  1 2σ 2 −1 PN  η i=1 αi U zi I − U U T + ηU U T . PN i=1 αi Solving (6) using a Gradient Descent. decouples into (r) zi Due to the sum-structure of problem (6), the solution ∈ argmin η∥(U (r) )T (xi − b(r) ) − zi ∥2 − log(q(zi |µ(r−1) , Σ(r−1) , β (r−1) )). (8) zi ∈Rd This is now a d-dimensional optimization problem, which (owing to its low dimension) can be solved efficiently via a gradient descent scheme. 9 Remark (Differentiability of the objective). In the case of generalized Gaussian distributions, the function g is given by g(x) = exp(− 12 xβ ), which is differentiable at R>0 . Thus, the objective function in (8) is differentiable for any zi ̸= µ(r−1) . In the case β ≥ 1, the derivative of g can be continuously extended to 0. In this case, the objective function in (8) is differentiable. The reduction in dimensionality of the PCA-GGMM model reduces the execution time of the E-step, but the M-step is slower than the GGMM due to its higher complexity. However, all computations in the M-step are on d-dimensional optimization problems, and the FP iteration is implemented on d-dimensional data. This limits the numerical instability problem to estimate the generalized Gaussian mixture model of the original data using the FP-EM algorithm. Moreover, the PCA-reduced generalized Gaussian distribution can be generalized to the PCA-reduced elliptical distribution by combining PCA with the elliptical distribution En (µ, Σ, g) with the density function q(·|µ, Σ, g). We denote p(·|U, b, µ, Σ, g) as a probability density function of the PCA-reduced elliptical distribution. By using the half-quadratic splitting technique, we can apply the iterations (4), (5), (6) and (7) with q(·|µ, Σ, g) instead of q(·|µ, Σ, β) to estimate the parameters of a PCA-reduced elliptical distribution p(·|U, b, µ, Σ, g). 5 MMSE Estimator for Generalized Gaussian Distributions In the following, we consider the computation of a high-resolution patch xH given the corresponding low-resolution patch xL and a mixture model (1), where the distribution Pθk is a generalized Gaussian distribution. In particular, we have that the parameters are given by θk = (µk , Σk , βk ) for some location µk ∈ Rn , a scatter matrix Σk ∈ SPD(n) and β > 0. As in [10], we first split the parameters µk =  µH,k µL,k  , Σk =  Σ H,k ΣT HL,k ΣHL,k  ΣL,k into components belonging to the high- and low-resolution part of the distribution p(·|θk ). Then we select the component k ∗ , which best matches the low-resolution patch xL by k ∗ := argmax p(xL |µL,k , ΣL,k , βk ). k=1,...,K Afterwards, we compute xH as the minimum mean square estimator (MMSE) of XH given XL = xL , where X = (XH , XL ) ∼ Pθk . Remark. Given a random variable XL : Ω → Rd in a probability space (Ω, A, P), we wish to estimate a random variable XH : Ω → RD , i.e., we seek an estimator T : Rd → RD such that X̂H = T (XL ) approximates XH . A common quality measure for this task is the mean square error E∥XH − T (XL )∥22 , which gives rise to the definition of the minimum mean square estimator TMMSE ∈ argmin E∥XH − T (XL )∥22 . (9) T Under weak additional regularity assumptions on the estimator T , the Lehmann-Scheffé theorem [23] states that the general solution of the minimization problem (9) is given by TMMSE (XL ) = E(XH |XL ). In general, it is not possible to give an analytical expression of the MMSE estimator TMMSE . In the case of elliptical distributions, the following theorem can be found, e.g., in [24, Theorem 8]. Theorem 3. Assume that X = (XH , XL ) : Ω → Rn has an elliptical distribution Pθ with parameters θ = (µ, Σ, g), where   Σ  ΣHL  µH H , Σ= µ= T µL ΣHL ΣL 10 Then, for each PXL -almost every xL , we have that the conditional distribution PXH |XL =xL is given by the elliptical distribution Pθ̂ , where the parameters θ̂ = (µ̂, Σ̂, ĝ) are given by µ̂ = µH + ΣHL Σ−1 L (xL − µL ), T Σ̂ = ΣH − ΣHL Σ−1 L ΣHL , ĝ(t) = g(t + t0 ) with t0 = (xL − µL )T Σ−1 L (xL − µL ). Since a generalized Gaussian distribution is also an elliptical distribution, the MMSE estimator TM M SE for the generalized Gaussian distribution is expressed as TMMSE (XL ) = E(XH |XL ) = µH + ΣHL Σ−1 L (xL − µL ). (10) In some applications, the high-resolution image and its low-resolution correspondence have different brightness and contrast. The following theorem shows that the MMSE estimator is invariant under affine transformations of the observation. In particular, the MMSE estimator compensates for the change in contrast and brightness. Theorem 4. Assume that X = (XH , XL ) : Ω → Rn has a generalized Gaussian distribution GG(µ, Σ, β), where   Σ  ΣHL  µH H , Σ= µ= µL ΣT ΣL HL Further define the random variable X̃L := α1 XL + α2 with α1 > 0, α2 ∈ R. Then it holds E(XH |X̃L ) = E(XH |XL ) = µH + ΣHL Σ−1 L (XL − µL ). Proof. The first equality holds true as it holds σ(X̃L ) = σ({X̃L−1 (] − ∞, x]) : x ∈ Rd }) = σ({X̃L−1 (] − ∞, α1 x + α2 ]) : x ∈ Rd }) = σ({XL−1 (] − ∞, x]) : x ∈ Rd }) = σ(XL ). The second equality follows directly from Theorem 3. Remark. (PCA-reduced generalized Gaussian distribution with the MMSE estimator). In the PCA-GGMM model, the latent data U T (xi − b) have a d-dimensional generalized Gaussian distribution GG(µ, Σ, β). Due to the following proposition and the definition of the affine subspace  U t + b : t ∈ Rd , the high-dimensional samples x1 , ..., xN ∈ Rn have a generalized Gaussian distribution GG(b + U µ, U ΣU T , β). Finally, we can apply the MMSE estimator in equation (10) to the generalized Gaussian distribution GG(b + U µ, U ΣU T , β) of the high dimensional data. Proposition 5. Let Y ∼ Ed (µ, Σ, g), let B be a n × d matrix, and let b ∈ Rn . Then b + BY ∼ En (b + Bµ, BΣB T , g). Proof. Lemma 3.1, p. 5 [25]. 6 Numerical Results In this section, we demonstrate the performance of our models: MMSE-GGMM and PCA-GGMM on some 2D images such as Gold-hill, Barbara, Camera-man, and our material data. All the implementations were done in Matlab. We run all experiments on PlaFRIM 1 with 32 cores (4 Go/core) AMD EPYC 7452. 1 https://www.plafrim.fr/ 11 6.1 Generation of the test examples In the experiment, we create the low resolution image from the original image by XL,i = AXH,i + ε where ε is a white Gaussian noise with standard derivation σ = 0.02. For the operator A, we use the implementation of [2]: A = SH (11) The blur operator H is given by a convolution with a Gaussian kernel with standard deviation 0.5. The downsampling operator S : Rm1 ,n1 → Rm2 ,n2 (m1 > m2 and n1 > n2 ) is given by S= m2 n2 −1 F DFm1 ,n1 , m1 n1 m2 ,n2 where D : Rm1 ,n1 → Rm2 ,n2 and for x ∈ Rm1 ,n1 the (i, j)-th entry of D (x) is given by  if i ≤ m22 and j ≤ n22 ,  xi,j ,  x if i > m22 and j ≤ n22 , i+m1 −m2 ,j ,  if i ≤ m22 and j > n22 , xi,j+n1 −n2 ,    xi+m1 −m2 ,j+n1 −n2 , if i > m22 and j > n22 . Fm1 ,n1 is the two-dimensional discrete Fourier transform which is defined by Fm1 ,n1 := Fn1 ⊗ Fm1 , m1 −1 where Fm1 = (exp (−2πikl/m1 ))k,l=0 . Figure 1: Top: Images for estimating the mixture models. Bottom: Ground truth for reconstruction. First column: Material ”FS”, second column: Material ”SiC Diamonds”, third column: goldhill image. Training data. For estimating the parameters of the mixture model, we used a part of the original HR image, such as the top left quarter as the top row in Figure 1 and the corresponding one of the LR image. From the training part of the LR image, we define a set of overlapping patches 2 N of size τ × τ : {x̃L,i }i=1 with x̃L,i ∈ Rτ . For the HR image XH , we can also have a set of qτ × qτ 12 2 N 2 HR patches {x̃H,i }i=1 where x̃H,i ∈ Rq τ corresponds to the LR patch x̃L,i . Finally, we define a set 2 2 {x } with a vector x ∈ Rτ (q +1) by concatenating the HR and LR patches as i i xi = 6.2   x̃ H,i ∈ Rn . x̃L,i Material data preprocessing In this part, we discuss material data and the pre-processing step for super-resolution. In the frame of the ITN MUMMERING, a series of multiscale 3D images have been acquired by synchrotron microtomography at the SLS beamline TOMCAT. The materials of the two samples were selected to provide 3D images that have diverse levels of complexity: • The first one is a sample of Fontainebleau sandstone (”FS”), a natural rock rather homogeneous and commonly used in the oil industry for flow experiments. • The second one is a composite (”SiC Diamonds”) obtained by microwave sintering of silicon and diamonds, see [26]. In Figure 2, the first column shows the HR image, and the second column displays the LR image, which we acquired from synchrotron microtomography. It is easy to see that, the black region in LR images is even much more than the HR ones. The important task is to find the position corresponding to HR image of the subject in the LR images. For more than a decade, the Scale-invariant feature transform SIFT [27] has arguably been the most popular keypoint detection and matching method. The SIFT algorithm proposed by David Lowe consists of a keypoint detection which is independent towards image rotation, scale change, affine transformation, intensity variation, and viewpoint change. Using the SIFT algorithm, we determine all the match points between the HR and LR images as shown in Figure 3. Then, the boundary of the LR image will be found by selecting the minimum and maximum coordinates of the keypoints. Finally, we obtain the LR image which matches with the HR image. These matched LR images are given in the third column in Figure 2. Figure 2: Top: Material ”FS”, bottom: Material ”SiC Diamonds”. First column: Ground-truth HR image, second column: Input LR image, third column: Match LR images. The red lines in the second column are the boundaries of the LR image that correspond to the HR image. 13 Figure 3: Matched points between the HR and LR image by using SIFT 6.3 MMSE-GGMM First, to underline the advantages of the proposed method MMSE-GGMM, we compare our method with the MMSE-GMM [10] and the EPLL-GMM [1] as well as EPLL-GGMM [8]. In this subsection, we use all models on standard images of size 512 × 512, such as Gold Hill, Camera-Man, Barbara, and material images for the magnification factor q = 2. Low-resolution images are created using the operator A in equation (11). To estimate the parameters of the mixture model, we extract the lowresolution image with the size of the overlap patches τ = 4. Thus, the corresponding high-resolution patch has size qτ = 8. Then we get N ≈ 1500 patches for the training data to learn the mixture model. Finally, high-resolution images can be reconstructed from the learned mixture model and LR images based on the MMSE as described in Section 2. To implement the EPLL method, we estimate the mixture models using the EM algorithm, which is introduced in [8]. Once again, EPLL requires the knowledge of the operator A in the image estimation step. For the standard image experiments, we reconstruct the HR image with the operator A defined by equation (11), called (”EPLL with given A”). Besides that, we implement the EPLL method with the operator A estimated as in [28] from the reference HR and LR images, called (”EPLL with learned A”). Since the operator A is learned from the quarter-known image, it is the same size as the reference image. Thus, we upsample the operator A by a zero padding technique to have the same size as the whole image. Table 1 shows the performance of the MMSE and EPLL methods on synthetic images. Both of these methods are compared based on the GMM, LMM, GGMM with fixed shape parameter β (e.g. β = 0.8, 0.4, 0.25) and the GGMM model which learns the shape parameter based on FP-EM algorithm. The PSNR values of MMSE-GGMM with the FP algorithm are slightly higher than those of the MMSE-GMM and LMM models. To compare our EM algorithm for the GGMM model with the EM algorithm proposed by Deledalle et al. [8], we compare the MMSE method with the FP algorithm (MMSE-GGMM FP) and Deledalle’s approach (MMSE-GGMM Deledalle) in Table 1. The MMSE-GGMM FP gives better results than the MMSE-GGMM Deledalle. Furthermore, we observed that our method can obtain results similar to those of EPLL with given A, while we do not have to learn the super-resolution operator A. Besides the results of our method are 1.5 dB higher than EPLL with the learned operator A. In practice, LR and HR images were acquired under different conditions. Therefore, they may have different contrasts. Thus, we have to estimate the contrast change parameters between the LR and HR images. However, Theorem 4 proves that MMSE-GGMM is invariant for an affine contrast change. Therefore, MMSE-GGMM does not require learning these parameters and the knowledge of the operator A, while the EPLL method does. To demonstrate these arguments, we perform the MMSE and the EPLL approach for our material data. Table 2 gives the PSNR value of the HR reconstructions of real material images using the MMSE and EPLL approach. The second column 14 MMSE EPLL with given A EPLL with learned A GMM LMM GGMM β = 0.8 GGMM β = 0.4 GGMM β = 0.25 GGMM FP GGMM Deledalle GMM LMM GGMM β = 0.8 GGMM β = 0.4 GGMM β = 0.25 GGMM GMM LMM GGMM β = 0.8 GGMM β = 0.4 GGMM β = 0.25 GGMM Hill 31.60 31.69 31.70 31.68 31.60 31.70 31.52 31.62 31.46 31.58 31.43 31.26 31.58 30.64 30.57 30.56 30.42 30.40 30.67 Camera 32.75 32.82 32.81 32.84 32.80 32.86 32.56 32.91 32.85 32.86 32.83 32.65 32.94 30.74 30.88 30.79 30.76 30.67 30.99 Barbara 25.27 25.31 25.29 25.20 25.29 25.33 25.20 25.39 25.30 25.28 25.30 25.21 25.33 24.32 24.29 24.25 24.31 24.19 24.36 Table 1: PSNRs of the reconstructions of synthetically downsampled 2D images using either MMSE and EPLL approaches for GMM and LMM, GGMM with different shape parameter β. The magnification factor is q = 2. group gives the PSNR values of the EPLL approach with a learned operator A, which is estimated from the known part of the HR image. The PSNR values in the third column group are obtained from the EPLL method with the learning contrast parameters and the operator A. This operator is learned as in [28] from the observed LR image and the whole HR image of the ground truth. We call this approach is ”EPLL with given A”. The PSNRs in Table 2 show that our method achieves significantly better results than the EPLL method with the learning A from the known HR part. Our method gives PSNRs 1 dB lower than the EPLL method with learned A, while knowledge of the operator A is not required in our method. FS Sic Diamonds GMM 33.09 28.00 MMSE LMM GGMM 33.32 33.35 28.07 28.08 EPLL GMM 32.25 26.75 with learned A LMM GGMM 32.31 32.39 26.78 26.82 EPLL with given A GMM LMM GGMM 33.61 33.86 34.22 29.11 29.29 29.45 Table 2: PSNRs of the reconstructions of material 2D images with contrast change problem using either MMSE estimator and EPLL approach for GMM, LMM, and GGMM. The magnification factor is q = 2. 6.4 PCA-GGMM In the second subsection, we focus on the dimensionality reduction task of our PCA-GGMM model. We compare our model with the PCA-GMM [16] on 2D artificially images as in the previous subsection and the 2-3D material images: FS”, ”SiC Diamonds” with the zooming ratio q = 2, 4. As in the previous experiments with MMSE-GGMM, we use the patch size τ = 4 for the low resolution and qτ = 8 (or qτ = 16) for high resolution images. For the material images with size 2500 × 2500 of HR, this leads to N ≈ 400000 patches for q = 2 and N ≈ 100000 patches for q = 4. These patches will be used for the training step of the PCA-GGMM model. In this step, we use 15 (a) HR (b) MMSE-GMM (c) MMSE-GGMM (d) LR (e) EPLL-GMM (f) EPLL-GGMM Figure 4: Reconstructions of 2D Sic Diamonds image with magnification factor q = 2 by using MMSE and EPLL method.  2 2 q + 1 = 80 or K = 100 components for the mixture model. Each training patch V of size D = τ  2 2 D = τ q + 1 = 272 is reduced to d low dimensional for d ∈ {4, 8, 12, 16, 20}. The PSNR values in Tables 3, 4 and 5 demonstrate that our PCA-GGMM model produces results almost as good as MMSE-GGMM and even slightly better than MMSE-GMM. For the 3D material data, we crop a 600 × 600 × 600 image from the material images FS and SiC Diamonds. Training data are taken from the upper left part of the 3D image 300×300×300. We use the magnification factor q = 2, the low-resolution patch size τ = 4, and the high-resolution patch size  qτ = 8. Thus, the dimension of the pair high and low resolution patches is D = τ 3 q 3 + 1 = 576. These high-dimensional data are reduced to d for d ∈ {20, 40, 60} in the PCA-GGMM model. Table 6 shows the PSNR values of the MMSE method for the GMM, LMM, GGMM model and the PCA-GMM, PCA-GGMM model. 7 Conclusion This paper proposed a new algorithm to perform image super-resolution. We extended the image super-resolution using the GMM method provided by Sandeep and Jacob [10] to the GGMM model, which is learned by the FP algorithm. We also derived a new method based on a Fixed Point approach to estimate the parameters of GGMM model. As in the previous work by J.Hertrich et al. [16] (in the case of GMM), we introduced a new Generalized Gaussian Mixture Model for dimensionality reduction. The new model, called PCA-GGMM, incorporates the PCA model with the GGMM model on the low-dimensional data. Experiments with 2D, 3D synthetic, and real material images demonstrated the effectiveness of the PCA-GGMM models. They showed that 16 MMSE-GMM MMSE-LMM MMSE-GGMM PCA-GMM PCA-LMM PCA-GGMM d 80 80 80 20 16 12 8 4 20 16 12 8 4 20 16 12 8 4 Hill 31.60 31.69 31.70 31.59 31.46 31.23 30.83 30.71 31.61 31.60 31.49 31.09 30.80 31.63 31.60 31.54 31.18 31.02 Camera 32.75 32.82 32.86 32.69 32.61 32.58 32.36 32.19 32.72 32.64 32.61 32.46 32.32 32.74 32.70 32.65 32.53 32.40 Barbara 25.27 25.31 25.33 25.26 25.23 25.11 24.82 24.75 25.29 25.28 25.20 25.02 24.93 25.30 25.28 25.25 25.11 24.94 FS 35.48 35.53 35.57 35.45 35.43 35.46 35.29 34.71 35.51 35.49 35.48 35.36 34.85 35.56 35.52 35.50 35.41 34.97 Diamonds 37.23 37.34 37.39 36.22 36.20 36.15 36.67 35.41 37.32 37.29 37.23 36.88 35.57 37.38 37.35 37.30 36.97 35.71 Table 3: PSNRs of the reconstructions of 2D images using MMSE method with GMM, LMM and GGMM (with learned shape parameter β) models, and the PCA with PCA-GMM, PCA-LMM, PCA-GGMM. The magnification factor is set to q = 2. d MMSE FS 20 16 12 8 4 PCA MMSE Diamonds 20 16 12 8 4 PCA GMM 33.09 33.03 32.99 32.86 32.41 32.10 28.00 27.99 27.98 27.85 27.67 27.14 LMM 33.32 33.25 33.21 33.17 32.68 32.43 28.07 28.02 28.00 27.91 27.70 27.21 GGMM 33.35 33.29 33.25 33.20 32.91 32.44 28.08 28.05 28.02 27.94 27.70 27.23 Table 4: PSNRs of the reconstructions of real material 2D image for magnification factor is set to q = 2 using MMSE method with GMM, LMM and GGMM (with learned shape parameter β) models, and the PCA with PCA-GMM, PCA-LMM, PCA-GGMM. 17 MMSE-GMM MMSE-LMM MMSE-GGMM PCA-GMM PCA-LMM PCA-GGMM d 20 16 12 8 4 20 16 12 8 4 20 16 12 8 4 Hill 27.15 27.16 27.18 26.89 26.81 26.73 26.41 26.22 27.14 27.05 26.84 26.60 26.46 27.15 27.01 26.83 26.71 26.57 Camera 26.19 26.21 26.23 26.15 26.07 25.96 25.69 25.19 26.18 26.12 26.03 25.87 25.34 26.20 26.14 26.03 25.89 25.40 Barbara 23.64 23.67 23.72 23.47 23.32 23.27 23.01 22.86 23.50 23.41 23.29 23.12 22.94 23.65 23.58 23.43 23.21 23.02 FS 30.72 30.76 30.83 30.73 30.71 30.68 30.47 29.80 30.74 30.71 30.70 30.52 29.84 30.81 30.78 30.74 30.59 29.93 Diamonds 30.74 30.79 30.81 30.73 30.61 30.46 30.18 29.23 30.76 30.68 30.53 30.24 29.41 30.78 30.73 30.59 30.30 29.48 Table 5: PSNRs of the reconstructions of 2D images for magnification factor is set to q = 4 using MMSE method with GMM, LMM and GGMM (with learned shape parameter β) models, and the PCA with PCA-GMM, PCA-LMM, PCA-GGMM. d MMSE FS 60 40 20 PCA MMSE Diamonds 60 40 20 PCA GMM 33.34 33.28 33.25 33.17 30.68 30.63 30.55 30.36 LMM 33.36 33.31 33.27 33.19 30.70 30.64 30.57 30.42 GGMM 33.41 33.36 33.35 33.23 30.75 30.67 30.62 30.48 Table 6: PSNRs of the reconstructions of 3D images using MMSE method and PCA-GGMM with q = 2. 18 our model gives a similar quality for the results as MMSE-GGMM and is better than the PCAGMM model of [16]. Furthermore, our method, which estimates the parameters of the PCA-reduced generalized model, could be extended to the case of elliptical distributions such as the Student-t distribution or any other one. Finally, we are aware of the effectiveness of deep learning approaches for super-resolution, e.g. [29, 30, 31]. In future work, we will consider some deep learning approaches for super-resolution with high magnification factor and high-dimensional data. Acknowledgment Funding by the French Agence Nationale de la Recherche (ANR) under reference ANR-18-CE920050 SUPREMATIM as well as by the German Research Foundation (DFG) at Technical University of Berlin within the project STE 571/16-1 , is gratefully acknowledged. Phuong-Lan NGUYEN expresses her sincere thanks to Dr. Laurent Facq. He taught her how to use PlaFRIM (a computational platform) of the University of Bordeaux. The experiments presented in thisarticle were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), University of Bordeaux, Bordeaux INP, and Conseil Régional d’Aquitaine (see https://www.plafrim.fr/). We also would like to express our deep gratitude to Professor Dominique Bernard and Dr Abdellatif Saadaldin f rom ICMCB, University of Bordeaux, for the generation and discussion of the material images in Section 6. A Computations for Generalized Gaussian Distributions In the following, we compute the derivatives of ℓ(µ, Σ, β) := N X αi log fβ (xi |µ, Σ), i=1 which is up to a constant equal to N X αi log i=1  1 β  Cp (β) exp − (xi − µ)T Σ−1 (xi − µ) 1/2 2 |Σ| Using the notation δi = (xi − µ)T Σ−1 (xi − µ), the gradient with respect to µ is given by ∇µ ℓ(µ, Σ, β) = N X αi βΣ−1 (xi − µ)δiβ−1 i=1 Setting the gradient to zero yields PN µ = Pi=1 N αi xi δiβ−1 i=1 Plugging in the formulas ∂aT Σ−1 b = Σ−T abT Σ−T ∂Σ and αi δiβ−1 . ∂ log |Σ−1 | ∂ log |Σ−1 | ∂Σ−1 = = ΣΣ−2 = Σ−1 ∂Σ ∂Σ−1 ∂Σ from [32], we obtain that ∇Σ ℓ(µ, Σ, β) = N X i=1 αi  1 −1 2Σ −  β β−1 −1 δi Σ (xi − µ)(xi − µ)T Σ−1 . 2 19 Setting the gradient to zero and multiplying by Σ from the left and from the right yields that PN αi δiβ−1 (xi − µ)(xi − µ)T . Σ = i=1 PN i=1 αi Finally, the gradient with respect to β is given by ∇β ℓ(µ, Σ, β) = N X αi ∇β log(Cn (β)) − 12 ∇β δiβ i=1 Here, we have that ∇β δiβ = δiβ log(δi ) and ∇β log(Cn (β)) = digamma function. Hence, we obtain ∇β ℓ(µ, Σ, β) = B 1 β + 1 β + n 2β 2   n ψ( 2β ) + log 2 , where ψ is the N N X  X n n ) + log 2 α + αi δiβ log(δi ). ψ( i 2β 2β 2 i=1 i=1 Proof of Proposition 1 Using that U T U = Id and Ũ T Ũ = In−d , we have that the distribution GG(µ, Σ, β) ⊗ N (0, σ 2 In−d ) has at x = (x1 , x2 ) ∈ Rn the density 1 exp(− 2σ1 2 ∥Ũ T Ũ x2 ∥2 )q(U T U x1 |µ, Σ, β) (2πσ 2 )(n−d)/2 As U T Ũ = 0 and Ũ T U = 0, we obtain, that this is equal to 1 (2πσ 2 )(n−d)/2 = exp(− 2σ1 2 ∥Ũ T (U x1 + Ũ x2 )∥2 )q(U T (U x1 + Ũ x2 )|µ, Σ, β) 1 exp(− 2σ1 2 ∥Ũ T V x∥2 )q(U T V x|µ, Σ, β) (2πσ 2 )(n−d)/2 Using the change of variables formula for push-forward measures, we obtain that Pθ has the density 1 (2πσ 2 )(n−d)/2 exp(− 2σ1 2 ∥Ũ T V r−1 (x)∥2 )q(U T V r−1 (x)|µ, Σ, β)|det(∇r−1 (x))| As it holds r−1 (x) = V T (x − b) and as ∇r−1 (x) = V T is an orthogonal matrix, we obtain that this is equal to 1 exp(− 2σ1 2 ∥Ũ T (x − b)∥2 )q(U T (x − b)|µ, Σ, β) (12) 2 (2πσ )(n−d)/2 where we used V V T = In . Finally, the fact that the mapping x 7→ Ũ x is an isometry and that I − U U T = Ũ Ũ T imply that ∥Ũ T (x − b)∥2 = ∥Ũ Ũ T (x − b)∥2 = ∥(U U T − I)(x − b)∥2 such that the density (12) coincides with p(x|U, b, µ, Σ, β). Taking the negative logarithm and sub2 tracting n−d 2 log(2πσ ) shows that the negative log-likelihood function of p(·|U, b, µ, Σ, β) coincides up to a constant with ℓPCA (U, b, µ, Σ, β). □ C Proof of Proposition 2 We aim to minimize the Lagrangian function L(U, Λ) = N X   1 αi − 2 ∥U T (xi − b)∥2 + η∥U T (xi − b) − zi ∥2 + tr(Λ(U T U − I)). 2σ i=1 20 for fixed Λ with respect to U by setting the gradient to zero. Since it holds by [32] that ∂aT U U T b = U T (abT + baT ) ∂U and ∂tr(ΛU T U ) ∂tr(ΛU T U ) ∂U T U ) = = 2ΛT U T , ∂U ∂(U T U ) ∂U Thus, it holds that ∇U L(U, Λ) = 2(η − 1 2σ 2 ) N X αi (xi − b)(xi − b)T U − 2η N X αi (xi − b)ziT + 2U Λ. i=1 i=1 Setting the gradient to zero, it shows that U is a solution of  (η − 1 2σ 2 ) N X i=1 N  X αi (xi − b)(xi − b)T U + U Λ = η αi (xi − b)ziT . i=1 This finishes the proof. □ References [1] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479–486, 2011. [2] S. Parameswaran, C. Deledalle, L. Denis, and T. Q. Nguyen. Accelerating gmm-based patch priors for image restoration: Three ingredients for a 100x speed-up. IEEE Transactions on Image Processing, 28(2):687-698, 2019. [3] C. L. Byrne. The EM Algorithm: Theory, Applications and Related Methods. Lecture Notes, University of Massachusetts, 2017. [4] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977. [5] M. Saıd Allili. Wavelet modeling using finite mixtures of generalized Gaussian distributions: Application to texture discrimination and retrieval. IEEE Transactions on Image Processing, 21(4):1452-1464, 2012. [6] Z. Boukouvalas, S. Said, L. Bombrun, Y. Berthoumieu, and T. Adali. A new Riemannian averaged fixed-point algorithm for MGGD parameter estimation. IEEE Signal Processing Letters, 22(12):2314-2318, Dec 2015. [7] F. Pascal, L. Bombrun, J.-Y. Tourneret, and Y. Berthoumieu. Parameter estimation for multivariate generalized Gaussian distributions. IEEE Transactions on Image Processing, 61(23):59605971, 2013. [8] CA Deledalle, S Parameswaran, and TQ Nguyen. Image denoising with generalized Gaussian mixture model patch priors. SIAM Journal on Imaging Sciences, 11(4):2568-2609. [9] B. Wang, H. Zhang, Z. Zhao and Y. Sun, Globally Convergent Algorithms for Learning Multivariate Generalized Gaussian Distributions. IEEE Statistical Signal Processing Workshop (SSP), 2021, pp. 336-340, doi: 10.1109/SSP 49050.2021.9513857. [10] P. Sandeep and T. Jacob. Single image super-resolution using a joint GMM method. IEEE Transactions on Image Processing, 25(9): 4233-4244, 2016. [11] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11):559-572, 1901. 21 [12] M. E. Tipping , C. M. Bishop. Probabilistic Principal Component Analysis. Journal of the royal statistical society, Series B, Vol. 61, No. 3 (1999), 611-622. [13] M. E. Tipping and C. M. Bishop. Mixture of probabilistic principal component analyzers. Neural Computation, 11(2):443-482, 1999. [14] C. Bouveyron, S. Girard, and C. Schmid. High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1): 502-519, 2007. [15] A. Houdard, C. Bouveyron, and J. Delon. High-dimensional mixture model for unsupervised image denoising (HDMI). SIAM Journal on Imaging Sciences, 11(4):2815-2846, 2018. [16] J. Hertrich, D-P-L. Nguyen, J-F. Aujol, D. Bernard, Y. Berthoumieu, A. Saadaldin, and G. Steidl. PCA Reduced Gaussian Mixture Models with Applications in Superresolution, Inverse Problems in Imaging, 2022, 16 (2) : 341-366. doi: 10.3934/ipi.2021053. [17] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE Transactions on Image Processing, 5(7):932–946, 1995. [18] A. P. Dempster , N. M. Laird , D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, SERIES B 1977. [19] G. McLachlan, D. Peel, Finite Mixture Models, Wiley-Interscience, New York, 2000. [20] F. Najar, S. Bourouis, N. Bouguila and S. Belghith. A Fixed-point Estimation Algorithm for Learning The Multivariate GGMM: Application to Human Action Recognition. IEEE Canadian Conference on Electrical & Computer Engineering (CCECE), 2018. [21] Tjalling J. Ypma, Historical development of the Newton–Raphson method, SIAM Review 37 (4), 531–551, 1995. [22] K. J. Arrow, L. Hurwicz, and H. Uzawa. Studies in linear and nonlinear programming. In Stanford Mathematical Studies in the Social Sciences, volume II. Stanford University Press, Stanford, 1958. [23] E. Lehmann and H. Scheffé, Completeness, similar regions, and unbiased estimation: Part I, Sankhyā, 10 (1950), 305–340. [24] E. Gomez, M.A Gomez-Villegas and J.M. Marin. A survey on continuous Elliptical vector distributions. [25] H. Hult and L. Filip, Multivariate extremes, aggregation and dependence in elliptical distributions. Advances in Applied probability 34.3 (2002): 587-608. [26] S. Vaucher, P. Unifantowicz, C. Ricard, L. Dubois, M. Kuball, J.-M. Catala-Civera, D. Bernard, M. Stampanoni and R. Nicula, On-line tools for microscopic and macroscopic monitoring of microwave processing, Physica B: Condensed Matter, 398 (2007), 191-195. [27] Lowe, David G. (1999). Object recognition from local scale-invariant features. Proceedings of the International Conference on Computer Vision. 2. pp. 1150–1157. [28] F. Altekruger, J. Hertrich. WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution. arXiv preprint arXiv:2201.08157, 2022. [29] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 136–144. [30] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, Residual dense network for image superresolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2472–2481. 22 [31] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, 2017, pp. 4681–4690. [32] Petersen, K.B., Pedersen, M.S.: The Matrix Cookbook. Technical University of Denmark, Lecture Notes (2008). 23