Abstract
The contribution of this paper is twofold. First, we introduce a generalized myriad filter, which is a method to compute the joint maximum likelihood estimator of the location and the scale parameter of the Cauchy distribution. Estimating only the location parameter is known as myriad filter. We propose an efficient algorithm to compute the generalized myriad filter and prove its convergence. Special cases of this algorithm result in the classical myriad filtering and an algorithm for estimating only the scale parameter. Based on an asymptotic analysis, we develop a second, even faster generalized myriad filtering technique. Second, we use our new approaches within a nonlocal, fully unsupervised method to denoise images corrupted by Cauchy noise. Special attention is paid to the determination of similar patches in noisy images. Numerical examples demonstrate the excellent performance of our algorithms which have moreover the advantage to be robust with respect to the parameter choice.
Similar content being viewed by others
References
Arce, G.R.: Nonlinear Signal Processing: A Statistical Approach. Wiley, New York (2005)
Banerjee, S., Agrawal, M.: Underwater acoustic communication in the presence of heavy-tailed impulsive noise with bi-parameter Cauchy-Gaussian mixture model. In: Ocean Electronics (SYMPOL), 2013, pp. 1–7. IEEE (2013)
Barnett, V.: Order statistics estimators of the location of the Cauchy distribution. J. Am. Stat. Assoc. 61(316), 1205–1218 (1966)
Besbeas, P., Morgan, B.J.: Integrated squared error estimation of Cauchy parameters. Stat. Prob. Lett. 55(4), 397–401 (2001)
Bloch, D.: A note on the estimation of the location parameter of the Cauchy distribution. J. Am. Stat. Assoc. 61(315), 852–855 (1966)
Brown, C.L., Brcich, R.E., Taleb, A.: Suboptimal robust estimation using rank score functions. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03), vol. 6, pp. VI–753. IEEE (2003)
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE (2005)
Cane, G.J.: Linear estimation of parameters of the Cauchy distribution based on sample quantiles. J. Am. Stat. Assoc. 69(345), 243–245 (1974)
Casella, G., Berger, R.L.: Statistical Inference, vol. 2. Duxbury, Pacific Grove (2002)
Chatterjee, P., Milanfar, P.: Is denoising dead? IEEE Trans. Image Process. 19(4), 895–911 (2010)
Copas, J.: On the unimodality of the likelihood for the Cauchy distribution. Biometrika 62(3), 701–704 (1975)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: BM3D image denoising with shape-adaptive principal component analysis. In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations (2009)
Deledalle, C.-A., Denis, L., Tupin, F.: Iterative weighted maximum likelihood denoising with probabilistic patch-based weights. IEEE Trans. Image Process. 18(12), 2661–2672 (2009)
Deledalle, C.-A., Denis, L., Tupin, F.: How to compare noisy patches? Patch similarity beyond Gaussian noise. Int. J. Comput. Vis. 99(1), 86–102 (2012)
Ferguson, T.S.: Maximum likelihood estimates of the parameters of the Cauchy distribution for samples of size 3 and 4. J. Am. Stat. Assoc. 73(361), 211–213 (1978)
Freue, G.V.C.: The Pitman estimator of the Cauchy location parameter. J. Stat. Plan. Inference 137(6), 1900–1913 (2007)
Gabrielsen, G.: On the unimodality of the likelihood for the Cauchy distribution: some comments. Biometrika 69(3), 677–678 (1982)
Gelfand, I.M., Kapranov, M., Zelevinsky, A.: Discriminants, Resultants, and Multidimensional Determinants. Springer, Berlin (2008)
Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. SIAM J. Multiscale Model. Simul. 7(3), 1005–1028 (2008)
Gonzalez, J.G., Arce, G.R.: Weighted myriad filters: a robust filtering framework derived from alpha-stable distributions. In: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96, vol. 5, pp. 2833–2836. IEEE (1996)
Gonzalez, J.G., Arce, G.R.: Optimality of the myriad filter in practical impulsive-noise environments. IEEE Trans. Signal Process. 49(2), 438–441 (2001)
Haas, G., Bain, L., Antle, C.: Inferences for the Cauchy distribution based on maximum likelihood estimators. Biometrika 57(2), 403–408 (1970)
Hamza, A.B., Krim, H.: Image denoising: a nonlinear robust statistical approach. IEEE Trans. Signal Process. 49(12), 3045–3054 (2001)
Higgins, J., Tichenor, D.: Window estimates of location and scale with applications to the Cauchy distribution. Appl. Math. Comput. 3(2), 113–126 (1977)
Howlader, H., Weiss, G.: On Bayesian estimation of the Cauchy parameters. Sankhyā Indian J. Stat. Ser. B 50, 350–361 (1988)
Kalluri, S., Arce, G.R.: Fast algorithms for weighted myriad computation by fixed-point search. IEEE Trans. Signal Process. 48(1), 159–171 (2000)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33, 239–251 (1945)
Koutrouvelis, I.A.: Estimation of location and scale in Cauchy distributions using the empirical characteristic function. Biometrika 69(1), 205–213 (1982)
Kravchuk, O., Pollett, P.: Hodges-Lehmann scale estimator for Cauchy distribution. Commun. Stat. Theory Methods 41(20), 3621–3632 (2012)
Kuruoglu, E.E., Fitzgerald, W.J., Rayner, P.J.: Near optimal detection of signals in impulsive noise modeled with a symmetric \(\alpha \)-stable distribution. IEEE Commun. Lett. 2(10), 282–284 (1998)
Laus, F., Nikolova, M., Persch, J., Steidl, G.: A nonlocal denoising algorithm for manifold-valued images using second order statistics. SIAM J. Imaging Sci. 10(1), 416–448 (2017)
Lebrun, M., Buades, A., Morel, J.-M.: Implementation of the “Non-Local Bayes” (NL-Bayes) image denoising algorithm. Image Process. Line 3, 1–42 (2013)
Lebrun, M., Buades, A., Morel, J.-M.: A nonlocal Bayesian image denoising algorithm. SIAM J. Imaging Sci. 6(3), 1665–1688 (2013)
Lebrun, M., Colom, M., Buades, A., Morel, J.: Secrets of image denoising cuisine. Acta Numer. 21, 475–576 (2012)
Levin, A., Nadler, B.: Natural image denoising: optimality and inherent bounds. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2833–2840. IEEE (2011)
Matsui, M., Takemura, A.: Empirical characteristic function approach to goodness-of-fit tests for the Cauchy distribution with parameters estimated by MLE or EISE. Ann. Inst. Stat. Math. 57(1), 183–199 (2005)
Mei, J.-J., Dong, Y., Huang, T.-Z., Yin, W.: Cauchy noise removal by nonconvex ADMM with convergence guarantees. J. Sci. Comput. 74, 1–24 (2017)
Middleton, D.: Statistical-physical models of electromagnetic interference. IEEE Trans. Electromagn. Compat. 3, 106–127 (1977)
Núñez, R.C., Gonzalez, J.G., Arce, G.R., Nolan, J.P.: Fast and accurate computation of the myriad filter via branch-and-bound search. IEEE Trans. Signal Process. 56(7), 3340–3346 (2008)
Pander, T.: New polynomial approach to myriad filter computation. Signal Process. 90(6), 1991–2001 (2010)
Pander, T.: The iterative trimming approach to the myriad filter computation. In: Network Intelligence Conference (ENIC), 2016 Third European, pp. 209–216. IEEE (2016)
Ram, I., Elad, M., Cohen, I.: Patch-ordering-based wavelet frame and its use in inverse problems. IEEE Trans. Image Process. 23(7), 2779–2792 (2014)
Rothenberg, T.J., Fisher, F.M., Tilanus, C.B.: A note on estimation from a Cauchy sample. J. Am. Stat. Assoc. 59(306), 460–463 (1964)
Salmon, J.: On two parameters for denoising with non-local means. IEEE Signal Process. Lett. 17(3), 269–272 (2010)
Scholz, F., Works, B.P.: Maximum likelihood estimation for type I censored Weibull data including covariates. In: ISSTECH-96-022, Boeing Information and Support Services, P.O. Box 24346, MS-7L-22 (1996)
Sciacchitano, F., Dong, Y., Zeng, T.: Variational approach for restoring blurred images with Cauchy noise. SIAM J. Imaging Sci. 8(3), 1894–1922 (2015)
Shinde, M., Gupta, S.: Signal detection in the presence of atmospheric noise in tropics. IEEE Trans. Commun. 22(8), 1055–1063 (1974)
Steidl, G., Teuber, T.: Removing multiplicative noise by Douglas–Rachford splitting methods. J. Math. Imaging Vis. 36(2), 168–184 (2010)
Sutour, C., Deledalle, C.-A., Aujol, J.-F.: Estimation of the noise level function based on a nonparametric detection of homogeneous image regions. SIAM J. Imaging Sci. 8(4), 2622–2661 (2015)
Teuber, T., Lang, A.: A new similarity measure for nonlocal filtering in the presence of multiplicative noise. Comput. Stat. Data Anal. 56(12), 3821–3842 (2012)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. arXiv:1511.06324 (2015). Accessed 13 Apr 2018
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Signal Process. 13(4), 600–612 (2004)
Wiest-Daesslé, N., Prima, S., Coupé, P., Morrissey, S.P., Barillot, C.: Non-local means variants for denoising of diffusion-weighted and diffusion tensor MRI. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 4792, pp. 344–351. Springer, Berlin (2007)
Yu, G., Sapiro, G., Mallat, S.: Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2012)
Zhang, J.: A highly efficient L-estimator for the location parameter of the Cauchy distribution. Comput. Stat. 25(1), 97–105 (2010)
Zurbach, P., Gonzalez, J., Arce, G.R.: Weighted myriad filters for image processing. In: 1996 IEEE International Symposium on Circuits and Systems, 1996. ISCAS’96., Connecting the World, vol. 2, pp. 726–729. IEEE (1996)
Acknowledgements
Funding by the German Research Foundation (DFG) within the Research Training Group 1932, project area P3, is gratefully acknowledged. Further, we wish to thank Yiqiu Dong for fruitful discussions and an interesting talk about Cauchy noise removal, as well as the anonymous referees for their careful examination of our manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix
This appendix contains the proofs of Sect. 3.
1.1 Proof of Theorem 1:
Proof
-
1.
First, we prove that all critical points \(({\hat{a}},\hat{\gamma })\) of L are strict minimizers, by showing that \(\nabla ^2 L\) is positive definite at all critical points. To simplify the notation, we set \(a_i :=x_i - a\) and \(\hat{a}_i :=x_i - \hat{a}\), \(i=1,\ldots ,n\).
Estimation of \(\frac{\partial ^2 L}{\partial a^2}\): We split the sum in (2) as follows:
$$\begin{aligned} \sum _{i:a_i^2> \gamma ^2} w_i \frac{a_i^2-\gamma ^2 }{\bigl (a_i^2 + \gamma ^2\bigr )^2}&< \sum _{i:a_i^2> \gamma ^2} w_i \frac{a_i^2-\gamma ^2 }{\bigl ( a_i^2 + \gamma ^2\bigr ) (\gamma ^2 + \gamma ^2 )}\\&= \frac{1}{2\gamma ^2} \sum _{i:a_i^2 > \gamma ^2}w_i \frac{a_i^2-\gamma ^2 }{ a_i^2 + \gamma ^2 } \end{aligned}$$and similarly, since in this case the summands are negative,
$$\begin{aligned} \sum _{i:a_i^2<\gamma ^2} w_i \frac{a_i^2-\gamma ^2 }{\bigl (a_i^2 + \gamma ^2\bigr )^2}< \frac{1}{2\gamma ^2} \sum _{i:a_i^2<\gamma ^2}w_i \frac{a_i^2-\gamma ^2 }{ a_i^2 + \gamma ^2 }. \end{aligned}$$Since \(n \ge 3\) at least one of these sums exists. Thus, for \(\hat{\gamma }\) we have
$$\begin{aligned} \frac{\partial ^2 L}{\partial a^2}(\hat{a}, \hat{\gamma })&= - 2 \ \sum _{i=1}^n w_i\frac{\hat{a}_i^2 -{\hat{\gamma }}^2}{\bigl (\hat{a}_i^2 + {\hat{\gamma }}^2\bigr )^2}\\&>- \frac{1}{{\hat{\gamma }}^2} \sum _{i=1}^n w_i\frac{\hat{a}_i^2-{\hat{\gamma }}^2 }{ \hat{a}_i^2 + {\hat{\gamma }}^2 }\\&= - \frac{1}{{\hat{\gamma }}^2} \sum _{i=1}^n w_i \frac{\hat{a}_i^2+{\hat{\gamma }}^2-2{\hat{\gamma }}^2 }{ \hat{a}_i^2 + {\hat{\gamma }}^2 }\\&= - \frac{1}{{\hat{\gamma }}^2} + 2 \sum _{i=1}^n w_i \frac{1 }{ \hat{a}_i^2 + {\hat{\gamma }}^2 } = 0, \end{aligned}$$where the last equation follows by (5).
Estimation of \(\det \left( \nabla ^2 L({\hat{a}},{\hat{\gamma }}) \right) \): Using (2)–(3) we obtain
$$\begin{aligned}&\frac{1}{4} \det \left( \nabla ^2 L({\hat{a}},{\hat{\gamma }}) \right) \\&\quad =\left( \sum \limits _{i=1}^n w_i \frac{{\hat{\gamma }}^2 - {\hat{a}}_i^2}{\bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2}\right) \left( \sum \limits _{i=1}^n w_i \frac{{\hat{a}}_i^2-{\hat{\gamma }}^2}{\bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2}+\frac{1}{2{\hat{\gamma }}^2} \right) \\&\qquad -4\left( \sum \limits _{i=1}^n w_i\frac{{\hat{\gamma }} \hat{a}_i}{\bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2}\right) ^2. \end{aligned}$$With (5) we rewrite the first term as
$$\begin{aligned} \sum _{i=1}^n w_i\frac{{\hat{a}}_i^2-{\hat{\gamma }}^2 }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }&= \sum _{i=1}^n w_i\frac{{\hat{a}}_i^2+ {\hat{\gamma }}^2 -2{\hat{\gamma }}^2 }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }\\&= \frac{1}{2{\hat{\gamma }}^2} - 2{\hat{\gamma }}^2 \sum _{i=1}^n w_i\frac{1}{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }\\&= -\frac{1}{2{\hat{\gamma }}^2} + 2 \sum _{i=1}^n w_i\frac{1}{{\hat{a}}_i^2 + {\hat{\gamma }}^2}\\&\quad - 2 {\hat{\gamma }}^2\sum _{i=1}^n w_i\frac{1}{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 } \\&= -\frac{1}{2{\hat{\gamma }}^2} \sum _{i=1}^n w_i\frac{ \bigl ({\hat{a}}_i^2- {\hat{\gamma }}^2\bigr )^2 }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 } . \end{aligned}$$With the help of (5), we can simplify the second term as
$$\begin{aligned} \frac{\partial ^2 L}{\partial \gamma ^2}(a,{\hat{\gamma }}) = 4 \sum _{i=1}^n w_i\frac{a_i^2 }{\bigl (a_i^2 + \hat{\gamma }^2\bigr )^2} \end{aligned}$$and the third one using (4) by
$$\begin{aligned}&\sum _{i=1}^n w_i\frac{{\hat{a}}_i }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 } \\&\quad = -\frac{1}{2{\hat{\gamma }}^2} \sum _{i=1}^n w_i\frac{{\hat{a}}_i}{{\hat{a}}_i^2 + {\hat{\gamma }}^2} + \sum _{i=1}^n w_i \frac{{\hat{a}}_i }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 } \\&\quad = -\frac{1}{2{\hat{\gamma }}^2}\sum _{i=1}^n w_i\frac{ {\hat{a}}_i \bigl ({\hat{a}}_i^2 -{\hat{\gamma }}^2 \bigr )}{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }. \end{aligned}$$Therewith we obtain
$$\begin{aligned}&\frac{1}{4} \det \left( \nabla ^2 L(\hat{a},{\hat{\gamma }}) \right) \\&\quad = \frac{1}{{\hat{\gamma }}^2} \left( \left( \sum _{i=1}^n w_i\frac{ \bigl ({\hat{a}}_i^2- {\hat{\gamma }}^2\bigr )^2 }{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }\right) \left( \sum _{i=1}^n w_i\frac{{\hat{a}}_i^2 }{\bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2}\right) \right. \\&\left. \qquad - \left( \sum _{i=1}^n w_i\frac{{\hat{a}}_i\bigl ({\hat{a}}_i^2 -{\hat{\gamma }}^2 \bigr )}{ \bigl ({\hat{a}}_i^2 + {\hat{\gamma }}^2\bigr )^2 }\right) ^2 \right) \end{aligned}$$and by Cauchy–Schwarz’ inequality finally
$$\begin{aligned} \det \left( \nabla ^2 L(a,{\hat{\gamma }}) \right) >0. \end{aligned}$$Note that we have indeed strict inequality, since otherwise there must exist \(\lambda \in {\mathbb {R}}\) such that \({\hat{a}}_i = \lambda ({\hat{a}}_i^2 - {\hat{\gamma }}^2)\) for all \(i=1,\ldots ,n\), which is not possible since \(n \ge 3\).
-
2.
Next, we show that the first step implies that there is only one critical point. For any fixed \(a \in (x_1,x_n)\), let \({\hat{x}} (a) :={\hat{\gamma }} (a)^2\) denote the solution of (5) which is unique due to monotonicity, see Lemma 3. Bringing the summands in (5) to the same nominator, we see that \({\hat{x}} (a)\) is the unique real zero of a polynomial \(P(\cdot ,a)\) of degree n whose coefficients are again polynomials in a, say
$$\begin{aligned} P(x,a)&= x^n + p_{n-1} (a) x^{n-1} + \cdots + p_1(a) x + p_0(a). \end{aligned}$$We show that the zero \({\hat{x}}(a)\) of \(P(\cdot ,a)\) is differentiable in a. To this end, we consider the smooth function \(F:{\mathbb {R}}^n \times {\mathbb {R}} \rightarrow {\mathbb {R}}\) given by
$$\begin{aligned} F(c,x)&:=x^n + c_{n-1} x^{n-1} + \cdots + c_1 x + c_0,\\ c&:=(c_0,\ldots ,c_{n-1}). \end{aligned}$$For an arbitrary fixed \(a^* \in (x_1,x_n)\), we define
$$\begin{aligned} c^* :=(p_0(a^*),\ldots , p_{n-1} (a^*)) \end{aligned}$$and \(x^* :={\hat{x}} (a^*)\). Then we have \(F(c^*, x^*) = 0\) and, since \(x^*\) is a simple zero of \(P(\cdot ,a^*)\), it holds \( \frac{\partial }{\partial x} F(c^*,x^*) = P'(x^*,a^*) \not = 0 \). By the implicit function theorem, there exists a continuously differentiable function \(\varphi :{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) such that \(F(c,\varphi (c)) = 0\) in a neighborhood of \(c^*\). Thus, for \(c(a) :=(p_0(a), \ldots , p_{n-1}(a))\) with a in a neighborhood of \(a^*\) we have \({\hat{x}} (a) = \varphi (c(a))\) and
$$\begin{aligned} {\hat{x}} '(a) = \varphi (c(a))' = \nabla \varphi (c(a)) \times \left( p_0'(a) , \ldots ,p_{n-1}'(a) \right) , \end{aligned}$$which proves the claim.
Now, the minima of L are given by \(({\hat{a}},{\hat{\gamma }}({\hat{a}}))\). Assume that there exist two different minimizers \({\check{a}} < {\tilde{a}}\) of L. Since they are strict, \({\check{a}} \) and \({\tilde{a}}\) are also strict minimizers of the univariate function \(g(a) :=L(a,{\hat{\gamma }}(a))\). The function g(a) is continuous, so that there exists a maximizer \({\bar{a}} \in ({\check{a}}, {\tilde{a}})\) of g fulfilling
$$\begin{aligned} 0 = g'({\bar{a}}) = \nabla L({\bar{a}},{\hat{\gamma }}({\bar{a}})) \times \left( (1,{\hat{\gamma }}'({\bar{a}}) \right) . \end{aligned}$$By construction of \({\hat{\gamma }}\), we have \(\frac{\partial L}{\partial \gamma } ({\bar{a}},{\hat{\gamma }}({\bar{a}})) = 0\). This implies \(\frac{\partial L}{\partial a} ({\bar{a}},{\hat{\gamma }}({\bar{a}})) = 0\). Consequently, \(\nabla L({\bar{a}},{\hat{\gamma }}({\bar{a}})) = 0\) so that \(({\bar{a}},{\hat{\gamma }}({\bar{a}}))\) is critical point of L which is not a strict minimizer. This yields a contradiction, and thus, we have indeed only one critical point.
-
3.
Finally, to see the existence of a critical point, it remains to show that there exists an a such that \(S_1(a,{\hat{\gamma }}(a)) = 0\). By part 2 of the proof \(S_1(a,{\hat{\gamma }}(a))\) is a continuous function in a. By the proof of the next Lemma 1, we know that \(S_1(a,{\hat{\gamma }}(a)) >0\) for \(a\le x_1\) and \(S_1(a,{\hat{\gamma }}(a)) <0\) for \(a\ge x_n\), so that the function has indeed a zero. \(\square \)
1.2 Proof of Lemma 1:
Proof
By (4), a critical point of \(L(\cdot ,\gamma )\) has to fulfill \(s_1(a) = 0\), where \(s_1 :=\frac{S_1(\cdot ,\gamma )}{\gamma }\). All summands in \(s_1(a)\) become positive if \(a < x_1\) and negative if \(a > x_n\). Since \(n \ge 2\) this implies \(s_1(a) > 0\) for \(a \le x_1\) and \(s_1(a) < 0\) for \(a \ge x_n\). Hence, the zeros of \(s_1\) lie in \((x_1,x_n)\) and there exists at least one zero by continuity of \(s_1\). Further,
so that the zeros of \(s_1\) are the real roots of the nontrivial polynomial \(P_1\) of degree \(2n-1\), which are at most \(2n-1\). \(\square \)
1.3 Proof of Lemma 2:
Proof
From the previous proof, we know that the zeros of \(\frac{\partial L}{\partial a}(\cdot ,\gamma )\) coincide with those of \(P_1\). By (2), we have
so that the zeros of \(\frac{\partial ^2 L}{\partial a^2}(\cdot ,\gamma )\) are those of the polynomial \(P_2\). The coefficients of the polynomials \(P_i\), \(i=1,2\) are polynomials in \(x_1,\ldots ,x_n\). Now, (7) states that \(P_1\) and \(P_2\) have a common root, which implies that the resultant \({\text {Res}}(P_1,P_2)\) of \(P_1\) and \(P_2\) equals zero, see, e.g., [19]. The resultant is defined as the determinant of the associated Sylvester matrix, so it is a polynomial expression in the coefficients of \(P_1\) and \(P_2\) as well, i.e., a polynomial in \(x_1,\ldots ,x_n\). Since the set of roots of an n-variate polynomial is a set of measure zero in \(\mathbb {R}^n\), this finishes the proof. \(\square \)
1.4 Proof of Lemma 3:
Proof
-
1.
By (5), the critical points of \(L(a,\cdot )\) have to fulfill \(s_0(\gamma ^2) = \frac{1}{2}\), where \(s_0 (\gamma ^2) :=S_0(a,\gamma )\). The continuous function \(s_0\) is strictly increasing in \(\gamma ^2\). Since \(s_0(0) = 0\) and \(\lim _{\gamma \rightarrow \infty } s_0(\gamma ) = 1\), we conclude that \(s_0(\gamma ^2) = \frac{1}{2}\) has a unique solution \({\hat{\gamma }}^2\). Moreover, in (3), we obtain
$$\begin{aligned} \frac{\partial ^2 L}{\partial \gamma ^2}(a,{\hat{\gamma }}) = 4 \sum _{i=1}^n w_i\frac{(x_i-a)^2 }{\bigl ((x_i-a)^2 + \hat{\gamma }^2\bigr )^2}>0 \end{aligned}$$so that \({\hat{\gamma }}\) is a minimizer.
-
2.
Concerning the range of \(\gamma \), it follows from (5) that \({\hat{\gamma }}^2 \in \left( \min _i (x_i-a)^2,\max _i (x_i-a)^2 \right) \) which gives together with the fact \(a\in (x_1,x_n)\) the upper bound for \({\hat{\gamma }}\). To see the lower bound, assume that \({\hat{\gamma }}^2 \le d^2 \epsilon ^2\) and distinguish two cases:
-
(i)
First, let a be one of the sample points, say \(a=x_i\). Then, since \(s_0\) is strictly increasing and \((x_j-a)^2 \ge d^2\) for \(j\ne i\), it holds
$$\begin{aligned} S_0(a,\gamma )&< w_i + \sum _{j \not = i} w_j \frac{ d^2 \epsilon ^2}{d^2 + d^2 \epsilon ^2} \\&= w_i + (1-w_i)\frac{\epsilon ^2}{1+\epsilon ^2} \\&\le w_{\text {max}} + (1-w_{\text {max}})\frac{\epsilon ^2}{1+\epsilon ^2}\\&= \frac{1}{1+\epsilon ^2} \frac{1}{2}\le \frac{1}{2}, \end{aligned}$$which is in contradiction to (5).
-
(ii)
Next, let \(a \in (x_i,x_{i+1})\). Similarly we obtain in this case
$$\begin{aligned}&S_0(a,\gamma )\\&\quad < w_i \frac{d^2 \epsilon ^2}{(a-x_i)^2 + d^2 \epsilon ^2} + w_{i+1} \frac{d^2 \epsilon ^2}{(a-x_{i+1})^2 + d^2 \epsilon ^2}\\&\qquad + \frac{\epsilon ^2}{1+\epsilon ^2} (1-w_i - w_{i+1}). \end{aligned}$$If the weights are not equal, say \(w_i > w_{i+1}\), then the right-hand side becomes largest for \(a = x_i\) and we are in the previous case i). For \(w_i = w_{i+1} = w\), we get
$$\begin{aligned} S_0(a,\gamma )&< \epsilon ^2 w \left( \frac{1}{\left( \frac{a-x_i}{d}\right) ^2 + \epsilon ^2} + \frac{1}{\left( \frac{a-x_{i+1}}{d}\right) ^2 + \epsilon ^2}\right) \\&\quad + \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w) \end{aligned}$$and by replacing d by \(x_{i+1} -x_i\) and denoting by \(z \in \left[ 0,\frac{1}{2}\right] \) the distance of a to the midpoint of the normalized interval,
$$\begin{aligned} S_0(a,\gamma )&< \epsilon ^2 w \left( \frac{1}{ \left( \frac{1}{2} + z\right) ^2 + \epsilon ^2} + \frac{1}{\left( \frac{1}{2} - z\right) ^2 + \epsilon ^2}\right) \nonumber \\&\quad + \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w)\nonumber \\&= \epsilon ^2 w \left( \frac{2\left( \frac{1}{4} + z^2 + \epsilon ^2\right) }{\left( \frac{1}{4} - z^2\right) ^2 + 2 \epsilon ^2 \left( \frac{1}{4} + z^2\right) + \epsilon ^4} \right) \nonumber \\&\quad + \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w)\nonumber \\&= 2 \epsilon ^2 w \left( \frac{ \frac{1}{4} + z^2 + \epsilon ^2}{\left( \frac{1}{4} + z^2 + \epsilon ^2\right) ^2 - z^2} \right) \nonumber \\&\quad +\, \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w). \end{aligned}$$(20)Now, \(\frac{ \frac{1}{4} + z^2 + \epsilon ^2}{ \left( \frac{1}{4} + z^2 + \epsilon ^2\right) ^2 - z^2}\) becomes largest iff
$$\begin{aligned}&\frac{\left( \frac{1}{4} + z^2 + \epsilon ^2\right) ^2 - z^2}{ \frac{1}{4} + z^2 + \epsilon ^2}\\&\quad = \left( \frac{1}{4} + z^2 + \epsilon ^2\right) - \frac{z^2}{ \frac{1}{4} + z^2 + \epsilon ^2} \end{aligned}$$becomes smallest. Substituting \(y :=z^2 + \frac{1}{4} \in \left[ \frac{1}{4},\frac{1}{2}\right] \), we obtain the function
$$\begin{aligned} f(y)&:=y + \epsilon ^2 - \frac{y - \frac{1}{4}}{y + \epsilon ^2} \\&= - 1 + y + \epsilon ^2 + \frac{\frac{1}{4}+\epsilon ^2}{y + \epsilon ^2}, \end{aligned}$$whose derivatives are given by
$$\begin{aligned} f'(y) = 1-\frac{\epsilon ^2 + \frac{1}{4}}{(y + \epsilon ^2)^2},\qquad f''(y) = 2 \frac{\epsilon ^2 + \frac{1}{4}}{(y + \epsilon ^2)^2}. \end{aligned}$$Setting the derivative to zero results in the positive solution \(y = -\epsilon ^2 + \sqrt{\epsilon ^2 + \frac{1}{4}}\), which is the global minimum on \(\left[ \frac{1}{4},\frac{1}{2}\right] \) since f is convex. Resubstituting and plugging it in (20) yields
$$\begin{aligned} S_0(a,\gamma )&< 2 \epsilon ^2 w \frac{1}{2\sqrt{\epsilon ^2 + \frac{1}{4}}+1}+ \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w) \\&\le w \epsilon ^2+ \frac{\epsilon ^2}{1+\epsilon ^2}(1-2w)\\&= \underbrace{\epsilon ^2\left( \frac{\epsilon ^2 -1}{\epsilon ^2+1} \right) }_{<0}w + \frac{\epsilon ^2}{1+\epsilon ^2}\\&\le \frac{\epsilon ^2}{1+\epsilon ^2}\le \frac{1}{3}, \end{aligned}$$since \(\epsilon ^2 \in \left( 0,\frac{1}{2}\right) \).
-
(i)
\(\square \)
Appendix
This appendix contains the proofs of Sect. 4.
1.1 Proof of Theorem 2:
Proof
-
1.
We show that the objective function \(L(a_r,\gamma _r)\) decreases for increasing r. By concavity of the logarithm, we have
$$\begin{aligned}&L(a_{r+1},\gamma _{r+1}) - L(a_r,\gamma _r) \\&\quad = \sum _{i=1}^n w_i \log \left( \frac{(x_i-a_{r+1})^2 + \gamma _{r+1}^2}{(x_i-a_{r})^2 + \gamma _{r}^2} \frac{\gamma _{r}}{\gamma _{r+1}}\right) \\&\quad \le \log \Biggl ( \underbrace{\sum _{i=1}^n w_i\frac{(x_i-a_{r+1})^2 + \gamma _{r+1}^2}{(x_i-a_{r})^2 + \gamma _{r}^2} \frac{\gamma _{r}}{\gamma _{r+1}} }_{\varUpsilon }\Biggr ), \end{aligned}$$so that it suffices to show that \(\varUpsilon \le 1\). Setting \(S_{0r} :=S_0(a_r,\gamma _r)\) and \(S_{1r} :=S_1(a_r,\gamma _r)\), we obtain with Algorithm 1
$$\begin{aligned} \varUpsilon&= \sqrt{\frac{S_{0r}}{1-S_{0r}}} \sum _{i=1}^n w_i\frac{(x_i-a_r + a_r- a_{r+1})^2 + \frac{1-S_{0r}}{S_{0r}} \gamma _{r}^2}{(x_i-a_{r})^2 + \gamma _{r}^2} \\&= \sqrt{\frac{S_{0r}}{1-S_{0r}}} \Bigg ( \underbrace{\sum _{i=1}^n w_i\frac{(x_i-a_r )^2 }{(x_i-a_{r})^2 + \gamma _{r}^2}}_{1-S_{0r}}\\&\quad +2 \underbrace{(a_r - a_{r+1} )}_{-\gamma _r \tfrac{S_{1r}}{S_{0r}}} \underbrace{\sum _{i=1}^n w_i\frac{x_i-a_r }{(x_i-a_{r})^2 + \gamma _{r}^2}}_{\tfrac{S_{1r}}{\gamma _r}}\\&\quad + \sum _{i=1}^n w_i\frac{\overbrace{(a_r-a_{r+1})^2}^{\gamma _r^2\tfrac{S_{1r}^2}{S_{0r}^2}} }{(x_i-a_{r})^2 + \gamma _{r}^2}\\&\quad + \frac{1-S_{0r}}{S_{0r}} \underbrace{\sum _{i=1}^n w_i \frac{\gamma _r^2 }{(x_i-a_{r})^2 + \gamma _{r}^2}}_{S_{0r}} \Bigg )\\&= 2 \sqrt{S_{0r} (1-S_{0r})} - 2 \frac{S_{1r}^2}{\sqrt{S_{0r} (1-S_{0r}) }}\\&\quad +\sqrt{\frac{S_{0r}}{1-S_{0r}}} \frac{S_{1r}^2}{S_{0r}^2} S_{0r}\\&= 2 \sqrt{S_{0r} (1-S_{0r})} - \frac{S_{1r}^2}{\sqrt{S_{0r} (1-S_{0r})}}. \end{aligned}$$The function
$$\begin{aligned} f:(0,1) \rightarrow {\mathbb {R}},\quad f(z) :=2\sqrt{z(1-z)} - \frac{\alpha ^2}{\sqrt{z(1-z)}} \end{aligned}$$attains its global maximum in \(z = \frac{1}{2}\), where \(f(\frac{1}{2}) = 1 - 2\alpha ^2 \le 1\). Consequently, \( \varUpsilon \le 1 \) with equality if and only if \(S_{1r} = 0\) and \(S_{0r} = \frac{1}{2}\), that is, \((a_{r+1},\gamma _{r+1}) = (a_r,\gamma _r)\).
-
2.
By (5) and (4), we know that \((a_{r+1},\gamma _{r+1}) = (a_r,\gamma _r)\) is a fixed point of \((a_{r+1},\gamma _{r+1}) :=T(a_r,\gamma _r)\) in Algorithm 1 if and only if it is the minimizer of L. Let \((a_{r+1},\gamma _{r+1}) \not = (a_r,\gamma _r)\) for all \(r \in {\mathbb {N}}_0\). The sequence \(\{(a_r,\gamma _r)\}_{r\in \mathbb {N}}\) is bounded: for \(a_r\) we have by (8) that \(a_r\) is always a convex combination of the \(x_i\) so that \(a_r \in (x_1,x_n)\); for \(\gamma _r\) this follows from Lemma 3 and Theorem 5, that is shown later on. Together with part 1 of the proof, we see that \(L_r :=L(a_r,\gamma _r)\) is a strictly decreasing, bounded sequence of numbers which must converge to some number \({\hat{L}}\). Further, \(\{ (a_r,\gamma _r)\}_{r\in \mathbb {N}}\) contains a convergent subsequence \(\{ (a_{r_j},\gamma _{r_j})\}_{j\in \mathbb {N}}\) which converges to some \(({\hat{a}},{\hat{\gamma }})\). By the continuity of L and T, we obtain
$$\begin{aligned} L ({\hat{a}},{\hat{\gamma }})&= \lim _{j\rightarrow \infty } L(a_{r_j}, \gamma _{r_j}) = \lim _{j\rightarrow \infty } L_{r_j}=\lim _{j\rightarrow \infty } L_{r_j+1} \\&= \lim _{j\rightarrow \infty } L(a_{r_j+1}, \gamma _{r_j+1}) \\&= \lim _{j\rightarrow \infty } L\left( T(a_{r_j}, \gamma _{r_j}) \right) = L\left( T({\hat{a}},{\hat{\gamma }}) \right) . \end{aligned}$$However, this implies \(({\hat{a}},{\hat{\gamma }}) = T({\hat{a}},{\hat{\gamma }})\) so that \(({\hat{a}},{\hat{\gamma }})\) is a fixed point of T and consequently the minimizer. Since the minimizer is unique, the whole sequence \(\{(a_r,\gamma _r)\}_{r\in \mathbb {N}}\) converges to \(({\hat{a}},{\hat{\gamma }})\) and we are done.
\(\square \)
1.2 Proof of Theorem 3:
Proof
We follow the lines of the proof of Theorem 2. Recall that
-
1.
First, we show that the objective function \(Q(a_r)\) decreases for increasing r. By concavity of the logarithm, we have
$$\begin{aligned} Q(a_{r+1})-Q(a_r)&= L(a_{r+1},\gamma ) - L(a_r,\gamma ) \\&= \sum _{i=1}^n w_i \log \left( \frac{(x_i-a_{r+1})^2 + \gamma ^2}{(x_i-a_{r})^2 + \gamma ^2} \right) \\&\le \log \Biggl ( \underbrace{\sum _{i=1}^n w_i\frac{(x_i-a_{r+1})^2 + \gamma ^2}{(x_i-a_{r})^2 + \gamma ^2} }_{\varUpsilon }\Biggr ), \end{aligned}$$and it suffices to show that \(\varUpsilon \le 1\). Setting \(S_{0r} :=S_0(a_r,\gamma )\) and \(S_{1r} :=S_1(a_r,\gamma )\) (note that \(\gamma \) is fixed here) we obtain with Algorithm 2
$$\begin{aligned} \varUpsilon&= \sum _{i=1}^n w_i\frac{(x_i-a_{r+1})^2 + \gamma ^2}{(x_i-a_{r})^2 + \gamma ^2}\\&= 1-S_{0r} - \frac{S_{1r}^2}{S_{0r}} + S_{0r}= 1 - \frac{S_{1r}^2}{S_{0r} }\le 1 \end{aligned}$$with equality if and only if \(S_{1r} = 0\), i.e., \(a_{r+1}= a_r\), in which case \({\hat{a}} :=a_r\) is a critical point of Q.
-
2.
If \(a_{r+1}\not = a_r\) for all \(r \in {\mathbb {N}}_0\), the sequence \(Q_r :=Q( a_r)\) is strictly decreasing and bounded below by \(\log (\gamma ^2)\), so that \(Q_r \rightarrow {\hat{Q}}\) as \(r \rightarrow \infty \). Further, since Q is continuous and coercive, the sequence \(\{a_r\}_{r\in \mathbb {N}}\) is bounded. Consequently, it contains a convergent subsequence \(\{a_{r_j} \}_{j\in \mathbb {N}}\) which converges to some \({\hat{a}}\) and by continuity of Q we have \(Q({\hat{a}}) = {\hat{Q}}\). By continuity of Q and the operator \(T_1\) given by \(a_{r+1} :=T_1(a_r)\) in Algorithm 2, it follows
$$\begin{aligned} Q({\hat{a}})&= \lim _{j \rightarrow \infty } Q(a_{r_j}) = \lim _{j \rightarrow \infty } Q_{r_j} = \lim _{j \rightarrow \infty } Q_{r_j+1}\\&= \lim _{j \rightarrow \infty } Q(a_{r_j+1}) = \lim _{j \rightarrow \infty } Q(T_1(a_{r_j}) = Q(T_1({\hat{a}})). \end{aligned}$$By the first part of the proof, this implies that \({\hat{a}}\) is a fixed point of \(T_1\) and thus a critical point of Q.
-
3.
Observing that \(\frac{S_{1r}^2}{S_{0r}} = \frac{S_0r}{\gamma ^2}(a_{r+1}-a_r)^2\) and \(- \log (1-y) \ge y\), \(y \in (0,1)\), we have
$$\begin{aligned} Q(a_r) - Q(a_{r+1})&\ge - \log \left( 1- \frac{S_{0r}}{\gamma ^2}(a_{r+1}-a_r)^2\right) \\&\ge \frac{S_{0r}}{\gamma ^2} (a_r-a_{r+1})^2. \end{aligned}$$Since \(1+|x-y|^2 <2 (1+|x|^2)(1+|y|^2)\) and \(a_r\in [x_1,x_n]\), we estimate
$$\begin{aligned} \frac{S_0r}{\gamma ^2}&= \sum _{i=1}^n w_i \frac{1 }{(x_i-a_r)+\gamma ^2} \\&\ge \frac{\gamma ^2}{2\bigl ( a_r^2+\gamma ^2 \bigr )} \sum _{i=1}^n w_i \frac{1 }{x_i^2+ \gamma ^2} \\&\ge \frac{\gamma ^2}{2\bigl (\max \{ x_1^2,x_n^2\} +\gamma ^2 \bigr )} \sum _{i=1}^n w_i\frac{1 }{x_i^2+\gamma ^2} \\&=:\tau _0 > 0, \end{aligned}$$which results in
$$\begin{aligned} Q(a_r) - Q(a_{r+1}) \ge \tau _0 (a_r-a_{r+1})^2. \end{aligned}$$Since by the second part of the proof \(\lim _{r\rightarrow \infty } Q(a_r) - Q(a_{r+1}) = 0\), we also have \(\lim _{r\rightarrow \infty } |a_r-a_{r+1}| = 0\).
-
4.
Assume now that there exists a subsequence \(\{a_{r_l} \}_{l\in \mathbb {N}}\) which converges to some \(a^*\ne {\hat{a}}\). Since the set of critical points is finite, there exists \(\varepsilon > 0\) such that \(|{\hat{a}} - a^*| \ge \varepsilon \). On the other hand, we have by the third part of the proof for l, j large enough that \(\varepsilon > |a_{r_l} - a_{r_j}|\). For \(l,j \rightarrow \infty \), this leads to a contradiction. \(\square \)
1.3 Proof of Theorem 4:
Proof
By Theorem 3, we know that \({\hat{a}} = \lim _{r\rightarrow \infty } a_r\) exists and \({\hat{a}} = T({\hat{a}})\) is a stationary points of Q fulfilling \(Q'({\hat{a}}) = 0\). We distinguish the following cases:
We show that Case I occurs with probability zero and that in Case II, the probability of \({\hat{a}}\) being a local minimum is one. By (8), we get
Rearranging yields
where \(p_1\) and \(p_2\) are polynomials. This polynomial equation in \(a_r\) has only finitely many solutions (up to \(2(n-1)\)). Recursively, backtracking each of the possible values for \(a_r\) in a similar way, we end up with at most \(2^{r+1} (n-1)^{r+1}\) starting points \(a_0 \in (x_1,x_n)\) that can lead to the point \(a_{r+1}\) after exactly \(r+1\) iterations.
Case I As seen above, there are only finitely many starting points \(a_0\) for which the sequence \(\{ a_r\}_{r\in \mathbb {N}}\) reaches a fixed point after exactly \(r_0\) steps. Since the set of natural numbers \(\mathbb {N}\) is countable and countable unions of finite sets are countable, the set of starting points leading to Case I is countable and has consequently Lebesgue measure zero.
Case II Since Q is smooth, there might occur the following cases for the critical point \({\hat{a}}\): a) \(Q''({\hat{a}}) <0\) (local maximum), b) \(Q''({\hat{a}}) =0\) (saddle point), c) \(Q''({\hat{a}}) >0\) (local minimum). Indeed, case a) cannot happen since we have seen in the proof of Theorem 3 that \(\{Q_r\}_{r\in \mathbb {N}}\) is decreasing. Addressing case b), according to Lemma 2 the function Q has with probability one only minima and maxima, but no saddle points. Since cases a) and b) occur each with probability zero, case c) occurs with probability one. This finishes the proof. \(\square \)
1.4 Proof of Theorem 5:
Proof
-
1.
First, we show property (11). From (5), we see immediately
$$\begin{aligned} S_0(a,\gamma ) {\left\{ \begin{array}{ll}< \frac{1}{2}&{}\quad \text {if } \gamma < {\hat{\gamma }},\\ = \frac{1}{2}&{}\quad \text {if } \gamma = {\hat{\gamma }},\\> \frac{1}{2}&{}\quad \text {if } \gamma > {\hat{\gamma }}, \end{array}\right. } \end{aligned}$$so that \(\gamma _r < {\hat{\gamma }}\) implies \(\gamma _{r+1} > \gamma _r\) and \(\gamma _r > {\hat{\gamma }}\) results in \(\gamma _{r+1} < \gamma _r\). To see that the iterates cannot skip \({\hat{\gamma }}\) we consider the quotient
$$\begin{aligned} \frac{\gamma _{r+1}^2}{{\hat{\gamma }}^2} = \frac{\gamma _r^2}{{\hat{\gamma }}^2}\frac{1-S_0(a,\gamma _r)}{S_0(a,\gamma _r)} = \frac{\sum \nolimits _{i=1}^n w_i \frac{a_i^2}{\alpha a_i^2+ {\hat{\gamma }}^2} }{\sum \nolimits _{i=1}^n w_i \frac{{\hat{\gamma }}^2}{\alpha a_i^2 + {\hat{\gamma }}^2 }}, \end{aligned}$$where \(\alpha :=\left( \frac{{\hat{\gamma }} }{\gamma _r} \right) ^2\). We have to show that \(\alpha < 1\) implies \(\frac{\gamma _{r+1}^2}{{\hat{\gamma }}^2} > 1\) and conversely, \(\alpha > 1\) implies \(\frac{\gamma _{r+1}^2}{{\hat{\gamma }}^2} < 1\). Alternatively, we can prove that the function
$$\begin{aligned} f(\alpha ) = \sum _{i=1}^n w_i \frac{\gamma ^2-a_i^2 }{\alpha a_i^2 + \gamma ^2}, \ \end{aligned}$$fulfills
$$\begin{aligned} f(\alpha ) \left\{ \begin{array}{ll} < 0 &{}\quad \mathrm {if} \; \alpha \in (0,1), \\ > 0 &{}\quad \mathrm {if} \; \alpha \in (1,+\infty ). \end{array} \right. \end{aligned}$$We have \(f(1) = 0\), and the derivatives of f are given by
$$\begin{aligned} f'(\alpha )&= \sum _{i=1}^nw_i \frac{a_i^2(a_i^2 -\gamma ^2)}{(\alpha a_i^2 + \gamma ^2)^2},\\ f''(\alpha )&=2 \sum _{i=1}^n w_i \frac{a_i^4(\gamma ^2-a_i^2)}{(\alpha a_i^2 + \gamma ^2)^3}. \end{aligned}$$For \(f'\) we estimate similarly as in the proof of Theorem 1,
$$\begin{aligned}&\sum _{i:a_i^2> \gamma ^2} w_i \frac{a_i^2(a_i^2 -\gamma ^2)}{(\alpha a_i^2 + \gamma ^2)^2}\\&\quad> \sum _{i:a_i^2> \gamma ^2}^n w_i \frac{a_i^2(a_i^2 -\gamma ^2)}{(\alpha a_i^2 + a_i^2)(\alpha a_i^2 + \gamma ^2)} \\&\quad =\frac{1}{\alpha +1}\sum _{i:a_i^2 > \gamma ^2}^n w_i \frac{a_i^2 - \gamma ^2 }{\alpha a_i^2 + \gamma ^2} \end{aligned}$$and analogously for the negative summands, so that in summary
$$\begin{aligned} f'(\alpha )&> \frac{1}{\alpha +1}\sum _{i=1}^n w_i \frac{a_i^2 -\gamma ^2}{\alpha a_i^2 + \gamma ^2} = -\frac{1}{\alpha +1}f(\alpha ). \end{aligned}$$(21)Analogously, we obtain for \(f''\)
$$\begin{aligned} f''(\alpha )&< \frac{2}{\alpha +1}\sum _{i=1}^n w_i \frac{a_i^2(\gamma ^2-a_i^2)}{(\alpha a_i^2 + \gamma ^2)^2} = -\frac{2}{\alpha +1}f'(\alpha ).\nonumber \\ \end{aligned}$$(22)From (21), it follows \(f'(1) > \frac{1}{2} f(1) = 0\) and therewith further \(f(\alpha ) <0 \) for \(\alpha \in (0,1)\). Consider the case \(\alpha > 1\). By continuity of f, we have \(f(\alpha )>0\) for \(\alpha \) sufficiently close to 1. Since \(\lim _{\alpha \rightarrow \infty } f(\alpha ) = 0\), we conclude that \(f'\) has at least one root for \(\alpha > 1\). On the other hand, \(f'\) has at most one root, since according to (22) any root of \(f'\) is a local maximum of f. Thus, \(f'\) has exactly one root, so that f has exactly one critical point (a local maximum). Since \(\lim _{\alpha \rightarrow \infty } f(\alpha ) = 0\), this implies \(f(\alpha )>0\) for all \(\alpha >1\).
-
2.
Since \(L(a,\cdot )\) is continuous and has only one critical point, it follows immediately from (11) that \(L(a,\gamma _r) \ge L(a,\gamma _{r+1}) = L(a,T_2 (\gamma _r))\) with equality if and only if \(\gamma _r = \gamma _{r+1} = {\hat{\gamma }}\). In the latter case, we are done, so assume that \(\gamma _r \not = \gamma _{r+1}\) for all \(r \in {\mathbb {N}}_0\). By part 1 of the proof, the sequence \(\{\gamma _r\}_r\) is monotone and bounded, so it converges to some \(\gamma ^*\). By continuity of \(L(a,\cdot )\) and \(T_2\), we get
$$\begin{aligned} L(a,\gamma ^*)&= \lim _{r \rightarrow \infty } L(a,\gamma _r) = \lim _{r \rightarrow \infty } L(a,\gamma _{r+1})\\&= \lim _{r \rightarrow \infty } L(a,T_2(\gamma _r)) = L(a,T_2(\gamma ^*)), \end{aligned}$$which is only possible it \(\gamma ^* = T_2(\gamma ^*)\), i.e., if \(\gamma ^* = {\hat{\gamma }}\).
\(\square \)
Appendix
This appendix contains the proofs of Sect. 5.
1.1 Proof of Lemma 4:
Proof
Consider the functions \(g(x) :=\frac{1}{1 + x^2}\) and \(h(x) :=\frac{x}{1 + x^2}\). Both functions are measurable and since
where p denotes the density function of \(C(a,\gamma )\); the expected values \(\mathbb {E}(Y),\mathbb {E}(Z)\) exist.
For g and \(a\ne 0\), we compute
Since \(\lim _{x\rightarrow \pm \infty }\log \left( \frac{(x-a)^2 + \gamma ^2}{x^2 + 1}\right) = 0\) and \(\lim _{x\rightarrow \pm \infty } \arctan (x) = \pm \frac{\pi }{2}\), we obtain the first equation in (13). For \(a=0\), we have
which results in the second equation in (13). Similarly, for h and \( (a,\gamma ) \not = (0,1)\),
so that the first equation in (14) follows. Finally, for \((a,\gamma ) = (0,1)\) it holds
and consequently \(E(Z) = 0\). \(\square \)
1.2 Proof of Corollary 1:
Proof
By Proposition 1, we have \(X_r\sim C\left( \tfrac{a-a_r}{\gamma _r},\tfrac{\gamma }{\gamma _r}\right) \). Setting \(\tilde{a} = \frac{a-a_r}{\gamma _r}\) and \(\tilde{\gamma } = \frac{\gamma }{\gamma _r}\), and applying the results of Lemma 4, we obtain
Similarly, we compute
\(\square \)
1.3 Proof of Theorem 6:
Proof
(i) Since \(\gamma ,\tilde{\gamma }_0>0\), it follows inductively from (18) that \(\tilde{\gamma }_r>0\). Further, if \(\tilde{\gamma }_r<\gamma \), then
On the other hand, if \(\tilde{\gamma }_r>\gamma \), then it holds
Thus, to summarize it holds \(\tilde{\gamma }^2_{r+1}\ge \min \{\tilde{\gamma }^2_r,\gamma ^2\}\) and inductively we have \(\tilde{\gamma }^2_{r+1}\ge \min \{\tilde{\gamma }^2_0,\gamma ^2\}\).
(ii) Since \(\gamma ,\tilde{\gamma }_r>0\), this is a direct consequence of (16).
(iii) Let \(q = \max \left\{ \frac{1}{2},\frac{\gamma }{\gamma + \tilde{\gamma }_0}\right\} \), so that \(\frac{1}{2} \le q < 1\). For the sequence \(\{\tilde{a}_r\}_{r\in \mathbb {N}}\) we estimate
Similarly, we obtain for the sequence \(\{\tilde{\gamma }_r\}_{r\in \mathbb {N}}\) ,
\(\square \)
1.4 Proof of Theorem 7:
Proof
By strict concavity of the logarithm function and since \(w_i > 0\), we have
with equality if and only if \((a_r,\gamma _r) = (a_{r+1},\gamma _{r+1})\). From Algorithm 4, we obtain similarly as in the proof of 2,
Thus, \(L(a_{r+1},\gamma _{r+1}) \le L(a_{r},\gamma _{r})\) with equality if and only if \((a_r,\gamma _r) = (a_{r+1},\gamma _{r+1})\). The convergence result follows as in part 2 of the proof of Theorem 2. \(\square \)
Appendix
This appendix contains the proof of Sect. 6.
1.1 Proof of Lemma 5:
Proof
Under \(\mathcal {H}_0\) (i.e., \(n=2\)), the ML estimate is not unique, but one easily verifies using (4) that
and therewith
Note that although the ML estimate \(\hat{\theta } \) is not unique, the value of the log-likelihood function does not change when using another solution.
Under \(\mathcal {H}_1\) (i.e., \(n=1\)), the ML estimate simply reads as
resulting in
Therewith, the LR statistic becomes
\(\square \)
Rights and permissions
About this article
Cite this article
Laus, F., Pierre, F. & Steidl, G. Nonlocal Myriad Filters for Cauchy Noise Removal. J Math Imaging Vis 60, 1324–1354 (2018). https://doi.org/10.1007/s10851-018-0816-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-018-0816-y