soft open fences
Linear Convergence in Hilbert’s Projective Metric for Computing Augustin Information
and a Rényi Information Measure
Abstract
Consider the problems of computing the Augustin information and a Rényi information measure of statistical independence, previously explored by Lapidoth and Pfister (IEEE Information Theory Workshop, 2018) and Tomamichel and Hayashi (IEEE Trans. Inf. Theory, 64(2):1064–-1082, 2018).
Both quantities are defined as solutions to optimization problems and lack closed-form expressions.
This paper analyzes two iterative algorithms: Augustin’s fixed-point iteration for computing the Augustin information, and the algorithm by Kamatsuka et al. (arXiv:2404.10950) for the Rényi information measure.
Previously, it was only known that these algorithms converge asymptotically.
We establish the linear convergence of Augustin’s algorithm for the Augustin information of order and Kamatsuka et al.’s algorithm for the Rényi information measure of order , using Hilbert’s projective metric.
1 Introduction
Denote by the set of probability distributions over the finite set . For any , the order- Augustin information is defined by the following optimization problem [Augustin, 1978]:
(1) |
where is a given probability distribution over , and
is the order- Rényi divergence. The Augustin information characterizes, e.g., the cutoff rate, the strong converse exponent, and the error exponent in the channel coding problem [Arimoto, 1973, Csiszár, 1995, Csiszár and Körner, 2011, Nakiboğlu, 2019, Wang et al., 2024]. When , the optimization problem (1) specializes to the definition of the log-optimal portfolio [Cover, 1984], and is equivalent to the definition of the maximum-likelihood estimate in Poisson inverse problems [Vardi and Lee, 1993].
The optimization problem (1) does not admit a closed-form expression. While the optimization problem is convex, the objective function violates the standard smoothness assumption in the optimization literature. Therefore, even the convergence guarantees of projected gradient descent, arguably the simplest convex optimization algorithm, do not directly apply [You et al., 2022].
Augustin [1978] proposed the following fixed-point iteration to solve the optimization problem (1):
(2) |
where is the normalizing constant, ensuring that remains a probability distribution, and denotes the entry-wise product. The algorithm was later rediscovered by Karakos et al. [2008]111Karakos et al. [2008] proposed an alternating minimization method whose iteration consists of two steps. Combining the two steps yields Augustin’s fixed-point iteration.. When , this fixed-point iteration coincides with Cover’s method for computing the log-optimal portfolio [Cover, 1984], and is equivalent to the expectation maximization algorithm for solving Poisson inverse problems [Richardson, 1972, Lucy, 1974, Shepp and Vardi, 1982, Vardi and Lee, 1993].
Recently, Kamatsuka et al. [2024] proposed an algorithm similar to Augustin’s fixed-point iteration to compute a Rényi information measure of statistical independence, which was explored by Lapidoth and Pfister [2019] and Tomamichel and Hayashi [2018]. For any , this order- Rényi information measure is defined by the following optimization problem:
(3) |
where is a given probability distribution over and denotes the tensor product. The Rényi information measure emerges in the error exponent of a hypothesis testing problem, where we test against the independence of two random variables given independent and identically distributed (i.i.d.) samples from their joint distribution [Lapidoth and Pfister, 2018, 2019, Tomamichel and Hayashi, 2018].
Kamatsuka et al.’s algorithm to compute the Rényi information measure iterates as:
(4) | ||||
where and are normalizing constants, ensuring that and remain probability distributions. The notation denotes the entry-wise power for any vector and number . This iterative algorithm is reminiscent of Augustin’s fixed-point iteration but differs in the powers applied to the gradients.
The convergence behaviors of Augustin’s fixed point iteration and Kamatsuka et al.’s algorithm remain largely unclear. For Augustin’s fixed-point iteration, Karakos et al. [2008] and Nakiboğlu [2019] have shown that it asymptotically converges for ; Iusem [1992] and Lin et al. [2021] have proved a convergence rate of for the case where approaches zero. For Kumatsuka et al.’s algorithm, Kamatsuka et al. [2024] have shown that it asymptotically converges for .
We aim to carry out non-asymptotic analyses for the two algorithms. One common approach to analyzing an iterative method is to show that it is contractive under a suitable metric. Since the two algorithms (2) and (4) map positive vectors to positive vectors, we view them as positive dynamical systems and consider the so-called Hilbert’s projective metric [Lemmens and Nussbaum, 2012, Krause, 2015].
In this work, we prove that with respect to Hilbert’s projective metric, Augustin’s fixed-point iteration is contractive for , and Kamatsuka et al.’s algorithm is also contractive for . Based on these contractivity results, we establish the following non-asymptotic convergence guarantees for the two algorithms.
-
•
For computing the Augustin information of order , Augustin’s fixed-point iteration converges at a rate of with respect to Hilbert’s projective metric. This improves on the previous asymptotic convergence guarantee [Karakos et al., 2008, Nakiboğlu, 2019] when and extends the range of convergence to include .
-
•
For computing the Rényi information measure of order , the iterative algorithm of Kamatsuka et al. converges at a rate of with respect to Hilbert’s projective metric. When , this method also converges linearly if has full support. This improves on the previous asymptotic convergence guarantee [Kamatsuka et al., 2024].
Notations
We write and for the sets of non-negative and strictly positive numbers, respectively. For any positive integer , we write for the set . Let and . We write for the -th entry of the vector , and the -th entry of the matrix . We write for the entry-wise product between and . We write for the matrix . For a set , we denote by its relative interior. We will adopt the convention that , , , for any , and . We call the probability simplex and view elements in as -dimensional vectors.
2 Related Work
We have discussed Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in Section 1. This section reviews other optimization algorithms for computing the Augustin information and the Rényi information measure.
For computing the Augusitin information of order , entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022], as well as a variant of Augustin’s fixed-point iteration explored by Cheng and Nakiboğlu [2021, Lemma 6], all achieve asymptotic convergence for all . Riemannian gradient descent with the Poincaré metric [Wang et al., 2024] converges at a rate of for all . An alternating minimization method due to Kamatsuka et al. [2024]222Kamatsuka et al. [2024] only claimed an asymptotic convergence guarantee in their paper. We find that their Lemma 2 indeed implies a convergence rate of . also achieves a converges rate of , but for a narrower range of . None of the existing works have yet established a linear convergence rate.
For computing the Rényi information measure of order , entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022] both asymptotically converge for . However, when , the optimization problem (3) becomes non-convex [Lapidoth and Pfister, 2019], and currently, there are no known algorithms that provably solve this problem. Similarly to the computation of the Augustin information, none of the existing works have established a linear convergence rate.
3 Preliminaries
Our analyses are based on properties of Hilbert’s projective metric and Birkhoff’s contraction theorem, which we introduce in this section.
Let be a closed cone in a finite-dimensional real vector space, such as the positive orthant and the set of Hermitian positive semidefinite matrices. For any , we write if . For any , define
(5) |
If the set is empty, then .
Definition 1.
Hilbert’s projective metric is defined as
In addition, is defined to be .
The following lemma shows that is indeed a metric on the set of rays.
Lemma 2.
The following properties hold.
-
(i)
For any and any , we have .
-
(ii)
We have if and only if for some .
In the rest of the paper, we will only consider the cone .
Lemma 3.
Consider Hilbert’s projective metric on the cone .
-
(i)
For any , we have
(6) -
(ii)
is a metric space [Lemmens and Nussbaum, 2012, Proposition 2.1.1].
Given the second item above, we will measure the errors of both Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in terms of Hilbert’s projective metric between their iterates and the minimizer. The following lemma lists several properties of Hilbert’s projective metric, which are direct consequences of Corollary 2.1.4 and Corollary 2.1.5 of Lemmens and Nussbaum [2012].
Lemma 4.
The following properties hold.
-
(i)
for any and any .
-
(ii)
for any .
-
(iii)
for any and .
When the matrix in Lemma 4 (iii) is entry-wise strictly positive, Birkhoff [1957] showed that linear transformation defined by it is a contraction.
Theorem 5 (Birkhoff [1957]).
Let . It holds that
where
and
4 Linear Rate of Augustin’s Fixed-Point Iteration
In this section, we show that Augustin’s fixed-point iteration converges linearly with respect to Hilbert’s projective metric for computing the Augustin information of order .
4.1 Augustin’s Fixed-Point Iteration
Define the following operators:
(7) |
Augustin’s fixed-point iteration (2) can be equivalent written as follows:
-
1.
Initialize .
-
2.
For all , compute .
4.2 Linear Rate Guarantee
The main result of this section is the following theorem, which bounds the Lipschitz constant of the mapping with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.
Theorem 7.
For , we have
(8) |
where .
Linear convergence of Augustin’s fixed-point iteration for immediately follows.
Corollary 8.
4.3 Proof of Theorem 7
The proof primarily consists of two steps, which are reflected by the following two lemmas. First, we show that the operators are Lipschitz with respect to Hilbert’s projective metric and bound the Lipschitz constant. Then, given that , we prove a general lemma that bounds Hilbert’s projective metric between two random probability vectors in terms of Hilbert’s projective metric between their realizations, which is of independent interest. The proofs of both lemmas are deferred to Appendix A.
Lemma 10.
For any and ,
Lemma 11.
Let be two random probability vectors, where denotes the sample space. We have
5 Linear Rate of Kamatsuka et al.’s Algorithm
In this section, we show that Kamatsuka et al.’s algorithm converges linearly with respect to Hilbert’s projective metric for computing the Rényi information measure of order .
For convenience, we will view any as a matrix in whose entries sum to . We will denote Hilbert’s projective metric on both and by . The associated cone should be clear from the context.
5.1 Kamatsuka et al.’s Algorithm
Define the following two operators:
(9) |
Kamatsuka et al.’s algorithm (4) can be equivalently written as follows:
-
1.
Initialize , and compute .
-
2.
For all , compute and .
This algorithm is inspired by the following lemma [Lapidoth and Pfister, 2019, Lemma 16].
Lemma 12.
For , every minimizer of the optimization problem (3) satisfies and .
5.2 Linear Rate Guarantee
The following theorem presents a key observation, showing that the operators and have a Lipschitz constant of with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.
Theorem 13.
For , we have
(10) | ||||||
where . Moreover, if , then the Lipschitz constant can be improved to
where is defined in Theorem 5.
Theorem 13 implies the following corollary, showing that the iterative algorithm converges linearly. The proof of the corollary is deferred to Appendix A.
5.3 Proof of Theorem 13
6 Discussions
We have proved that Augustin’s fixed-point iteration converges at a linear rate for computing the Augustin information of order , and that Kamatsuka et al.’s algorithm converges at a linear rate for computing the Rényi information measure of order . In contrast, existing results are asymptotic and apply to a narrower range of . Our proofs are simple, demonstrating the effectiveness of selecting an appropriate mathematical structure.
Preliminary numerical experiments indicate that Augustin’s fixed-point iteration may converge linearly for . This observed range is broader than that we have established. It is natural to explore extending the range of that admits linear convergence
Acknowledgements
We thank Marco Tomamichel and Rubboli Roberto for discussions. C.-E. Tsai, G.-R. Wang, and Y.-H. Li are supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2636-E-002-003, by the 2030 Cross-Generation Young Scholars Program (Excellent Young Scholars) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2628-E-002-019-MY3, by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under grant numbers NTU-CC-112L893406 and NTU-CC-113L891606, and by the Academic Research-Career Development Project (Laurel Research Project) of National Taiwan University under grant numbers NTU-CDP-112L7786 and NTU-CDP-113L7763.
H.-C. Cheng is supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council, Taiwan (R.O.C.) under Grants No. NSTC 112-2636-E-002-009, No. NSTC 113-2119-M-007-006, No. NSTC 113-2119-M-001-006, No. NSTC 113-2124-M-002-003, and No. NSTC 113-2628-E-002-029 by the Yushan Young Scholar Program of the Ministry of Education, Taiwan (R.O.C.) under Grants No. NTU-112V1904-4 and by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under Grant No. NTU-CC-112L893405 and NTU-CC-113L891605. H.-C. Cheng acknowledges the support from the “Center for Advanced Computing and Imaging in Biomedicine (NTU-113L900702)” through The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.
References
- Arimoto [1973] S. Arimoto. On the converse to the coding theorem for discrete memoryless channels (corresp.). IEEE Trans. Inf. Theory, 19(3):357–359, 1973.
- Augustin [1978] U. Augustin. Noisy channels. Habilitation Thesis, Universität Erlangen-Nürnberg, 1978.
- Birkhoff [1957] G. Birkhoff. Extensions of Jentzsch’s theorem. Trans. Amer. Math. Soc., 85(1):219–227, 1957.
- Cheng and Nakiboğlu [2021] H.-C. Cheng and B. Nakiboğlu. On the existence of the Augustin means. In 2021 IEEE Information Theory Workshop (ITW), 2021.
- Cover [1984] T. M. Cover. An algorithm for maximizing expected log investment return. IEEE Trans. Inf. Theory, 30(2):369–373, 1984.
- Csiszár [1995] I. Csiszár. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory, 41(1):26–34, 1995.
- Csiszár and Körner [2011] I. Csiszár and J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2 edition, 2011.
- Iusem [1992] A. N. Iusem. A short convergence proof of the em algorithm for a specific Poisson model. Braz. J. Probab. Stat., pages 57–67, 1992.
- Kamatsuka et al. [2024] A. Kamatsuka, K. Kazama, and T. Yoshida. Algorithms for computing the Augustin–Csiszár mutual information and Lapidoth–Pfister mutual information. arXiv preprint arXiv:2404.10950, 2024.
- Karakos et al. [2008] D. Karakos, S. Khudanpur, and C. E. Priebe. Computation of Csiszár’s mutual information of order \textalpha. In IEEE Int. Symp. Information Theory, pages 2106–2110, 2008.
- Krause [2015] U. Krause. Positive Dynamical Systems in Discrete Time: Theory, Models, and Applications. De Gruyter, 2015.
- Lapidoth and Pfister [2018] A. Lapidoth and C. Pfister. Testing against independence and a Rényi information measure. In IEEE Information Theory Workshop (ITW), pages 1–5, 2018.
- Lapidoth and Pfister [2019] A. Lapidoth and C. Pfister. Two measures of dependence. Entropy, 21(8), 2019.
- Lemmens and Nussbaum [2012] B. Lemmens and R. Nussbaum. Nonlinear Perron–Frobenius Theory. Cambridge University Press, 2012.
- Li et al. [2018] Y.-H. Li, C. A. Riofrío, and V. Cevher. A general convergence result for mirror descent with Armijo line search. arXiv preprint arXiv:1805.12232, 2018.
- Lin et al. [2021] C.-M. Lin, H.-C. Cheng, and Y.-H. Li. Maximum-likelihood quantum state tomography by Cover’s method with non-asymptotic analysis. arXiv preprint arXiv:2110.00747, 2021.
- Lucy [1974] L. B. Lucy. An iterative technique for the rectification of observed distributions. Astron. J., 79:745, 1974.
- Nakiboğlu [2019] B. Nakiboğlu. The Augustin capacity and center. Probl. Inf. Transm., 55(4):299–342, 2019.
- Richardson [1972] W. H. Richardson. Bayesian-based iterative method of image restoration. J. Opt. Soc. Am., 62(1):55–59, Jan 1972.
- Shepp and Vardi [1982] L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging, 1(2):113–122, 1982.
- Tomamichel and Hayashi [2018] M. Tomamichel and M. Hayashi. Operational interpretation of Rényi information measures via composite hypothesis testing against product and markov distributions. IEEE Trans. Inf. Theory, 64(2):1064–1082, 2018.
- Vardi and Lee [1993] Y. Vardi and D. Lee. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. J. R. Stat. Soc. Series B Stat. Methodol., 55(3):569–598, 1993.
- Wang et al. [2024] G.-R. Wang, C.-E. Tsai, H.-C. Cheng, and Y.-H. Li. Computing Augustin information via hybrid geodesically convex optimization. In IEEE Int. Symp. Information Theory, 2024.
- You et al. [2022] J.-K. You, H.-C. Cheng, and Y.-H. Li. Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In IEEE Int. Symp. Information Theory, pages 252–257, 2022.
Appendix A Omitted Proofs
A.1 Proof of Lemma 10
A.2 Proof of Lemma 11
We will use the following lemma, whose proof is postponed to the next subsection.
Lemma 16.
Let be two random probability vectors. We have
A.3 Proof of Lemma 16
Let . We have
The lemma follows from the definition of .
A.4 Proof of Corollary 14
A.5 Lemma 17
We prove the following lemma.
Lemma 17.
For , let be a minimizer of the optimization problem (3) and assume . Then, we have and .
By Lemma 12, we have
Since and , the vector is entry-wise strictly positive. This implies that is entry-wise strictly positive, and hence . To show , consider the equation and apply the same argument. This completes the proof.