\intervalconfig

soft open fences

Linear Convergence in Hilbert’s Projective Metric for Computing Augustin Information
and a Rényi Information Measure

Chung-En Tsai^∗ Department of Computer Science and Information Engineering,
National Taiwan University Guan-Ren Wang^∗ Graduate Institute of Networking and Multimedia,
National Taiwan University
Hao-Chung Cheng Department of Electrical Engineering and Graduate Institute of
Communication Engineering, National Taiwan University Department of Mathematics, National Taiwan University Center for Quantum Science and Engineering,
National Taiwan University Physics Division, National Center for Theoretical Sciences Hon Hai (Foxconn) Quantum Computing Centre Yen-Huan Li Department of Computer Science and Information Engineering,
National Taiwan University Graduate Institute of Networking and Multimedia,
National Taiwan University Department of Mathematics, National Taiwan University Center for Quantum Science and Engineering,
National Taiwan University

Abstract

Consider the problems of computing the Augustin information and a Rényi information measure of statistical independence, previously explored by Lapidoth and Pfister (IEEE Information Theory Workshop, 2018) and Tomamichel and Hayashi (IEEE Trans. Inf. Theory, 64(2):1064–-1082, 2018). Both quantities are defined as solutions to optimization problems and lack closed-form expressions. This paper analyzes two iterative algorithms: Augustin’s fixed-point iteration for computing the Augustin information, and the algorithm by Kamatsuka et al. (arXiv:2404.10950) for the Rényi information measure. Previously, it was only known that these algorithms converge asymptotically. We establish the linear convergence of Augustin’s algorithm for the Augustin information of order $\alpha\in(1/2,1)\cup(1,3/2)$ and Kamatsuka et al.’s algorithm for the Rényi information measure of order $\alpha\in[1/2,1)\cup(1,\infty)$ , using Hilbert’s projective metric.

{NoHyper}^†^†^∗Both authors contribute equally to this work.

1 Introduction

Denote by $\Delta([d])$ the set of probability distributions over the finite set $[d]\coloneqq\set{1,\ldots,d}$ . For any $\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}$ , the order- $\alpha$ Augustin information is defined by the following optimization problem [Augustin, 1978]:

\min_{x\in\Delta([d])}f_{\mathrm{Aug}}(x),\quad f_{\mathrm{Aug}}(x):=\mathbb{E% }_{p\sim P}\left[D_{\alpha}(p\parallel x)\right],

(1)

where $P$ is a given probability distribution over $\Delta([d])$ , and

D_{\alpha}(p\parallel q)\coloneqq\frac{1}{\alpha-1}\log\sum_{s\in S}p(s)^{% \alpha}q(s)^{1-\alpha},\quad\forall p,q\in\Delta(S)

is the order- $\alpha$ Rényi divergence. The Augustin information characterizes, e.g., the cutoff rate, the strong converse exponent, and the error exponent in the channel coding problem [Arimoto, 1973, Csiszár, 1995, Csiszár and Körner, 2011, Nakiboğlu, 2019, Wang et al., 2024]. When $\alpha=0$ , the optimization problem (1) specializes to the definition of the log-optimal portfolio [Cover, 1984], and is equivalent to the definition of the maximum-likelihood estimate in Poisson inverse problems [Vardi and Lee, 1993].

The optimization problem (1) does not admit a closed-form expression. While the optimization problem is convex, the objective function violates the standard smoothness assumption in the optimization literature. Therefore, even the convergence guarantees of projected gradient descent, arguably the simplest convex optimization algorithm, do not directly apply [You et al., 2022].

Augustin [1978] proposed the following fixed-point iteration to solve the optimization problem (1):

x_{t+1}=Z_{t}^{-1}\cdot x_{t}\odot(-\nabla f_{\text{Aug}}(x_{t})),\quad\forall t% \in\mathbb{N},

(2)

where $Z_{t}$ is the normalizing constant, ensuring that $x_{t+1}$ remains a probability distribution, and $\odot$ denotes the entry-wise product. The algorithm was later rediscovered by Karakos et al. [2008]¹¹1Karakos et al. [2008] proposed an alternating minimization method whose iteration consists of two steps. Combining the two steps yields Augustin’s fixed-point iteration.. When $\alpha=0$ , this fixed-point iteration coincides with Cover’s method for computing the log-optimal portfolio [Cover, 1984], and is equivalent to the expectation maximization algorithm for solving Poisson inverse problems [Richardson, 1972, Lucy, 1974, Shepp and Vardi, 1982, Vardi and Lee, 1993].

Recently, Kamatsuka et al. [2024] proposed an algorithm similar to Augustin’s fixed-point iteration to compute a Rényi information measure of statistical independence, which was explored by Lapidoth and Pfister [2019] and Tomamichel and Hayashi [2018]. For any $\alpha\in[0,1)\cup(1,\infty)$ , this order- $\alpha$ Rényi information measure is defined by the following optimization problem:

\min_{x\in\Delta([m])}\min_{y\in\Delta([n])}f_{\mathrm{Ren}}(x,y),\quad f_{% \mathrm{Ren}}(x,y)\coloneqq D_{\alpha}(p\parallel x\otimes y),

(3)

where $p$ is a given probability distribution over $[m]\times[n]$ and $\otimes$ denotes the tensor product. The Rényi information measure emerges in the error exponent of a hypothesis testing problem, where we test against the independence of two random variables given independent and identically distributed (i.i.d.) samples from their joint distribution [Lapidoth and Pfister, 2018, 2019, Tomamichel and Hayashi, 2018].

Kamatsuka et al.’s algorithm to compute the Rényi information measure iterates as:

	$\displaystyle x_{t+1}$	$\displaystyle=Z_{1,t}^{-1}\cdot x_{t}\odot(-\nabla_{x}f_{\mathrm{Ren}}(x_{t},y% _{t}))^{1/\alpha},$		(4)
	$\displaystyle y_{t+1}$	$\displaystyle=Z_{2,t}^{-1}\cdot y_{t}\odot(-\nabla_{y}f_{\mathrm{Ren}}(x_{t+1}% ,y_{t}))^{1/\alpha},$		(4)

where $Z_{1,t}$ and $Z_{2,t}$ are normalizing constants, ensuring that $x_{t+1}$ and $y_{t+1}$ remain probability distributions. The notation $v^{r}$ denotes the entry-wise power for any vector $v$ and number $r$ . This iterative algorithm is reminiscent of Augustin’s fixed-point iteration but differs in the powers applied to the gradients.

The convergence behaviors of Augustin’s fixed point iteration and Kamatsuka et al.’s algorithm remain largely unclear. For Augustin’s fixed-point iteration, Karakos et al. [2008] and Nakiboğlu [2019] have shown that it asymptotically converges for $\alpha\in\interval[open]{0}{1}$ ; Iusem [1992] and Lin et al. [2021] have proved a convergence rate of $O(1/t)$ for the case where $\alpha$ approaches zero. For Kumatsuka et al.’s algorithm, Kamatsuka et al. [2024] have shown that it asymptotically converges for $\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}$ .

We aim to carry out non-asymptotic analyses for the two algorithms. One common approach to analyzing an iterative method is to show that it is contractive under a suitable metric. Since the two algorithms (2) and (4) map positive vectors to positive vectors, we view them as positive dynamical systems and consider the so-called Hilbert’s projective metric [Lemmens and Nussbaum, 2012, Krause, 2015].

In this work, we prove that with respect to Hilbert’s projective metric, Augustin’s fixed-point iteration is contractive for $\alpha\in(1/2,1)\cup(1,3/2)$ , and Kamatsuka et al.’s algorithm is also contractive for $\alpha\in(1/2,1)\cup(1,\infty)$ . Based on these contractivity results, we establish the following non-asymptotic convergence guarantees for the two algorithms.

•

For computing the Augustin information of order $\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{3/2}$ , Augustin’s fixed-point iteration converges at a rate of $O((2\left\lvert 1-\alpha\right\rvert)^{t})$ with respect to Hilbert’s projective metric. This improves on the previous asymptotic convergence guarantee [Karakos et al., 2008, Nakiboğlu, 2019] when $\alpha\in(1/2,1)$ and extends the range of convergence to include $\alpha\in(1,3/2)$ .
•

For computing the Rényi information measure of order $\alpha\in(1/2,1)\cup(1,\infty)$ , the iterative algorithm of Kamatsuka et al. converges at a rate of $O(\left\lvert 1-1/\alpha\right\rvert^{2t})$ with respect to Hilbert’s projective metric. When $\alpha=1/2$ , this method also converges linearly if $p$ has full support. This improves on the previous asymptotic convergence guarantee [Kamatsuka et al., 2024].

Notations

We write $\mathbb{R}_{+}$ and $\mathbb{R}_{++}$ for the sets of non-negative and strictly positive numbers, respectively. For any positive integer $n$ , we write $[n]$ for the set $\set{1,\ldots,n}$ . Let $v\in\mathbb{R}^{d}$ and $A,B\in\mathbb{R}^{m\times n}$ . We write $v(i)$ for the $i$ -th entry of the vector $v$ , and $A(i,j)$ the $(i,j)$ -th entry of the matrix $A$ . We write $A\odot B$ for the entry-wise product between $A$ and $B$ . We write $A^{r}$ for the matrix $(A(i,j)^{r})_{1\leq i\leq m,1\leq j\leq n}$ . For a set $S\subseteq\mathbb{R}^{d}$ , we denote by $\operatorname{ri}S$ its relative interior. We will adopt the convention that $0^{0}=0$ , $0/0=1$ , $\infty\cdot\infty=\infty$ , $a\cdot\infty=\infty$ for any $a>0$ , and $\log\infty=\infty$ . We call $\Delta([d])$ the probability simplex and view elements in $\Delta([d])$ as $d$ -dimensional vectors.

2 Related Work

We have discussed Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in Section 1. This section reviews other optimization algorithms for computing the Augustin information and the Rényi information measure.

For computing the Augusitin information of order $\alpha$ , entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022], as well as a variant of Augustin’s fixed-point iteration explored by Cheng and Nakiboğlu [2021, Lemma 6], all achieve asymptotic convergence for all $\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}$ . Riemannian gradient descent with the Poincaré metric [Wang et al., 2024] converges at a rate of $O(1/t)$ for all $\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}$ . An alternating minimization method due to Kamatsuka et al. [2024]²²2Kamatsuka et al. [2024] only claimed an asymptotic convergence guarantee in their paper. We find that their Lemma 2 indeed implies a convergence rate of $O(1/t)$ . also achieves a converges rate of $O(1/t)$ , but for a narrower range of $\alpha\in\interval[open]{1}{\infty}$ . None of the existing works have yet established a linear convergence rate.

For computing the Rényi information measure of order $\alpha$ , entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022] both asymptotically converge for $\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}$ . However, when $\alpha\in\interval[open]{0}{1/2}$ , the optimization problem (3) becomes non-convex [Lapidoth and Pfister, 2019], and currently, there are no known algorithms that provably solve this problem. Similarly to the computation of the Augustin information, none of the existing works have established a linear convergence rate.

3 Preliminaries

Our analyses are based on properties of Hilbert’s projective metric and Birkhoff’s contraction theorem, which we introduce in this section.

Let $K$ be a closed cone in a finite-dimensional real vector space, such as the positive orthant and the set of Hermitian positive semidefinite matrices. For any $x,y\in K$ , we write $x\leq y$ if $y-x\in K$ . For any $x,y\in K\setminus\{0\}$ , define

M(x/y)\coloneqq\inf\{\beta\geq 0\mid x\leq\beta y\}>0.

(5)

If the set is empty, then $M(x/y)\coloneqq\infty$ .

Definition 1.

Hilbert’s projective metric is defined as

d_{\mathrm{H}}(x,y)\coloneqq\log(M(x/y)M(y/x))\in[0,\infty],\quad\forall x,y% \in K\setminus\set{0}.

In addition, $d_{\mathrm{H}}(0,0)$ is defined to be $0$ .

The following lemma shows that $d_{\mathrm{H}}$ is indeed a metric on the set of rays.

Lemma 2.

The following properties hold.

(i)

For any $x,y\in K$ and any $\alpha,\beta>0$ , we have $d_{\mathrm{H}}(\alpha x,\beta y)=d_{\mathrm{H}}(x,y)$ .
(ii)

We have $d_{\mathrm{H}}(x,y)=0$ if and only if $x=ry$ for some $r>0$ .

In the rest of the paper, we will only consider the cone $K=\mathbb{R}_{+}^{d}$ .

Lemma 3.

Consider Hilbert’s projective metric $d_{\mathrm{H}}$ on the cone $K=\mathbb{R}_{+}^{d}$ .

(i)

For any $x,y\in\mathbb{R}_{+}^{d}\setminus\{0\}$ , we have

M(x/y)=\max_{i\in[d]}\frac{x(i)}{y(i)},\quad d_{\mathrm{H}}(x,y)=\log\max_{i,j% \in[d]}\frac{x(i)y(j)}{y(i)x(j)}.

(6)

(ii)

$(\operatorname{ri}\Delta([d]),d_{\mathrm{H}})$ is a metric space [Lemmens and Nussbaum, 2012, Proposition 2.1.1].

Given the second item above, we will measure the errors of both Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in terms of Hilbert’s projective metric between their iterates and the minimizer. The following lemma lists several properties of Hilbert’s projective metric, which are direct consequences of Corollary 2.1.4 and Corollary 2.1.5 of Lemmens and Nussbaum [2012].

Lemma 4.

The following properties hold.

(i)

$d_{\mathrm{H}}(x^{r},y^{r})\leq\left\lvert r\right\rvert d_{\mathrm{H}}(x,y)$ for any $x,y\in\mathbb{R}_{+}^{d}$ and any $r\in\mathbb{R}\setminus\{0\}$ .
(ii)

$d_{\mathrm{H}}(v\odot x,v\odot y)\leq d_{\mathrm{H}}(x,y)$ for any $x,y,v\in\mathbb{R}_{+}^{d}$ .
(iii)

$d_{\mathrm{H}}(Ax,Ay)\leq d_{\mathrm{H}}(x,y)$ for any $x,y\in\mathbb{R}_{+}^{d}$ and $A\in\mathbb{R}_{+}^{d^{\prime}\times d}$ .

When the matrix in Lemma 4 (iii) is entry-wise strictly positive, Birkhoff [1957] showed that linear transformation defined by it is a contraction.

Theorem 5 (Birkhoff [1957]).

Let $A\in\mathbb{R}_{++}^{m\times n}$ . It holds that

d_{\mathrm{H}}(Ax,Ay)\leq\lambda(A)\cdot d_{\mathrm{H}}(x,y),\quad\forall x,y% \in\mathbb{R}_{+}^{n},

where

\lambda(A)\coloneqq\tanh\left(\frac{\delta(A)}{4}\right)<1,

and

\delta(A)\coloneqq\log\max_{(i,j),(i^{\prime},j^{\prime})\in[m]\times[n]}\frac% {A(i,j)A(i^{\prime},j^{\prime})}{A(i^{\prime},j)A(i,j^{\prime})}\geq 0.

4 Linear Rate of Augustin’s Fixed-Point Iteration

In this section, we show that Augustin’s fixed-point iteration converges linearly with respect to Hilbert’s projective metric for computing the Augustin information of order $\alpha\in(1/2,1)\cup(1,3/2)$ .

4.1 Augustin’s Fixed-Point Iteration

Define the following operators:

T_{\alpha}(x):=\mathbb{E}_{p\sim P}\left[T_{\alpha,p}(x)\right],\quad T_{% \alpha,p}(x):=\frac{p^{\alpha}\odot x^{1-\alpha}}{\braket{p^{\alpha},x^{1-% \alpha}}},\quad\forall p\in\Delta([d]).

(7)

Augustin’s fixed-point iteration (2) can be equivalent written as follows:

1.

Initialize $x_{1}\in\Delta([d])$ .
2.

For all $t\in\mathbb{N}$ , compute $x_{t+1}=T_{\alpha}(x_{t})$ .

Lemma 6 ([Nakiboğlu, 2019, Lemma 13]).

For $\alpha\in(0,1)\cup(1,\infty)$ , the optimization problem (1) has a unique minimizer $x^{\star}$ , which satisfies the fixed-point equation $x^{\star}=T_{\alpha}(x^{\star})$ .

4.2 Linear Rate Guarantee

The main result of this section is the following theorem, which bounds the Lipschitz constant of the mapping $T_{\alpha}$ with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.

Theorem 7.

For $\alpha\in[0,1)\cup(1,\infty)$ , we have

d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq\gamma\cdot d_{\mathrm{H}}(x,y)% ,\quad\forall x,y\in\Delta([d]),

(8)

where $\gamma:=2|1-\alpha|$ .

Linear convergence of Augustin’s fixed-point iteration for $\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{\infty}$ immediately follows.

Corollary 8.

For any $\alpha\in(1/2,1)\cup(1,3/2)$ , let $x^{\star}$ be the minimizer of the optimization problem (1) and $\{x_{t}\}$ be the iterates of Augustin’s fixed-point iteration (2). We have

d_{\mathrm{H}}(x_{t+1},x^{\star})\leq\gamma^{t}\cdot d_{\mathrm{H}}(x_{1},x^{% \star}),

for all $t\in\mathbb{N}$ , where $\gamma<1$ is defined in Theorem 7.

Remark 9.

Corollary 8 is meaningful only when $d_{\mathrm{H}}(x_{1},x^{\star})<\infty$ . Lemma 13 of Nakiboğlu [2019] shows that $\mathbb{E}_{p}[p]\in\operatorname{ri}\Delta([d])$ implies $x^{\star}\in\operatorname{ri}\Delta_{n}$ . In this case, it suffices to choose $x_{1}\in\operatorname{ri}\Delta([d])$ to ensure that $d_{\mathrm{H}}(x_{1},x^{\star})<\infty$ .

4.3 Proof of Theorem 7

The proof primarily consists of two steps, which are reflected by the following two lemmas. First, we show that the operators $T_{\alpha,p}$ are Lipschitz with respect to Hilbert’s projective metric and bound the Lipschitz constant. Then, given that $T_{\alpha}(\cdot)=\mathbb{E}_{p}\left[T_{\alpha,p}(\cdot)\right]$ , we prove a general lemma that bounds Hilbert’s projective metric between two random probability vectors in terms of Hilbert’s projective metric between their realizations, which is of independent interest. The proofs of both lemmas are deferred to Appendix A.

Lemma 10.

For any $\alpha\in[0,1)\cup(1,\infty)$ and $p\in\Delta([d])$ ,

d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))\leq\left\lvert 1-\alpha\right% \rvert d_{\mathrm{H}}(x,y),\quad\forall x,y\in\Delta([d]).

Lemma 11.

Let $X,Y:\Omega\to\Delta([d])$ be two random probability vectors, where $\Omega$ denotes the sample space. We have

d_{\mathrm{H}}\left(\mathbb{E}[X],\mathbb{E}[Y]\right)\leq 2\sup_{\omega\in% \Omega}d_{\mathrm{H}}(X(\omega),Y(\omega)).

Theorem 7 follows immediately: By Lemma 11, we write

d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq 2\sup_{p\in\Delta([d])}d_{% \mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y)).

Then, by Lemma 10, we obtain

d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq 2\sup_{p\in\Delta([d])}\left% \lvert 1-\alpha\right\rvert d_{\mathrm{H}}(x,y)=2\left\lvert 1-\alpha\right% \rvert d_{\mathrm{H}}(x,y).

This completes the proof.

5 Linear Rate of Kamatsuka et al.’s Algorithm

In this section, we show that Kamatsuka et al.’s algorithm converges linearly with respect to Hilbert’s projective metric for computing the Rényi information measure of order $\alpha\in[1/2,1)\cup(1,\infty)$ .

For convenience, we will view any $p\in\Delta([m]\times[n])$ as a matrix in $\mathbb{R}_{+}^{m\times n}$ whose entries sum to $1$ . We will denote Hilbert’s projective metric on both $\mathbb{R}_{++}^{m}$ and $\mathbb{R}_{++}^{n}$ by $d_{\mathrm{H}}$ . The associated cone should be clear from the context.

5.1 Kamatsuka et al.’s Algorithm

Define the following two operators:

\begin{split}U_{\alpha}(y)&:=\frac{\left(p^{\alpha}y^{1-\alpha}\right)^{1/% \alpha}}{\left\lVert\left(p^{\alpha}y^{1-\alpha}\right)^{1/\alpha}\right\rVert% _{1}},\\ V_{\alpha}(x)&:=\frac{\big{(}(p^{\alpha})^{\top}x^{1-\alpha}\big{)}^{1/\alpha}% }{\left\lVert\big{(}(p^{\alpha})^{\top}x^{1-\alpha}\big{)}^{1/\alpha}\right% \rVert_{1}}.\end{split}

(9)

Kamatsuka et al.’s algorithm (4) can be equivalently written as follows:

1.

Initialize $x_{1}\in\Delta([m])$ , and compute $y_{1}=V_{\alpha}(x_{1})$ .
2.

For all $t\in\mathbb{N}$ , compute $x_{t+1}=U_{\alpha}(y_{t})$ and $y_{t+1}=V_{\alpha}(x_{t+1})$ .

This algorithm is inspired by the following lemma [Lapidoth and Pfister, 2019, Lemma 16].

Lemma 12.

For $\alpha\in[1/2,1)\cup(1,\infty)$ , every minimizer $(x^{\star},y^{\star})$ of the optimization problem (3) satisfies $x^{\star}=U_{\alpha}(y^{\star})$ and $y^{\star}=V_{\alpha}(x^{\star})$ .

5.2 Linear Rate Guarantee

The following theorem presents a key observation, showing that the operators $U_{\alpha}$ and $V_{\alpha}$ have a Lipschitz constant of $\left\lvert 1-\alpha^{-1}\right\rvert$ with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.

Theorem 13.

For $\alpha\in[1/2,1)\cup(1,\infty)$ , we have

	$\displaystyle d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))$	$\displaystyle\leq\gamma^{\prime}\cdot d_{\mathrm{H}}(x,x^{\prime}),\quad$		$\displaystyle\forall x,x^{\prime}\in\Delta([m]),$		(10)
	$\displaystyle d_{\mathrm{H}}(U_{\alpha}(y),U_{\alpha}(y^{\prime}))$	$\displaystyle\leq\gamma^{\prime}\cdot d_{\mathrm{H}}(y,y^{\prime}),\quad$		$\displaystyle\forall y,y^{\prime}\in\Delta([n]),$		(10)

where $\gamma^{\prime}:=\left\lvert 1-(1/\alpha)\right\rvert$ . Moreover, if $p\in\mathbb{R}_{++}^{m\times n}$ , then the Lipschitz constant $\gamma^{\prime}$ can be improved to

\gamma^{\prime\prime}:=\left\lvert 1-\frac{1}{\alpha}\right\rvert\lambda(p^{% \alpha})<\gamma^{\prime},

where $\lambda(\cdot)$ is defined in Theorem 5.

Theorem 13 implies the following corollary, showing that the iterative algorithm converges linearly. The proof of the corollary is deferred to Appendix A.

Corollary 14.

Let $(x^{\star},y^{\star})$ be a minimizer of the optimization problem (3) and $\{(x_{t},y_{t})\}$ be the iterates of the iterative algorithm (4).

(i)

If $\alpha\in(1/2,1)\cup(1,\infty)$ , then

\begin{split}d_{\mathrm{H}}(x_{t+1},x^{\star})&\leq(\gamma^{\prime})^{2t}\cdot d% _{\mathrm{H}}(x_{1},x^{\star}),\\ d_{\mathrm{H}}(y_{t+1},y^{\star})&\leq(\gamma^{\prime})^{2t+1}\cdot d_{\mathrm% {H}}(x_{1},x^{\star}),\end{split}

for all $t\in\mathbb{N}$ , where $\gamma^{\prime}<1$ is defined in Theorem 13.

(ii)

If $\alpha\in[1/2,1)\cup(1,\infty)$ and $p\in\mathbb{R}_{++}^{m\times n}$ , then

\begin{split}d_{\mathrm{H}}(x_{t+1},x^{\star})&\leq(\gamma^{\prime\prime})^{2t% }\cdot d_{\mathrm{H}}(x_{1},x^{\star}),\\ d_{\mathrm{H}}(y_{t+1},y^{\star})&\leq(\gamma^{\prime\prime})^{2t+1}\cdot d_{% \mathrm{H}}(x_{1},x^{\star}),\end{split}

for all $t\in\mathbb{N}$ , where $\gamma^{\prime\prime}<1$ is defined in Theorem 13.

Remark 15.

Corollary 14 is meaningful only when $d_{\mathrm{H}}(x_{1},x^{\star})<\infty$ . For $\alpha\in[1/2,1)\cup(1,\infty)$ , Lemma 17 in Appendix A ensures that $x^{\star}\in\operatorname{ri}\Delta([m])$ whenever $p\in\mathbb{R}_{++}^{m\times n}$ . In this case, it suffices to choose $x_{1}\in\operatorname{ri}\Delta([m])$ to ensure $d_{\mathrm{H}}(x_{1},x^{\star})<\infty$ .

5.3 Proof of Theorem 13

By Lemma 2 and Lemma 4 (i), we have

\begin{split}d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))&=d_{\mathrm{% H}}\left(((p^{\alpha})^{\top}x^{1-\alpha})^{1/\alpha},((p^{\alpha})^{\top}(x^{% \prime})^{1-\alpha})^{1/\alpha}\right)\\ &\leq\alpha^{-1}d_{\mathrm{H}}\left((p^{\alpha})^{\top}x^{1-\alpha},(p^{\alpha% })^{\top}(x^{\prime})^{1-\alpha}\right).\end{split}

By Lemma 4 (iii) and (i), we have

d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))\leq\left\lvert 1-\alpha^{% -1}\right\rvert d_{\mathrm{H}}(x,x^{\prime}).

This proves the first inequality in (10). The second inequality follows from a similar argument.

Assume $p\in\mathbb{R}_{++}^{m\times n}$ . We can apply Birkhoff’s contraction theorem (Theorem 5) instead of Lemma 4 (iii) to obtain

d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))\leq\left\lvert 1-\alpha^{% -1}\right\rvert\lambda\left((p^{\alpha})^{\top}\right)\cdot d_{\mathrm{H}}(x,x% ^{\prime}).

The theorem follows by noticing that $\lambda\left((p^{\alpha})^{\top}\right)=\lambda(p^{\alpha})<1$ .

6 Discussions

We have proved that Augustin’s fixed-point iteration converges at a linear rate for computing the Augustin information of order $\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{3/2}$ , and that Kamatsuka et al.’s algorithm converges at a linear rate for computing the Rényi information measure of order $\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}$ . In contrast, existing results are asymptotic and apply to a narrower range of $\alpha$ . Our proofs are simple, demonstrating the effectiveness of selecting an appropriate mathematical structure.

Preliminary numerical experiments indicate that Augustin’s fixed-point iteration may converge linearly for $\alpha\in\interval[open]{0}{1}\cup\interval[open]{1}{2}$ . This observed range is broader than that we have established. It is natural to explore extending the range of $\alpha$ that admits linear convergence

Acknowledgements

We thank Marco Tomamichel and Rubboli Roberto for discussions. C.-E. Tsai, G.-R. Wang, and Y.-H. Li are supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2636-E-002-003, by the 2030 Cross-Generation Young Scholars Program (Excellent Young Scholars) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2628-E-002-019-MY3, by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under grant numbers NTU-CC-112L893406 and NTU-CC-113L891606, and by the Academic Research-Career Development Project (Laurel Research Project) of National Taiwan University under grant numbers NTU-CDP-112L7786 and NTU-CDP-113L7763.

H.-C. Cheng is supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council, Taiwan (R.O.C.) under Grants No. NSTC 112-2636-E-002-009, No. NSTC 113-2119-M-007-006, No. NSTC 113-2119-M-001-006, No. NSTC 113-2124-M-002-003, and No. NSTC 113-2628-E-002-029 by the Yushan Young Scholar Program of the Ministry of Education, Taiwan (R.O.C.) under Grants No. NTU-112V1904-4 and by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under Grant No. NTU-CC-112L893405 and NTU-CC-113L891605. H.-C. Cheng acknowledges the support from the “Center for Advanced Computing and Imaging in Biomedicine (NTU-113L900702)” through The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

References

Arimoto [1973] S. Arimoto. On the converse to the coding theorem for discrete memoryless channels (corresp.). IEEE Trans. Inf. Theory, 19(3):357–359, 1973.
Augustin [1978] U. Augustin. Noisy channels. Habilitation Thesis, Universität Erlangen-Nürnberg, 1978.
Birkhoff [1957] G. Birkhoff. Extensions of Jentzsch’s theorem. Trans. Amer. Math. Soc., 85(1):219–227, 1957.
Cheng and Nakiboğlu [2021] H.-C. Cheng and B. Nakiboğlu. On the existence of the Augustin means. In 2021 IEEE Information Theory Workshop (ITW), 2021.
Cover [1984] T. M. Cover. An algorithm for maximizing expected log investment return. IEEE Trans. Inf. Theory, 30(2):369–373, 1984.
Csiszár [1995] I. Csiszár. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory, 41(1):26–34, 1995.
Csiszár and Körner [2011] I. Csiszár and J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2 edition, 2011.
Iusem [1992] A. N. Iusem. A short convergence proof of the em algorithm for a specific Poisson model. Braz. J. Probab. Stat., pages 57–67, 1992.
Kamatsuka et al. [2024] A. Kamatsuka, K. Kazama, and T. Yoshida. Algorithms for computing the Augustin–Csiszár mutual information and Lapidoth–Pfister mutual information. arXiv preprint arXiv:2404.10950, 2024.
Karakos et al. [2008] D. Karakos, S. Khudanpur, and C. E. Priebe. Computation of Csiszár’s mutual information of order \textalpha. In IEEE Int. Symp. Information Theory, pages 2106–2110, 2008.
Krause [2015] U. Krause. Positive Dynamical Systems in Discrete Time: Theory, Models, and Applications. De Gruyter, 2015.
Lapidoth and Pfister [2018] A. Lapidoth and C. Pfister. Testing against independence and a Rényi information measure. In IEEE Information Theory Workshop (ITW), pages 1–5, 2018.
Lapidoth and Pfister [2019] A. Lapidoth and C. Pfister. Two measures of dependence. Entropy, 21(8), 2019.
Lemmens and Nussbaum [2012] B. Lemmens and R. Nussbaum. Nonlinear Perron–Frobenius Theory. Cambridge University Press, 2012.
Li et al. [2018] Y.-H. Li, C. A. Riofrío, and V. Cevher. A general convergence result for mirror descent with Armijo line search. arXiv preprint arXiv:1805.12232, 2018.
Lin et al. [2021] C.-M. Lin, H.-C. Cheng, and Y.-H. Li. Maximum-likelihood quantum state tomography by Cover’s method with non-asymptotic analysis. arXiv preprint arXiv:2110.00747, 2021.
Lucy [1974] L. B. Lucy. An iterative technique for the rectification of observed distributions. Astron. J., 79:745, 1974.
Nakiboğlu [2019] B. Nakiboğlu. The Augustin capacity and center. Probl. Inf. Transm., 55(4):299–342, 2019.
Richardson [1972] W. H. Richardson. Bayesian-based iterative method of image restoration. J. Opt. Soc. Am., 62(1):55–59, Jan 1972.
Shepp and Vardi [1982] L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging, 1(2):113–122, 1982.
Tomamichel and Hayashi [2018] M. Tomamichel and M. Hayashi. Operational interpretation of Rényi information measures via composite hypothesis testing against product and markov distributions. IEEE Trans. Inf. Theory, 64(2):1064–1082, 2018.
Vardi and Lee [1993] Y. Vardi and D. Lee. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. J. R. Stat. Soc. Series B Stat. Methodol., 55(3):569–598, 1993.
Wang et al. [2024] G.-R. Wang, C.-E. Tsai, H.-C. Cheng, and Y.-H. Li. Computing Augustin information via hybrid geodesically convex optimization. In IEEE Int. Symp. Information Theory, 2024.
You et al. [2022] J.-K. You, H.-C. Cheng, and Y.-H. Li. Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In IEEE Int. Symp. Information Theory, pages 252–257, 2022.

Appendix A Omitted Proofs

A.1 Proof of Lemma 10

By Lemma 2, we have

d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))=d_{\mathrm{H}}(p^{\alpha}\odot x% ^{1-\alpha},p^{\alpha}\odot y^{1-\alpha}).

Applying Lemma 4 (i) and (ii) gives

d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))\leq d_{\mathrm{H}}(x^{1-\alpha% },y^{1-\alpha})\leq\left\lvert 1-\alpha\right\rvert d_{\mathrm{H}}(x,y).

This proves the lemma.

A.2 Proof of Lemma 11

We will use the following lemma, whose proof is postponed to the next subsection.

Lemma 16.

Let $X,Y:\Omega\to\Delta([d])$ be two random probability vectors. We have

M(\mathbb{E}[X]/\mathbb{E}[Y])\leq\sup_{\omega\in\Omega}M(X(\omega)/Y(\omega)).

By Lemma 16, we have

\begin{split}M(\mathbb{E}[X]/\mathbb{E}[Y])&\leq\sup_{\omega\in\Omega}M(X(% \omega)/Y(\omega)),\\ M(\mathbb{E}[Y]/\mathbb{E}[X])&\leq\sup_{\omega\in\Omega}M(Y(\omega)/X(\omega)% ).\end{split}

Then,

d_{\mathrm{H}}(\mathbb{E}[X],\mathbb{E}[Y])\leq\sup_{\omega\in\Omega}\log M(X(% \omega)/Y(\omega))+\sup_{\omega\in\Omega}\log M(Y(\omega)/X(\omega)).

Since $X(\omega),Y(\omega)\in\Delta([d])$ , we have

M(X(\omega)/Y(\omega))\geq 1\quad\text{and}\quad M(Y(\omega)/X(\omega))\geq 1.

This implies that

\begin{split}M(X(\omega)/Y(\omega))&\leq M(X(\omega)/Y(\omega))\cdot M(Y(% \omega)/X(\omega)),\\ M(Y(\omega)/X(\omega))&\leq M(X(\omega)/Y(\omega))\cdot M(Y(\omega)/X(\omega))% ,\end{split}

and hence

\begin{split}d_{\mathrm{H}}(\mathbb{E}[X],\mathbb{E}[Y])&\leq\sup_{\omega\in% \Omega}d_{\mathrm{H}}(X(\omega),Y(\omega))+\sup_{\omega\in\Omega}d_{\mathrm{H}% }(X(\omega),Y(\omega))\\ &=2\sup_{\omega\in\Omega}d_{\mathrm{H}}(X(\omega),Y(\omega)),\end{split}

which completes the proof.

A.3 Proof of Lemma 16

Let $\overline{M}:=\sup_{\omega\in\Omega}M(X(\omega)/Y(\omega))$ . We have

\overline{M}\mathbb{E}[Y]=\mathbb{E}[\overline{M}Y]\geq\mathbb{E}[M(X/Y)Y]\geq% \mathbb{E}[X].

The lemma follows from the definition of $M(\mathbb{E}[X]/\mathbb{E}[Y])$ .

A.4 Proof of Corollary 14

For both (i) and (ii), by Lemma 12 and Theorem 13, we have

d_{\mathrm{H}}(x_{t+1},x^{\star})=d_{\mathrm{H}}(U_{\alpha}(y_{t}),U_{\alpha}(% y^{\star}))\leq\tilde{\gamma}\cdot d_{\mathrm{H}}(y_{t},y^{\star}),

and

d_{\mathrm{H}}(y_{t+1},y^{\star})=d_{\mathrm{H}}(V_{\alpha}(x_{t+1}),V_{\alpha% }(x^{\star}))\leq\tilde{\gamma}\cdot d_{\mathrm{H}}(x_{t+1},x^{\star}),

where $\tilde{\gamma}=\gamma^{\prime}$ for (i) and $\tilde{\gamma}=\gamma^{\prime\prime}$ for (ii). The corollary follows by applying the above two inequalities alternatively.

A.5 Lemma 17

We prove the following lemma.

Lemma 17.

For $\alpha\in[0,1)\cup(1,\infty)$ , let $(x^{\star},y^{\star})$ be a minimizer of the optimization problem (3) and assume $p\in\mathbb{R}_{++}^{m\times n}$ . Then, we have $x\in\operatorname{ri}\Delta([m])$ and $y\in\operatorname{ri}\Delta([n])$ .

By Lemma 12, we have

x^{\star}=U_{\alpha}(y^{\star})=\frac{\left(p^{\alpha}(y^{\star})^{1-\alpha}% \right)^{1/\alpha}}{\left\lVert\left(p^{\alpha}(y^{\star})^{1-\alpha}\right)^{% 1/\alpha}\right\rVert_{1}}.

Since $p\in\mathbb{R}_{++}^{m\times n}$ and $y^{\star}\in\Delta([n])$ , the vector $p^{\alpha}(y^{\star})^{1-\alpha}$ is entry-wise strictly positive. This implies that $x^{\star}$ is entry-wise strictly positive, and hence $x^{\star}\in\operatorname{ri}\Delta([m])$ . To show $y^{\star}\in\operatorname{ri}\Delta([n])$ , consider the equation $y^{\star}=V_{\alpha}(x^{\star})$ and apply the same argument. This completes the proof.

Linear Convergence in Hilbert’s Projective Metric for Computing Augustin Information and a Rényi Information Measure