Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\intervalconfig

soft open fences

Linear Convergence in Hilbert’s Projective Metric for Computing Augustin Information
and a Rényi Information Measure

Chung-En Tsai Department of Computer Science and Information Engineering,
National Taiwan University
Guan-Ren Wang Graduate Institute of Networking and Multimedia,
National Taiwan University

Hao-Chung Cheng
Department of Electrical Engineering and Graduate Institute of
Communication Engineering, National Taiwan University
Department of Mathematics, National Taiwan University Center for Quantum Science and Engineering,
National Taiwan University
Physics Division, National Center for Theoretical Sciences Hon Hai (Foxconn) Quantum Computing Centre
Yen-Huan Li Department of Computer Science and Information Engineering,
National Taiwan University
Graduate Institute of Networking and Multimedia,
National Taiwan University
Department of Mathematics, National Taiwan University Center for Quantum Science and Engineering,
National Taiwan University
Abstract

Consider the problems of computing the Augustin information and a Rényi information measure of statistical independence, previously explored by Lapidoth and Pfister (IEEE Information Theory Workshop, 2018) and Tomamichel and Hayashi (IEEE Trans. Inf. Theory, 64(2):1064–-1082, 2018). Both quantities are defined as solutions to optimization problems and lack closed-form expressions. This paper analyzes two iterative algorithms: Augustin’s fixed-point iteration for computing the Augustin information, and the algorithm by Kamatsuka et al. (arXiv:2404.10950) for the Rényi information measure. Previously, it was only known that these algorithms converge asymptotically. We establish the linear convergence of Augustin’s algorithm for the Augustin information of order α(1/2,1)(1,3/2)𝛼121132\alpha\in(1/2,1)\cup(1,3/2)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , 3 / 2 ) and Kamatsuka et al.’s algorithm for the Rényi information measure of order α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ), using Hilbert’s projective metric.

{NoHyper}Both authors contribute equally to this work.

1 Introduction

Denote by Δ([d])Δdelimited-[]𝑑\Delta([d])roman_Δ ( [ italic_d ] ) the set of probability distributions over the finite set [d]{1,,d}delimited-[]𝑑1𝑑[d]\coloneqq\set{1,\ldots,d}[ italic_d ] ≔ { start_ARG 1 , … , italic_d end_ARG }. For any α\interval[openright]01\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡01\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 01 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞, the order-α𝛼\alphaitalic_α Augustin information is defined by the following optimization problem [Augustin, 1978]:

minxΔ([d])fAug(x),fAug(x):=𝔼pP[Dα(px)],assignsubscript𝑥Δdelimited-[]𝑑subscript𝑓Aug𝑥subscript𝑓Aug𝑥subscript𝔼similar-to𝑝𝑃delimited-[]subscript𝐷𝛼conditional𝑝𝑥\min_{x\in\Delta([d])}f_{\mathrm{Aug}}(x),\quad f_{\mathrm{Aug}}(x):=\mathbb{E% }_{p\sim P}\left[D_{\alpha}(p\parallel x)\right],roman_min start_POSTSUBSCRIPT italic_x ∈ roman_Δ ( [ italic_d ] ) end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT roman_Aug end_POSTSUBSCRIPT ( italic_x ) , italic_f start_POSTSUBSCRIPT roman_Aug end_POSTSUBSCRIPT ( italic_x ) := blackboard_E start_POSTSUBSCRIPT italic_p ∼ italic_P end_POSTSUBSCRIPT [ italic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_p ∥ italic_x ) ] , (1)

where P𝑃Pitalic_P is a given probability distribution over Δ([d])Δdelimited-[]𝑑\Delta([d])roman_Δ ( [ italic_d ] ), and

Dα(pq)1α1logsSp(s)αq(s)1α,p,qΔ(S)formulae-sequencesubscript𝐷𝛼conditional𝑝𝑞1𝛼1subscript𝑠𝑆𝑝superscript𝑠𝛼𝑞superscript𝑠1𝛼for-all𝑝𝑞Δ𝑆D_{\alpha}(p\parallel q)\coloneqq\frac{1}{\alpha-1}\log\sum_{s\in S}p(s)^{% \alpha}q(s)^{1-\alpha},\quad\forall p,q\in\Delta(S)italic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_p ∥ italic_q ) ≔ divide start_ARG 1 end_ARG start_ARG italic_α - 1 end_ARG roman_log ∑ start_POSTSUBSCRIPT italic_s ∈ italic_S end_POSTSUBSCRIPT italic_p ( italic_s ) start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_q ( italic_s ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT , ∀ italic_p , italic_q ∈ roman_Δ ( italic_S )

is the order-α𝛼\alphaitalic_α Rényi divergence. The Augustin information characterizes, e.g., the cutoff rate, the strong converse exponent, and the error exponent in the channel coding problem [Arimoto, 1973, Csiszár, 1995, Csiszár and Körner, 2011, Nakiboğlu, 2019, Wang et al., 2024]. When α=0𝛼0\alpha=0italic_α = 0, the optimization problem (1) specializes to the definition of the log-optimal portfolio [Cover, 1984], and is equivalent to the definition of the maximum-likelihood estimate in Poisson inverse problems [Vardi and Lee, 1993].

The optimization problem (1) does not admit a closed-form expression. While the optimization problem is convex, the objective function violates the standard smoothness assumption in the optimization literature. Therefore, even the convergence guarantees of projected gradient descent, arguably the simplest convex optimization algorithm, do not directly apply [You et al., 2022].

Augustin [1978] proposed the following fixed-point iteration to solve the optimization problem (1):

xt+1=Zt1xt(fAug(xt)),t,formulae-sequencesubscript𝑥𝑡1direct-productsuperscriptsubscript𝑍𝑡1subscript𝑥𝑡subscript𝑓Augsubscript𝑥𝑡for-all𝑡x_{t+1}=Z_{t}^{-1}\cdot x_{t}\odot(-\nabla f_{\text{Aug}}(x_{t})),\quad\forall t% \in\mathbb{N},italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ ( - ∇ italic_f start_POSTSUBSCRIPT Aug end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) , ∀ italic_t ∈ blackboard_N , (2)

where Ztsubscript𝑍𝑡Z_{t}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the normalizing constant, ensuring that xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT remains a probability distribution, and direct-product\odot denotes the entry-wise product. The algorithm was later rediscovered by Karakos et al. [2008]111Karakos et al. [2008] proposed an alternating minimization method whose iteration consists of two steps. Combining the two steps yields Augustin’s fixed-point iteration.. When α=0𝛼0\alpha=0italic_α = 0, this fixed-point iteration coincides with Cover’s method for computing the log-optimal portfolio [Cover, 1984], and is equivalent to the expectation maximization algorithm for solving Poisson inverse problems [Richardson, 1972, Lucy, 1974, Shepp and Vardi, 1982, Vardi and Lee, 1993].

Recently, Kamatsuka et al. [2024] proposed an algorithm similar to Augustin’s fixed-point iteration to compute a Rényi information measure of statistical independence, which was explored by Lapidoth and Pfister [2019] and Tomamichel and Hayashi [2018]. For any α[0,1)(1,)𝛼011\alpha\in[0,1)\cup(1,\infty)italic_α ∈ [ 0 , 1 ) ∪ ( 1 , ∞ ), this order-α𝛼\alphaitalic_α Rényi information measure is defined by the following optimization problem:

minxΔ([m])minyΔ([n])fRen(x,y),fRen(x,y)Dα(pxy),subscript𝑥Δdelimited-[]𝑚subscript𝑦Δdelimited-[]𝑛subscript𝑓Ren𝑥𝑦subscript𝑓Ren𝑥𝑦subscript𝐷𝛼conditional𝑝tensor-product𝑥𝑦\min_{x\in\Delta([m])}\min_{y\in\Delta([n])}f_{\mathrm{Ren}}(x,y),\quad f_{% \mathrm{Ren}}(x,y)\coloneqq D_{\alpha}(p\parallel x\otimes y),roman_min start_POSTSUBSCRIPT italic_x ∈ roman_Δ ( [ italic_m ] ) end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_y ∈ roman_Δ ( [ italic_n ] ) end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT roman_Ren end_POSTSUBSCRIPT ( italic_x , italic_y ) , italic_f start_POSTSUBSCRIPT roman_Ren end_POSTSUBSCRIPT ( italic_x , italic_y ) ≔ italic_D start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_p ∥ italic_x ⊗ italic_y ) , (3)

where p𝑝pitalic_p is a given probability distribution over [m]×[n]delimited-[]𝑚delimited-[]𝑛[m]\times[n][ italic_m ] × [ italic_n ] and tensor-product\otimes denotes the tensor product. The Rényi information measure emerges in the error exponent of a hypothesis testing problem, where we test against the independence of two random variables given independent and identically distributed (i.i.d.) samples from their joint distribution [Lapidoth and Pfister, 2018, 2019, Tomamichel and Hayashi, 2018].

Kamatsuka et al.’s algorithm to compute the Rényi information measure iterates as:

xt+1subscript𝑥𝑡1\displaystyle x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Z1,t1xt(xfRen(xt,yt))1/α,absentdirect-productsuperscriptsubscript𝑍1𝑡1subscript𝑥𝑡superscriptsubscript𝑥subscript𝑓Rensubscript𝑥𝑡subscript𝑦𝑡1𝛼\displaystyle=Z_{1,t}^{-1}\cdot x_{t}\odot(-\nabla_{x}f_{\mathrm{Ren}}(x_{t},y% _{t}))^{1/\alpha},= italic_Z start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ ( - ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT roman_Ren end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT , (4)
yt+1subscript𝑦𝑡1\displaystyle y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =Z2,t1yt(yfRen(xt+1,yt))1/α,absentdirect-productsuperscriptsubscript𝑍2𝑡1subscript𝑦𝑡superscriptsubscript𝑦subscript𝑓Rensubscript𝑥𝑡1subscript𝑦𝑡1𝛼\displaystyle=Z_{2,t}^{-1}\cdot y_{t}\odot(-\nabla_{y}f_{\mathrm{Ren}}(x_{t+1}% ,y_{t}))^{1/\alpha},= italic_Z start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ ( - ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT roman_Ren end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ,

where Z1,tsubscript𝑍1𝑡Z_{1,t}italic_Z start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT and Z2,tsubscript𝑍2𝑡Z_{2,t}italic_Z start_POSTSUBSCRIPT 2 , italic_t end_POSTSUBSCRIPT are normalizing constants, ensuring that xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT and yt+1subscript𝑦𝑡1y_{t+1}italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT remain probability distributions. The notation vrsuperscript𝑣𝑟v^{r}italic_v start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT denotes the entry-wise power for any vector v𝑣vitalic_v and number r𝑟ritalic_r. This iterative algorithm is reminiscent of Augustin’s fixed-point iteration but differs in the powers applied to the gradients.

The convergence behaviors of Augustin’s fixed point iteration and Kamatsuka et al.’s algorithm remain largely unclear. For Augustin’s fixed-point iteration, Karakos et al. [2008] and Nakiboğlu [2019] have shown that it asymptotically converges for α\interval[open]01𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛01\alpha\in\interval[open]{0}{1}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 01; Iusem [1992] and Lin et al. [2021] have proved a convergence rate of O(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ) for the case where α𝛼\alphaitalic_α approaches zero. For Kumatsuka et al.’s algorithm, Kamatsuka et al. [2024] have shown that it asymptotically converges for α\interval[openright]1/21\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡121\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞.

We aim to carry out non-asymptotic analyses for the two algorithms. One common approach to analyzing an iterative method is to show that it is contractive under a suitable metric. Since the two algorithms (2) and (4) map positive vectors to positive vectors, we view them as positive dynamical systems and consider the so-called Hilbert’s projective metric [Lemmens and Nussbaum, 2012, Krause, 2015].

In this work, we prove that with respect to Hilbert’s projective metric, Augustin’s fixed-point iteration is contractive for α(1/2,1)(1,3/2)𝛼121132\alpha\in(1/2,1)\cup(1,3/2)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , 3 / 2 ), and Kamatsuka et al.’s algorithm is also contractive for α(1/2,1)(1,)𝛼1211\alpha\in(1/2,1)\cup(1,\infty)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , ∞ ). Based on these contractivity results, we establish the following non-asymptotic convergence guarantees for the two algorithms.

  • For computing the Augustin information of order α\interval[open]1/21\interval[open]13/2𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛121\intervaldelimited-[]𝑜𝑝𝑒𝑛132\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{3/2}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 13 / 2, Augustin’s fixed-point iteration converges at a rate of O((2|1α|)t)𝑂superscript21𝛼𝑡O((2\left\lvert 1-\alpha\right\rvert)^{t})italic_O ( ( 2 | 1 - italic_α | ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) with respect to Hilbert’s projective metric. This improves on the previous asymptotic convergence guarantee [Karakos et al., 2008, Nakiboğlu, 2019] when α(1/2,1)𝛼121\alpha\in(1/2,1)italic_α ∈ ( 1 / 2 , 1 ) and extends the range of convergence to include α(1,3/2)𝛼132\alpha\in(1,3/2)italic_α ∈ ( 1 , 3 / 2 ).

  • For computing the Rényi information measure of order α(1/2,1)(1,)𝛼1211\alpha\in(1/2,1)\cup(1,\infty)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , ∞ ), the iterative algorithm of Kamatsuka et al. converges at a rate of O(|11/α|2t)𝑂superscript11𝛼2𝑡O(\left\lvert 1-1/\alpha\right\rvert^{2t})italic_O ( | 1 - 1 / italic_α | start_POSTSUPERSCRIPT 2 italic_t end_POSTSUPERSCRIPT ) with respect to Hilbert’s projective metric. When α=1/2𝛼12\alpha=1/2italic_α = 1 / 2, this method also converges linearly if p𝑝pitalic_p has full support. This improves on the previous asymptotic convergence guarantee [Kamatsuka et al., 2024].

Notations

We write +subscript\mathbb{R}_{+}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and ++subscriptabsent\mathbb{R}_{++}blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT for the sets of non-negative and strictly positive numbers, respectively. For any positive integer n𝑛nitalic_n, we write [n]delimited-[]𝑛[n][ italic_n ] for the set {1,,n}1𝑛\set{1,\ldots,n}{ start_ARG 1 , … , italic_n end_ARG }. Let vd𝑣superscript𝑑v\in\mathbb{R}^{d}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and A,Bm×n𝐴𝐵superscript𝑚𝑛A,B\in\mathbb{R}^{m\times n}italic_A , italic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT. We write v(i)𝑣𝑖v(i)italic_v ( italic_i ) for the i𝑖iitalic_i-th entry of the vector v𝑣vitalic_v, and A(i,j)𝐴𝑖𝑗A(i,j)italic_A ( italic_i , italic_j ) the (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th entry of the matrix A𝐴Aitalic_A. We write ABdirect-product𝐴𝐵A\odot Bitalic_A ⊙ italic_B for the entry-wise product between A𝐴Aitalic_A and B𝐵Bitalic_B. We write Arsuperscript𝐴𝑟A^{r}italic_A start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT for the matrix (A(i,j)r)1im,1jnsubscript𝐴superscript𝑖𝑗𝑟formulae-sequence1𝑖𝑚1𝑗𝑛(A(i,j)^{r})_{1\leq i\leq m,1\leq j\leq n}( italic_A ( italic_i , italic_j ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_m , 1 ≤ italic_j ≤ italic_n end_POSTSUBSCRIPT. For a set Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we denote by riSri𝑆\operatorname{ri}Sroman_ri italic_S its relative interior. We will adopt the convention that 00=0superscript0000^{0}=00 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 0, 0/0=10010/0=10 / 0 = 1, =\infty\cdot\infty=\infty∞ ⋅ ∞ = ∞, a=𝑎a\cdot\infty=\inftyitalic_a ⋅ ∞ = ∞ for any a>0𝑎0a>0italic_a > 0, and log=\log\infty=\inftyroman_log ∞ = ∞. We call Δ([d])Δdelimited-[]𝑑\Delta([d])roman_Δ ( [ italic_d ] ) the probability simplex and view elements in Δ([d])Δdelimited-[]𝑑\Delta([d])roman_Δ ( [ italic_d ] ) as d𝑑ditalic_d-dimensional vectors.

2 Related Work

We have discussed Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in Section 1. This section reviews other optimization algorithms for computing the Augustin information and the Rényi information measure.

For computing the Augusitin information of order α𝛼\alphaitalic_α, entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022], as well as a variant of Augustin’s fixed-point iteration explored by Cheng and Nakiboğlu [2021, Lemma 6], all achieve asymptotic convergence for all α\interval[openright]01\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡01\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 01 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞. Riemannian gradient descent with the Poincaré metric [Wang et al., 2024] converges at a rate of O(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ) for all α\interval[openright]01\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡01\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{0}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 01 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞. An alternating minimization method due to Kamatsuka et al. [2024]222Kamatsuka et al. [2024] only claimed an asymptotic convergence guarantee in their paper. We find that their Lemma 2 indeed implies a convergence rate of O(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ). also achieves a converges rate of O(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ), but for a narrower range of α\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 1 ∞. None of the existing works have yet established a linear convergence rate.

For computing the Rényi information measure of order α𝛼\alphaitalic_α, entropic mirror descent with Armijo line search [Li et al., 2018] and with the Polyak step size [You et al., 2022] both asymptotically converge for α\interval[openright]1/21\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡121\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞. However, when α\interval[open]01/2𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛012\alpha\in\interval[open]{0}{1/2}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 01 / 2, the optimization problem (3) becomes non-convex [Lapidoth and Pfister, 2019], and currently, there are no known algorithms that provably solve this problem. Similarly to the computation of the Augustin information, none of the existing works have established a linear convergence rate.

3 Preliminaries

Our analyses are based on properties of Hilbert’s projective metric and Birkhoff’s contraction theorem, which we introduce in this section.

Let K𝐾Kitalic_K be a closed cone in a finite-dimensional real vector space, such as the positive orthant and the set of Hermitian positive semidefinite matrices. For any x,yK𝑥𝑦𝐾x,y\in Kitalic_x , italic_y ∈ italic_K, we write xy𝑥𝑦x\leq yitalic_x ≤ italic_y if yxK𝑦𝑥𝐾y-x\in Kitalic_y - italic_x ∈ italic_K. For any x,yK{0}𝑥𝑦𝐾0x,y\in K\setminus\{0\}italic_x , italic_y ∈ italic_K ∖ { 0 }, define

M(x/y)inf{β0xβy}>0.𝑀𝑥𝑦infimumconditional-set𝛽0𝑥𝛽𝑦0M(x/y)\coloneqq\inf\{\beta\geq 0\mid x\leq\beta y\}>0.italic_M ( italic_x / italic_y ) ≔ roman_inf { italic_β ≥ 0 ∣ italic_x ≤ italic_β italic_y } > 0 . (5)

If the set is empty, then M(x/y)𝑀𝑥𝑦M(x/y)\coloneqq\inftyitalic_M ( italic_x / italic_y ) ≔ ∞.

Definition 1.

Hilbert’s projective metric is defined as

dH(x,y)log(M(x/y)M(y/x))[0,],x,yK{0}.formulae-sequencesubscript𝑑H𝑥𝑦𝑀𝑥𝑦𝑀𝑦𝑥0for-all𝑥𝑦𝐾0d_{\mathrm{H}}(x,y)\coloneqq\log(M(x/y)M(y/x))\in[0,\infty],\quad\forall x,y% \in K\setminus\set{0}.italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) ≔ roman_log ( italic_M ( italic_x / italic_y ) italic_M ( italic_y / italic_x ) ) ∈ [ 0 , ∞ ] , ∀ italic_x , italic_y ∈ italic_K ∖ { start_ARG 0 end_ARG } .

In addition, dH(0,0)subscript𝑑H00d_{\mathrm{H}}(0,0)italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( 0 , 0 ) is defined to be 00.

The following lemma shows that dHsubscript𝑑Hd_{\mathrm{H}}italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT is indeed a metric on the set of rays.

Lemma 2.

The following properties hold.

  1. (i)

    For any x,yK𝑥𝑦𝐾x,y\in Kitalic_x , italic_y ∈ italic_K and any α,β>0𝛼𝛽0\alpha,\beta>0italic_α , italic_β > 0, we have dH(αx,βy)=dH(x,y)subscript𝑑H𝛼𝑥𝛽𝑦subscript𝑑H𝑥𝑦d_{\mathrm{H}}(\alpha x,\beta y)=d_{\mathrm{H}}(x,y)italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_α italic_x , italic_β italic_y ) = italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ).

  2. (ii)

    We have dH(x,y)=0subscript𝑑H𝑥𝑦0d_{\mathrm{H}}(x,y)=0italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) = 0 if and only if x=ry𝑥𝑟𝑦x=ryitalic_x = italic_r italic_y for some r>0𝑟0r>0italic_r > 0.

In the rest of the paper, we will only consider the cone K=+d𝐾superscriptsubscript𝑑K=\mathbb{R}_{+}^{d}italic_K = blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Lemma 3.

Consider Hilbert’s projective metric dHsubscript𝑑Hd_{\mathrm{H}}italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT on the cone K=+d𝐾superscriptsubscript𝑑K=\mathbb{R}_{+}^{d}italic_K = blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

  1. (i)

    For any x,y+d{0}𝑥𝑦superscriptsubscript𝑑0x,y\in\mathbb{R}_{+}^{d}\setminus\{0\}italic_x , italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∖ { 0 }, we have

    M(x/y)=maxi[d]x(i)y(i),dH(x,y)=logmaxi,j[d]x(i)y(j)y(i)x(j).formulae-sequence𝑀𝑥𝑦subscript𝑖delimited-[]𝑑𝑥𝑖𝑦𝑖subscript𝑑H𝑥𝑦subscript𝑖𝑗delimited-[]𝑑𝑥𝑖𝑦𝑗𝑦𝑖𝑥𝑗M(x/y)=\max_{i\in[d]}\frac{x(i)}{y(i)},\quad d_{\mathrm{H}}(x,y)=\log\max_{i,j% \in[d]}\frac{x(i)y(j)}{y(i)x(j)}.italic_M ( italic_x / italic_y ) = roman_max start_POSTSUBSCRIPT italic_i ∈ [ italic_d ] end_POSTSUBSCRIPT divide start_ARG italic_x ( italic_i ) end_ARG start_ARG italic_y ( italic_i ) end_ARG , italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) = roman_log roman_max start_POSTSUBSCRIPT italic_i , italic_j ∈ [ italic_d ] end_POSTSUBSCRIPT divide start_ARG italic_x ( italic_i ) italic_y ( italic_j ) end_ARG start_ARG italic_y ( italic_i ) italic_x ( italic_j ) end_ARG . (6)
  2. (ii)

    (riΔ([d]),dH)riΔdelimited-[]𝑑subscript𝑑H(\operatorname{ri}\Delta([d]),d_{\mathrm{H}})( roman_ri roman_Δ ( [ italic_d ] ) , italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ) is a metric space [Lemmens and Nussbaum, 2012, Proposition 2.1.1].

Given the second item above, we will measure the errors of both Augustin’s fixed-point iteration and Kamatsuka et al.’s algorithm in terms of Hilbert’s projective metric between their iterates and the minimizer. The following lemma lists several properties of Hilbert’s projective metric, which are direct consequences of Corollary 2.1.4 and Corollary 2.1.5 of Lemmens and Nussbaum [2012].

Lemma 4.

The following properties hold.

  1. (i)

    dH(xr,yr)|r|dH(x,y)subscript𝑑Hsuperscript𝑥𝑟superscript𝑦𝑟𝑟subscript𝑑H𝑥𝑦d_{\mathrm{H}}(x^{r},y^{r})\leq\left\lvert r\right\rvert d_{\mathrm{H}}(x,y)italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ≤ | italic_r | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) for any x,y+d𝑥𝑦superscriptsubscript𝑑x,y\in\mathbb{R}_{+}^{d}italic_x , italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and any r{0}𝑟0r\in\mathbb{R}\setminus\{0\}italic_r ∈ blackboard_R ∖ { 0 }.

  2. (ii)

    dH(vx,vy)dH(x,y)subscript𝑑Hdirect-product𝑣𝑥direct-product𝑣𝑦subscript𝑑H𝑥𝑦d_{\mathrm{H}}(v\odot x,v\odot y)\leq d_{\mathrm{H}}(x,y)italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_v ⊙ italic_x , italic_v ⊙ italic_y ) ≤ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) for any x,y,v+d𝑥𝑦𝑣superscriptsubscript𝑑x,y,v\in\mathbb{R}_{+}^{d}italic_x , italic_y , italic_v ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

  3. (iii)

    dH(Ax,Ay)dH(x,y)subscript𝑑H𝐴𝑥𝐴𝑦subscript𝑑H𝑥𝑦d_{\mathrm{H}}(Ax,Ay)\leq d_{\mathrm{H}}(x,y)italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_A italic_x , italic_A italic_y ) ≤ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) for any x,y+d𝑥𝑦superscriptsubscript𝑑x,y\in\mathbb{R}_{+}^{d}italic_x , italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and A+d×d𝐴superscriptsubscriptsuperscript𝑑𝑑A\in\mathbb{R}_{+}^{d^{\prime}\times d}italic_A ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_d end_POSTSUPERSCRIPT.

When the matrix in Lemma 4 (iii) is entry-wise strictly positive, Birkhoff [1957] showed that linear transformation defined by it is a contraction.

Theorem 5 (Birkhoff [1957]).

Let A++m×n𝐴superscriptsubscriptabsent𝑚𝑛A\in\mathbb{R}_{++}^{m\times n}italic_A ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT. It holds that

dH(Ax,Ay)λ(A)dH(x,y),x,y+n,formulae-sequencesubscript𝑑H𝐴𝑥𝐴𝑦𝜆𝐴subscript𝑑H𝑥𝑦for-all𝑥𝑦superscriptsubscript𝑛d_{\mathrm{H}}(Ax,Ay)\leq\lambda(A)\cdot d_{\mathrm{H}}(x,y),\quad\forall x,y% \in\mathbb{R}_{+}^{n},italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_A italic_x , italic_A italic_y ) ≤ italic_λ ( italic_A ) ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) , ∀ italic_x , italic_y ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

where

λ(A)tanh(δ(A)4)<1,𝜆𝐴𝛿𝐴41\lambda(A)\coloneqq\tanh\left(\frac{\delta(A)}{4}\right)<1,italic_λ ( italic_A ) ≔ roman_tanh ( divide start_ARG italic_δ ( italic_A ) end_ARG start_ARG 4 end_ARG ) < 1 ,

and

δ(A)logmax(i,j),(i,j)[m]×[n]A(i,j)A(i,j)A(i,j)A(i,j)0.𝛿𝐴subscript𝑖𝑗superscript𝑖superscript𝑗delimited-[]𝑚delimited-[]𝑛𝐴𝑖𝑗𝐴superscript𝑖superscript𝑗𝐴superscript𝑖𝑗𝐴𝑖superscript𝑗0\delta(A)\coloneqq\log\max_{(i,j),(i^{\prime},j^{\prime})\in[m]\times[n]}\frac% {A(i,j)A(i^{\prime},j^{\prime})}{A(i^{\prime},j)A(i,j^{\prime})}\geq 0.italic_δ ( italic_A ) ≔ roman_log roman_max start_POSTSUBSCRIPT ( italic_i , italic_j ) , ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ [ italic_m ] × [ italic_n ] end_POSTSUBSCRIPT divide start_ARG italic_A ( italic_i , italic_j ) italic_A ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_A ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_j ) italic_A ( italic_i , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ≥ 0 .

4 Linear Rate of Augustin’s Fixed-Point Iteration

In this section, we show that Augustin’s fixed-point iteration converges linearly with respect to Hilbert’s projective metric for computing the Augustin information of order α(1/2,1)(1,3/2)𝛼121132\alpha\in(1/2,1)\cup(1,3/2)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , 3 / 2 ).

4.1 Augustin’s Fixed-Point Iteration

Define the following operators:

Tα(x):=𝔼pP[Tα,p(x)],Tα,p(x):=pαx1αpα,x1α,pΔ([d]).formulae-sequenceassignsubscript𝑇𝛼𝑥subscript𝔼similar-to𝑝𝑃delimited-[]subscript𝑇𝛼𝑝𝑥formulae-sequenceassignsubscript𝑇𝛼𝑝𝑥direct-productsuperscript𝑝𝛼superscript𝑥1𝛼expectationsuperscript𝑝𝛼superscript𝑥1𝛼for-all𝑝Δdelimited-[]𝑑T_{\alpha}(x):=\mathbb{E}_{p\sim P}\left[T_{\alpha,p}(x)\right],\quad T_{% \alpha,p}(x):=\frac{p^{\alpha}\odot x^{1-\alpha}}{\braket{p^{\alpha},x^{1-% \alpha}}},\quad\forall p\in\Delta([d]).italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) := blackboard_E start_POSTSUBSCRIPT italic_p ∼ italic_P end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) ] , italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) := divide start_ARG italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⊙ italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ⟨ start_ARG italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT end_ARG ⟩ end_ARG , ∀ italic_p ∈ roman_Δ ( [ italic_d ] ) . (7)

Augustin’s fixed-point iteration (2) can be equivalent written as follows:

  1. 1.

    Initialize x1Δ([d])subscript𝑥1Δdelimited-[]𝑑x_{1}\in\Delta([d])italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Δ ( [ italic_d ] ).

  2. 2.

    For all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, compute xt+1=Tα(xt)subscript𝑥𝑡1subscript𝑇𝛼subscript𝑥𝑡x_{t+1}=T_{\alpha}(x_{t})italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

Lemma 6 ([Nakiboğlu, 2019, Lemma 13]).

For α(0,1)(1,)𝛼011\alpha\in(0,1)\cup(1,\infty)italic_α ∈ ( 0 , 1 ) ∪ ( 1 , ∞ ), the optimization problem (1) has a unique minimizer xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, which satisfies the fixed-point equation x=Tα(x)superscript𝑥subscript𝑇𝛼superscript𝑥x^{\star}=T_{\alpha}(x^{\star})italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ).

4.2 Linear Rate Guarantee

The main result of this section is the following theorem, which bounds the Lipschitz constant of the mapping Tαsubscript𝑇𝛼T_{\alpha}italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.

Theorem 7.

For α[0,1)(1,)𝛼011\alpha\in[0,1)\cup(1,\infty)italic_α ∈ [ 0 , 1 ) ∪ ( 1 , ∞ ), we have

dH(Tα(x),Tα(y))γdH(x,y),x,yΔ([d]),formulae-sequencesubscript𝑑Hsubscript𝑇𝛼𝑥subscript𝑇𝛼𝑦𝛾subscript𝑑H𝑥𝑦for-all𝑥𝑦Δdelimited-[]𝑑d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq\gamma\cdot d_{\mathrm{H}}(x,y)% ,\quad\forall x,y\in\Delta([d]),italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) ) ≤ italic_γ ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) , ∀ italic_x , italic_y ∈ roman_Δ ( [ italic_d ] ) , (8)

where γ:=2|1α|assign𝛾21𝛼\gamma:=2|1-\alpha|italic_γ := 2 | 1 - italic_α |.

Linear convergence of Augustin’s fixed-point iteration for α\interval[open]1/21\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛121\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞ immediately follows.

Corollary 8.

For any α(1/2,1)(1,3/2)𝛼121132\alpha\in(1/2,1)\cup(1,3/2)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , 3 / 2 ), let xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT be the minimizer of the optimization problem (1) and {xt}subscript𝑥𝑡\{x_{t}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } be the iterates of Augustin’s fixed-point iteration (2). We have

dH(xt+1,x)γtdH(x1,x),subscript𝑑Hsubscript𝑥𝑡1superscript𝑥superscript𝛾𝑡subscript𝑑Hsubscript𝑥1superscript𝑥d_{\mathrm{H}}(x_{t+1},x^{\star})\leq\gamma^{t}\cdot d_{\mathrm{H}}(x_{1},x^{% \star}),italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ≤ italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ,

for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, where γ<1𝛾1\gamma<1italic_γ < 1 is defined in Theorem 7.

Remark 9.

Corollary 8 is meaningful only when dH(x1,x)<subscript𝑑Hsubscript𝑥1superscript𝑥d_{\mathrm{H}}(x_{1},x^{\star})<\inftyitalic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) < ∞. Lemma 13 of Nakiboğlu [2019] shows that 𝔼p[p]riΔ([d])subscript𝔼𝑝delimited-[]𝑝riΔdelimited-[]𝑑\mathbb{E}_{p}[p]\in\operatorname{ri}\Delta([d])blackboard_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ italic_p ] ∈ roman_ri roman_Δ ( [ italic_d ] ) implies xriΔnsuperscript𝑥risubscriptΔ𝑛x^{\star}\in\operatorname{ri}\Delta_{n}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_ri roman_Δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. In this case, it suffices to choose x1riΔ([d])subscript𝑥1riΔdelimited-[]𝑑x_{1}\in\operatorname{ri}\Delta([d])italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_ri roman_Δ ( [ italic_d ] ) to ensure that dH(x1,x)<subscript𝑑Hsubscript𝑥1superscript𝑥d_{\mathrm{H}}(x_{1},x^{\star})<\inftyitalic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) < ∞.

4.3 Proof of Theorem 7

The proof primarily consists of two steps, which are reflected by the following two lemmas. First, we show that the operators Tα,psubscript𝑇𝛼𝑝T_{\alpha,p}italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT are Lipschitz with respect to Hilbert’s projective metric and bound the Lipschitz constant. Then, given that Tα()=𝔼p[Tα,p()]subscript𝑇𝛼subscript𝔼𝑝delimited-[]subscript𝑇𝛼𝑝T_{\alpha}(\cdot)=\mathbb{E}_{p}\left[T_{\alpha,p}(\cdot)\right]italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( ⋅ ) = blackboard_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( ⋅ ) ], we prove a general lemma that bounds Hilbert’s projective metric between two random probability vectors in terms of Hilbert’s projective metric between their realizations, which is of independent interest. The proofs of both lemmas are deferred to Appendix A.

Lemma 10.

For any α[0,1)(1,)𝛼011\alpha\in[0,1)\cup(1,\infty)italic_α ∈ [ 0 , 1 ) ∪ ( 1 , ∞ ) and pΔ([d])𝑝Δdelimited-[]𝑑p\in\Delta([d])italic_p ∈ roman_Δ ( [ italic_d ] ),

dH(Tα,p(x),Tα,p(y))|1α|dH(x,y),x,yΔ([d]).formulae-sequencesubscript𝑑Hsubscript𝑇𝛼𝑝𝑥subscript𝑇𝛼𝑝𝑦1𝛼subscript𝑑H𝑥𝑦for-all𝑥𝑦Δdelimited-[]𝑑d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))\leq\left\lvert 1-\alpha\right% \rvert d_{\mathrm{H}}(x,y),\quad\forall x,y\in\Delta([d]).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_y ) ) ≤ | 1 - italic_α | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) , ∀ italic_x , italic_y ∈ roman_Δ ( [ italic_d ] ) .
Lemma 11.

Let X,Y:ΩΔ([d]):𝑋𝑌ΩΔdelimited-[]𝑑X,Y:\Omega\to\Delta([d])italic_X , italic_Y : roman_Ω → roman_Δ ( [ italic_d ] ) be two random probability vectors, where ΩΩ\Omegaroman_Ω denotes the sample space. We have

dH(𝔼[X],𝔼[Y])2supωΩdH(X(ω),Y(ω)).subscript𝑑H𝔼delimited-[]𝑋𝔼delimited-[]𝑌2subscriptsupremum𝜔Ωsubscript𝑑H𝑋𝜔𝑌𝜔d_{\mathrm{H}}\left(\mathbb{E}[X],\mathbb{E}[Y]\right)\leq 2\sup_{\omega\in% \Omega}d_{\mathrm{H}}(X(\omega),Y(\omega)).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( blackboard_E [ italic_X ] , blackboard_E [ italic_Y ] ) ≤ 2 roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_X ( italic_ω ) , italic_Y ( italic_ω ) ) .

Theorem 7 follows immediately: By Lemma 11, we write

dH(Tα(x),Tα(y))2suppΔ([d])dH(Tα,p(x),Tα,p(y)).subscript𝑑Hsubscript𝑇𝛼𝑥subscript𝑇𝛼𝑦2subscriptsupremum𝑝Δdelimited-[]𝑑subscript𝑑Hsubscript𝑇𝛼𝑝𝑥subscript𝑇𝛼𝑝𝑦d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq 2\sup_{p\in\Delta([d])}d_{% \mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y)).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) ) ≤ 2 roman_sup start_POSTSUBSCRIPT italic_p ∈ roman_Δ ( [ italic_d ] ) end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_y ) ) .

Then, by Lemma 10, we obtain

dH(Tα(x),Tα(y))2suppΔ([d])|1α|dH(x,y)=2|1α|dH(x,y).subscript𝑑Hsubscript𝑇𝛼𝑥subscript𝑇𝛼𝑦2subscriptsupremum𝑝Δdelimited-[]𝑑1𝛼subscript𝑑H𝑥𝑦21𝛼subscript𝑑H𝑥𝑦d_{\mathrm{H}}(T_{\alpha}(x),T_{\alpha}(y))\leq 2\sup_{p\in\Delta([d])}\left% \lvert 1-\alpha\right\rvert d_{\mathrm{H}}(x,y)=2\left\lvert 1-\alpha\right% \rvert d_{\mathrm{H}}(x,y).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) ) ≤ 2 roman_sup start_POSTSUBSCRIPT italic_p ∈ roman_Δ ( [ italic_d ] ) end_POSTSUBSCRIPT | 1 - italic_α | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) = 2 | 1 - italic_α | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) .

This completes the proof.

5 Linear Rate of Kamatsuka et al.’s Algorithm

In this section, we show that Kamatsuka et al.’s algorithm converges linearly with respect to Hilbert’s projective metric for computing the Rényi information measure of order α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ).

For convenience, we will view any pΔ([m]×[n])𝑝Δdelimited-[]𝑚delimited-[]𝑛p\in\Delta([m]\times[n])italic_p ∈ roman_Δ ( [ italic_m ] × [ italic_n ] ) as a matrix in +m×nsuperscriptsubscript𝑚𝑛\mathbb{R}_{+}^{m\times n}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT whose entries sum to 1111. We will denote Hilbert’s projective metric on both ++msuperscriptsubscriptabsent𝑚\mathbb{R}_{++}^{m}blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and ++nsuperscriptsubscriptabsent𝑛\mathbb{R}_{++}^{n}blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT by dHsubscript𝑑Hd_{\mathrm{H}}italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT. The associated cone should be clear from the context.

5.1 Kamatsuka et al.’s Algorithm

Define the following two operators:

Uα(y):=(pαy1α)1/α(pαy1α)1/α1,Vα(x):=((pα)x1α)1/α((pα)x1α)1/α1.formulae-sequenceassignsubscript𝑈𝛼𝑦superscriptsuperscript𝑝𝛼superscript𝑦1𝛼1𝛼subscriptdelimited-∥∥superscriptsuperscript𝑝𝛼superscript𝑦1𝛼1𝛼1assignsubscript𝑉𝛼𝑥superscriptsuperscriptsuperscript𝑝𝛼topsuperscript𝑥1𝛼1𝛼subscriptdelimited-∥∥superscriptsuperscriptsuperscript𝑝𝛼topsuperscript𝑥1𝛼1𝛼1\begin{split}U_{\alpha}(y)&:=\frac{\left(p^{\alpha}y^{1-\alpha}\right)^{1/% \alpha}}{\left\lVert\left(p^{\alpha}y^{1-\alpha}\right)^{1/\alpha}\right\rVert% _{1}},\\ V_{\alpha}(x)&:=\frac{\big{(}(p^{\alpha})^{\top}x^{1-\alpha}\big{)}^{1/\alpha}% }{\left\lVert\big{(}(p^{\alpha})^{\top}x^{1-\alpha}\big{)}^{1/\alpha}\right% \rVert_{1}}.\end{split}start_ROW start_CELL italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) end_CELL start_CELL := divide start_ARG ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ∥ ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW start_ROW start_CELL italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL := divide start_ARG ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ∥ ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . end_CELL end_ROW (9)

Kamatsuka et al.’s algorithm (4) can be equivalently written as follows:

  1. 1.

    Initialize x1Δ([m])subscript𝑥1Δdelimited-[]𝑚x_{1}\in\Delta([m])italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_Δ ( [ italic_m ] ), and compute y1=Vα(x1)subscript𝑦1subscript𝑉𝛼subscript𝑥1y_{1}=V_{\alpha}(x_{1})italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ).

  2. 2.

    For all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, compute xt+1=Uα(yt)subscript𝑥𝑡1subscript𝑈𝛼subscript𝑦𝑡x_{t+1}=U_{\alpha}(y_{t})italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and yt+1=Vα(xt+1)subscript𝑦𝑡1subscript𝑉𝛼subscript𝑥𝑡1y_{t+1}=V_{\alpha}(x_{t+1})italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ).

This algorithm is inspired by the following lemma [Lapidoth and Pfister, 2019, Lemma 16].

Lemma 12.

For α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ), every minimizer (x,y)superscript𝑥superscript𝑦(x^{\star},y^{\star})( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) of the optimization problem (3) satisfies x=Uα(y)superscript𝑥subscript𝑈𝛼superscript𝑦x^{\star}=U_{\alpha}(y^{\star})italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) and y=Vα(x)superscript𝑦subscript𝑉𝛼superscript𝑥y^{\star}=V_{\alpha}(x^{\star})italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ).

5.2 Linear Rate Guarantee

The following theorem presents a key observation, showing that the operators Uαsubscript𝑈𝛼U_{\alpha}italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT and Vαsubscript𝑉𝛼V_{\alpha}italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT have a Lipschitz constant of |1α1|1superscript𝛼1\left\lvert 1-\alpha^{-1}\right\rvert| 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT | with respect to Hilbert’s projective metric. Its proof is postponed to the next subsection.

Theorem 13.

For α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ), we have

dH(Vα(x),Vα(x))subscript𝑑Hsubscript𝑉𝛼𝑥subscript𝑉𝛼superscript𝑥\displaystyle d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) γdH(x,x),absentsuperscript𝛾subscript𝑑H𝑥superscript𝑥\displaystyle\leq\gamma^{\prime}\cdot d_{\mathrm{H}}(x,x^{\prime}),\quad≤ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , x,xΔ([m]),for-all𝑥superscript𝑥Δdelimited-[]𝑚\displaystyle\forall x,x^{\prime}\in\Delta([m]),∀ italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Δ ( [ italic_m ] ) , (10)
dH(Uα(y),Uα(y))subscript𝑑Hsubscript𝑈𝛼𝑦subscript𝑈𝛼superscript𝑦\displaystyle d_{\mathrm{H}}(U_{\alpha}(y),U_{\alpha}(y^{\prime}))italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y ) , italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) γdH(y,y),absentsuperscript𝛾subscript𝑑H𝑦superscript𝑦\displaystyle\leq\gamma^{\prime}\cdot d_{\mathrm{H}}(y,y^{\prime}),\quad≤ italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , y,yΔ([n]),for-all𝑦superscript𝑦Δdelimited-[]𝑛\displaystyle\forall y,y^{\prime}\in\Delta([n]),∀ italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Δ ( [ italic_n ] ) ,

where γ:=|1(1/α)|assignsuperscript𝛾11𝛼\gamma^{\prime}:=\left\lvert 1-(1/\alpha)\right\rvertitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := | 1 - ( 1 / italic_α ) |. Moreover, if p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, then the Lipschitz constant γsuperscript𝛾\gamma^{\prime}italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be improved to

γ′′:=|11α|λ(pα)<γ,assignsuperscript𝛾′′11𝛼𝜆superscript𝑝𝛼superscript𝛾\gamma^{\prime\prime}:=\left\lvert 1-\frac{1}{\alpha}\right\rvert\lambda(p^{% \alpha})<\gamma^{\prime},italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT := | 1 - divide start_ARG 1 end_ARG start_ARG italic_α end_ARG | italic_λ ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) < italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

where λ()𝜆\lambda(\cdot)italic_λ ( ⋅ ) is defined in Theorem 5.

Theorem 13 implies the following corollary, showing that the iterative algorithm converges linearly. The proof of the corollary is deferred to Appendix A.

Corollary 14.

Let (x,y)superscript𝑥superscript𝑦(x^{\star},y^{\star})( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) be a minimizer of the optimization problem (3) and {(xt,yt)}subscript𝑥𝑡subscript𝑦𝑡\{(x_{t},y_{t})\}{ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) } be the iterates of the iterative algorithm (4).

  1. (i)

    If α(1/2,1)(1,)𝛼1211\alpha\in(1/2,1)\cup(1,\infty)italic_α ∈ ( 1 / 2 , 1 ) ∪ ( 1 , ∞ ), then

    dH(xt+1,x)(γ)2tdH(x1,x),dH(yt+1,y)(γ)2t+1dH(x1,x),formulae-sequencesubscript𝑑Hsubscript𝑥𝑡1superscript𝑥superscriptsuperscript𝛾2𝑡subscript𝑑Hsubscript𝑥1superscript𝑥subscript𝑑Hsubscript𝑦𝑡1superscript𝑦superscriptsuperscript𝛾2𝑡1subscript𝑑Hsubscript𝑥1superscript𝑥\begin{split}d_{\mathrm{H}}(x_{t+1},x^{\star})&\leq(\gamma^{\prime})^{2t}\cdot d% _{\mathrm{H}}(x_{1},x^{\star}),\\ d_{\mathrm{H}}(y_{t+1},y^{\star})&\leq(\gamma^{\prime})^{2t+1}\cdot d_{\mathrm% {H}}(x_{1},x^{\star}),\end{split}start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_CELL start_CELL ≤ ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_t end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_CELL start_CELL ≤ ( italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_t + 1 end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , end_CELL end_ROW

    for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, where γ<1superscript𝛾1\gamma^{\prime}<1italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < 1 is defined in Theorem 13.

  2. (ii)

    If α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ) and p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, then

    dH(xt+1,x)(γ′′)2tdH(x1,x),dH(yt+1,y)(γ′′)2t+1dH(x1,x),formulae-sequencesubscript𝑑Hsubscript𝑥𝑡1superscript𝑥superscriptsuperscript𝛾′′2𝑡subscript𝑑Hsubscript𝑥1superscript𝑥subscript𝑑Hsubscript𝑦𝑡1superscript𝑦superscriptsuperscript𝛾′′2𝑡1subscript𝑑Hsubscript𝑥1superscript𝑥\begin{split}d_{\mathrm{H}}(x_{t+1},x^{\star})&\leq(\gamma^{\prime\prime})^{2t% }\cdot d_{\mathrm{H}}(x_{1},x^{\star}),\\ d_{\mathrm{H}}(y_{t+1},y^{\star})&\leq(\gamma^{\prime\prime})^{2t+1}\cdot d_{% \mathrm{H}}(x_{1},x^{\star}),\end{split}start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_CELL start_CELL ≤ ( italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_t end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_CELL start_CELL ≤ ( italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_t + 1 end_POSTSUPERSCRIPT ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , end_CELL end_ROW

    for all t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, where γ′′<1superscript𝛾′′1\gamma^{\prime\prime}<1italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT < 1 is defined in Theorem 13.

Remark 15.

Corollary 14 is meaningful only when dH(x1,x)<subscript𝑑Hsubscript𝑥1superscript𝑥d_{\mathrm{H}}(x_{1},x^{\star})<\inftyitalic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) < ∞. For α[1/2,1)(1,)𝛼1211\alpha\in[1/2,1)\cup(1,\infty)italic_α ∈ [ 1 / 2 , 1 ) ∪ ( 1 , ∞ ), Lemma 17 in Appendix A ensures that xriΔ([m])superscript𝑥riΔdelimited-[]𝑚x^{\star}\in\operatorname{ri}\Delta([m])italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_ri roman_Δ ( [ italic_m ] ) whenever p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT . In this case, it suffices to choose x1riΔ([m])subscript𝑥1riΔdelimited-[]𝑚x_{1}\in\operatorname{ri}\Delta([m])italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ roman_ri roman_Δ ( [ italic_m ] ) to ensure dH(x1,x)<subscript𝑑Hsubscript𝑥1superscript𝑥d_{\mathrm{H}}(x_{1},x^{\star})<\inftyitalic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) < ∞.

5.3 Proof of Theorem 13

By Lemma 2 and Lemma 4 (i), we have

dH(Vα(x),Vα(x))=dH(((pα)x1α)1/α,((pα)(x)1α)1/α)α1dH((pα)x1α,(pα)(x)1α).subscript𝑑Hsubscript𝑉𝛼𝑥subscript𝑉𝛼superscript𝑥subscript𝑑Hsuperscriptsuperscriptsuperscript𝑝𝛼topsuperscript𝑥1𝛼1𝛼superscriptsuperscriptsuperscript𝑝𝛼topsuperscriptsuperscript𝑥1𝛼1𝛼superscript𝛼1subscript𝑑Hsuperscriptsuperscript𝑝𝛼topsuperscript𝑥1𝛼superscriptsuperscript𝑝𝛼topsuperscriptsuperscript𝑥1𝛼\begin{split}d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))&=d_{\mathrm{% H}}\left(((p^{\alpha})^{\top}x^{1-\alpha})^{1/\alpha},((p^{\alpha})^{\top}(x^{% \prime})^{1-\alpha})^{1/\alpha}\right)\\ &\leq\alpha^{-1}d_{\mathrm{H}}\left((p^{\alpha})^{\top}x^{1-\alpha},(p^{\alpha% })^{\top}(x^{\prime})^{1-\alpha}\right).\end{split}start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) end_CELL start_CELL = italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT , ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT , ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) . end_CELL end_ROW

By Lemma 4 (iii) and (i), we have

dH(Vα(x),Vα(x))|1α1|dH(x,x).subscript𝑑Hsubscript𝑉𝛼𝑥subscript𝑉𝛼superscript𝑥1superscript𝛼1subscript𝑑H𝑥superscript𝑥d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))\leq\left\lvert 1-\alpha^{% -1}\right\rvert d_{\mathrm{H}}(x,x^{\prime}).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ | 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

This proves the first inequality in (10). The second inequality follows from a similar argument.

Assume p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT. We can apply Birkhoff’s contraction theorem (Theorem 5) instead of Lemma 4 (iii) to obtain

dH(Vα(x),Vα(x))|1α1|λ((pα))dH(x,x).subscript𝑑Hsubscript𝑉𝛼𝑥subscript𝑉𝛼superscript𝑥1superscript𝛼1𝜆superscriptsuperscript𝑝𝛼topsubscript𝑑H𝑥superscript𝑥d_{\mathrm{H}}(V_{\alpha}(x),V_{\alpha}(x^{\prime}))\leq\left\lvert 1-\alpha^{% -1}\right\rvert\lambda\left((p^{\alpha})^{\top}\right)\cdot d_{\mathrm{H}}(x,x% ^{\prime}).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ | 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT | italic_λ ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

The theorem follows by noticing that λ((pα))=λ(pα)<1𝜆superscriptsuperscript𝑝𝛼top𝜆superscript𝑝𝛼1\lambda\left((p^{\alpha})^{\top}\right)=\lambda(p^{\alpha})<1italic_λ ( ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = italic_λ ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ) < 1.

6 Discussions

We have proved that Augustin’s fixed-point iteration converges at a linear rate for computing the Augustin information of order α\interval[open]1/21\interval[open]13/2𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛121\intervaldelimited-[]𝑜𝑝𝑒𝑛132\alpha\in\interval[open]{1/2}{1}\cup\interval[open]{1}{3/2}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 13 / 2, and that Kamatsuka et al.’s algorithm converges at a linear rate for computing the Rényi information measure of order α\interval[openright]1/21\interval[open]1𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛𝑟𝑖𝑔𝑡121\intervaldelimited-[]𝑜𝑝𝑒𝑛1\alpha\in\interval[openright]{1/2}{1}\cup\interval[open]{1}{\infty}italic_α ∈ [ italic_o italic_p italic_e italic_n italic_r italic_i italic_g italic_h italic_t ] 1 / 21 ∪ [ italic_o italic_p italic_e italic_n ] 1 ∞. In contrast, existing results are asymptotic and apply to a narrower range of α𝛼\alphaitalic_α. Our proofs are simple, demonstrating the effectiveness of selecting an appropriate mathematical structure.

Preliminary numerical experiments indicate that Augustin’s fixed-point iteration may converge linearly for α\interval[open]01\interval[open]12𝛼\intervaldelimited-[]𝑜𝑝𝑒𝑛01\intervaldelimited-[]𝑜𝑝𝑒𝑛12\alpha\in\interval[open]{0}{1}\cup\interval[open]{1}{2}italic_α ∈ [ italic_o italic_p italic_e italic_n ] 01 ∪ [ italic_o italic_p italic_e italic_n ] 12. This observed range is broader than that we have established. It is natural to explore extending the range of α𝛼\alphaitalic_α that admits linear convergence

Acknowledgements

We thank Marco Tomamichel and Rubboli Roberto for discussions. C.-E. Tsai, G.-R. Wang, and Y.-H. Li are supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2636-E-002-003, by the 2030 Cross-Generation Young Scholars Program (Excellent Young Scholars) of the National Science and Technology Council of Taiwan under grant number NSTC 112-2628-E-002-019-MY3, by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under grant numbers NTU-CC-112L893406 and NTU-CC-113L891606, and by the Academic Research-Career Development Project (Laurel Research Project) of National Taiwan University under grant numbers NTU-CDP-112L7786 and NTU-CDP-113L7763.

H.-C. Cheng is supported by the Young Scholar Fellowship (Einstein Program) of the National Science and Technology Council, Taiwan (R.O.C.) under Grants No. NSTC 112-2636-E-002-009, No. NSTC 113-2119-M-007-006, No. NSTC 113-2119-M-001-006, No. NSTC 113-2124-M-002-003, and No. NSTC 113-2628-E-002-029 by the Yushan Young Scholar Program of the Ministry of Education, Taiwan (R.O.C.) under Grants No. NTU-112V1904-4 and by the research project “Pioneering Research in Forefront Quantum Computing, Learning and Engineering” of National Taiwan University under Grant No. NTU-CC-112L893405 and NTU-CC-113L891605. H.-C. Cheng acknowledges the support from the “Center for Advanced Computing and Imaging in Biomedicine (NTU-113L900702)” through The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

References

  • Arimoto [1973] S. Arimoto. On the converse to the coding theorem for discrete memoryless channels (corresp.). IEEE Trans. Inf. Theory, 19(3):357–359, 1973.
  • Augustin [1978] U. Augustin. Noisy channels. Habilitation Thesis, Universität Erlangen-Nürnberg, 1978.
  • Birkhoff [1957] G. Birkhoff. Extensions of Jentzsch’s theorem. Trans. Amer. Math. Soc., 85(1):219–227, 1957.
  • Cheng and Nakiboğlu [2021] H.-C. Cheng and B. Nakiboğlu. On the existence of the Augustin means. In 2021 IEEE Information Theory Workshop (ITW), 2021.
  • Cover [1984] T. M. Cover. An algorithm for maximizing expected log investment return. IEEE Trans. Inf. Theory, 30(2):369–373, 1984.
  • Csiszár [1995] I. Csiszár. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory, 41(1):26–34, 1995.
  • Csiszár and Körner [2011] I. Csiszár and J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2 edition, 2011.
  • Iusem [1992] A. N. Iusem. A short convergence proof of the em algorithm for a specific Poisson model. Braz. J. Probab. Stat., pages 57–67, 1992.
  • Kamatsuka et al. [2024] A. Kamatsuka, K. Kazama, and T. Yoshida. Algorithms for computing the Augustin–Csiszár mutual information and Lapidoth–Pfister mutual information. arXiv preprint arXiv:2404.10950, 2024.
  • Karakos et al. [2008] D. Karakos, S. Khudanpur, and C. E. Priebe. Computation of Csiszár’s mutual information of order \textalpha. In IEEE Int. Symp. Information Theory, pages 2106–2110, 2008.
  • Krause [2015] U. Krause. Positive Dynamical Systems in Discrete Time: Theory, Models, and Applications. De Gruyter, 2015.
  • Lapidoth and Pfister [2018] A. Lapidoth and C. Pfister. Testing against independence and a Rényi information measure. In IEEE Information Theory Workshop (ITW), pages 1–5, 2018.
  • Lapidoth and Pfister [2019] A. Lapidoth and C. Pfister. Two measures of dependence. Entropy, 21(8), 2019.
  • Lemmens and Nussbaum [2012] B. Lemmens and R. Nussbaum. Nonlinear Perron–Frobenius Theory. Cambridge University Press, 2012.
  • Li et al. [2018] Y.-H. Li, C. A. Riofrío, and V. Cevher. A general convergence result for mirror descent with Armijo line search. arXiv preprint arXiv:1805.12232, 2018.
  • Lin et al. [2021] C.-M. Lin, H.-C. Cheng, and Y.-H. Li. Maximum-likelihood quantum state tomography by Cover’s method with non-asymptotic analysis. arXiv preprint arXiv:2110.00747, 2021.
  • Lucy [1974] L. B. Lucy. An iterative technique for the rectification of observed distributions. Astron. J., 79:745, 1974.
  • Nakiboğlu [2019] B. Nakiboğlu. The Augustin capacity and center. Probl. Inf. Transm., 55(4):299–342, 2019.
  • Richardson [1972] W. H. Richardson. Bayesian-based iterative method of image restoration. J. Opt. Soc. Am., 62(1):55–59, Jan 1972.
  • Shepp and Vardi [1982] L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging, 1(2):113–122, 1982.
  • Tomamichel and Hayashi [2018] M. Tomamichel and M. Hayashi. Operational interpretation of Rényi information measures via composite hypothesis testing against product and markov distributions. IEEE Trans. Inf. Theory, 64(2):1064–1082, 2018.
  • Vardi and Lee [1993] Y. Vardi and D. Lee. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. J. R. Stat. Soc. Series B Stat. Methodol., 55(3):569–598, 1993.
  • Wang et al. [2024] G.-R. Wang, C.-E. Tsai, H.-C. Cheng, and Y.-H. Li. Computing Augustin information via hybrid geodesically convex optimization. In IEEE Int. Symp. Information Theory, 2024.
  • You et al. [2022] J.-K. You, H.-C. Cheng, and Y.-H. Li. Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In IEEE Int. Symp. Information Theory, pages 252–257, 2022.

Appendix A Omitted Proofs

A.1 Proof of Lemma 10

By Lemma 2, we have

dH(Tα,p(x),Tα,p(y))=dH(pαx1α,pαy1α).subscript𝑑Hsubscript𝑇𝛼𝑝𝑥subscript𝑇𝛼𝑝𝑦subscript𝑑Hdirect-productsuperscript𝑝𝛼superscript𝑥1𝛼direct-productsuperscript𝑝𝛼superscript𝑦1𝛼d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))=d_{\mathrm{H}}(p^{\alpha}\odot x% ^{1-\alpha},p^{\alpha}\odot y^{1-\alpha}).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_y ) ) = italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⊙ italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT , italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⊙ italic_y start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) .

Applying Lemma 4 (i) and (ii) gives

dH(Tα,p(x),Tα,p(y))dH(x1α,y1α)|1α|dH(x,y).subscript𝑑Hsubscript𝑇𝛼𝑝𝑥subscript𝑇𝛼𝑝𝑦subscript𝑑Hsuperscript𝑥1𝛼superscript𝑦1𝛼1𝛼subscript𝑑H𝑥𝑦d_{\mathrm{H}}(T_{\alpha,p}(x),T_{\alpha,p}(y))\leq d_{\mathrm{H}}(x^{1-\alpha% },y^{1-\alpha})\leq\left\lvert 1-\alpha\right\rvert d_{\mathrm{H}}(x,y).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_x ) , italic_T start_POSTSUBSCRIPT italic_α , italic_p end_POSTSUBSCRIPT ( italic_y ) ) ≤ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) ≤ | 1 - italic_α | italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x , italic_y ) .

This proves the lemma.

A.2 Proof of Lemma 11

We will use the following lemma, whose proof is postponed to the next subsection.

Lemma 16.

Let X,Y:ΩΔ([d]):𝑋𝑌ΩΔdelimited-[]𝑑X,Y:\Omega\to\Delta([d])italic_X , italic_Y : roman_Ω → roman_Δ ( [ italic_d ] ) be two random probability vectors. We have

M(𝔼[X]/𝔼[Y])supωΩM(X(ω)/Y(ω)).𝑀𝔼delimited-[]𝑋𝔼delimited-[]𝑌subscriptsupremum𝜔Ω𝑀𝑋𝜔𝑌𝜔M(\mathbb{E}[X]/\mathbb{E}[Y])\leq\sup_{\omega\in\Omega}M(X(\omega)/Y(\omega)).italic_M ( blackboard_E [ italic_X ] / blackboard_E [ italic_Y ] ) ≤ roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) .

By Lemma 16, we have

M(𝔼[X]/𝔼[Y])supωΩM(X(ω)/Y(ω)),M(𝔼[Y]/𝔼[X])supωΩM(Y(ω)/X(ω)).formulae-sequence𝑀𝔼delimited-[]𝑋𝔼delimited-[]𝑌subscriptsupremum𝜔Ω𝑀𝑋𝜔𝑌𝜔𝑀𝔼delimited-[]𝑌𝔼delimited-[]𝑋subscriptsupremum𝜔Ω𝑀𝑌𝜔𝑋𝜔\begin{split}M(\mathbb{E}[X]/\mathbb{E}[Y])&\leq\sup_{\omega\in\Omega}M(X(% \omega)/Y(\omega)),\\ M(\mathbb{E}[Y]/\mathbb{E}[X])&\leq\sup_{\omega\in\Omega}M(Y(\omega)/X(\omega)% ).\end{split}start_ROW start_CELL italic_M ( blackboard_E [ italic_X ] / blackboard_E [ italic_Y ] ) end_CELL start_CELL ≤ roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) , end_CELL end_ROW start_ROW start_CELL italic_M ( blackboard_E [ italic_Y ] / blackboard_E [ italic_X ] ) end_CELL start_CELL ≤ roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) . end_CELL end_ROW

Then,

dH(𝔼[X],𝔼[Y])supωΩlogM(X(ω)/Y(ω))+supωΩlogM(Y(ω)/X(ω)).subscript𝑑H𝔼delimited-[]𝑋𝔼delimited-[]𝑌subscriptsupremum𝜔Ω𝑀𝑋𝜔𝑌𝜔subscriptsupremum𝜔Ω𝑀𝑌𝜔𝑋𝜔d_{\mathrm{H}}(\mathbb{E}[X],\mathbb{E}[Y])\leq\sup_{\omega\in\Omega}\log M(X(% \omega)/Y(\omega))+\sup_{\omega\in\Omega}\log M(Y(\omega)/X(\omega)).italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( blackboard_E [ italic_X ] , blackboard_E [ italic_Y ] ) ≤ roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT roman_log italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) + roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT roman_log italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) .

Since X(ω),Y(ω)Δ([d])𝑋𝜔𝑌𝜔Δdelimited-[]𝑑X(\omega),Y(\omega)\in\Delta([d])italic_X ( italic_ω ) , italic_Y ( italic_ω ) ∈ roman_Δ ( [ italic_d ] ), we have

M(X(ω)/Y(ω))1andM(Y(ω)/X(ω))1.formulae-sequence𝑀𝑋𝜔𝑌𝜔1and𝑀𝑌𝜔𝑋𝜔1M(X(\omega)/Y(\omega))\geq 1\quad\text{and}\quad M(Y(\omega)/X(\omega))\geq 1.italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) ≥ 1 and italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) ≥ 1 .

This implies that

M(X(ω)/Y(ω))M(X(ω)/Y(ω))M(Y(ω)/X(ω)),M(Y(ω)/X(ω))M(X(ω)/Y(ω))M(Y(ω)/X(ω)),formulae-sequence𝑀𝑋𝜔𝑌𝜔𝑀𝑋𝜔𝑌𝜔𝑀𝑌𝜔𝑋𝜔𝑀𝑌𝜔𝑋𝜔𝑀𝑋𝜔𝑌𝜔𝑀𝑌𝜔𝑋𝜔\begin{split}M(X(\omega)/Y(\omega))&\leq M(X(\omega)/Y(\omega))\cdot M(Y(% \omega)/X(\omega)),\\ M(Y(\omega)/X(\omega))&\leq M(X(\omega)/Y(\omega))\cdot M(Y(\omega)/X(\omega))% ,\end{split}start_ROW start_CELL italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) end_CELL start_CELL ≤ italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) ⋅ italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) , end_CELL end_ROW start_ROW start_CELL italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) end_CELL start_CELL ≤ italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ) ⋅ italic_M ( italic_Y ( italic_ω ) / italic_X ( italic_ω ) ) , end_CELL end_ROW

and hence

dH(𝔼[X],𝔼[Y])supωΩdH(X(ω),Y(ω))+supωΩdH(X(ω),Y(ω))=2supωΩdH(X(ω),Y(ω)),subscript𝑑H𝔼delimited-[]𝑋𝔼delimited-[]𝑌subscriptsupremum𝜔Ωsubscript𝑑H𝑋𝜔𝑌𝜔subscriptsupremum𝜔Ωsubscript𝑑H𝑋𝜔𝑌𝜔2subscriptsupremum𝜔Ωsubscript𝑑H𝑋𝜔𝑌𝜔\begin{split}d_{\mathrm{H}}(\mathbb{E}[X],\mathbb{E}[Y])&\leq\sup_{\omega\in% \Omega}d_{\mathrm{H}}(X(\omega),Y(\omega))+\sup_{\omega\in\Omega}d_{\mathrm{H}% }(X(\omega),Y(\omega))\\ &=2\sup_{\omega\in\Omega}d_{\mathrm{H}}(X(\omega),Y(\omega)),\end{split}start_ROW start_CELL italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( blackboard_E [ italic_X ] , blackboard_E [ italic_Y ] ) end_CELL start_CELL ≤ roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_X ( italic_ω ) , italic_Y ( italic_ω ) ) + roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_X ( italic_ω ) , italic_Y ( italic_ω ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = 2 roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_X ( italic_ω ) , italic_Y ( italic_ω ) ) , end_CELL end_ROW

which completes the proof.

A.3 Proof of Lemma 16

Let M¯:=supωΩM(X(ω)/Y(ω))assign¯𝑀subscriptsupremum𝜔Ω𝑀𝑋𝜔𝑌𝜔\overline{M}:=\sup_{\omega\in\Omega}M(X(\omega)/Y(\omega))over¯ start_ARG italic_M end_ARG := roman_sup start_POSTSUBSCRIPT italic_ω ∈ roman_Ω end_POSTSUBSCRIPT italic_M ( italic_X ( italic_ω ) / italic_Y ( italic_ω ) ). We have

M¯𝔼[Y]=𝔼[M¯Y]𝔼[M(X/Y)Y]𝔼[X].¯𝑀𝔼delimited-[]𝑌𝔼delimited-[]¯𝑀𝑌𝔼delimited-[]𝑀𝑋𝑌𝑌𝔼delimited-[]𝑋\overline{M}\mathbb{E}[Y]=\mathbb{E}[\overline{M}Y]\geq\mathbb{E}[M(X/Y)Y]\geq% \mathbb{E}[X].over¯ start_ARG italic_M end_ARG blackboard_E [ italic_Y ] = blackboard_E [ over¯ start_ARG italic_M end_ARG italic_Y ] ≥ blackboard_E [ italic_M ( italic_X / italic_Y ) italic_Y ] ≥ blackboard_E [ italic_X ] .

The lemma follows from the definition of M(𝔼[X]/𝔼[Y])𝑀𝔼delimited-[]𝑋𝔼delimited-[]𝑌M(\mathbb{E}[X]/\mathbb{E}[Y])italic_M ( blackboard_E [ italic_X ] / blackboard_E [ italic_Y ] ).

A.4 Proof of Corollary 14

For both (i) and (ii), by Lemma 12 and Theorem 13, we have

dH(xt+1,x)=dH(Uα(yt),Uα(y))γ~dH(yt,y),subscript𝑑Hsubscript𝑥𝑡1superscript𝑥subscript𝑑Hsubscript𝑈𝛼subscript𝑦𝑡subscript𝑈𝛼superscript𝑦~𝛾subscript𝑑Hsubscript𝑦𝑡superscript𝑦d_{\mathrm{H}}(x_{t+1},x^{\star})=d_{\mathrm{H}}(U_{\alpha}(y_{t}),U_{\alpha}(% y^{\star}))\leq\tilde{\gamma}\cdot d_{\mathrm{H}}(y_{t},y^{\star}),italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ≤ over~ start_ARG italic_γ end_ARG ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ,

and

dH(yt+1,y)=dH(Vα(xt+1),Vα(x))γ~dH(xt+1,x),subscript𝑑Hsubscript𝑦𝑡1superscript𝑦subscript𝑑Hsubscript𝑉𝛼subscript𝑥𝑡1subscript𝑉𝛼superscript𝑥~𝛾subscript𝑑Hsubscript𝑥𝑡1superscript𝑥d_{\mathrm{H}}(y_{t+1},y^{\star})=d_{\mathrm{H}}(V_{\alpha}(x_{t+1}),V_{\alpha% }(x^{\star}))\leq\tilde{\gamma}\cdot d_{\mathrm{H}}(x_{t+1},x^{\star}),italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) , italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ) ≤ over~ start_ARG italic_γ end_ARG ⋅ italic_d start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ,

where γ~=γ~𝛾superscript𝛾\tilde{\gamma}=\gamma^{\prime}over~ start_ARG italic_γ end_ARG = italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for (i) and γ~=γ′′~𝛾superscript𝛾′′\tilde{\gamma}=\gamma^{\prime\prime}over~ start_ARG italic_γ end_ARG = italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT for (ii). The corollary follows by applying the above two inequalities alternatively.

A.5 Lemma 17

We prove the following lemma.

Lemma 17.

For α[0,1)(1,)𝛼011\alpha\in[0,1)\cup(1,\infty)italic_α ∈ [ 0 , 1 ) ∪ ( 1 , ∞ ), let (x,y)superscript𝑥superscript𝑦(x^{\star},y^{\star})( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) be a minimizer of the optimization problem (3) and assume p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT. Then, we have xriΔ([m])𝑥riΔdelimited-[]𝑚x\in\operatorname{ri}\Delta([m])italic_x ∈ roman_ri roman_Δ ( [ italic_m ] ) and yriΔ([n])𝑦riΔdelimited-[]𝑛y\in\operatorname{ri}\Delta([n])italic_y ∈ roman_ri roman_Δ ( [ italic_n ] ).

By Lemma 12, we have

x=Uα(y)=(pα(y)1α)1/α(pα(y)1α)1/α1.superscript𝑥subscript𝑈𝛼superscript𝑦superscriptsuperscript𝑝𝛼superscriptsuperscript𝑦1𝛼1𝛼subscriptdelimited-∥∥superscriptsuperscript𝑝𝛼superscriptsuperscript𝑦1𝛼1𝛼1x^{\star}=U_{\alpha}(y^{\star})=\frac{\left(p^{\alpha}(y^{\star})^{1-\alpha}% \right)^{1/\alpha}}{\left\lVert\left(p^{\alpha}(y^{\star})^{1-\alpha}\right)^{% 1/\alpha}\right\rVert_{1}}.italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_U start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = divide start_ARG ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT end_ARG start_ARG ∥ ( italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG .

Since p++m×n𝑝superscriptsubscriptabsent𝑚𝑛p\in\mathbb{R}_{++}^{m\times n}italic_p ∈ blackboard_R start_POSTSUBSCRIPT + + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT and yΔ([n])superscript𝑦Δdelimited-[]𝑛y^{\star}\in\Delta([n])italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_Δ ( [ italic_n ] ), the vector pα(y)1αsuperscript𝑝𝛼superscriptsuperscript𝑦1𝛼p^{\alpha}(y^{\star})^{1-\alpha}italic_p start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 - italic_α end_POSTSUPERSCRIPT is entry-wise strictly positive. This implies that xsuperscript𝑥x^{\star}italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is entry-wise strictly positive, and hence xriΔ([m])superscript𝑥riΔdelimited-[]𝑚x^{\star}\in\operatorname{ri}\Delta([m])italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_ri roman_Δ ( [ italic_m ] ). To show yriΔ([n])superscript𝑦riΔdelimited-[]𝑛y^{\star}\in\operatorname{ri}\Delta([n])italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_ri roman_Δ ( [ italic_n ] ), consider the equation y=Vα(x)superscript𝑦subscript𝑉𝛼superscript𝑥y^{\star}=V_{\alpha}(x^{\star})italic_y start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) and apply the same argument. This completes the proof.