Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Improved Channel Coding Performance Through Cost Variability

Adeel Mahmood and Aaron B. Wagner
School of Electrical and Computer Engineering, Cornell University
Abstract

Channel coding for discrete memoryless channels (DMCs) with mean and variance cost constraints has been recently introduced. We show that there is an improvement in coding performance due to cost variability, both with and without feedback. We demonstrate this improvement over the traditional almost-sure cost constraint (also called the peak-power constraint) that prohibits any cost variation above a fixed threshold. Our result simultaneously shows that feedback does not improve the second-order coding rate of simple-dispersion DMCs under the peak-power constraint. This finding parallels similar results for unconstrained simple-dispersion DMCs, additive white Gaussian noise (AWGN) channels and parallel Gaussian channels.

Index Terms:
Channel coding, feedback communications, second-order coding rate, stochastic control.

I Introduction

Channel coding is a fundamental problem focused on the reliable transmission of information over a noisy channel. Information transmission with arbitrarily small error probability is possible at all rates below the capacity C𝐶Citalic_C of the channel, if the number n𝑛nitalic_n of channel uses (also called the blocklength) is permitted to grow without bound [1]. At finite blocklengths, there is an unavoidable backoff from capacity due to the random nature of the channel. The second-order coding rate (SOCR) ([2, 3, 4, 5, 6]) quantifies the O(n1/2)𝑂superscript𝑛12O(n^{-1/2})italic_O ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) convergence to the capacity.

In many practical scenarios, the channel input is subject to some cost constraints which limit the amount of resources that can be used for transmission. With a cost constraint present, the role of capacity is replaced by the capacity-cost function [1, Theorem 6.11]. One common form of the cost constraint is the almost-sure (a.s.) cost constraint ([3, 7]) which bounds the time-average cost of the channel input Xnsuperscript𝑋𝑛X^{n}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over all messages, realizations of any side randomness, channel noise (if there is feedback), etc.:

1ni=1nc(Xi)Γalmost surely,1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖Γalmost surely,\displaystyle\frac{1}{n}\sum_{i=1}^{n}c(X_{i})\leq\Gamma\quad\text{almost % surely,}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ roman_Γ almost surely, (1)

where c()𝑐c(\cdot)italic_c ( ⋅ ) is the cost function. Under the almost-sure (a.s.) cost constraint, the optimal first-order coding rate is the capacity-cost function, the strong converse holds [1, Theorem 6.11], and the optimal SOCR is also known [3, Theorem 3].

The a.s. cost constraint is quite unforgiving, never allowing the cost to exceed the threshold under any circumstances. Our first result (Theorem 1) shows that the SOCR can be strictly improved by merely allowing the cost to fluctuate above the threshold in a manner consistent with a noise process, i.e., the fluctuations have a variance of O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ). Our second result (Theorem 2) shows that the a.s. cost framework does not allow feedback improvement to SOCR for simple-dispersion111See Definition 1. DMCs. This again contrasts with the scenario where random fluctuations with variance O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ) above the threshold are allowed, as shown in [8, Theorem 3]. This highlights the important role cost variability plays in enabling feedback mechanisms to improve coding performance.

These findings raise the question of whether it is necessary to impose a constraint as stringent as (1). Cost constraints in communication systems are typically imposed to achieve goals such as operating circuitry in the linear regime, minimizing power consumption, and reducing interference with other terminals. It is worth noting that these goals do not always necessitate the use of the strict a.s. cost constraint. For example, the expected cost constraint is often used in wireless communication literature (see, e.g., [9, 10, 11, 12]) because it allows for a dynamic allocation of power based on the current channel state. The expected cost constraint bounds the cost averaged over time and the ensemble:

𝔼[1ni=1nc(Xi)]Γ.𝔼delimited-[]1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖Γ\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}c(X_{i})\right]\leq\Gamma.blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ≤ roman_Γ . (2)

Yet, if the a.s. constraint is too strict, the expectation constraint is arguably too weak. The expectation constraint allows highly non-ergodic use of power, as shown in Section II-A, which is problematic both from the vantage points of operating circuitry in the linear regime and interference management.

The O(1/n)𝑂1𝑛O(1/n)italic_O ( 1 / italic_n ) variance allowance is a feature of a new cost formulation, referred to as mean and variance cost constraints in [8]. This formulation replaces (1) with the following conditions:

𝔼[1ni=1nc(Xi)]𝔼delimited-[]1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}c(X_{i})\right]blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] Γ,absentΓ\displaystyle\leq\Gamma,≤ roman_Γ , (3)
Var(1ni=1nc(Xi))Var1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖\displaystyle\text{Var}\left(\frac{1}{n}\sum_{i=1}^{n}c(X_{i})\right)Var ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) Vn.absent𝑉𝑛\displaystyle\leq\frac{V}{n}.≤ divide start_ARG italic_V end_ARG start_ARG italic_n end_ARG . (4)

The mean and variance cost constraints were introduced as a relaxed version of the a.s. cost constraint that permits a small amount of stochastic fluctuation above the threshold ΓΓ\Gammaroman_Γ while providing an ergodicity guarantee. Consider a random channel codebook whose codewords satisfy (3)3(\ref{exp01})( ) with equality. For a given input xnsuperscript𝑥𝑛x^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, define an ergodicity metric msubscript𝑚\mathcal{E}_{m}caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT as

m(xn):=max(1ni=1nc(xi)Γ,0)Γ.assignsubscript𝑚superscript𝑥𝑛1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑥𝑖Γ0Γ\displaystyle\mathcal{E}_{m}(x^{n}):=\frac{\max\left(\frac{1}{n}\sum_{i=1}^{n}% c(x_{i})-\Gamma,0\right)}{\Gamma}.caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) := divide start_ARG roman_max ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - roman_Γ , 0 ) end_ARG start_ARG roman_Γ end_ARG . (5)

The definition in (5)5(\ref{h23})( ) only penalizes cost variation above the threshold and normalizes by the mean cost ΓΓ\Gammaroman_Γ. Let α>0𝛼0\alpha>0italic_α > 0 be the desired ergodicity parameter. We say that a transmission xnsuperscript𝑥𝑛x^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is α𝛼\alphaitalic_α-ergodic if m(xn)αsubscript𝑚superscript𝑥𝑛𝛼\mathcal{E}_{m}(x^{n})\leq\alphacaligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_α. Let β𝛽\betaitalic_β be the desired uncertainty parameter. We say that a random codebook is (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β )-ergodic if (m(Xn)α)1βsubscript𝑚superscript𝑋𝑛𝛼1𝛽\mathbb{P}\left(\mathcal{E}_{m}(X^{n})\leq\alpha\right)\geq 1-\betablackboard_P ( caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_α ) ≥ 1 - italic_β, where Xnsuperscript𝑋𝑛X^{n}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is a random transmission from the codebook.

Under the mean and variance cost formulation, we have (m(Xn)α)1βsubscript𝑚superscript𝑋𝑛𝛼1𝛽\mathbb{P}\left(\mathcal{E}_{m}(X^{n})\leq\alpha\right)\geq 1-\betablackboard_P ( caligraphic_E start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_α ) ≥ 1 - italic_β if

nnc:=Vβα2Γ2,𝑛subscript𝑛𝑐assign𝑉𝛽superscript𝛼2superscriptΓ2\displaystyle n\geq n_{c}:=\frac{V}{\beta\alpha^{2}\Gamma^{2}},italic_n ≥ italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT := divide start_ARG italic_V end_ARG start_ARG italic_β italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (6)

where we call ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT the critical blocklength. Thus, the critical blocklength specifies the minimum blocklength of a channel code for which transmission behaves ergodically with high probability. For fixed α𝛼\alphaitalic_α, β𝛽\betaitalic_β, and ΓΓ\Gammaroman_Γ, the parameters ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and V𝑉Vitalic_V are in one-to-one correspondence, so one can view the choice of V𝑉Vitalic_V in (4) as specifying the critical blocklength. Note that with an expectation-only constraint, we effectively have V=𝑉V=\inftyitalic_V = ∞, so the transmission is not guaranteed to be ergodic at any blocklength. Furthermore, unlike the expected cost constraint, the mean and variance cost formulation:

  • allows for a strong converse [13, Theorem 77], [8],

  • allows for a finite second-order coding rate [8],

  • does not allow blasting power on errors in the feedback case [14].

The results of this paper also have significance in the context of previous works. Our result in Theorem 2 extends the previously known result that feedback does not improve the second-order performance for simple-dispersion DMCs without cost constraints [4]. It is also similar to the result in [15] that feedback does not improve the second-order performance for AWGN channels.

Random channel coding schemes often use independent and identically distributed (i.i.d.) codewords. It was noted in [16] that the a.s. cost constraint, which is the most commonly considered cost constraint in the context of discrete memoryless channels (DMCs), prohibits the use of i.i.d. codewords. It was shown in [16] that a feedback scheme that uses both i.i.d. and constant-composition codewords leads to an improved SOCR compared to the best non-feedback SOCR achievable under the a.s. cost constraint. Our result in Theorem 2 strengthens the result in [16] by showing that the aforementioned improvement also holds compared to the best feedback SOCR achievable under the a.s. cost constraint.

I-A Related Work

The second- and third-order asymptotics for DMCs with the a.s. cost constraint in the non-feedback setting have been characterized in [3] and [17], respectively. The second-order asymptotics in the feedback setting of DMCs that are not simple-dispersion are studied in [4] without cost constraints. There are more feedback results available for AWGN channels compared to DMCs under the a.s. cost constraint. For example, the result in [15] also addresses the third-order performance with feedback while [18] gives the result that feedback does not improve the second-order performance for parallel Gaussian channels. The second-order performance for the AWGN channel with an expected cost constraint is characterized in [19]. Table I summarizes these results across different settings in channel coding.

Paper Channel Performance Cost Constraint Feedback Non-feedback
Hayashi [3] DMC, AWGN 2nd order a.s. No Yes
Tan and Tomamichel [20] AWGN 3rd order a.s. No Yes
Kostina and Verdú [17] DMC 3rd order a.s. No Yes
Fong and Tan [15] AWGN 2nd and 3rd order a.s. Yes No
Wagner, Shende and Altuğ [4] DMC 2nd order none Yes No
Mahmood and Wagner [8] DMC 2nd order mean and variance Yes Yes
This paper DMC 2nd order mean and variance, a.s. Yes Yes
Polyanskiy [13, Th. 78] Parallel AWGN 2nd order a.s. No Yes
Fong and Tan [18] Parallel AWGN 2nd order a.s. Yes No
Polyanskiy [13, Th. 77] AWGN 1st order expected cost No Yes
Yang et al. [19] AWGN 2nd order expected cost No Yes
TABLE I: Relevant results across different settings in channel coding.

Our proof technique for Theorem 2 is more closely aligned with that used in [4] for DMCs than in [15] for AWGN channels. Both proofs show converse bounds with feedback that match the previously known non-feedback achievability results for DMCs and AWGN channels, respectively. A common technique used in both converse proofs is a result from binary hypothesis testing, which is used in the derivation of Lemma 1 in our paper and a similar result in [15, (17)]. We then proceed with the proof by using a Berry-Esseen-type result for bounded martingale difference sequences whereas [15] uses the usual Berry-Esseen theorem by first showing equality in distribution of the information density with a sum of i.i.d. random variables.

II Preliminaries

Let 𝒜𝒜\mathcal{A}caligraphic_A and \mathcal{B}caligraphic_B be finite input and output alphabets, respectively, of the DMC W𝑊Witalic_W, where W𝑊Witalic_W is a stochastic matrix from 𝒜𝒜\mathcal{A}caligraphic_A to \mathcal{B}caligraphic_B. For a given sequence xn𝒜nsuperscript𝑥𝑛superscript𝒜𝑛x^{n}\in\mathcal{A}^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the n𝑛nitalic_n-type t=t(xn)𝑡𝑡superscript𝑥𝑛t=t(x^{n})italic_t = italic_t ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) of xnsuperscript𝑥𝑛x^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is defined as

t(a)𝑡𝑎\displaystyle t(a)italic_t ( italic_a ) =1ni=1n𝟙(xi=a)absent1𝑛superscriptsubscript𝑖1𝑛1subscript𝑥𝑖𝑎\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\mathds{1}(x_{i}=a)= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a )

for all a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A, where 𝟙(.)\mathds{1}(.)blackboard_1 ( . ) is the standard indicator function. For a given sequence xn𝒜nsuperscript𝑥𝑛superscript𝒜𝑛x^{n}\in\mathcal{A}^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we will use t(xn)𝑡superscript𝑥𝑛t(x^{n})italic_t ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) or Pxnsubscript𝑃superscript𝑥𝑛P_{x^{n}}italic_P start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to denote its type. Let 𝒫n(𝒜)subscript𝒫𝑛𝒜\mathcal{P}_{n}(\mathcal{A})caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) be the set of n𝑛nitalic_n-types on 𝒜𝒜\mathcal{A}caligraphic_A. For a given t𝒫n(𝒜)𝑡subscript𝒫𝑛𝒜t\in\mathcal{P}_{n}(\mathcal{A})italic_t ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ), T𝒜n(t)subscriptsuperscript𝑇𝑛𝒜𝑡T^{n}_{\mathcal{A}}(t)italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) denotes the type class, i.e., the set of sequences xn𝒜nsuperscript𝑥𝑛superscript𝒜𝑛x^{n}\in\mathcal{A}^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with empirical distribution equal to t𝑡titalic_t. For a random variable Z𝑍Zitalic_Z, Zsubscriptnorm𝑍||Z||_{\infty}| | italic_Z | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT denotes its essential supremum (that is, the infimum of those numbers z𝑧zitalic_z such that (Zz)=1𝑍𝑧1\mathbb{P}(Z\leq z)=1blackboard_P ( italic_Z ≤ italic_z ) = 1). We will write log\logroman_log to denote logarithm to the base e𝑒eitalic_e and exp(x)𝑥\exp(x)roman_exp ( italic_x ) to denote e𝑒eitalic_e to the power of x𝑥xitalic_x. The cost function is denoted by c()𝑐c(\cdot)italic_c ( ⋅ ) where c:𝒜[0,cmax]:𝑐𝒜0subscript𝑐c:\mathcal{A}\to[0,c_{\max}]italic_c : caligraphic_A → [ 0 , italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] and cmax>0subscript𝑐0c_{\max}>0italic_c start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT > 0 is a constant. Let Γ0=mina𝒜c(a)subscriptΓ0subscript𝑎𝒜𝑐𝑎\Gamma_{0}=\min_{a\in\mathcal{A}}c(a)roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_c ( italic_a ). Let ΓsuperscriptΓ\Gamma^{*}roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denote the smallest ΓΓ\Gammaroman_Γ such that the capacity-cost function C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) is equal to the unconstrained capacity. We assume Γ>Γ0superscriptΓsubscriptΓ0\Gamma^{*}>\Gamma_{0}roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) throughout the paper. For Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), the capacity-cost function is defined as

C(Γ)𝐶Γ\displaystyle C(\Gamma)italic_C ( roman_Γ ) =maxP𝒫(𝒜)c(P)ΓI(P,W),absentsubscript𝑃𝒫𝒜𝑐𝑃Γ𝐼𝑃𝑊\displaystyle=\max_{\begin{subarray}{c}P\in\mathcal{P}(\mathcal{A})\\ c(P)\leq\Gamma\end{subarray}}I(P,W),= roman_max start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_P ∈ caligraphic_P ( caligraphic_A ) end_CELL end_ROW start_ROW start_CELL italic_c ( italic_P ) ≤ roman_Γ end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_I ( italic_P , italic_W ) , (7)

where c(P):=a𝒜P(a)c(a)assign𝑐𝑃subscript𝑎𝒜𝑃𝑎𝑐𝑎c(P):=\sum_{a\in\mathcal{A}}P(a)c(a)italic_c ( italic_P ) := ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_P ( italic_a ) italic_c ( italic_a ). The function C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) is strictly increasing and differentiable [1, Problem 8.4] in the interval (Γ0,Γ)subscriptΓ0superscriptΓ(\Gamma_{0},\Gamma^{*})( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). For a given xn𝒜nsuperscript𝑥𝑛superscript𝒜𝑛x^{n}\in\mathcal{A}^{n}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we define

c(xn):=1ni=1nc(xi).assign𝑐superscript𝑥𝑛1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑥𝑖\displaystyle c(x^{n}):=\frac{1}{n}\sum_{i=1}^{n}c(x_{i}).italic_c ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) := divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Let ΠW,ΓsuperscriptsubscriptΠ𝑊Γ\Pi_{W,\Gamma}^{*}roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the set of all capacity-cost-achieving distributions, i.e., the set of maximizing distributions in (7)7(\ref{main_form})( ). For any PΠW,Γsuperscript𝑃superscriptsubscriptΠ𝑊ΓP^{*}\in\Pi_{W,\Gamma}^{*}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, let Q=PWsuperscript𝑄superscript𝑃𝑊Q^{*}=P^{*}Witalic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_W be the marginal distribution on \mathcal{B}caligraphic_B. Note that the output distribution Qsuperscript𝑄Q^{*}italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is always unique, and without loss of generality, Qsuperscript𝑄Q^{*}italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be assumed to satisfy Q(b)>0superscript𝑄𝑏0Q^{*}(b)>0italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_b ) > 0 for all b𝑏b\in\mathcal{B}italic_b ∈ caligraphic_B [21, Corollaries 1 and 2 to Theorem 4.5.1].

The following definitions will remain in effect throughout the paper:

νasubscript𝜈𝑎\displaystyle\nu_{a}italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT :=Var(logW(Y|a)Q(Y)), where YW(|a),\displaystyle:=\text{Var}\left(\log\frac{W(Y|a)}{Q^{*}(Y)}\right),\quad\text{ % where }Y\sim W(\cdot|a),:= Var ( roman_log divide start_ARG italic_W ( italic_Y | italic_a ) end_ARG start_ARG italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_Y ) end_ARG ) , where italic_Y ∼ italic_W ( ⋅ | italic_a ) ,
νmaxsubscript𝜈\displaystyle\nu_{\max}italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT :=maxa𝒜νa,assignabsentsubscript𝑎𝒜subscript𝜈𝑎\displaystyle:=\max_{a\in\mathcal{A}}\nu_{a},:= roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ,
i(a,b)𝑖𝑎𝑏\displaystyle i(a,b)italic_i ( italic_a , italic_b ) :=logW(b|a)Q(b).assignabsent𝑊conditional𝑏𝑎superscript𝑄𝑏\displaystyle:=\log\frac{W(b|a)}{Q^{*}(b)}.:= roman_log divide start_ARG italic_W ( italic_b | italic_a ) end_ARG start_ARG italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_b ) end_ARG .
Definition 1 (cf. [4])

A DMC W𝑊Witalic_W is called simple-dispersion at the cost Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) if

minPΠW,Γa𝒜P(a)νa=maxPΠW,Γa𝒜P(a)νa.subscriptsuperscript𝑃superscriptsubscriptΠ𝑊Γsubscript𝑎𝒜superscript𝑃𝑎subscript𝜈𝑎subscriptsuperscript𝑃superscriptsubscriptΠ𝑊Γsubscript𝑎𝒜superscript𝑃𝑎subscript𝜈𝑎\displaystyle\min_{P^{*}\in\Pi_{W,\Gamma}^{*}}\sum_{a\in\mathcal{A}}P^{*}(a)% \nu_{a}=\max_{P^{*}\in\Pi_{W,\Gamma}^{*}}\sum_{a\in\mathcal{A}}P^{*}(a)\nu_{a}.roman_min start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_a ) italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_a ) italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT .

We will only focus on simple-dispersion channels for a fixed cost Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and thus define

V(Γ):=a𝒜P(a)νaassign𝑉Γsubscript𝑎𝒜superscript𝑃𝑎subscript𝜈𝑎V(\Gamma):=\sum_{a\in\mathcal{A}}P^{*}(a)\nu_{a}italic_V ( roman_Γ ) := ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_a ) italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT

for any PΠW,Γsuperscript𝑃superscriptsubscriptΠ𝑊ΓP^{*}\in\Pi_{W,\Gamma}^{*}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

With a blocklength n𝑛nitalic_n and a fixed rate R>0𝑅0R>0italic_R > 0, let R={1,,exp(nR)}subscript𝑅1𝑛𝑅\mathcal{M}_{R}=\{1,\ldots,\lceil\exp(nR)\rceil\}caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT = { 1 , … , ⌈ roman_exp ( italic_n italic_R ) ⌉ } denote the message set. Let MR𝑀subscript𝑅M\in\mathcal{M}_{R}italic_M ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT denote the random message drawn uniformly from the message set.

Definition 2

An (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code for a DMC consists of an encoder f𝑓fitalic_f which, for each message mR𝑚subscript𝑅m\in\mathcal{M}_{R}italic_m ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, chooses an input Xn=f(m)𝒜nsuperscript𝑋𝑛𝑓𝑚superscript𝒜𝑛X^{n}=f(m)\in\mathcal{A}^{n}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = italic_f ( italic_m ) ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and a decoder g𝑔gitalic_g which maps the output Ynsuperscript𝑌𝑛Y^{n}italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to m^R^𝑚subscript𝑅\hat{m}\in\mathcal{M}_{R}over^ start_ARG italic_m end_ARG ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. The code (f,g)𝑓𝑔(f,g)( italic_f , italic_g ) is random if f𝑓fitalic_f or g𝑔gitalic_g is random.

Definition 3

An (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code with ideal feedback for a DMC consists of an encoder f𝑓fitalic_f which, at each time instant k𝑘kitalic_k (1kn1𝑘𝑛1\leq k\leq n1 ≤ italic_k ≤ italic_n) and for each message mR𝑚subscript𝑅m\in\mathcal{M}_{R}italic_m ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT, chooses an input xk=f(m,xk1,yk1)𝒜subscript𝑥𝑘𝑓𝑚superscript𝑥𝑘1superscript𝑦𝑘1𝒜x_{k}=f(m,x^{k-1},y^{k-1})\in\mathcal{A}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_f ( italic_m , italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ∈ caligraphic_A, and a decoder g𝑔gitalic_g which maps the output ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to m^R^𝑚subscript𝑅\hat{m}\in\mathcal{M}_{R}over^ start_ARG italic_m end_ARG ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. The code (f,g)𝑓𝑔(f,g)( italic_f , italic_g ) is random if f𝑓fitalic_f or g𝑔gitalic_g is random.

Definition 4

An (n,R,Γ)𝑛𝑅Γ(n,R,\Gamma)( italic_n , italic_R , roman_Γ ) code for a DMC is an (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code such that c(Xn)Γ𝑐superscript𝑋𝑛Γc(X^{n})\leq\Gammaitalic_c ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ almost surely, where the message MUnif(R)similar-to𝑀Unifsubscript𝑅M\sim\text{Unif}(\mathcal{M}_{R})italic_M ∼ Unif ( caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) has a uniform distribution over the message set Rsubscript𝑅\mathcal{M}_{R}caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT.

Definition 5

An (n,R,Γ)𝑛𝑅Γ(n,R,\Gamma)( italic_n , italic_R , roman_Γ ) code with ideal feedback for a DMC is an (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code with ideal feedback such that c(Xn)Γ𝑐superscript𝑋𝑛Γc(X^{n})\leq\Gammaitalic_c ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ almost surely, where the message MUnif(R)similar-to𝑀Unifsubscript𝑅M\sim\text{Unif}(\mathcal{M}_{R})italic_M ∼ Unif ( caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) has a uniform distribution over the message set Rsubscript𝑅\mathcal{M}_{R}caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT.

Definition 6

An (n,R,Γ,V)𝑛𝑅Γ𝑉(n,R,\Gamma,V)( italic_n , italic_R , roman_Γ , italic_V ) code for a DMC is an (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code such that 𝔼[i=1nc(Xi)]nΓ𝔼delimited-[]superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖𝑛Γ\mathbb{E}\left[\sum_{i=1}^{n}c(X_{i})\right]\leq n\Gammablackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ≤ italic_n roman_Γ and Var(i=1nc(Xi))nVVarsuperscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖𝑛𝑉\text{Var}\left(\sum_{i=1}^{n}c(X_{i})\right)\leq nVVar ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≤ italic_n italic_V, where the message MUnif(R)similar-to𝑀Unifsubscript𝑅M\sim\text{Unif}(\mathcal{M}_{R})italic_M ∼ Unif ( caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) has a uniform distribution over the message set Rsubscript𝑅\mathcal{M}_{R}caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT.

Definition 7

An (n,R,Γ,V)𝑛𝑅Γ𝑉(n,R,\Gamma,V)( italic_n , italic_R , roman_Γ , italic_V ) code with ideal feedback for a DMC is an (n,R)𝑛𝑅(n,R)( italic_n , italic_R ) code with ideal feedback such that 𝔼[i=1nc(Xi)]nΓ𝔼delimited-[]superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖𝑛Γ\mathbb{E}\left[\sum_{i=1}^{n}c(X_{i})\right]\leq n\Gammablackboard_E [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ≤ italic_n roman_Γ and Var(i=1nc(Xi))nVVarsuperscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖𝑛𝑉\text{Var}\left(\sum_{i=1}^{n}c(X_{i})\right)\leq nVVar ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≤ italic_n italic_V, where the message MUnif(R)similar-to𝑀Unifsubscript𝑅M\sim\text{Unif}(\mathcal{M}_{R})italic_M ∼ Unif ( caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ) has a uniform distribution over the message set Rsubscript𝑅\mathcal{M}_{R}caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT.

Given ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ), define

Mfb(n,ϵ,Γ):=max{exp(nR):P¯e,fb(n,R,Γ)ϵ},assignsubscriptsuperscript𝑀fb𝑛italic-ϵΓ:𝑛𝑅subscript¯𝑃e,fb𝑛𝑅Γitalic-ϵ\displaystyle M^{*}_{\text{fb}}(n,\epsilon,\Gamma):=\max\{\lceil\exp(nR)\rceil% :\bar{P}_{\text{e,fb}}(n,R,\Gamma)\leq\epsilon\},italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) := roman_max { ⌈ roman_exp ( italic_n italic_R ) ⌉ : over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT e,fb end_POSTSUBSCRIPT ( italic_n , italic_R , roman_Γ ) ≤ italic_ϵ } ,

where P¯e,fb(n,R,Γ)subscript¯𝑃e,fb𝑛𝑅Γ\bar{P}_{\text{e,fb}}(n,R,\Gamma)over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT e,fb end_POSTSUBSCRIPT ( italic_n , italic_R , roman_Γ ) denotes the minimum average error probability attainable by any random (n,R,Γ)𝑛𝑅Γ(n,R,\Gamma)( italic_n , italic_R , roman_Γ ) code with feedback. Similarly, define

M(n,ϵ,Γ):=max{exp(nR):P¯e(n,R,Γ)ϵ},assignsuperscript𝑀𝑛italic-ϵΓ:𝑛𝑅subscript¯𝑃e𝑛𝑅Γitalic-ϵ\displaystyle M^{*}(n,\epsilon,\Gamma):=\max\{\lceil\exp(nR)\rceil:\bar{P}_{% \text{e}}(n,R,\Gamma)\leq\epsilon\},italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ ) := roman_max { ⌈ roman_exp ( italic_n italic_R ) ⌉ : over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT e end_POSTSUBSCRIPT ( italic_n , italic_R , roman_Γ ) ≤ italic_ϵ } ,

where P¯e(n,R,Γ)subscript¯𝑃e𝑛𝑅Γ\bar{P}_{\text{e}}(n,R,\Gamma)over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT e end_POSTSUBSCRIPT ( italic_n , italic_R , roman_Γ ) denotes the minimum average error probability attainable by any random (n,R,Γ)𝑛𝑅Γ(n,R,\Gamma)( italic_n , italic_R , roman_Γ ) code without feedback. Define Mfb(n,ϵ,Γ,V)subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑉M^{*}_{\text{fb}}(n,\epsilon,\Gamma,V)italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ , italic_V ) and M(n,ϵ,Γ,V)superscript𝑀𝑛italic-ϵΓ𝑉M^{*}(n,\epsilon,\Gamma,V)italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ , italic_V ) similarly for codes with mean and variance cost constraints.

II-A Expectation-only cost constraint

Under this cost formulation, the average cost of the codewords is constrained in expectation only:

𝔼[1ni=1nc(Xi)]𝔼delimited-[]1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}c(X_{i})\right]blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] Γ.absentΓ\displaystyle\leq\Gamma.≤ roman_Γ . (8)

We now illustrate a codebook construction (adapted from [17]) with an average error probability at most ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) that meets the cost threshold ΓΓ\Gammaroman_Γ according to (8)8(\ref{exp_cost})( ) but the cost of its codewords is non-ergodic, i.e., 1ni=1nc(Xi)1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖\frac{1}{n}\sum_{i=1}^{n}c(X_{i})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) does not converge to ΓΓ\Gammaroman_Γ. Consider a codebook 𝒞nsubscript𝒞𝑛\mathcal{C}_{n}caligraphic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT with rate C(Γ)<R<C(Γ1ϵ)𝐶Γ𝑅𝐶Γ1italic-ϵC(\Gamma)<R<C\left(\frac{\Gamma}{1-\epsilon}\right)italic_C ( roman_Γ ) < italic_R < italic_C ( divide start_ARG roman_Γ end_ARG start_ARG 1 - italic_ϵ end_ARG ) whose average error probability ϵn0subscriptitalic-ϵ𝑛0\epsilon_{n}\to 0italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → 0 and each of whose codewords has average cost equal to Γ1ϵΓ1italic-ϵ\frac{\Gamma}{1-\epsilon}divide start_ARG roman_Γ end_ARG start_ARG 1 - italic_ϵ end_ARG. Such a codebook exists because R<C(Γ1ϵ)𝑅𝐶Γ1italic-ϵR<C\left(\frac{\Gamma}{1-\epsilon}\right)italic_R < italic_C ( divide start_ARG roman_Γ end_ARG start_ARG 1 - italic_ϵ end_ARG ). Assuming Γ0=mina𝒜c(a)=0subscriptΓ0subscript𝑎𝒜𝑐𝑎0\Gamma_{0}=\min_{a\in\mathcal{A}}c(a)=0roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_c ( italic_a ) = 0 without loss of generality, one could modify the codebook 𝒞nsubscript𝒞𝑛\mathcal{C}_{n}caligraphic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT by replacing an ϵitalic-ϵ\epsilonitalic_ϵ-fraction of its codewords with the all-zero codeword. The modified codebook 𝒞nsuperscriptsubscript𝒞𝑛\mathcal{C}_{n}^{\prime}caligraphic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT has average error probability at most ϵnϵsuperscriptsubscriptitalic-ϵ𝑛italic-ϵ\epsilon_{n}^{\prime}\to\epsilonitalic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → italic_ϵ and meets the cost threshold ΓΓ\Gammaroman_Γ according to (8)8(\ref{exp_cost})( ). But 1ni=1nc(Xi)1𝑛superscriptsubscript𝑖1𝑛𝑐subscript𝑋𝑖\frac{1}{n}\sum_{i=1}^{n}c(X_{i})divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is either 00 or Γ1ϵΓ1italic-ϵ\frac{\Gamma}{1-\epsilon}divide start_ARG roman_Γ end_ARG start_ARG 1 - italic_ϵ end_ARG. This construction also shows that the strong converse does not hold under the expected cost constraint.

The mean and variance cost constraints ensure that the average cost of the codewords concentrate around the cost threshold ΓΓ\Gammaroman_Γ, thereby disallowing codebook constructions with irregular or non-ergodic power consumption.

III Main Results

We prove coding performance improvement in terms of the second-order coding rate, although equivalent results in terms of the average error probability improvement can also be shown as in [8, Theorems 1-3]. Let

  • ra.s.(ϵ,Γ)subscript𝑟a.s.italic-ϵΓr_{\text{a.s.}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) denote the optimal SOCR with the a.s. cost constraint without feedback,

  • ra.s.,fb(ϵ,Γ)subscript𝑟a.s.,fbitalic-ϵΓr_{\text{a.s.,fb}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) denote the optimal SOCR with the a.s. cost constraint with feedback,

  • rm.v.(ϵ,Γ,V)subscript𝑟m.v.italic-ϵΓ𝑉r_{\text{m.v.}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) denote the optimal SOCR with the mean and variance cost constraints without feedback and

  • rm.v.,fb(ϵ,Γ,V)subscript𝑟m.v.,fbitalic-ϵΓ𝑉r_{\text{m.v.,fb}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) denote the optimal SOCR with the mean and variance cost constraints with feedback

for channel codes operating with average error probability of at most ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ). In the non-feedback case, C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) is the optimal first-order rate for DMCs with the a.s. cost constraint [1, Theorem 6.11] as well as the mean and variance cost formulation [8, Theorems 1 and 2], i.e.,

limn1nlogM(n,ϵ,Γ)subscript𝑛1𝑛superscript𝑀𝑛italic-ϵΓ\displaystyle\lim_{n\to\infty}\frac{1}{n}\log M^{*}(n,\epsilon,\Gamma)roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ ) =C(Γ)absent𝐶Γ\displaystyle=C(\Gamma)= italic_C ( roman_Γ ) (9)
limn1nlogM(n,ϵ,Γ,V)subscript𝑛1𝑛superscript𝑀𝑛italic-ϵΓ𝑉\displaystyle\lim_{n\to\infty}\frac{1}{n}\log M^{*}(n,\epsilon,\Gamma,V)roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ , italic_V ) =C(Γ)absent𝐶Γ\displaystyle=C(\Gamma)= italic_C ( roman_Γ ) (10)

for all ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ). The results (9)9(\ref{str1})( ) and (10)10(\ref{str2})( ) imply that the strong converse holds. We thus define the second-order rates with respect to the capacity-cost function C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) as follows:

ra.s.(ϵ,Γ)subscript𝑟a.s.italic-ϵΓ\displaystyle r_{\text{a.s.}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) :=lim infnlogM(n,ϵ,Γ)nC(Γ)nassignabsentsubscriptlimit-infimum𝑛superscript𝑀𝑛italic-ϵΓ𝑛𝐶Γ𝑛\displaystyle:=\liminf_{n\to\infty}\frac{\log M^{*}(n,\epsilon,\Gamma)-nC(% \Gamma)}{\sqrt{n}}:= lim inf start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG
rm.v.(ϵ,Γ,V)subscript𝑟m.v.italic-ϵΓ𝑉\displaystyle r_{\text{m.v.}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) :=lim infnlogM(n,ϵ,Γ,V)nC(Γ)n.assignabsentsubscriptlimit-infimum𝑛superscript𝑀𝑛italic-ϵΓ𝑉𝑛𝐶Γ𝑛\displaystyle:=\liminf_{n\to\infty}\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(% \Gamma)}{\sqrt{n}}.:= lim inf start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_n , italic_ϵ , roman_Γ , italic_V ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG .

For the feedback case, we simply take the convention to define the SOCR with respect to C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) as follows:

ra.s.,fb(ϵ,Γ)subscript𝑟a.s.,fbitalic-ϵΓ\displaystyle r_{\text{a.s.,fb}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) :=lim infnlogMfb(n,ϵ,Γ)nC(Γ)nassignabsentsubscriptlimit-infimum𝑛subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑛𝐶Γ𝑛\displaystyle:=\liminf_{n\to\infty}\frac{\log M^{*}_{\text{fb}}(n,\epsilon,% \Gamma)-nC(\Gamma)}{\sqrt{n}}:= lim inf start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG
rm.v.,fb(ϵ,Γ,V)subscript𝑟m.v.,fbitalic-ϵΓ𝑉\displaystyle r_{\text{m.v.,fb}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) :=lim infnlogMfb(n,ϵ,Γ,V)nC(Γ)n.assignabsentsubscriptlimit-infimum𝑛subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑉𝑛𝐶Γ𝑛\displaystyle:=\liminf_{n\to\infty}\frac{\log M^{*}_{\text{fb}}(n,\epsilon,% \Gamma,V)-nC(\Gamma)}{\sqrt{n}}.:= lim inf start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ , italic_V ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG .

For the a.s. cost constraint, this convention is justified because from the result in Theorem 2, C(Γ)𝐶ΓC(\Gamma)italic_C ( roman_Γ ) is the optimal first-order rate, in the analogous sense to (9)9(\ref{str1})( ), for DMCs with feedback. For DMCs without cost constraints, Shannon [22, Theorem 6] showed that feedback does not increase the capacity.

III-A Performance improvement for non-feedback codes

From [3, Theorem 3], we have ra.s.(ϵ,Γ)=V(Γ)Φ1(ϵ)subscript𝑟a.s.italic-ϵΓ𝑉ΓsuperscriptΦ1italic-ϵr_{\text{a.s.}}(\epsilon,\Gamma)=\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon)italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) = square-root start_ARG italic_V ( roman_Γ ) end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ ) for a simple-dispersion222The result in [3, Theorem 3] is not restricted to simple-dispersion DMCs. DMC W𝑊Witalic_W. On the other hand, [8, Theorems 1 and 2] proved that

rm.v.(ϵ,Γ,V)=max{r:𝒦(rV(Γ),C(Γ)2VV(Γ))ϵ}subscript𝑟m.v.italic-ϵΓ𝑉:𝑟𝒦𝑟𝑉Γsuperscript𝐶superscriptΓ2𝑉𝑉Γitalic-ϵ\displaystyle r_{\text{m.v.}}(\epsilon,\Gamma,V)=\max\left\{r\in\mathbb{R}:% \mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(% \Gamma)}\right)\leq\epsilon\right\}italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) = roman_max { italic_r ∈ blackboard_R : caligraphic_K ( divide start_ARG italic_r end_ARG start_ARG square-root start_ARG italic_V ( roman_Γ ) end_ARG end_ARG , divide start_ARG italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_V end_ARG start_ARG italic_V ( roman_Γ ) end_ARG ) ≤ italic_ϵ } (11)

for a DMC W𝑊Witalic_W such that |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1 and V(Γ)>0𝑉Γ0V(\Gamma)>0italic_V ( roman_Γ ) > 0, where the function 𝒦:×(0,)(0,1):𝒦001\mathcal{K}:\mathbb{R}\times(0,\infty)\to(0,1)caligraphic_K : blackboard_R × ( 0 , ∞ ) → ( 0 , 1 ) is given by

𝒦(r,V)𝒦𝑟𝑉\displaystyle\mathcal{K}\left(r,V\right)caligraphic_K ( italic_r , italic_V ) =minΠ:𝔼[Π]=rVar(Π)V|supp(Π)|3𝔼[Φ(Π)].absentsubscript:Πabsent𝔼delimited-[]Π𝑟VarΠ𝑉suppΠ3𝔼delimited-[]ΦΠ\displaystyle=\min_{\begin{subarray}{c}\Pi:\\ \mathbb{E}[\Pi]=r\\ \text{Var}(\Pi)\leq V\\ |\text{supp}(\Pi)|\leq 3\end{subarray}}\mathbb{E}\left[\Phi(\Pi)\right].= roman_min start_POSTSUBSCRIPT start_ARG start_ROW start_CELL roman_Π : end_CELL end_ROW start_ROW start_CELL blackboard_E [ roman_Π ] = italic_r end_CELL end_ROW start_ROW start_CELL Var ( roman_Π ) ≤ italic_V end_CELL end_ROW start_ROW start_CELL | supp ( roman_Π ) | ≤ 3 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E [ roman_Φ ( roman_Π ) ] . (12)

The maximum and the minimum in (11)11(\ref{max5})( ) and (12)12(\ref{min5})( ), respectively, are attained [8, Lemmas 3 and 4].

Theorem 1

Fix an arbitrary ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ). Then for any Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), V>0𝑉0V>0italic_V > 0 and a DMC W𝑊Witalic_W such that |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1 and V(Γ)>0𝑉Γ0V(\Gamma)>0italic_V ( roman_Γ ) > 0, we have rm.v.(ϵ,Γ,V)>ra.s.(ϵ,Γ)subscript𝑟m.v.italic-ϵΓ𝑉subscript𝑟a.s.italic-ϵΓr_{\text{m.v.}}(\epsilon,\Gamma,V)>r_{\text{a.s.}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) > italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ).

Proof: The proof is given in Section IV.

The improvement in Theorem 1 is shown in Fig. 1 for a binary symmetric channel. Specifically, the second-order coding rate as a function of the average error probability is shown in Fig. 1 and Fig. 2 for a binary symmetric channel with parameter p=0.3𝑝0.3p=0.3italic_p = 0.3, alphabets 𝒜=={0,1}𝒜01\mathcal{A}=\mathcal{B}=\{0,1\}caligraphic_A = caligraphic_B = { 0 , 1 }, cost threshold Γ=0.2Γ0.2\Gamma=0.2roman_Γ = 0.2 and cost function c(x)=x𝑐𝑥𝑥c(x)=xitalic_c ( italic_x ) = italic_x.

Refer to caption
Figure 1: The SOCR is compared between the almost-sure cost constraint and the mean and variance cost constraints for different values of V𝑉Vitalic_V. The plots for the mean and variance cost constraints are lower bounds to the SOCR since they are obtained through a non-exhaustive search of the feasible region in the maximization and minimization in (11)11(\ref{max5})( ) and (12)12(\ref{min5})( ), respectively.

As discussed in (6)6(\ref{criticaln})( ), the choice of V𝑉Vitalic_V together with the desired values of α𝛼\alphaitalic_α and β𝛽\betaitalic_β specifies the critical blocklength exceeding which guarantees the (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β )-ergodicity of the coding scheme. In practice, the choice of blocklength is more fundamental as it affects complexity and latency. Therefore, it is more prudent for the value of V𝑉Vitalic_V to be dictated by the blocklength n𝑛nitalic_n and the desired (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β )-ergodicity via the relation V=nβα2Γ2𝑉𝑛𝛽superscript𝛼2superscriptΓ2V=n\beta\alpha^{2}\Gamma^{2}italic_V = italic_n italic_β italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT derived from (6)6(\ref{criticaln})( ). With the same channel (p=0.3)𝑝0.3(p=0.3)( italic_p = 0.3 ) and cost (Γ=0.2)Γ0.2(\Gamma=0.2)( roman_Γ = 0.2 ) parameters as used in Fig. 1, Fig. 2 shows the second-order performance for different critical blocklengths for an (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β )-ergodic codebook with α=β=0.1𝛼𝛽0.1\alpha=\beta=0.1italic_α = italic_β = 0.1.

Refer to caption
Figure 2: The SOCR is compared between the almost-sure cost constraint and the mean and variance cost constraints for an (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β )-ergodic random codebook with α=β=0.1𝛼𝛽0.1\alpha=\beta=0.1italic_α = italic_β = 0.1.

III-B Performance improvement for feedback codes

Definition 8

A controller is a function F:(𝒜×)𝒫(𝒜):𝐹superscript𝒜𝒫𝒜F:(\mathcal{A}\times\mathcal{B})^{*}\to\mathcal{P}(\mathcal{A})italic_F : ( caligraphic_A × caligraphic_B ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT → caligraphic_P ( caligraphic_A ).

Random feedback codes can be constructed by controllers. Given a message mR𝑚subscript𝑅m\in\mathcal{M}_{R}italic_m ∈ caligraphic_M start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT and the past channel inputs and outputs (xk1,yk1)superscript𝑥𝑘1superscript𝑦𝑘1(x^{k-1},y^{k-1})( italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ), the channel input Xk=f(m,xk1,yk1)subscript𝑋𝑘𝑓𝑚superscript𝑥𝑘1superscript𝑦𝑘1X_{k}=f(m,x^{k-1},y^{k-1})italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_f ( italic_m , italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) at time instant k𝑘kitalic_k is distributed according to F(xk1,yk1)𝐹superscript𝑥𝑘1superscript𝑦𝑘1F(x^{k-1},y^{k-1})italic_F ( italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ). There is a one-to-one correspondence between a random feedback code and a controller-based code.

  • A random feedback code (f,g)𝑓𝑔(f,g)( italic_f , italic_g ) is equivalently specified by the joint distribution

    pM,Xn,Yn,M^=pM(i=1npXi|M,Xi1,Yi1pYi|Xi)pM^|Yn,subscript𝑝𝑀superscript𝑋𝑛superscript𝑌𝑛^𝑀subscript𝑝𝑀superscriptsubscriptproduct𝑖1𝑛subscript𝑝conditionalsubscript𝑋𝑖𝑀superscript𝑋𝑖1superscript𝑌𝑖1subscript𝑝conditionalsubscript𝑌𝑖subscript𝑋𝑖subscript𝑝conditional^𝑀superscript𝑌𝑛\displaystyle p_{M,X^{n},Y^{n},\hat{M}}=p_{M}\left(\prod_{i=1}^{n}p_{X_{i}|M,X% ^{i-1},Y^{i-1}}p_{Y_{i}|X_{i}}\right)p_{\hat{M}|Y^{n}},italic_p start_POSTSUBSCRIPT italic_M , italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , over^ start_ARG italic_M end_ARG end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_M , italic_X start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT over^ start_ARG italic_M end_ARG | italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

    where M^^𝑀\hat{M}over^ start_ARG italic_M end_ARG is the decoded message. Marginalizing out M𝑀Mitalic_M and M^^𝑀\hat{M}over^ start_ARG italic_M end_ARG, we obtain

    pXn,Yn=i=1npXi|Xi1,Yi1pYi|Xisubscript𝑝superscript𝑋𝑛superscript𝑌𝑛superscriptsubscriptproduct𝑖1𝑛subscript𝑝conditionalsubscript𝑋𝑖superscript𝑋𝑖1superscript𝑌𝑖1subscript𝑝conditionalsubscript𝑌𝑖subscript𝑋𝑖\displaystyle p_{X^{n},Y^{n}}=\prod_{i=1}^{n}p_{X_{i}|X^{i-1},Y^{i-1}}\,p_{Y_{% i}|X_{i}}italic_p start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT

    from which a controller F𝐹Fitalic_F can be obtained with the specification

    F(xk|xk1,yk1)=pXk|Xk1,Yk1(xk|xk1,yk1)𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1subscript𝑝conditionalsubscript𝑋𝑘superscript𝑋𝑘1superscript𝑌𝑘1conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1\displaystyle F(x_{k}|x^{k-1},y^{k-1})=p_{X_{k}|X^{k-1},Y^{k-1}}(x_{k}|x^{k-1}% ,y^{k-1})italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) = italic_p start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT )

    for each time k𝑘kitalic_k.

  • Likewise, a controller F𝐹Fitalic_F specifies a random feedback code by inducing the following joint distribution:

    pM,Xn,Yn,M^(m,xn,yn,m^)=pM(m)(i=1nF(xi|xi1,yi1)pYi|Xi(yi|xi))pM^|Yn(m^|yn).subscript𝑝𝑀superscript𝑋𝑛superscript𝑌𝑛^𝑀𝑚superscript𝑥𝑛superscript𝑦𝑛^𝑚subscript𝑝𝑀𝑚superscriptsubscriptproduct𝑖1𝑛𝐹conditionalsubscript𝑥𝑖superscript𝑥𝑖1superscript𝑦𝑖1subscript𝑝conditionalsubscript𝑌𝑖subscript𝑋𝑖conditionalsubscript𝑦𝑖subscript𝑥𝑖subscript𝑝conditional^𝑀superscript𝑌𝑛conditional^𝑚superscript𝑦𝑛\displaystyle p_{M,X^{n},Y^{n},\hat{M}}(m,x^{n},y^{n},\hat{m})=p_{M}(m)\left(% \prod_{i=1}^{n}F(x_{i}|x^{i-1},y^{i-1})p_{Y_{i}|X_{i}}(y_{i}|x_{i})\right)p_{% \hat{M}|Y^{n}}(\hat{m}|y^{n}).italic_p start_POSTSUBSCRIPT italic_M , italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , over^ start_ARG italic_M end_ARG end_POSTSUBSCRIPT ( italic_m , italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , over^ start_ARG italic_m end_ARG ) = italic_p start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_m ) ( ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) italic_p start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) italic_p start_POSTSUBSCRIPT over^ start_ARG italic_M end_ARG | italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_m end_ARG | italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) .

A controller F𝐹Fitalic_F along with the channel W𝑊Witalic_W specify a joint distribution over 𝒜n×nsuperscript𝒜𝑛superscript𝑛\mathcal{A}^{n}\times\mathcal{B}^{n}caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × caligraphic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with probability assignments given by

(FW)(xn,yn)=k=1nF(xk|xk1,yk1)W(yk|xk).𝐹𝑊superscript𝑥𝑛superscript𝑦𝑛superscriptsubscriptproduct𝑘1𝑛𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle(F\circ W)(x^{n},y^{n})=\prod_{k=1}^{n}F(x_{k}|x^{k-1},y^{k-1})W(% y_{k}|x_{k}).( italic_F ∘ italic_W ) ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (13)
Lemma 1

Consider a channel W𝑊Witalic_W with cost constraint Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Then for every n,ρ>0𝑛𝜌0n,\rho>0italic_n , italic_ρ > 0 and ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ),

logMfb(n,ϵ,Γ)subscriptsuperscript𝑀fb𝑛italic-ϵΓ\displaystyle\log M^{*}_{\text{fb}}(n,\epsilon,\Gamma)roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) logρlog[(1ϵsupFinfq𝒫(n)(FW)(W(Yn|Xn)q(Yn)>ρ))+],absent𝜌superscript1italic-ϵsubscriptsupremum𝐹subscriptinfimum𝑞𝒫superscript𝑛𝐹𝑊𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle\leq\log\rho-\log\left[\left(1-\epsilon-\sup_{F}\,\inf_{q\in% \mathcal{P}(\mathcal{B}^{n})}(F\circ W)\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>% \rho\right)\right)^{+}\right],≤ roman_log italic_ρ - roman_log [ ( 1 - italic_ϵ - roman_sup start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_q ∈ caligraphic_P ( caligraphic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ( italic_F ∘ italic_W ) ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ) ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ] , (14)

where the supremum in (14)14(\ref{qandp})( ) is over all controllers F𝐹Fitalic_F satisfying

(FW)(c(Xn)Γ)𝐹𝑊𝑐superscript𝑋𝑛Γ\displaystyle(F\circ W)\left(c(X^{n})\leq\Gamma\right)( italic_F ∘ italic_W ) ( italic_c ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ ) =xn:c(xn)Γynnk=1nF(xk|xk1,yk1)W(yk|xk)absentsubscript:superscript𝑥𝑛𝑐superscript𝑥𝑛Γsubscriptsuperscript𝑦𝑛superscript𝑛superscriptsubscriptproduct𝑘1𝑛𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle=\sum_{x^{n}:c(x^{n})\leq\Gamma}\sum_{y^{n}\in\mathcal{B}^{n}}% \prod_{k=1}^{n}F(x_{k}|x^{k-1},y^{k-1})W(y_{k}|x_{k})= ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_c ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
=1.absent1\displaystyle\,=1.= 1 . (15)

Proof: Lemma 1 is similar to [23, Theorem 27], [18, (42)], [4, Lemma 15], [8, Lemma 2] and others. Its proof is omitted.

Theorem 2

Fix an arbitrary ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ). Then for any Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and a simple-dispersion DMC W𝑊Witalic_W such that V(Γ)>0𝑉Γ0V(\Gamma)>0italic_V ( roman_Γ ) > 0, we have ra.s.,fb(ϵ,Γ)=ra.s.(ϵ,Γ)subscript𝑟a.s.,fbitalic-ϵΓsubscript𝑟a.s.italic-ϵΓr_{\text{a.s.,fb}}(\epsilon,\Gamma)=r_{\text{a.s.}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) = italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ).

Proof: The proof is given in Section V.

We only prove a converse result that is the following upper bound:

ra.s.,fb(ϵ,Γ)V(Γ)Φ1(ϵ).subscript𝑟a.s.,fbitalic-ϵΓ𝑉ΓsuperscriptΦ1italic-ϵ\displaystyle r_{\text{a.s.,fb}}(\epsilon,\Gamma)\leq\sqrt{V(\Gamma)}\Phi^{-1}% (\epsilon).italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) ≤ square-root start_ARG italic_V ( roman_Γ ) end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ ) . (16)

The result of Theorem 2 follows by combining (16)16(\ref{4x})( ) with the existing achievability result (without feedback) from [3, Theorem 3].

From Theorems 1 and 2 alone, we have that rm.v.,fb(ϵ,Γ,V)>ra.s.,fb(ϵ,Γ,V)subscript𝑟m.v.,fbitalic-ϵΓ𝑉subscript𝑟a.s.,fbitalic-ϵΓ𝑉r_{\text{m.v.,fb}}(\epsilon,\Gamma,V)>r_{\text{a.s.,fb}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) > italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ). More importantly, the mean and variance cost formulation admits feedback mechanisms that improve the SOCR, even if the capacity-cost-achieving distribution is unique, i.e., |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1. This is the more interesting case since for compound-dispersion channels where |ΠW,Γ|>1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|>1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | > 1, feedback is already known to improve second-order performance via timid/bold coding [4].

In contrast to the almost-sure constraint where ra.s.,fb(ϵ,Γ)=ra.s.(ϵ,Γ)subscript𝑟a.s.,fbitalic-ϵΓsubscript𝑟a.s.italic-ϵΓr_{\text{a.s.,fb}}(\epsilon,\Gamma)=r_{\text{a.s.}}(\epsilon,\Gamma)italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) = italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ), we observe that rm.v.,fb(ϵ,Γ,V)>rm.v.(ϵ,Γ,V)subscript𝑟m.v.,fbitalic-ϵΓ𝑉subscript𝑟m.v.italic-ϵΓ𝑉r_{\text{m.v.,fb}}(\epsilon,\Gamma,V)>r_{\text{m.v.}}(\epsilon,\Gamma,V)italic_r start_POSTSUBSCRIPT m.v.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) > italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) for simple-dispersion channels with |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1 [8, Theorem 3]. In summary, for any ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ), Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), V>0𝑉0V>0italic_V > 0 and a simple-dispersion DMC W𝑊Witalic_W such that |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1 and V(Γ)>0𝑉Γ0V(\Gamma)>0italic_V ( roman_Γ ) > 0,

rm.v.,fb(ϵ,Γ,V)>rm.v.(ϵ,Γ,V)>ra.s.,fb(ϵ,Γ)=ra.s.(ϵ,Γ),subscript𝑟m.v.,fbitalic-ϵΓ𝑉subscript𝑟m.v.italic-ϵΓ𝑉subscript𝑟a.s.,fbitalic-ϵΓsubscript𝑟a.s.italic-ϵΓ\displaystyle r_{\text{m.v.,fb}}(\epsilon,\Gamma,V)>r_{\text{m.v.}}(\epsilon,% \Gamma,V)>r_{\text{a.s.,fb}}(\epsilon,\Gamma)=r_{\text{a.s.}}(\epsilon,\Gamma),italic_r start_POSTSUBSCRIPT m.v.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) > italic_r start_POSTSUBSCRIPT m.v. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ , italic_V ) > italic_r start_POSTSUBSCRIPT a.s.,fb end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) = italic_r start_POSTSUBSCRIPT a.s. end_POSTSUBSCRIPT ( italic_ϵ , roman_Γ ) ,

where the last equality above has been proven without the assumption |ΠW,Γ|=1superscriptsubscriptΠ𝑊Γ1|\Pi_{W,\Gamma}^{*}|=1| roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | = 1.

IV Proof of Theorem 1

Since 𝒦(r,V)𝒦𝑟𝑉\mathcal{K}(r,V)caligraphic_K ( italic_r , italic_V ) is a continuous function [8, Lemma 3], it suffices to show that for all r𝑟r\in\mathbb{R}italic_r ∈ blackboard_R and V>0𝑉0V>0italic_V > 0,

minΠ:𝔼[Π]=rVar(Π)V|supp(Π)|2𝔼[Φ(Π)]<Φ(r).subscript:Πabsent𝔼delimited-[]Π𝑟VarΠ𝑉suppΠ2𝔼delimited-[]ΦΠΦ𝑟\displaystyle\min_{\begin{subarray}{c}\Pi:\\ \mathbb{E}[\Pi]=r\\ \text{Var}(\Pi)\leq V\\ |\text{supp}(\Pi)|\leq 2\end{subarray}}\mathbb{E}\left[\Phi(\Pi)\right]<\Phi(r).roman_min start_POSTSUBSCRIPT start_ARG start_ROW start_CELL roman_Π : end_CELL end_ROW start_ROW start_CELL blackboard_E [ roman_Π ] = italic_r end_CELL end_ROW start_ROW start_CELL Var ( roman_Π ) ≤ italic_V end_CELL end_ROW start_ROW start_CELL | supp ( roman_Π ) | ≤ 2 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_E [ roman_Φ ( roman_Π ) ] < roman_Φ ( italic_r ) . (17)

The LHS of (17)17(\ref{bv})( ) can be written as

minp,π:0p1p1p(πr)2V[pΦ(π)+(1p)Φ(rpπ1p)],subscript:𝑝𝜋absent0𝑝1𝑝1𝑝superscript𝜋𝑟2𝑉𝑝Φ𝜋1𝑝Φ𝑟𝑝𝜋1𝑝\displaystyle\min_{\begin{subarray}{c}p,\pi:\\ 0\leq p\leq 1\\ \frac{p}{1-p}(\pi-r)^{2}\leq V\end{subarray}}\left[p\Phi(\pi)+(1-p)\Phi\left(% \frac{r-p\pi}{1-p}\right)\right],roman_min start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_p , italic_π : end_CELL end_ROW start_ROW start_CELL 0 ≤ italic_p ≤ 1 end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_π - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_V end_CELL end_ROW end_ARG end_POSTSUBSCRIPT [ italic_p roman_Φ ( italic_π ) + ( 1 - italic_p ) roman_Φ ( divide start_ARG italic_r - italic_p italic_π end_ARG start_ARG 1 - italic_p end_ARG ) ] ,

where we used the constraint 𝔼[Π]=r𝔼delimited-[]Π𝑟\mathbb{E}[\Pi]=rblackboard_E [ roman_Π ] = italic_r to eliminate one of the decision variables.

Assume by contradiction that

pΦ(π)+(1p)Φ(rpπ1p)Φ(r)𝑝Φ𝜋1𝑝Φ𝑟𝑝𝜋1𝑝Φ𝑟\displaystyle p\Phi(\pi)+(1-p)\Phi\left(\frac{r-p\pi}{1-p}\right)\geq\Phi(r)italic_p roman_Φ ( italic_π ) + ( 1 - italic_p ) roman_Φ ( divide start_ARG italic_r - italic_p italic_π end_ARG start_ARG 1 - italic_p end_ARG ) ≥ roman_Φ ( italic_r ) (18)

for all πr𝜋𝑟\pi\geq ritalic_π ≥ italic_r, p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ] and p1p(πr)2V𝑝1𝑝superscript𝜋𝑟2𝑉\frac{p}{1-p}(\pi-r)^{2}\leq Vdivide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_π - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_V. The assumption πr𝜋𝑟\pi\geq ritalic_π ≥ italic_r is without loss of generality since one of the two point masses must be greater than or equal to r𝑟ritalic_r.

If (18)18(\ref{eq})( ) holds, then

pΦ(π)+(1p)Φ(rpπ1p)Φ(r)𝑝Φ𝜋1𝑝Φ𝑟𝑝𝜋1𝑝Φ𝑟\displaystyle p\Phi(\pi)+(1-p)\Phi\left(\frac{r-p\pi}{1-p}\right)\geq\Phi(r)italic_p roman_Φ ( italic_π ) + ( 1 - italic_p ) roman_Φ ( divide start_ARG italic_r - italic_p italic_π end_ARG start_ARG 1 - italic_p end_ARG ) ≥ roman_Φ ( italic_r )

for all πr𝜋𝑟\pi\geq ritalic_π ≥ italic_r, p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ] and p1p(πr)2=V𝑝1𝑝superscript𝜋𝑟2𝑉\frac{p}{1-p}(\pi-r)^{2}=Vdivide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_π - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_V. Since π=r+V(1p)p𝜋𝑟𝑉1𝑝𝑝\pi=r+\sqrt{\frac{V(1-p)}{p}}italic_π = italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG in this case, we must have

pΦ(r+V(1p)p)+(1p)Φ(r1pp1p(r+V(1p)p))𝑝Φ𝑟𝑉1𝑝𝑝1𝑝Φ𝑟1𝑝𝑝1𝑝𝑟𝑉1𝑝𝑝\displaystyle p\,\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+(1-p)\Phi\left(% \frac{r}{1-p}-\frac{p}{1-p}\left(r+\sqrt{\frac{V(1-p)}{p}}\right)\right)italic_p roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + ( 1 - italic_p ) roman_Φ ( divide start_ARG italic_r end_ARG start_ARG 1 - italic_p end_ARG - divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) ) Φ(r)absentΦ𝑟\displaystyle\geq\Phi(r)≥ roman_Φ ( italic_r )
pΦ(r+V(1p)p)+(1p)Φ(rp1pV(1p)p)𝑝Φ𝑟𝑉1𝑝𝑝1𝑝Φ𝑟𝑝1𝑝𝑉1𝑝𝑝\displaystyle p\,\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+(1-p)\Phi\left(r-% \frac{p}{1-p}\sqrt{\frac{V(1-p)}{p}}\right)italic_p roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + ( 1 - italic_p ) roman_Φ ( italic_r - divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) Φ(r)absentΦ𝑟\displaystyle\geq\Phi(r)≥ roman_Φ ( italic_r )
pΦ(r+V(1p)p)+(1p)Φ(rVp1p)𝑝Φ𝑟𝑉1𝑝𝑝1𝑝Φ𝑟𝑉𝑝1𝑝\displaystyle p\,\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+(1-p)\Phi\left(r-% \sqrt{\frac{Vp}{1-p}}\right)italic_p roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + ( 1 - italic_p ) roman_Φ ( italic_r - square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG ) Φ(r)absentΦ𝑟\displaystyle\geq\Phi(r)≥ roman_Φ ( italic_r ) (19)

for all p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ].

Consider the function

f(p)𝑓𝑝\displaystyle f(p)italic_f ( italic_p ) =pΦ(r+V(1p)p)+(1p)Φ(rVp1p)Φ(r)absent𝑝Φ𝑟𝑉1𝑝𝑝1𝑝Φ𝑟𝑉𝑝1𝑝Φ𝑟\displaystyle=p\,\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+(1-p)\Phi\left(r-% \sqrt{\frac{Vp}{1-p}}\right)-\Phi(r)= italic_p roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + ( 1 - italic_p ) roman_Φ ( italic_r - square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG ) - roman_Φ ( italic_r )

with domain p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ]. For any p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ),

f(p)f(0)p𝑓𝑝𝑓0𝑝\displaystyle\frac{f(p)-f(0)}{p}divide start_ARG italic_f ( italic_p ) - italic_f ( 0 ) end_ARG start_ARG italic_p end_ARG =Φ(r+V(1p)p)+1ppΦ(rVp1p)Φ(r)pabsentΦ𝑟𝑉1𝑝𝑝1𝑝𝑝Φ𝑟𝑉𝑝1𝑝Φ𝑟𝑝\displaystyle=\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+\frac{1-p}{p}\Phi% \left(r-\sqrt{\frac{Vp}{1-p}}\right)-\frac{\Phi(r)}{p}= roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG roman_Φ ( italic_r - square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG ) - divide start_ARG roman_Φ ( italic_r ) end_ARG start_ARG italic_p end_ARG
Φ(r+V(1p)p)+1pΦ(rVp1p)Φ(r)pabsentΦ𝑟𝑉1𝑝𝑝1𝑝Φ𝑟𝑉𝑝1𝑝Φ𝑟𝑝\displaystyle\leq\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)+\frac{1}{p}\Phi% \left(r-\sqrt{\frac{Vp}{1-p}}\right)-\frac{\Phi(r)}{p}≤ roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + divide start_ARG 1 end_ARG start_ARG italic_p end_ARG roman_Φ ( italic_r - square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG ) - divide start_ARG roman_Φ ( italic_r ) end_ARG start_ARG italic_p end_ARG
=(a)Φ(r+V(1p)p)+1p[Φ(r)ϕ(r~)Vp1pΦ(r)]superscript𝑎absentΦ𝑟𝑉1𝑝𝑝1𝑝delimited-[]Φ𝑟italic-ϕ~𝑟𝑉𝑝1𝑝Φ𝑟\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\Phi\left(r+\sqrt{\frac{V(1-p)}{% p}}\right)+\frac{1}{p}\left[\Phi(r)-\phi(\tilde{r})\sqrt{\frac{Vp}{1-p}}-\Phi(% r)\right]start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( italic_a ) end_ARG end_RELOP roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) + divide start_ARG 1 end_ARG start_ARG italic_p end_ARG [ roman_Φ ( italic_r ) - italic_ϕ ( over~ start_ARG italic_r end_ARG ) square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG - roman_Φ ( italic_r ) ]
=Φ(r+V(1p)p)ϕ(r~)Vp(1p).absentΦ𝑟𝑉1𝑝𝑝italic-ϕ~𝑟𝑉𝑝1𝑝\displaystyle=\Phi\left(r+\sqrt{\frac{V(1-p)}{p}}\right)-\phi(\tilde{r})\sqrt{% \frac{V}{p(1-p)}}.= roman_Φ ( italic_r + square-root start_ARG divide start_ARG italic_V ( 1 - italic_p ) end_ARG start_ARG italic_p end_ARG end_ARG ) - italic_ϕ ( over~ start_ARG italic_r end_ARG ) square-root start_ARG divide start_ARG italic_V end_ARG start_ARG italic_p ( 1 - italic_p ) end_ARG end_ARG . (20)

In equality (a)𝑎(a)( italic_a ), we have rVp1p<r~<r𝑟𝑉𝑝1𝑝~𝑟𝑟r-\sqrt{\frac{Vp}{1-p}}<\tilde{r}<ritalic_r - square-root start_ARG divide start_ARG italic_V italic_p end_ARG start_ARG 1 - italic_p end_ARG end_ARG < over~ start_ARG italic_r end_ARG < italic_r by the mean value theorem. It is easy to see that for sufficiently small p>0𝑝0p>0italic_p > 0, the expression in (20)20(\ref{co})( ) is negative. Since f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0, we have f(p)<0𝑓𝑝0f(p)<0italic_f ( italic_p ) < 0 for some p>0𝑝0p>0italic_p > 0, which contradicts (19)19(\ref{gp})( ).

V Proof of Theorem 2

For any t𝒫n(𝒜)𝑡subscript𝒫𝑛𝒜t\in\mathcal{P}_{n}(\mathcal{A})italic_t ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ), define

dW(t):=infPΠW,ΓtP1.assignsubscript𝑑𝑊𝑡subscriptinfimum𝑃superscriptsubscriptΠ𝑊Γsubscriptnorm𝑡𝑃1\displaystyle d_{W}(t):=\inf_{P\in\Pi_{W,\Gamma}^{*}}||t-P||_{1}.italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ) := roman_inf start_POSTSUBSCRIPT italic_P ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | italic_t - italic_P | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

For any 0<γV(Γ)4|𝒜|νmax0𝛾𝑉Γ4𝒜subscript𝜈0<\gamma\leq\frac{V(\Gamma)}{4|\mathcal{A}|\nu_{\max}}0 < italic_γ ≤ divide start_ARG italic_V ( roman_Γ ) end_ARG start_ARG 4 | caligraphic_A | italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG, define

𝒫nγ={xn𝒜n:c(xn)Γ and dW(t(xn))>γ}𝒫nγ,c={xn𝒜n:c(xn)Γ and dW(t(xn))γ}.superscriptsubscript𝒫𝑛𝛾conditional-setsuperscript𝑥𝑛superscript𝒜𝑛𝑐superscript𝑥𝑛Γ and subscript𝑑𝑊𝑡superscript𝑥𝑛𝛾superscriptsubscript𝒫𝑛𝛾𝑐conditional-setsuperscript𝑥𝑛superscript𝒜𝑛𝑐superscript𝑥𝑛Γ and subscript𝑑𝑊𝑡superscript𝑥𝑛𝛾\displaystyle\begin{split}\mathcal{P}_{n}^{\gamma}&=\left\{x^{n}\in\mathcal{A}% ^{n}:c(x^{n})\leq\Gamma\text{ and }d_{W}(t(x^{n}))>\gamma\right\}\\ \mathcal{P}_{n}^{\gamma,c}&=\left\{x^{n}\in\mathcal{A}^{n}:c(x^{n})\leq\Gamma% \text{ and }d_{W}(t(x^{n}))\leq\gamma\right\}.\end{split}start_ROW start_CELL caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_CELL start_CELL = { italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_c ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ and italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) > italic_γ } end_CELL end_ROW start_ROW start_CELL caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT end_CELL start_CELL = { italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : italic_c ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ and italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≤ italic_γ } . end_CELL end_ROW (21)
Definition 9

For any distribution P𝒫(𝒜)𝑃𝒫𝒜P\in\mathcal{P}(\mathcal{A})italic_P ∈ caligraphic_P ( caligraphic_A ) and S𝒜𝑆𝒜S\subset\mathcal{A}italic_S ⊂ caligraphic_A such that P(S)>0𝑃𝑆0P(S)>0italic_P ( italic_S ) > 0, define the probability measure

P|S(x)={P(x)P(S)xS0otherwise.evaluated-at𝑃𝑆𝑥cases𝑃𝑥𝑃𝑆𝑥𝑆0otherwise\displaystyle P|_{S}(x)=\begin{cases}\frac{P(x)}{P(S)}&x\in S\\ 0&\text{otherwise}.\end{cases}italic_P | start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_x ) = { start_ROW start_CELL divide start_ARG italic_P ( italic_x ) end_ARG start_ARG italic_P ( italic_S ) end_ARG end_CELL start_CELL italic_x ∈ italic_S end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise . end_CELL end_ROW
Definition 10

For any k0𝑘0k\geq 0italic_k ≥ 0 and any xk𝒜ksuperscript𝑥𝑘superscript𝒜𝑘x^{k}\in\mathcal{A}^{k}italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, let333For kn𝑘𝑛k\geq nitalic_k ≥ italic_n, 𝒜xk=subscript𝒜superscript𝑥𝑘\mathcal{A}_{x^{k}}=\emptysetcaligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∅ and for k=0𝑘0k=0italic_k = 0, (xk,x)=(x,)(x^{k},x)=(x,)( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_x ) = ( italic_x , ).

𝒜xk={x𝒜:(xk,x) is a prefix of some xn𝒫nγ,c}.subscript𝒜superscript𝑥𝑘conditional-set𝑥𝒜superscript𝑥𝑘𝑥 is a prefix of some superscript𝑥𝑛superscriptsubscript𝒫𝑛𝛾𝑐\displaystyle\mathcal{A}_{x^{k}}=\left\{x\in\mathcal{A}:(x^{k},x)\text{ is a % prefix of some }x^{n}\in\mathcal{P}_{n}^{\gamma,c}\right\}.caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_A : ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_x ) is a prefix of some italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT } .

Fix (a0,a1,,an1)𝒫nγ,csubscript𝑎0subscript𝑎1subscript𝑎𝑛1superscriptsubscript𝒫𝑛𝛾𝑐(a_{0},a_{1},\ldots,a_{n-1})\in\mathcal{P}_{n}^{\gamma,c}( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT arbitrarily and let 𝟙ai𝒫(𝒜)subscript1subscript𝑎𝑖𝒫𝒜\mathds{1}_{a_{i}}\in\mathcal{P}(\mathcal{A})blackboard_1 start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ caligraphic_P ( caligraphic_A ) denote a single point-mass distribution at aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then for any 0kn10𝑘𝑛10\leq k\leq n-10 ≤ italic_k ≤ italic_n - 1, xk𝒜k,ykkformulae-sequencesuperscript𝑥𝑘superscript𝒜𝑘superscript𝑦𝑘superscript𝑘x^{k}\in\mathcal{A}^{k},y^{k}\in\mathcal{B}^{k}italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and a controller F𝐹Fitalic_F satisfying (15)15(\ref{vc})( ), we define the controller Fγsubscript𝐹𝛾F_{\gamma}italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT as

Fγ(xk,yk)subscript𝐹𝛾superscript𝑥𝑘superscript𝑦𝑘\displaystyle F_{\gamma}(x^{k},y^{k})italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) :={F(xk,yk)|𝒜xk if F(𝒜xk|xk,yk)>0Unif(Axk) if F(𝒜xk|xk,yk)=0 and |𝒜xk|0𝟙ak otherwise.assignabsentcasesevaluated-at𝐹superscript𝑥𝑘superscript𝑦𝑘subscript𝒜superscript𝑥𝑘 if 𝐹conditionalsubscript𝒜superscript𝑥𝑘superscript𝑥𝑘superscript𝑦𝑘0Unifsubscript𝐴superscript𝑥𝑘 if 𝐹conditionalsubscript𝒜superscript𝑥𝑘superscript𝑥𝑘superscript𝑦𝑘0 and subscript𝒜superscript𝑥𝑘0subscript1subscript𝑎𝑘 otherwise\displaystyle:=\begin{cases}F(x^{k},y^{k})|_{\mathcal{A}_{x^{k}}}&\text{ if }F% (\mathcal{A}_{x^{k}}|x^{k},y^{k})>0\\ \text{Unif}(A_{x^{k}})&\text{ if }F(\mathcal{A}_{x^{k}}|x^{k},y^{k})=0\text{ % and }|\mathcal{A}_{x^{k}}|\neq 0\\ \mathds{1}_{a_{k}}&\text{ otherwise}.\end{cases}:= { start_ROW start_CELL italic_F ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL if italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) > 0 end_CELL end_ROW start_ROW start_CELL Unif ( italic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = 0 and | caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ≠ 0 end_CELL end_ROW start_ROW start_CELL blackboard_1 start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL otherwise . end_CELL end_ROW
Remark 1

Given any controller F𝐹Fitalic_F satisfying (15)15(\ref{vc})( ), Definition 10 constructs a modified controller Fγsubscript𝐹𝛾F_{\gamma}italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT which satisfies

(FγW)(𝒫nγ,c)subscript𝐹𝛾𝑊superscriptsubscript𝒫𝑛𝛾𝑐\displaystyle(F_{\gamma}\circ W)\left(\mathcal{P}_{n}^{\gamma,c}\right)( italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∘ italic_W ) ( caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT ) :=xn𝒫nγ,cynnk=1nFγ(xk|xk1,yk1)W(yk|xk)assignabsentsubscriptsuperscript𝑥𝑛superscriptsubscript𝒫𝑛𝛾𝑐subscriptsuperscript𝑦𝑛superscript𝑛superscriptsubscriptproduct𝑘1𝑛subscript𝐹𝛾conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle:=\sum_{x^{n}\in\mathcal{P}_{n}^{\gamma,c}}\sum_{y^{n}\in\mathcal% {B}^{n}}\prod_{k=1}^{n}F_{\gamma}(x_{k}|x^{k-1},y^{k-1})W(y_{k}|x_{k}):= ∑ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
=1.absent1\displaystyle=1.= 1 . (22)

Intuitively, Fγsubscript𝐹𝛾F_{\gamma}italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT amplifies the probability assignments of F𝐹Fitalic_F over the set 𝒫nγ,csuperscriptsubscript𝒫𝑛𝛾𝑐\mathcal{P}_{n}^{\gamma,c}caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT and nullifies the probability assignments of F𝐹Fitalic_F over the set 𝒫nγsuperscriptsubscript𝒫𝑛𝛾\mathcal{P}_{n}^{\gamma}caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT so that Xn𝒫nγ,csuperscript𝑋𝑛superscriptsubscript𝒫𝑛𝛾𝑐X^{n}\in\mathcal{P}_{n}^{\gamma,c}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT almost surely for (Xn,Yn)(FγW)similar-tosuperscript𝑋𝑛superscript𝑌𝑛subscript𝐹𝛾𝑊(X^{n},Y^{n})\sim(F_{\gamma}\circ W)( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ∼ ( italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∘ italic_W ). Definition 10 is inspired by, and corrects an error in, [4, Def. 8]. With the definition given in [4, Def. 8], the analogue of (22) does not hold, although it is asserted in the proof of [4, Thm. 3]. This can be rectified by using Definition 10 in place of [4, Def. 8]. An analogous comment applies to the next definition and [4, Def. 9].

Definition 11

For any type t𝒫n(𝒜)𝑡subscript𝒫𝑛𝒜t\in\mathcal{P}_{n}(\mathcal{A})italic_t ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) such that T𝒜n(t)𝒫nγsubscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, k0𝑘0k\geq 0italic_k ≥ 0 and any xk𝒜ksuperscript𝑥𝑘superscript𝒜𝑘x^{k}\in\mathcal{A}^{k}italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, let

𝒜xkt={x𝒜:(xk,x) is a prefix of some xnT𝒜n(t)}.superscriptsubscript𝒜superscript𝑥𝑘𝑡conditional-set𝑥𝒜superscript𝑥𝑘𝑥 is a prefix of some superscript𝑥𝑛subscriptsuperscript𝑇𝑛𝒜𝑡\displaystyle\mathcal{A}_{x^{k}}^{t}=\left\{x\in\mathcal{A}:(x^{k},x)\text{ is% a prefix of some }x^{n}\in T^{n}_{\mathcal{A}}(t)\right\}.caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { italic_x ∈ caligraphic_A : ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_x ) is a prefix of some italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) } .

Fix (a0,a1,,an1)T𝒜n(t)subscript𝑎0subscript𝑎1subscript𝑎𝑛1subscriptsuperscript𝑇𝑛𝒜𝑡(a_{0},a_{1},\ldots,a_{n-1})\in T^{n}_{\mathcal{A}}(t)( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ∈ italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) arbitrarily and let 𝟙ai𝒫(𝒜)subscript1subscript𝑎𝑖𝒫𝒜\mathds{1}_{a_{i}}\in\mathcal{P}(\mathcal{A})blackboard_1 start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ caligraphic_P ( caligraphic_A ) denote a single point-mass distribution at aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then for any 0kn10𝑘𝑛10\leq k\leq n-10 ≤ italic_k ≤ italic_n - 1, xk𝒜k,ykkformulae-sequencesuperscript𝑥𝑘superscript𝒜𝑘superscript𝑦𝑘superscript𝑘x^{k}\in\mathcal{A}^{k},y^{k}\in\mathcal{B}^{k}italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and a controller F𝐹Fitalic_F satisfying (15)15(\ref{vc})( ), we define the controller Ftsubscript𝐹𝑡F_{t}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as

Ft(xk,yk)subscript𝐹𝑡superscript𝑥𝑘superscript𝑦𝑘\displaystyle F_{t}(x^{k},y^{k})italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) :={F(xk,yk)|𝒜xkt if F(𝒜xkt|xk,yk)>0Unif(Axkt) if F(𝒜xkt|xk,yk)=0 and |𝒜xk|0𝟙ak otherwise.assignabsentcasesevaluated-at𝐹superscript𝑥𝑘superscript𝑦𝑘superscriptsubscript𝒜superscript𝑥𝑘𝑡 if 𝐹conditionalsubscriptsuperscript𝒜𝑡superscript𝑥𝑘superscript𝑥𝑘superscript𝑦𝑘0Unifsubscriptsuperscript𝐴𝑡superscript𝑥𝑘 if 𝐹conditionalsubscriptsuperscript𝒜𝑡superscript𝑥𝑘superscript𝑥𝑘superscript𝑦𝑘0 and subscript𝒜superscript𝑥𝑘0subscript1subscript𝑎𝑘 otherwise\displaystyle:=\begin{cases}F(x^{k},y^{k})|_{\mathcal{A}_{x^{k}}^{t}}&\text{ % if }F(\mathcal{A}^{t}_{x^{k}}|x^{k},y^{k})>0\\ \text{Unif}(A^{t}_{x^{k}})&\text{ if }F(\mathcal{A}^{t}_{x^{k}}|x^{k},y^{k})=0% \text{ and }|\mathcal{A}_{x^{k}}|\neq 0\\ \mathds{1}_{a_{k}}&\text{ otherwise}.\end{cases}:= { start_ROW start_CELL italic_F ( italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL if italic_F ( caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) > 0 end_CELL end_ROW start_ROW start_CELL Unif ( italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_F ( caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = 0 and | caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | ≠ 0 end_CELL end_ROW start_ROW start_CELL blackboard_1 start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL otherwise . end_CELL end_ROW
Remark 2

Given any controller F𝐹Fitalic_F satisfying (15)15(\ref{vc})( ), Definition 11 constructs a modified controller Ftsubscript𝐹𝑡F_{t}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which satisfies (FtW)(T𝒜n(t))=1subscript𝐹𝑡𝑊subscriptsuperscript𝑇𝑛𝒜𝑡1(F_{t}\circ W)\left(T^{n}_{\mathcal{A}}(t)\right)=1( italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ italic_W ) ( italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ) = 1 for T𝒜n(t)𝒫nγsubscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT.

Now let

q(yn)=12i=1nQ(yi)+121|𝒫n(𝒜)|t𝒫n(𝒜)i=1nqt(yi),𝑞superscript𝑦𝑛12superscriptsubscriptproduct𝑖1𝑛superscript𝑄subscript𝑦𝑖121subscript𝒫𝑛𝒜subscript𝑡subscript𝒫𝑛𝒜superscriptsubscriptproduct𝑖1𝑛subscript𝑞𝑡subscript𝑦𝑖\displaystyle q(y^{n})=\frac{1}{2}\prod_{i=1}^{n}Q^{*}(y_{i})+\frac{1}{2}\frac% {1}{|\mathcal{P}_{n}(\mathcal{A})|}\sum_{t\in\mathcal{P}_{n}(\mathcal{A})}% \prod_{i=1}^{n}q_{t}(y_{i}),italic_q ( italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG 1 end_ARG start_ARG | caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) | end_ARG ∑ start_POSTSUBSCRIPT italic_t ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (23)

where

qt(b):=a𝒜t(a)W(b|a).assignsubscript𝑞𝑡𝑏subscript𝑎𝒜𝑡𝑎𝑊conditional𝑏𝑎\displaystyle q_{t}(b):=\sum_{a\in\mathcal{A}}t(a)W(b|a).italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) := ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_t ( italic_a ) italic_W ( italic_b | italic_a ) .

Let P𝑃Pitalic_P denote the distribution FW𝐹𝑊F\circ Witalic_F ∘ italic_W. Let Pγsubscript𝑃𝛾P_{\gamma}italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT denote the distribution FγWsubscript𝐹𝛾𝑊F_{\gamma}\circ Witalic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ∘ italic_W. Let Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the distribution FtWsubscript𝐹𝑡𝑊F_{t}\circ Witalic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∘ italic_W for each t𝒫n(𝒜)𝑡subscript𝒫𝑛𝒜t\in\mathcal{P}_{n}(\mathcal{A})italic_t ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) such that T𝒜n(t)𝒫nγsubscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT. Note that all controllers F,Fγ𝐹subscript𝐹𝛾F,F_{\gamma}italic_F , italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT and Ftsubscript𝐹𝑡F_{t}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfy (15)15(\ref{vc})( ). We have

P(W(Yn|Xn)q(Yn)>ρ)𝑃𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle P\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right)italic_P ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ )
=P(W(Yn|Xn)q(Yn)>ρdW(t(Xn))γ)+P(W(Yn|Xn)q(Yn)>ρdW(t(Xn))>γ)absent𝑃𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌subscript𝑑𝑊𝑡superscript𝑋𝑛𝛾𝑃𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌subscript𝑑𝑊𝑡superscript𝑋𝑛𝛾\displaystyle=P\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\cap d_{W}(t(X^{n}))% \leq\gamma\right)+P\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\cap d_{W}(t(X^{n% }))>\gamma\right)= italic_P ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ∩ italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≤ italic_γ ) + italic_P ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ∩ italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) > italic_γ )
=P(W(Yn|Xn)q(Yn)>ρdW(t(Xn))γ)+t:T𝒜n(t)𝒫nγP(W(Yn|Xn)q(Yn)>ρt(Xn)=t)absent𝑃𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌subscript𝑑𝑊𝑡superscript𝑋𝑛𝛾subscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾𝑃𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌𝑡superscript𝑋𝑛𝑡\displaystyle=P\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\cap d_{W}(t(X^{n}))% \leq\gamma\right)+\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma% }}P\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\cap t(X^{n})=t\right)= italic_P ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ∩ italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≤ italic_γ ) + ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ∩ italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_t )
(a)Pγ(W(Yn|Xn)q(Yn)>ρ)+t:T𝒜n(t)𝒫nγPt(W(Yn|Xn)q(Yn)>ρ).superscript𝑎absentsubscript𝑃𝛾𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌subscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}P_{\gamma}\left(\frac{W(Y^{n}% |X^{n})}{q(Y^{n})}>\rho\right)+\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P% }_{n}^{\gamma}}P_{t}\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right).start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_a ) end_ARG end_RELOP italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ) + ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ ) . (24)

Inequality (a)𝑎(a)( italic_a ) follows from the following argument. For any (xn,yn)superscript𝑥𝑛superscript𝑦𝑛(x^{n},y^{n})( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) such that xn𝒫nγ,csuperscript𝑥𝑛superscriptsubscript𝒫𝑛𝛾𝑐x^{n}\in\mathcal{P}_{n}^{\gamma,c}italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ , italic_c end_POSTSUPERSCRIPT, note that for all 1kn1𝑘𝑛1\leq k\leq n1 ≤ italic_k ≤ italic_n, 𝒜xk1subscript𝒜superscript𝑥𝑘1\mathcal{A}_{x^{k-1}}\neq\emptysetcaligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≠ ∅ and xk𝒜xk1subscript𝑥𝑘subscript𝒜superscript𝑥𝑘1x_{k}\in\mathcal{A}_{x^{k-1}}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT so that

F(xk|xk1,yk1)F(𝒜xk1|xk1,yk1).𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝐹conditionalsubscript𝒜superscript𝑥𝑘1superscript𝑥𝑘1superscript𝑦𝑘1\displaystyle F(x_{k}|x^{k-1},y^{k-1})\leq F(\mathcal{A}_{x^{k-1}}|x^{k-1},y^{% k-1}).italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ≤ italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) . (25)

Then

Pγ((xn,yn))subscript𝑃𝛾superscript𝑥𝑛superscript𝑦𝑛\displaystyle P_{\gamma}\left((x^{n},y^{n})\right)italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) )
=k=1nFγ(xk|xk1,yk1)W(yk|xk)absentsuperscriptsubscriptproduct𝑘1𝑛subscript𝐹𝛾conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle=\prod_{k=1}^{n}F_{\gamma}(x_{k}|x^{k-1},y^{k-1})W(y_{k}|x_{k})= ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
(b)k=1nF(xk|xk1,yk1)F(𝒜xk1|xk1,yk1)W(yk|xk)superscript𝑏absentsuperscriptsubscriptproduct𝑘1𝑛𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝐹conditionalsubscript𝒜superscript𝑥𝑘1superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle\stackrel{{\scriptstyle(b)}}{{\geq}}\prod_{k=1}^{n}\frac{F(x_{k}|% x^{k-1},y^{k-1})}{F(\mathcal{A}_{x^{k-1}}|x^{k-1},y^{k-1})}W(y_{k}|x_{k})start_RELOP SUPERSCRIPTOP start_ARG ≥ end_ARG start_ARG ( italic_b ) end_ARG end_RELOP ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) end_ARG italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
k=1nF(xk|xk1,yk1)W(yk|xk)absentsuperscriptsubscriptproduct𝑘1𝑛𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝑊conditionalsubscript𝑦𝑘subscript𝑥𝑘\displaystyle\geq\prod_{k=1}^{n}F(x_{k}|x^{k-1},y^{k-1})W(y_{k}|x_{k})≥ ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) italic_W ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
=P((xn,yn)).absent𝑃superscript𝑥𝑛superscript𝑦𝑛\displaystyle=P((x^{n},y^{n})).= italic_P ( ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) .

With some abuse of notation, we assume in inequality (b)𝑏(b)( italic_b ) above that if F(𝒜xk1|xk1,yk1)=0𝐹conditionalsubscript𝒜superscript𝑥𝑘1superscript𝑥𝑘1superscript𝑦𝑘10F(\mathcal{A}_{x^{k-1}}|x^{k-1},y^{k-1})=0italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) = 0, then

F(xk|xk1,yk1)F(𝒜xk1|xk1,yk1)=0𝐹conditionalsubscript𝑥𝑘superscript𝑥𝑘1superscript𝑦𝑘1𝐹conditionalsubscript𝒜superscript𝑥𝑘1superscript𝑥𝑘1superscript𝑦𝑘10\frac{F(x_{k}|x^{k-1},y^{k-1})}{F(\mathcal{A}_{x^{k-1}}|x^{k-1},y^{k-1})}=0divide start_ARG italic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_F ( caligraphic_A start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) end_ARG = 0

which is justified by (25)25(\ref{fw})( ).

A similar derivation gives

Pt((xn,yn))P(xn,yn)subscript𝑃𝑡superscript𝑥𝑛superscript𝑦𝑛𝑃superscript𝑥𝑛superscript𝑦𝑛\displaystyle P_{t}((x^{n},y^{n}))\geq P(x^{n},y^{n})italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≥ italic_P ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )

for all (xn,yn)superscript𝑥𝑛superscript𝑦𝑛(x^{n},y^{n})( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) such that c(xn)Γ𝑐superscript𝑥𝑛Γc(x^{n})\leq\Gammaitalic_c ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ and t(xn)=t𝑡superscript𝑥𝑛𝑡t(x^{n})=titalic_t ( italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_t, where T𝒜n(t)𝒫nγsubscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT.

Let ρ=exp(nC(Γ)+nr)𝜌𝑛𝐶Γ𝑛𝑟\rho=\exp\left(nC(\Gamma)+\sqrt{n}r\right)italic_ρ = roman_exp ( italic_n italic_C ( roman_Γ ) + square-root start_ARG italic_n end_ARG italic_r ), where r𝑟ritalic_r will be specified later. Define444This proof follows that of [4, Thm. 3] and corrects an error therein: the isubscript𝑖\mathcal{F}_{i}caligraphic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT filtration defined before (98) should be defined like 𝒢isubscript𝒢𝑖\mathcal{G}_{i}caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT here.

𝒢isubscript𝒢𝑖\displaystyle\mathcal{G}_{i}caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT :=σ(X1,,Xi+1,Y1,,Yi)assignabsent𝜎subscript𝑋1subscript𝑋𝑖1subscript𝑌1subscript𝑌𝑖\displaystyle:=\sigma(X_{1},\ldots,X_{i+1},Y_{1},\ldots,Y_{i}):= italic_σ ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
Zisubscript𝑍𝑖\displaystyle Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT :=i(Xi,Yi)𝔼[i(Xi,Yi)|𝒢i1]assignabsent𝑖subscript𝑋𝑖subscript𝑌𝑖𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝒢𝑖1\displaystyle:=i(X_{i},Y_{i})-\mathbb{E}\left[i(X_{i},Y_{i})|\mathcal{G}_{i-1}\right]:= italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ]
isubscript𝑖\displaystyle\mathcal{F}_{i}caligraphic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT :=σ(Z1,Z2,,Zi).assignabsent𝜎subscript𝑍1subscript𝑍2subscript𝑍𝑖\displaystyle:=\sigma(Z_{1},Z_{2},\ldots,Z_{i}).:= italic_σ ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Two things are important to note here. First, by the Markov property (Xi1,Yi1)XiYisuperscript𝑋𝑖1superscript𝑌𝑖1subscript𝑋𝑖subscript𝑌𝑖(X^{i-1},Y^{i-1})-X_{i}-Y_{i}( italic_X start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have 𝔼[i(Xi,Yi)|𝒢i1]=𝔼[i(Xi,Yi)|Xi]𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝒢𝑖1𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝑋𝑖\mathbb{E}\left[i(X_{i},Y_{i})|\mathcal{G}_{i-1}\right]=\mathbb{E}\left[i(X_{i% },Y_{i})|X_{i}\right]blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] = blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] a.s. Second, i𝒢isubscript𝑖subscript𝒢𝑖\mathcal{F}_{i}\subset\mathcal{G}_{i}caligraphic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊂ caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

For the first term in (24)24(\ref{q})( ), we can upper bound it as follows:

Pγ(W(Yn|Xn)q(Yn)>ρ)subscript𝑃𝛾𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle P_{\gamma}\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right)italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ )
Pγ(i=1nW(Yi|Xi)Q(Yi)>ρ2)absentsubscript𝑃𝛾superscriptsubscriptproduct𝑖1𝑛𝑊conditionalsubscript𝑌𝑖subscript𝑋𝑖superscript𝑄subscript𝑌𝑖𝜌2\displaystyle\leq P_{\gamma}\left(\prod_{i=1}^{n}\frac{W(Y_{i}|X_{i})}{Q^{*}(Y% _{i})}>\frac{\rho}{2}\right)≤ italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_ρ end_ARG start_ARG 2 end_ARG )
=Pγ(i=1n[log(W(Yi|Xi)Q(Yi))C(Γ)]>nrlog(2))absentsubscript𝑃𝛾superscriptsubscript𝑖1𝑛delimited-[]𝑊conditionalsubscript𝑌𝑖subscript𝑋𝑖superscript𝑄subscript𝑌𝑖𝐶Γ𝑛𝑟2\displaystyle=P_{\gamma}\left(\sum_{i=1}^{n}\left[\log\left(\frac{W(Y_{i}|X_{i% })}{Q^{*}(Y_{i})}\right)-C(\Gamma)\right]>\sqrt{n}r-\log(2)\right)= italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ roman_log ( divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ) - italic_C ( roman_Γ ) ] > square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) )
(a)Pγ(i=1n[i(Xi,Yi)𝔼[i(Xi,Yi)|𝒢i1]]>nrlog(2))superscript𝑎absentsubscript𝑃𝛾superscriptsubscript𝑖1𝑛delimited-[]𝑖subscript𝑋𝑖subscript𝑌𝑖𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝒢𝑖1𝑛𝑟2\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}P_{\gamma}\left(\sum_{i=1}^{n% }\left[i(X_{i},Y_{i})-\mathbb{E}\left[i(X_{i},Y_{i})|\mathcal{G}_{i-1}\right]% \right]>\sqrt{n}r-\log(2)\right)start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_a ) end_ARG end_RELOP italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] ] > square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) )
=Pγ(i=1nZi>nrlog(2)).absentsubscript𝑃𝛾superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑛𝑟2\displaystyle=P_{\gamma}\left(\sum_{i=1}^{n}Z_{i}>\sqrt{n}r-\log(2)\right).= italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) ) . (26)

In inequality (a)𝑎(a)( italic_a ), we used the following lemma and the fact that c(Xn)Γ𝑐superscript𝑋𝑛Γc(X^{n})\leq\Gammaitalic_c ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Γ almost surely.

Lemma 2

For Γ(Γ0,Γ)ΓsubscriptΓ0superscriptΓ\Gamma\in(\Gamma_{0},\Gamma^{*})roman_Γ ∈ ( roman_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ),

𝔼[i(X,Y)|X]C(Γ)C(Γ)(Γc(X))𝔼delimited-[]conditional𝑖𝑋𝑌𝑋𝐶Γsuperscript𝐶ΓΓ𝑐𝑋\displaystyle\mathbb{E}\left[i(X,Y)|X\right]\leq C(\Gamma)-C^{\prime}(\Gamma)% \left(\Gamma-c(X)\right)blackboard_E [ italic_i ( italic_X , italic_Y ) | italic_X ] ≤ italic_C ( roman_Γ ) - italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Γ ) ( roman_Γ - italic_c ( italic_X ) ) (27)

almost surely, where X𝑋Xitalic_X has an arbitrary distribution and Y𝑌Yitalic_Y is the output of the channel W𝑊Witalic_W when X𝑋Xitalic_X is the input.

Proof: See [24, Proposition 1] and its references.

We will now apply a martingale central limit theorem [25, Corollary to Theorem 2] to the expression in (26)26(\ref{MCLT})( ). We first verify that the hypotheses of [25, Corollary to Theorem 2] are satisfied:

  1. 1.

    First, we require that

    max1kn|Zk|<.subscript1𝑘𝑛subscript𝑍𝑘\max_{1\leq k\leq n}|Z_{k}|<\infty.roman_max start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_n end_POSTSUBSCRIPT | italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | < ∞ .

    Since Q(b)>0superscript𝑄𝑏0Q^{*}(b)>0italic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_b ) > 0 for all b𝑏b\in\mathcal{B}italic_b ∈ caligraphic_B by assumption and W(Yk|Xk)>0𝑊conditionalsubscript𝑌𝑘subscript𝑋𝑘0W(Y_{k}|X_{k})>0italic_W ( italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > 0 almost surely for each channel input and output pair (Xk,Yk)subscript𝑋𝑘subscript𝑌𝑘(X_{k},Y_{k})( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), we have

    |Zk|subscript𝑍𝑘\displaystyle|Z_{k}|| italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | maxa𝒜,b:W(b|a)>02|i(a,b)|absentsubscript:formulae-sequence𝑎𝒜𝑏𝑊conditional𝑏𝑎02𝑖𝑎𝑏\displaystyle\leq\max_{a\in\mathcal{A},b\in\mathcal{B}:W(b|a)>0}2\,|i(a,b)|≤ roman_max start_POSTSUBSCRIPT italic_a ∈ caligraphic_A , italic_b ∈ caligraphic_B : italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT 2 | italic_i ( italic_a , italic_b ) |
    :=2imax<assignabsent2subscript𝑖\displaystyle:=2i_{\max}<\infty:= 2 italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT < ∞

    for all 1kn1𝑘𝑛1\leq k\leq n1 ≤ italic_k ≤ italic_n.

  2. 2.

    Second, we require that

    𝔼[Zk|k1]=0𝔼delimited-[]conditionalsubscript𝑍𝑘subscript𝑘10\displaystyle\mathbb{E}\left[Z_{k}|\mathcal{F}_{k-1}\right]=0blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = 0

    almost surely for all 1kn1𝑘𝑛1\leq k\leq n1 ≤ italic_k ≤ italic_n [25, p. 672]. This is true because 𝔼[Zk|𝒢k1]=0𝔼delimited-[]conditionalsubscript𝑍𝑘subscript𝒢𝑘10\mathbb{E}\left[Z_{k}|\mathcal{G}_{k-1}\right]=0blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] = 0 implies

    𝔼[𝔼[Zk|𝒢k1]|k1]𝔼delimited-[]conditional𝔼delimited-[]conditionalsubscript𝑍𝑘subscript𝒢𝑘1subscript𝑘1\displaystyle\mathbb{E}\left[\mathbb{E}\left[Z_{k}|\mathcal{G}_{k-1}\right]|% \mathcal{F}_{k-1}\right]blackboard_E [ blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] | caligraphic_F start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] =0absent0\displaystyle=0= 0
    𝔼[Zk|k1]𝔼delimited-[]conditionalsubscript𝑍𝑘subscript𝑘1\displaystyle\mathbb{E}\left[Z_{k}|\mathcal{F}_{k-1}\right]blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] =0.absent0\displaystyle=0.= 0 .

Under the above two conditions, it follows from [25, Corollary to Theorem 2] that there exists a constant κ>0𝜅0\kappa>0italic_κ > 0 depending only on imaxsubscript𝑖i_{\max}italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT such that for any s𝑠s\in\mathbb{R}italic_s ∈ blackboard_R,

Pγ(1i=1n𝔼[Zi2]i=1nZis)Φ(s)κ[nlogn(i=1n𝔼γ[Zi2])32+i=1n𝔼γ[Zi2|i1]i=1n𝔼γ[Zi2]11/2].subscript𝑃𝛾1superscriptsubscript𝑖1𝑛𝔼delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑠Φ𝑠𝜅delimited-[]𝑛𝑛superscriptsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖232superscriptsubscriptnormsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝑖1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2112\displaystyle P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}\mathbb{E}\left[Z_{% i}^{2}\right]}}\sum_{i=1}^{n}Z_{i}\leq s\right)\geq\Phi\left(s\right)-\kappa% \left[\frac{n\log n}{\left(\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}% \right]\right)^{\frac{3}{2}}}+\Bigg{|}\Bigg{|}\frac{\sum_{i=1}^{n}\mathbb{E}_{% \gamma}[Z_{i}^{2}|\mathcal{F}_{i-1}]}{\sum_{i=1}^{n}\mathbb{E}_{\gamma}[Z_{i}^% {2}]}-1\Bigg{|}\Bigg{|}_{\infty}^{1/2}\right].italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_s ) ≥ roman_Φ ( italic_s ) - italic_κ [ divide start_ARG italic_n roman_log italic_n end_ARG start_ARG ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + | | divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG - 1 | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ] . (28)

Using Lemma 3 in (28)28(\ref{conv})( ), we obtain

Pγ(1i=1n𝔼[Zi2]i=1nZis)subscript𝑃𝛾1superscriptsubscript𝑖1𝑛𝔼delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑠\displaystyle P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}\mathbb{E}\left[Z_{% i}^{2}\right]}}\sum_{i=1}^{n}Z_{i}\leq s\right)italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_s ) Φ(s)κ[nlogn(nV(Γ)n|𝒜|γνmax)32+(4|𝒜|γνmaxV(Γ))1/2]absentΦ𝑠𝜅delimited-[]𝑛𝑛superscript𝑛𝑉Γ𝑛𝒜𝛾subscript𝜈32superscript4𝒜𝛾subscript𝜈𝑉Γ12\displaystyle\geq\Phi\left(s\right)-\kappa\left[\frac{n\log n}{\left(nV(\Gamma% )-n|\mathcal{A}|\gamma\nu_{\max}\right)^{\frac{3}{2}}}+\left(\frac{4|\mathcal{% A}|\gamma\nu_{\max}}{V(\Gamma)}\right)^{1/2}\right]≥ roman_Φ ( italic_s ) - italic_κ [ divide start_ARG italic_n roman_log italic_n end_ARG start_ARG ( italic_n italic_V ( roman_Γ ) - italic_n | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG + ( divide start_ARG 4 | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_V ( roman_Γ ) end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ]
Φ(s)βγ,absentΦ𝑠subscript𝛽𝛾\displaystyle\geq\Phi(s)-\beta_{\gamma},≥ roman_Φ ( italic_s ) - italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , (29)

where the last inequality holds for sufficiently large n𝑛nitalic_n for some constant βγ>0subscript𝛽𝛾0\beta_{\gamma}>0italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT > 0 which can be chosen such that βγ0subscript𝛽𝛾0\beta_{\gamma}\to 0italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT → 0 as γ0𝛾0\gamma\to 0italic_γ → 0.

Lemma 3

We have

V(Γ)2γνmax1ni=1n𝔼γ[Zi2]V(Γ)+2γνmax.𝑉Γ2𝛾subscript𝜈1𝑛superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2𝑉Γ2𝛾subscript𝜈\displaystyle V(\Gamma)-2\gamma\nu_{\max}\leq\frac{1}{n}\sum_{i=1}^{n}\mathbb{% E}_{\gamma}\left[Z_{i}^{2}\right]\leq V(\Gamma)+2\gamma\nu_{\max}.italic_V ( roman_Γ ) - 2 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_V ( roman_Γ ) + 2 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT .

Furthermore, for γV(Γ)4νmax𝛾𝑉Γ4subscript𝜈\gamma\leq\frac{V(\Gamma)}{4\nu_{\max}}italic_γ ≤ divide start_ARG italic_V ( roman_Γ ) end_ARG start_ARG 4 italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG,

i=1n𝔼γ[Zi2|i1]i=1n𝔼γ[Zi2]18γνmaxV(Γ)subscriptnormsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝑖1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖218𝛾subscript𝜈𝑉Γ\displaystyle\Bigg{|}\Bigg{|}\frac{\sum_{i=1}^{n}\mathbb{E}_{\gamma}[Z_{i}^{2}% |\mathcal{F}_{i-1}]}{\sum_{i=1}^{n}\mathbb{E}_{\gamma}[Z_{i}^{2}]}-1\Bigg{|}% \Bigg{|}_{\infty}\leq\frac{8\gamma\nu_{\max}}{V(\Gamma)}| | divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG - 1 | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ divide start_ARG 8 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_V ( roman_Γ ) end_ARG

almost surely according to the probability measure Pγsubscript𝑃𝛾P_{\gamma}italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT.

Proof: The proof of Lemma 3 is given in Appendix A.

Using the result in (29)29(\ref{finres})( ) and Lemma 3 in the expression in (26)26(\ref{MCLT})( ), we obtain

Pγ(i=1nZi>nrlog(2))subscript𝑃𝛾superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑛𝑟2\displaystyle P_{\gamma}\left(\sum_{i=1}^{n}Z_{i}>\sqrt{n}r-\log(2)\right)italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) )
=Pγ(1i=1n𝔼γ[Zi2]i=1nZi>nrlog(2)i=1n𝔼γ[Zi2])absentsubscript𝑃𝛾1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑛𝑟2superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2\displaystyle=P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}\mathbb{E}_{\gamma}% \left[Z_{i}^{2}\right]}}\sum_{i=1}^{n}Z_{i}>\frac{\sqrt{n}r-\log(2)}{\sqrt{% \sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}\right]}}\right)= italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > divide start_ARG square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG )
=1Pγ(1i=1n𝔼γ[Zi2]i=1nZinrlog(2)i=1n𝔼γ[Zi2])absent1subscript𝑃𝛾1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑛𝑟2superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2\displaystyle=1-P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}\mathbb{E}_{% \gamma}\left[Z_{i}^{2}\right]}}\sum_{i=1}^{n}Z_{i}\leq\frac{\sqrt{n}r-\log(2)}% {\sqrt{\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}\right]}}\right)= 1 - italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ divide start_ARG square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG )
{1Pγ(1i=1n𝔼γ[Zi2]i=1nZirlog(2)nV(Γ)+|𝒜|γνmax) if rlog2n1Pγ(1i=1n𝔼γ[Zi2]i=1nZirlog(2)nV(Γ)|𝒜|γνmax) if r<log2nabsentcases1subscript𝑃𝛾1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑟2𝑛𝑉Γ𝒜𝛾subscript𝜈 if 𝑟2𝑛1subscript𝑃𝛾1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑟2𝑛𝑉Γ𝒜𝛾subscript𝜈 if 𝑟2𝑛\displaystyle\leq\begin{cases}1-P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}% \mathbb{E}_{\gamma}\left[Z_{i}^{2}\right]}}\sum_{i=1}^{n}Z_{i}\leq\frac{r-% \frac{\log(2)}{\sqrt{n}}}{\sqrt{V(\Gamma)+|\mathcal{A}|\gamma\nu_{\max}}}% \right)&\text{ if }r\geq\frac{\log 2}{\sqrt{n}}\\ 1-P_{\gamma}\left(\frac{1}{\sqrt{\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^% {2}\right]}}\sum_{i=1}^{n}Z_{i}\leq\frac{r-\frac{\log(2)}{\sqrt{n}}}{\sqrt{V(% \Gamma)-|\mathcal{A}|\gamma\nu_{\max}}}\right)&\text{ if }r<\frac{\log 2}{% \sqrt{n}}\end{cases}≤ { start_ROW start_CELL 1 - italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ divide start_ARG italic_r - divide start_ARG roman_log ( 2 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_ARG start_ARG square-root start_ARG italic_V ( roman_Γ ) + | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG end_ARG ) end_CELL start_CELL if italic_r ≥ divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL 1 - italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ divide start_ARG italic_r - divide start_ARG roman_log ( 2 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_ARG start_ARG square-root start_ARG italic_V ( roman_Γ ) - | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG end_ARG ) end_CELL start_CELL if italic_r < divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_CELL end_ROW
{1Φ(rlog(2)nV(Γ)+|𝒜|γνmax)+βγ if rlog2n1Φ(rlog(2)nV(Γ)|𝒜|γνmax)+βγ if r<log2n.absentcases1Φ𝑟2𝑛𝑉Γ𝒜𝛾subscript𝜈subscript𝛽𝛾 if 𝑟2𝑛1Φ𝑟2𝑛𝑉Γ𝒜𝛾subscript𝜈subscript𝛽𝛾 if 𝑟2𝑛\displaystyle\leq\begin{cases}1-\Phi\left(\frac{r-\frac{\log(2)}{\sqrt{n}}}{% \sqrt{V(\Gamma)+|\mathcal{A}|\gamma\nu_{\max}}}\right)+\beta_{\gamma}&\text{ % if }r\geq\frac{\log 2}{\sqrt{n}}\\ 1-\Phi\left(\frac{r-\frac{\log(2)}{\sqrt{n}}}{\sqrt{V(\Gamma)-|\mathcal{A}|% \gamma\nu_{\max}}}\right)+\beta_{\gamma}&\text{ if }r<\frac{\log 2}{\sqrt{n}}.% \end{cases}≤ { start_ROW start_CELL 1 - roman_Φ ( divide start_ARG italic_r - divide start_ARG roman_log ( 2 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_ARG start_ARG square-root start_ARG italic_V ( roman_Γ ) + | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG end_ARG ) + italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_CELL start_CELL if italic_r ≥ divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL 1 - roman_Φ ( divide start_ARG italic_r - divide start_ARG roman_log ( 2 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_ARG start_ARG square-root start_ARG italic_V ( roman_Γ ) - | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG end_ARG ) + italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_CELL start_CELL if italic_r < divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG . end_CELL end_ROW (30)

Let

r={V(Γ)+|𝒜|γνmaxΦ1(ϵ+3βγ)+log2n if ϵ[123βγ,1)V(Γ)|𝒜|γνmaxΦ1(ϵ+3βγ)+log2n if ϵ(0,123βγ).𝑟cases𝑉Γ𝒜𝛾subscript𝜈superscriptΦ1italic-ϵ3subscript𝛽𝛾2𝑛 if italic-ϵ123subscript𝛽𝛾1𝑉Γ𝒜𝛾subscript𝜈superscriptΦ1italic-ϵ3subscript𝛽𝛾2𝑛 if italic-ϵ0123subscript𝛽𝛾\displaystyle r=\begin{cases}\sqrt{V(\Gamma)+|\mathcal{A}|\gamma\nu_{\max}}\,% \Phi^{-1}\left(\epsilon+3\beta_{\gamma}\right)+\frac{\log 2}{\sqrt{n}}&\text{ % if }\epsilon\in\left[\frac{1}{2}-3\beta_{\gamma},1\right)\\ \sqrt{V(\Gamma)-|\mathcal{A}|\gamma\nu_{\max}}\,\Phi^{-1}\left(\epsilon+3\beta% _{\gamma}\right)+\frac{\log 2}{\sqrt{n}}&\text{ if }\epsilon\in\left(0,\frac{1% }{2}-3\beta_{\gamma}\right).\end{cases}italic_r = { start_ROW start_CELL square-root start_ARG italic_V ( roman_Γ ) + | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ + 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) + divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_CELL start_CELL if italic_ϵ ∈ [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG - 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , 1 ) end_CELL end_ROW start_ROW start_CELL square-root start_ARG italic_V ( roman_Γ ) - | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ + 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) + divide start_ARG roman_log 2 end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_CELL start_CELL if italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG - 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) . end_CELL end_ROW (31)

Note that the upper bound in (30)30(\ref{userhere})( ) holds for any r𝑟ritalic_r. For a given error probability ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ), we choose r𝑟ritalic_r according to (31)31(\ref{myrval})( ). Then using (31)31(\ref{myrval})( ) in (30)30(\ref{userhere})( ), we obtain for any given ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ) that

Pγ(i=1nZi>nrlog(2))subscript𝑃𝛾superscriptsubscript𝑖1𝑛subscript𝑍𝑖𝑛𝑟2\displaystyle P_{\gamma}\left(\sum_{i=1}^{n}Z_{i}>\sqrt{n}r-\log(2)\right)italic_P start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > square-root start_ARG italic_n end_ARG italic_r - roman_log ( 2 ) ) 1ϵ2βγ.absent1italic-ϵ2subscript𝛽𝛾\displaystyle\leq 1-\epsilon-2\beta_{\gamma}.≤ 1 - italic_ϵ - 2 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT . (32)

(32)32(\ref{firstterm})( ) provides an upper bound to the first term in (24)24(\ref{q})( ).

We now upper bound the second term in (24)24(\ref{q})( ).

Using again the choice of q𝑞qitalic_q in (23)23(\ref{choiceq})( ), we have

t:T𝒜n(t)𝒫nγPt(W(Yn|Xn)q(Yn)>ρ)subscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}}P_{% t}\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right)∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ )
t:T𝒜n(t)𝒫nγPt(W(Yn|Xn)i=1nqt(yi)>ρ2|𝒫n(𝒜)|)absentsubscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛superscriptsubscriptproduct𝑖1𝑛subscript𝑞𝑡subscript𝑦𝑖𝜌2subscript𝒫𝑛𝒜\displaystyle\leq\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}% }P_{t}\left(\frac{W(Y^{n}|X^{n})}{\prod_{i=1}^{n}q_{t}(y_{i})}>\frac{\rho}{2|% \mathcal{P}_{n}(\mathcal{A})|}\right)≤ ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG > divide start_ARG italic_ρ end_ARG start_ARG 2 | caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_A ) | end_ARG )
t:T𝒜n(t)𝒫nγPt(i=1nlogW(Yi|Xi)qt(Yi)>nC(Γ)+nrlog2(n+1)|𝒜|)absentsubscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡superscriptsubscript𝑖1𝑛𝑊conditionalsubscript𝑌𝑖subscript𝑋𝑖subscript𝑞𝑡subscript𝑌𝑖𝑛𝐶Γ𝑛𝑟2superscript𝑛1𝒜\displaystyle\leq\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}% }P_{t}\left(\sum_{i=1}^{n}\log\frac{W(Y_{i}|X_{i})}{q_{t}(Y_{i})}>nC(\Gamma)+% \sqrt{n}r-\log 2(n+1)^{|\mathcal{A}|}\right)≤ ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG > italic_n italic_C ( roman_Γ ) + square-root start_ARG italic_n end_ARG italic_r - roman_log 2 ( italic_n + 1 ) start_POSTSUPERSCRIPT | caligraphic_A | end_POSTSUPERSCRIPT )
=(a)t:T𝒜n(t)𝒫nγW(i=1nlogW(Yi|xt,i)qt(Yi)>nC(Γ)+nrlog2(n+1)|𝒜|),superscript𝑎absentsubscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾𝑊superscriptsubscript𝑖1𝑛𝑊conditionalsubscript𝑌𝑖subscript𝑥𝑡𝑖subscript𝑞𝑡subscript𝑌𝑖𝑛𝐶Γ𝑛𝑟2superscript𝑛1𝒜\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\sum_{t:T^{n}_{\mathcal{A}}(t)% \subset\mathcal{P}_{n}^{\gamma}}W\left(\sum_{i=1}^{n}\log\frac{W(Y_{i}|x_{t,i}% )}{q_{t}(Y_{i})}>nC(\Gamma)+\sqrt{n}r-\log 2(n+1)^{|\mathcal{A}|}\right),start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( italic_a ) end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_W ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG > italic_n italic_C ( roman_Γ ) + square-root start_ARG italic_n end_ARG italic_r - roman_log 2 ( italic_n + 1 ) start_POSTSUPERSCRIPT | caligraphic_A | end_POSTSUPERSCRIPT ) , (33)

where in equality (a)𝑎(a)( italic_a ), (xt,1,,xt,n)subscript𝑥𝑡1subscript𝑥𝑡𝑛(x_{t,1},\ldots,x_{t,n})( italic_x start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_t , italic_n end_POSTSUBSCRIPT ) is any arbitrary sequence from the type class T𝒜n(t)subscriptsuperscript𝑇𝑛𝒜𝑡T^{n}_{\mathcal{A}}(t)italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ). Equality (a)𝑎(a)( italic_a ) holds because under the probability measure Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t(Xn)=t𝑡superscript𝑋𝑛𝑡t(X^{n})=titalic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_t a.s. (see Remark 2) and the distribution of

i=1nlogW(Yi|Xi)qt(Yi)superscriptsubscript𝑖1𝑛𝑊conditionalsubscript𝑌𝑖subscript𝑋𝑖subscript𝑞𝑡subscript𝑌𝑖\sum_{i=1}^{n}\log\frac{W(Y_{i}|X_{i})}{q_{t}(Y_{i})}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG

depends on Xnsuperscript𝑋𝑛X^{n}italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT only through its type. Continuing from (33)33(\ref{typeetype})( ), we have

t:T𝒜n(t)𝒫nγPt(W(Yn|Xn)q(Yn)>ρ)subscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}}P_{% t}\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right)∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ )
t:T𝒜n(t)𝒫nγW(i=1n[logW(Yi|xt,i)qt(Yi)𝔼W[logW(Y|xt,i)qt(Y)]]>\displaystyle\leq\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}% }W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{t,i})}{q_{t}(Y_{i})}-\mathbb{% E}_{W}\left[\log\frac{W(Y|x_{t,i})}{q_{t}(Y)}\right]\right]>\right.≤ ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_W ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ roman_log divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - blackboard_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT [ roman_log divide start_ARG italic_W ( italic_Y | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y ) end_ARG ] ] >
n[C(Γ)I(t,W)]+nrlog2(n+1)|𝒜|)\displaystyle\quad\quad\quad\quad\quad\quad\quad n\left[C(\Gamma)-I(t,W)\right% ]+\sqrt{n}r-\log 2(n+1)^{|\mathcal{A}|}\Bigg{)}italic_n [ italic_C ( roman_Γ ) - italic_I ( italic_t , italic_W ) ] + square-root start_ARG italic_n end_ARG italic_r - roman_log 2 ( italic_n + 1 ) start_POSTSUPERSCRIPT | caligraphic_A | end_POSTSUPERSCRIPT )
t:T𝒜n(t)𝒫nγW(i=1n[logW(Yi|xt,i)qt(Yi)𝔼W[logW(Y|xt,i)qt(Y)]]>nK2)absentsubscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾𝑊superscriptsubscript𝑖1𝑛delimited-[]𝑊conditionalsubscript𝑌𝑖subscript𝑥𝑡𝑖subscript𝑞𝑡subscript𝑌𝑖subscript𝔼𝑊delimited-[]𝑊conditional𝑌subscript𝑥𝑡𝑖subscript𝑞𝑡𝑌𝑛𝐾2\displaystyle\leq\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}% }W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{t,i})}{q_{t}(Y_{i})}-\mathbb{% E}_{W}\left[\log\frac{W(Y|x_{t,i})}{q_{t}(Y)}\right]\right]>n\frac{K}{2}\right)≤ ∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_W ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ roman_log divide start_ARG italic_W ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - blackboard_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT [ roman_log divide start_ARG italic_W ( italic_Y | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y ) end_ARG ] ] > italic_n divide start_ARG italic_K end_ARG start_ARG 2 end_ARG ) (34)

where the last inequality holds for sufficiently large n𝑛nitalic_n because r𝑟ritalic_r, as defined in (31)31(\ref{myrval})( ), is an O(1)𝑂1O(1)italic_O ( 1 ) term, and from the construction of the set 𝒫nγsuperscriptsubscript𝒫𝑛𝛾\mathcal{P}_{n}^{\gamma}caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT, we have

inft:T𝒜n(t)𝒫nγdW(t)γ>0subscriptinfimum:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑑𝑊𝑡𝛾0\inf_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}}\,d_{W}(t)\geq% \gamma>0roman_inf start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ) ≥ italic_γ > 0

which implies

inft:T𝒜n(t)𝒫nγ[C(Γ)I(t,W)]>Ksubscriptinfimum:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾delimited-[]𝐶Γ𝐼𝑡𝑊𝐾\inf_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}}\left[C(\Gamma)-% I(t,W)\right]>Kroman_inf start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_C ( roman_Γ ) - italic_I ( italic_t , italic_W ) ] > italic_K

for some constant K>0𝐾0K>0italic_K > 0.

Let imax,t:=maxa,b:qt(b)W(b|a)>0|logW(b|a)qt(b)|assignsubscript𝑖𝑡subscript:𝑎𝑏subscript𝑞𝑡𝑏𝑊conditional𝑏𝑎0𝑊conditional𝑏𝑎subscript𝑞𝑡𝑏i_{\max,t}:=\max_{a,b:q_{t}(b)W(b|a)>0}\big{|}\log\frac{W(b|a)}{q_{t}(b)}\big{|}italic_i start_POSTSUBSCRIPT roman_max , italic_t end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT italic_a , italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT | roman_log divide start_ARG italic_W ( italic_b | italic_a ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) end_ARG |. We now show that imax,t2lognsubscript𝑖𝑡2𝑛i_{\max,t}\leq 2\log nitalic_i start_POSTSUBSCRIPT roman_max , italic_t end_POSTSUBSCRIPT ≤ 2 roman_log italic_n for all t𝑡titalic_t. Let Wmin:=mina,b:W(b|a)>0W(b|a)assignsubscript𝑊subscript:𝑎𝑏𝑊conditional𝑏𝑎0𝑊conditional𝑏𝑎W_{\min}:=\min_{a,b:W(b|a)>0}W(b|a)italic_W start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_a , italic_b : italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT italic_W ( italic_b | italic_a ) and qmin,t:=minb:qt(b)>0qt(b)assignsubscript𝑞𝑡subscript:𝑏subscript𝑞𝑡𝑏0subscript𝑞𝑡𝑏q_{\min,t}:=\min_{b:q_{t}(b)>0}q_{t}(b)italic_q start_POSTSUBSCRIPT roman_min , italic_t end_POSTSUBSCRIPT := roman_min start_POSTSUBSCRIPT italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) > 0 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ). Then

qmin,tsubscript𝑞𝑡\displaystyle q_{\min,t}italic_q start_POSTSUBSCRIPT roman_min , italic_t end_POSTSUBSCRIPT =minb:qt(b)>0a𝒜t(a)W(b|a)absentsubscript:𝑏subscript𝑞𝑡𝑏0subscript𝑎𝒜𝑡𝑎𝑊conditional𝑏𝑎\displaystyle=\min_{b:q_{t}(b)>0}\sum_{a\in\mathcal{A}}t(a)W(b|a)= roman_min start_POSTSUBSCRIPT italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) > 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_t ( italic_a ) italic_W ( italic_b | italic_a )
mina,b:W(b|a)>0W(b|a)mina:t(a)>0t(a)absentsubscript:𝑎𝑏𝑊conditional𝑏𝑎0𝑊conditional𝑏𝑎subscript:𝑎𝑡𝑎0𝑡𝑎\displaystyle\geq\min_{a,b:W(b|a)>0}W(b|a)\min_{a:t(a)>0}t(a)≥ roman_min start_POSTSUBSCRIPT italic_a , italic_b : italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT italic_W ( italic_b | italic_a ) roman_min start_POSTSUBSCRIPT italic_a : italic_t ( italic_a ) > 0 end_POSTSUBSCRIPT italic_t ( italic_a )
=Wminn.absentsubscript𝑊𝑛\displaystyle=\frac{W_{\min}}{n}.= divide start_ARG italic_W start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG .

Thus,

imax,tsubscript𝑖𝑡\displaystyle i_{\max,t}italic_i start_POSTSUBSCRIPT roman_max , italic_t end_POSTSUBSCRIPT =maxa,b:qt(b)W(b|a)>0|logW(b|a)qt(b)|absentsubscript:𝑎𝑏subscript𝑞𝑡𝑏𝑊conditional𝑏𝑎0𝑊conditional𝑏𝑎subscript𝑞𝑡𝑏\displaystyle=\max_{a,b:q_{t}(b)W(b|a)>0}\big{|}\log\frac{W(b|a)}{q_{t}(b)}% \big{|}= roman_max start_POSTSUBSCRIPT italic_a , italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT | roman_log divide start_ARG italic_W ( italic_b | italic_a ) end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) end_ARG |
maxa,b:qt(b)W(b|a)>0|logW(b|a)|+maxb:qt(b)>0|logqt(b)|\displaystyle\leq\max_{a,b:q_{t}(b)W(b|a)>0}\big{|}\log W(b|a)\big{|}+\max_{b:% q_{t}(b)>0}\big{|}\log q_{t}(b)\big{|}≤ roman_max start_POSTSUBSCRIPT italic_a , italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) italic_W ( italic_b | italic_a ) > 0 end_POSTSUBSCRIPT | roman_log italic_W ( italic_b | italic_a ) | + roman_max start_POSTSUBSCRIPT italic_b : italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) > 0 end_POSTSUBSCRIPT | roman_log italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_b ) |
lognWmin2absent𝑛superscriptsubscript𝑊2\displaystyle\leq\log\frac{n}{W_{\min}^{2}}≤ roman_log divide start_ARG italic_n end_ARG start_ARG italic_W start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
2lognabsent2𝑛\displaystyle\leq 2\log n≤ 2 roman_log italic_n

for all sufficiently large n𝑛nitalic_n. Hence, we can use Azuma’s inequality [26, (33), p. 61] to upper bound (34)34(\ref{recycle})( ), giving us

t:T𝒜n(t)𝒫nγPt(W(Yn|Xn)q(Yn)>ρ)subscript:𝑡subscriptsuperscript𝑇𝑛𝒜𝑡superscriptsubscript𝒫𝑛𝛾subscript𝑃𝑡𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝜌\displaystyle\sum_{t:T^{n}_{\mathcal{A}}(t)\subset\mathcal{P}_{n}^{\gamma}}P_{% t}\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\rho\right)∑ start_POSTSUBSCRIPT italic_t : italic_T start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( italic_t ) ⊂ caligraphic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > italic_ρ )
(n+1)|𝒜|exp(nK2128log2n)absentsuperscript𝑛1𝒜𝑛superscript𝐾2128superscript2𝑛\displaystyle\leq(n+1)^{|\mathcal{A}|}\exp\left(-\frac{nK^{2}}{128\log^{2}n}\right)≤ ( italic_n + 1 ) start_POSTSUPERSCRIPT | caligraphic_A | end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG italic_n italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 128 roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n end_ARG ) (35)

which goes to zero as n𝑛n\to\inftyitalic_n → ∞.

Substituting the upper bounds (32)32(\ref{firstterm})( ) and (35)35(\ref{pn4})( ) in (24)24(\ref{q})( ), we obtain

(FW)(W(Yn|Xn)q(Yn)>exp(nC(Γ)+nr))𝐹𝑊𝑊conditionalsuperscript𝑌𝑛superscript𝑋𝑛𝑞superscript𝑌𝑛𝑛𝐶Γ𝑛𝑟\displaystyle(F\circ W)\left(\frac{W(Y^{n}|X^{n})}{q(Y^{n})}>\exp\left(nC(% \Gamma)+\sqrt{n}r\right)\right)( italic_F ∘ italic_W ) ( divide start_ARG italic_W ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( italic_Y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG > roman_exp ( italic_n italic_C ( roman_Γ ) + square-root start_ARG italic_n end_ARG italic_r ) )
1ϵβγabsent1italic-ϵsubscript𝛽𝛾\displaystyle\leq 1-\epsilon-\beta_{\gamma}≤ 1 - italic_ϵ - italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT

for sufficiently large n𝑛nitalic_n. Since the controller F𝐹Fitalic_F was arbitrary, we can apply Lemma 1 to obtain

logMfb(n,ϵ,Γ)subscriptsuperscript𝑀fb𝑛italic-ϵΓ\displaystyle\log M^{*}_{\text{fb}}(n,\epsilon,\Gamma)roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) nC(Γ)+nrlogβγabsent𝑛𝐶Γ𝑛𝑟subscript𝛽𝛾\displaystyle\leq nC(\Gamma)+\sqrt{n}r-\log\beta_{\gamma}≤ italic_n italic_C ( roman_Γ ) + square-root start_ARG italic_n end_ARG italic_r - roman_log italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT
logMfb(n,ϵ,Γ)nC(Γ)nsubscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑛𝐶Γ𝑛\displaystyle\frac{\log M^{*}_{\text{fb}}(n,\epsilon,\Gamma)-nC(\Gamma)}{\sqrt% {n}}divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG rlogβγnabsent𝑟subscript𝛽𝛾𝑛\displaystyle\leq r-\frac{\log\beta_{\gamma}}{\sqrt{n}}≤ italic_r - divide start_ARG roman_log italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG
lim supnlogMfb(n,ϵ,Γ)nC(Γ)nsubscriptlimit-supremum𝑛subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑛𝐶Γ𝑛\displaystyle\limsup_{n\to\infty}\frac{\log M^{*}_{\text{fb}}(n,\epsilon,% \Gamma)-nC(\Gamma)}{\sqrt{n}}lim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG r,absentsuperscript𝑟\displaystyle\leq r^{\prime},≤ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

where rsuperscript𝑟r^{\prime}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is obtained from the expression of r𝑟ritalic_r in (31)31(\ref{myrval})( ) after taking the limit as n𝑛n\to\inftyitalic_n → ∞, i.e.,

r={V(Γ)+|𝒜|γνmaxΦ1(ϵ+3βγ) if ϵ[123βγ,1)V(Γ)|𝒜|γνmaxΦ1(ϵ+3βγ) if ϵ(0,123βγ).superscript𝑟cases𝑉Γ𝒜𝛾subscript𝜈superscriptΦ1italic-ϵ3subscript𝛽𝛾 if italic-ϵ123subscript𝛽𝛾1𝑉Γ𝒜𝛾subscript𝜈superscriptΦ1italic-ϵ3subscript𝛽𝛾 if italic-ϵ0123subscript𝛽𝛾\displaystyle r^{\prime}=\begin{cases}\sqrt{V(\Gamma)+|\mathcal{A}|\gamma\nu_{% \max}}\,\Phi^{-1}\left(\epsilon+3\beta_{\gamma}\right)&\text{ if }\epsilon\in% \left[\frac{1}{2}-3\beta_{\gamma},1\right)\\ \sqrt{V(\Gamma)-|\mathcal{A}|\gamma\nu_{\max}}\,\Phi^{-1}\left(\epsilon+3\beta% _{\gamma}\right)&\text{ if }\epsilon\in\left(0,\frac{1}{2}-3\beta_{\gamma}% \right).\end{cases}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { start_ROW start_CELL square-root start_ARG italic_V ( roman_Γ ) + | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ + 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_ϵ ∈ [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG - 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , 1 ) end_CELL end_ROW start_ROW start_CELL square-root start_ARG italic_V ( roman_Γ ) - | caligraphic_A | italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ + 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG - 3 italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ) . end_CELL end_ROW

Finally, since V(Γ)4|𝒜|νmax>γ>0𝑉Γ4𝒜subscript𝜈𝛾0\frac{V(\Gamma)}{4|\mathcal{A}|\nu_{\max}}>\gamma>0divide start_ARG italic_V ( roman_Γ ) end_ARG start_ARG 4 | caligraphic_A | italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG > italic_γ > 0 was arbitrary, we can take γ𝛾\gammaitalic_γ and βγsubscript𝛽𝛾\beta_{\gamma}italic_β start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT arbitrarily small, giving us the converse result

lim supnlogMfb(n,ϵ,Γ)nC(Γ)nV(Γ)Φ1(ϵ).subscriptlimit-supremum𝑛subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑛𝐶Γ𝑛𝑉ΓsuperscriptΦ1italic-ϵ\displaystyle\limsup_{n\to\infty}\frac{\log M^{*}_{\text{fb}}(n,\epsilon,% \Gamma)-nC(\Gamma)}{\sqrt{n}}\leq\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon).lim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ≤ square-root start_ARG italic_V ( roman_Γ ) end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ ) .

Since this matches the optimal non-feedback SOCR of simple dispersion DMCs with a peak-power cost constraint, we have

limnlogMfb(n,ϵ,Γ)nC(Γ)n=V(Γ)Φ1(ϵ)subscript𝑛subscriptsuperscript𝑀fb𝑛italic-ϵΓ𝑛𝐶Γ𝑛𝑉ΓsuperscriptΦ1italic-ϵ\displaystyle\lim_{n\to\infty}\frac{\log M^{*}_{\text{fb}}(n,\epsilon,\Gamma)-% nC(\Gamma)}{\sqrt{n}}=\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon)roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG roman_log italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fb end_POSTSUBSCRIPT ( italic_n , italic_ϵ , roman_Γ ) - italic_n italic_C ( roman_Γ ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG = square-root start_ARG italic_V ( roman_Γ ) end_ARG roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_ϵ ) (36)

for simple-dispersion DMCs with a peak-power cost constraint.

Appendix A Proof of Lemma 3

We have

i=1n𝔼γ[Zi2|𝒢i1]superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝒢𝑖1\displaystyle\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}|\mathcal{G}_{i-1% }\right]∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ]
=i=1n𝔼γ[(i(Xi,Yi)𝔼[i(Xi,Yi)|Xi])2|𝒢i1]absentsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscript𝑖subscript𝑋𝑖subscript𝑌𝑖𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝑋𝑖2subscript𝒢𝑖1\displaystyle=\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[\left(i(X_{i},Y_{i})-% \mathbb{E}\left[i(X_{i},Y_{i})|X_{i}\right]\right)^{2}|\mathcal{G}_{i-1}\right]= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ ( italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ]
=i=1n𝔼γ[(i(Xi,Yi)𝔼[i(Xi,Yi)|Xi])2|Xi]absentsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscript𝑖subscript𝑋𝑖subscript𝑌𝑖𝔼delimited-[]conditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝑋𝑖2subscript𝑋𝑖\displaystyle=\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[\left(i(X_{i},Y_{i})-% \mathbb{E}\left[i(X_{i},Y_{i})|X_{i}\right]\right)^{2}|X_{i}\right]= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ ( italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - blackboard_E [ italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]
=i=1nVar(i(Xi,Yi)|Xi)absentsuperscriptsubscript𝑖1𝑛Varconditional𝑖subscript𝑋𝑖subscript𝑌𝑖subscript𝑋𝑖\displaystyle=\sum_{i=1}^{n}\text{Var}\left(i(X_{i},Y_{i})|X_{i}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT Var ( italic_i ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
=i=1na𝒜𝟙(Xi=a)νaabsentsuperscriptsubscript𝑖1𝑛subscript𝑎𝒜1subscript𝑋𝑖𝑎subscript𝜈𝑎\displaystyle=\sum_{i=1}^{n}\sum_{a\in\mathcal{A}}\mathds{1}\left(X_{i}=a% \right)\nu_{a}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT blackboard_1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_a ) italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
=na𝒜PXn(a)νa.absent𝑛subscript𝑎𝒜subscript𝑃superscript𝑋𝑛𝑎subscript𝜈𝑎\displaystyle=n\sum_{a\in\mathcal{A}}P_{X^{n}}(a)\nu_{a}.= italic_n ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a ) italic_ν start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT .

From Remark 1, since dW(t(Xn))γsubscript𝑑𝑊𝑡superscript𝑋𝑛𝛾d_{W}(t(X^{n}))\leq\gammaitalic_d start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ) ≤ italic_γ a.s., there exists a P~ΠW,Γ~𝑃superscriptsubscriptΠ𝑊Γ\tilde{P}\in\Pi_{W,\Gamma}^{*}over~ start_ARG italic_P end_ARG ∈ roman_Π start_POSTSUBSCRIPT italic_W , roman_Γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that t(Xn)P~12γsubscriptnorm𝑡superscript𝑋𝑛~𝑃12𝛾||t(X^{n})-\tilde{P}||_{1}\leq 2\gamma| | italic_t ( italic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - over~ start_ARG italic_P end_ARG | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 2 italic_γ. Hence,

nV(Γ)2nγνmax𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle nV(\Gamma)-2n\gamma\nu_{\max}italic_n italic_V ( roman_Γ ) - 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT i=1n𝔼γ[Zi2|𝒢i1]nV(Γ)+2nγνmaxabsentsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝒢𝑖1𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle\leq\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}|\mathcal{G}_% {i-1}\right]\leq nV(\Gamma)+2n\gamma\nu_{\max}≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] ≤ italic_n italic_V ( roman_Γ ) + 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT

a.s., where we used the fact that W𝑊Witalic_W is simple-dispersion at the cost ΓΓ\Gammaroman_Γ. Furthermore, since i1𝒢i1subscript𝑖1subscript𝒢𝑖1\mathcal{F}_{i-1}\subset\mathcal{G}_{i-1}caligraphic_F start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ⊂ caligraphic_G start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT,

nV(Γ)2nγνmax𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle nV(\Gamma)-2n\gamma\nu_{\max}italic_n italic_V ( roman_Γ ) - 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT i=1n𝔼γ[Zi2|i1]nV(Γ)+2nγνmaxabsentsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝑖1𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle\leq\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}|\mathcal{F}_% {i-1}\right]\leq nV(\Gamma)+2n\gamma\nu_{\max}≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] ≤ italic_n italic_V ( roman_Γ ) + 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT

a.s. and

nV(Γ)2nγνmax𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle nV(\Gamma)-2n\gamma\nu_{\max}italic_n italic_V ( roman_Γ ) - 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT i=1n𝔼γ[Zi2]nV(Γ)+2nγνmax.absentsuperscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖2𝑛𝑉Γ2𝑛𝛾subscript𝜈\displaystyle\leq\sum_{i=1}^{n}\mathbb{E}_{\gamma}\left[Z_{i}^{2}\right]\leq nV% (\Gamma)+2n\gamma\nu_{\max}.≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_n italic_V ( roman_Γ ) + 2 italic_n italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT .

Finally, we have

|i=1n𝔼γ[Zi2|i1]i=1n𝔼γ[Zi2]1|superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]conditionalsuperscriptsubscript𝑍𝑖2subscript𝑖1superscriptsubscript𝑖1𝑛subscript𝔼𝛾delimited-[]superscriptsubscript𝑍𝑖21\displaystyle\Bigg{|}\frac{\sum_{i=1}^{n}\mathbb{E}_{\gamma}[Z_{i}^{2}|% \mathcal{F}_{i-1}]}{\sum_{i=1}^{n}\mathbb{E}_{\gamma}[Z_{i}^{2}]}-1\Bigg{|}| divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ] end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG - 1 |
|V(Γ)+2γνmaxV(Γ)2γνmax1|absent𝑉Γ2𝛾subscript𝜈𝑉Γ2𝛾subscript𝜈1\displaystyle\leq\Bigg{|}\frac{V(\Gamma)+2\gamma\nu_{\max}}{V(\Gamma)-2\gamma% \nu_{\max}}-1\Bigg{|}≤ | divide start_ARG italic_V ( roman_Γ ) + 2 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_V ( roman_Γ ) - 2 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG - 1 |
=|4γνmaxV(Γ)2γνmax|absent4𝛾subscript𝜈𝑉Γ2𝛾subscript𝜈\displaystyle=\Bigg{|}\frac{4\gamma\nu_{\max}}{V(\Gamma)-2\gamma\nu_{\max}}% \Bigg{|}= | divide start_ARG 4 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_V ( roman_Γ ) - 2 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG |
8γνmaxV(Γ),absent8𝛾subscript𝜈𝑉Γ\displaystyle\leq\frac{8\gamma\nu_{\max}}{V(\Gamma)},≤ divide start_ARG 8 italic_γ italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_V ( roman_Γ ) end_ARG ,

assuming γV(Γ)4νmax𝛾𝑉Γ4subscript𝜈\gamma\leq\frac{V(\Gamma)}{4\nu_{\max}}italic_γ ≤ divide start_ARG italic_V ( roman_Γ ) end_ARG start_ARG 4 italic_ν start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG.

Acknowledgment

This research was supported by the US National Science Foundation under grant CCF-1956192.

References

  • [1] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.   Cambridge University Press, 2011.
  • [2] V. Strassen, “Asymptotische abschätzungen in Shannon’s informationstheorie,” in Proc. Trans. 3rd Prague Conf Inf. Theory, Prague, Czech, 1962, pp. 689–723.
  • [3] M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 4947–4966, 2009.
  • [4] A. B. Wagner, N. V. Shende, and Y. Altuğ, “A new method for employing feedback to improve coding performance,” IEEE Transactions on Information Theory, vol. 66, no. 11, pp. 6660–6681, 2020.
  • [5] Y. Y. Shkel, V. Y. F. Tan, and S. C. Draper, “Second-order coding rate for m𝑚mitalic_m-class source-channel codes,” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2015, pp. 620–626.
  • [6] M. Tomamichel and V. Y. F. Tan, “Second-order coding rates for channels with state,” IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 4427–4448, 2014.
  • [7] C. E. Shannon, “Probability of error for optimal codes in a gaussian channel,” The Bell System Technical Journal, vol. 38, no. 3, pp. 611–656, 1959.
  • [8] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” in 2024 IEEE International Symposium on Information Theory (ISIT), 2024, pp. 510–515.
  • [9] R. Yates, “A framework for uplink power control in cellular radio systems,” IEEE Journal on Selected Areas in Communications, vol. 13, no. 7, pp. 1341–1347, 1995.
  • [10] A. Goldsmith and P. Varaiya, “Capacity of fading channels with channel side information,” IEEE Transactions on Information Theory, vol. 43, no. 6, pp. 1986–1992, 1997.
  • [11] S. Hanly and D. Tse, “Multiaccess fading channels. ii. delay-limited capacities,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2816–2831, 1998.
  • [12] A. Lozano and N. Jindal, “Transmit diversity vs. spatial multiplexing in modern mimo systems,” IEEE Transactions on Wireless Communications, vol. 9, no. 1, pp. 186–197, 2010.
  • [13] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Dept. Elect. Eng., Princeton Univ., Princeton, NJ, USA, 2010.
  • [14] A. Sahai, S. Draper, and M. Gastpar, “Boosting reliability over AWGN networks with average power constraints and noiseless feedback,” in Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., 2005, pp. 402–406.
  • [15] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the awgn channel with feedback under a peak power constraint,” in 2015 IEEE International Symposium on Information Theory (ISIT), 2015, pp. 311–315.
  • [16] A. Mahmood and A. B. Wagner, “Timid/bold coding for channels with cost constraints,” in 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 1442–1447.
  • [17] V. Kostina and S. Verdú, “Channels with cost constraints: Strong converse and dispersion,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2415–2429, 2015.
  • [18] S. L. Fong and V. Y. F. Tan, “A tight upper bound on the second-order coding rate of the parallel Gaussian channel with feedback,” IEEE Transactions on Information Theory, vol. 63, no. 10, pp. 6474–6486, 2017.
  • [19] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4598–4615, 2015.
  • [20] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the awgn channel,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2430–2438, 2015.
  • [21] R. G. Gallager, Information Theory and Reliable Communication.   New York, NY, USA: Wiley, 1968.
  • [22] C. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. 2, no. 3, pp. 8–19, 1956.
  • [23] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
  • [24] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” 2024. [Online]. Available: https://arxiv.org/abs/2401.16417
  • [25] E. Bolthausen, “Exact convergence rates in some martingale central limit theorems,” The Annals of Probability, vol. 10, no. 3, pp. 672–688, Aug. 1982.
  • [26] B. Bercu, B. Delyon, and E. Rio, Concentration Inequalities for Sums and Martingales.   Cham, Switzerland: Springer, 2015.