INSTITUTE OF PHYSICS PUBLISHING
JOURNAL OF PHYSICS A: MATHEMATICAL AND GENERAL
J. Phys. A: Math. Gen. 38 (2005) 10859–10872
doi:10.1088/0305-4470/38/49/024
Random matrix theory of multi-antenna
communications: the Ricean channel
Aris L Moustakas1,3 and Steven H Simon2
1
2
Department of Physics, University of Athens, Panepistimiopolis, Athens 15784, Greece
Bell Labs, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ 07974, USA
Received 25 July 2005, in final form 26 July 2005
Published 22 November 2005
Online at stacks.iop.org/JPhysA/38/10859
Abstract
The use of multi-antenna arrays in wireless communications through disordered
media promises huge increases in the information transmission rate. It is
therefore important to analyse the information capacity of such systems in
realistic situations of microwave transmission, where the statistics of the
transmission amplitudes (channel) may be coloured. Here, we present an
approach that provides analytic expressions for the statistics, i.e. the moments
of the distribution, of the mutual information for general Gaussian channel
statistics. The mathematical method applies tools developed originally in the
context of coherent wave propagation in disordered media, such as random
matrix theory and replicas. Although it is valid formally for large antenna
numbers, this approach produces extremely accurate results even for arrays with
as few as two antennas. We also develop a method to analytically optimize over
the input signal distribution, which enables us to calculate analytic capacities
when the transmitter has knowledge of the statistics of the channel. The
emphasis of this paper is on elucidating the novel mathematical methods used.
We do this by analysing a specific case when the channel matrix is a complex
Gaussian with arbitrary mean and unit covariance, which is usually called the
Ricean channel.
PACS numbers: 84.40.Ba, 02.10.Yn, 07.50.Qx
1. Introduction
Recently, there has been increased interest in using multi-antenna arrays simultaneously in
transmission and reception of microwave signals. Recent theoretical work in [1] and [2] has
shown that for sufficiently rich scattering environments the Shannon capacity of an nt -element
transmitting and an nr -element receiving array is roughly proportional to min(nr , nt ) for large
numbers of antennas. This is significantly more than the usually logarithmic increase in
3
Author to whom correspondence should be addressed.
0305-4470/05/4910859+14$30.00 © 2005 IOP Publishing Ltd Printed in the UK
10859
10860
A L Moustakas and S H Simon
capacity for increasing antenna numbers when only a single path (plane-wave) connects the
transmission and reception arrays (line-of-sight). Intuitively, one can understand this result by
observing that if the scattering is rich enough, then there is an independent channel from each
transmission antenna to each reception antenna. Therefore, one can send independent signals
from each transmission antenna.
Both indoors [3] and outdoors [4] measurements have shown promising throughput gains
for such MIMO (multiple input multiple output) technologies. Therefore it is important to be
able to assess the capacity gains of MIMO technologies in realistic situations. Such situations
include the case where spatial correlations between transmission and/or reception antennas
exist, which tend to reduce the effective number of independent channels (paths) between the
transmitter and the receiver [5]. Another interesting situation corresponds to the case where
the transmitter has partial knowledge of the channel itself. This may arise when the channel
feedback to the transmitter from the receiver (who is assumed to always know the channel)
may be impaired or noisy. This channel is usually called the Ricean channel in the literature.
For large antenna numbers and Gaussian channel distributions the analysis of MIMO
ergodic capacities (expectation value of mutual information over channel realizations) is
greatly facilitated by the use of asymptotic techniques of random matrix theory (RMT). These
methods were introduced in this context by various authors, starting with Foschini [1] and
Telatar [2]. Verdú and Shamai [6] derived the expression of the capacity for infinite antennas
with uncorrelated channels in the context of CDMA codes, and more recently Rapajic and
Popescu [7] applied it in the context of multi-antenna systems. Lozano and Tulino [8] using
methods developed by Tse and Hanley [9] calculated the infinite antenna-number capacity
with spatially uncorrelated channels and uncorrelated interferers. Also, Chuah et al [10]
extended the results of [6] to calculate the mutual information of spatially correlated channels
in the infinite antenna limit. In all previous studies, the capacity has been assumed to be
asymptotically proportional to min(nt , nr ) and only the corresponding proportionality factor
was studied as the number of antennas grows indefinitely. However, for finite antenna numbers
or for slowly decaying spatial correlations the capacity of arrays with finite antenna numbers
cannot be simply described as linear in the antenna number. For example, as shown in figure 2
of [5], for square arrays with λ/2 spacing the capacity per antenna does not reach the limiting
value even at 10 000 antennas per array! In [5], using techniques developed by Sengupta and
Mitra [11] in a different context, a method was developed to analytically calculate the capacity
of spatially correlated channels for large but finite antenna numbers.
In this paper we extend work done in [5, 12, 13] to provide analytic expressions for the
statistics of the mutual information in the case of the Ricean channel. We apply a method from
physics, known as the replica approach, to analyse the resulting random matrix problems. The
replica approach, first introduced in [14], has been used heavily in physics for understanding
random systems [15, 16]. One of the first applications of this method in communication theory
was by Sourlas [17] in the field of error correcting codes. More recently, this method has seen
increased application in the field of information theory [18–20].
We will use the replica method to average over the channel realizations and obtain moments
of the distribution of the mutual information. We find that for large antenna numbers n, the
second and third moments of the mutual information is of O(1) and O(1/n) respectively, which
shows that the mutual information distribution approaches a Gaussian. It should be noted that
recently the generating function of the mutual information for the Ricean channel [21, 22],
as well as some other channels [23], was calculated in closed form using other methods.
Nevertheless, the replica method produces far simpler equations, while the equations resulting
from other methods become increasingly difficult to evaluate for large antenna numbers.
Surprisingly, the replica method also gives accurate results when applied to arrays with even
Random matrix theory of multi-antenna communications
10861
few (two or three) antennas. Thus, this analytic approach provides a powerful tool for analysing
antenna systems with even a few antennas.
In the remainder of this section we provide some notational definitions used in this paper
(section 1.1) and define several quantities of interest (section 1.2). In section 2 we describe
the mathematical framework of the methods used to calculate the statistics of the mutual
information. In the next section (section 3) we maximize over the input signal distribution to
calculate the maximum mutual information (channel capacity). In section 4 we discuss the
Gaussian character of the distribution by presenting a few numerical examples. Finally, in the
appendix we provide without proof several useful identities regarding complex integrals.
1.1. Notation
1.1.1. Vectors/matrices. Throughout this paper we will use bold-faced upper-case letters to
denote matrices, e.g. X, with elements given by Xab , bold-faced lower-case letters for column
vectors, e.g. x, with elements xa , and non-bold lower-case letters for scalar quantities. Also
the superscripts T and † will indicate transpose and Hermitian conjugate operations and In will
represent the n-dimensional identity matrix. In addition, K = M ⊗ N will depict the outer
product between the m × m-dimensional matrix M and the n × n-dimensional N.
1.1.2. Order of number of antennas O(nk ). We will be examining quantities in the limit
when all nt , nr , ni are large but their ratios are fixed and finite. We will denote collectively
the order in an expansion over the antenna numbers as O(n), O(1), O(1/n), etc, irrespective
of whether the particular term involves nt , nr .
1.1.3. Integral measures. In this paper we will be dealing with two general types of integrals
over matrix elements. We will therefore adopt the following notation for their corresponding
integration measures. In the first type we will be integrating over the real and imaginary parts
of the elements of a complex mrows × mcols matrix X. The integral measure will be denoted by
DX =
m
rows m
cols
a=1 α=1
d Re Xaα d Im Xaα
.
2π
(1)
The second type of integration is over pairs of complex square matrices T and R. Each
element of T and R will be integrated over a contour in the complex plane (to be specified).
The corresponding measure will be described as
dµ(T, R) =
m
rows m
cols
a=1 α=1
dTaα dRαa
2π i
(2)
1.2. Definitions
We consider the case of single-user transmission from nt transmit antennas to nr receive
antennas over a narrow band fading channel. The received nr -dimensional complex signal
vector y can be written as
ρ
Gx + z,
(3)
y=
nt
where G is an nr × nt complex matrix with the channel coefficients from the transmitting to
the receiving arrays, while x and z are nt - and nr -dimensional vectors of the transmitted signal
and the additive noise, respectively, both assumed to be zero-mean Gaussian. The signal
10862
A L Moustakas and S H Simon
covariance Q = E[xx† ] is normalized so that Tr Q = nt . For simplicity, the noise vector z is
assumed to be white with unit covariance, normalized so that E[zz† ] = Inr . Finally, ρ is the
signal-to-noise ratio (SNR). It is assumed that the receiver knows the channel matrix G and
ρ. The transmitter, on the other hand, knows only the statistics of the noise, p(y|{x, G}), as
well as the statistics of the channel p(G).
The associated mutual information, i.e. the reduced entropy of the random variable x,
gives the knowledge of the received signal y, can be expressed as [1]
ρ
†
(4)
h(x|G) − h(x|y, G) = I (y; x|G) = log det Inr + GQG .
nt
The log above (and throughout the whole paper) represents the natural logarithm and thus I is
expressed in nats.
Due to the underlying randomness of G, I is also a random quantity. The average mutual
information can then be expressed as
ρ
†
.
(5)
I (y; x|G) = log det Inr + GQG
nt
When this is maximized over the signal covariance Q, one obtains the capacity of the channel.
This is the maximum error-free information transmission rate when the channel matrix G
varies through its whole distribution p(G). This is why it is also called the ergodic capacity.
Another important metric of the information capacity of the channel, the so-called outage
capacity [24], is obtained by inverting the expression below with respect to Iout ,
Pout = Prob(I < Iout )
(6)
where Prob(I < Iout ) is the probability that the mutual information is less than a given value
Iout .
In this review we will analyse the case where G has Gaussian statistics with a non-zero
mean and unit covariance, i.e. with
√
Gaα = µaα pc
(7)
Gaα Gbβ ∗ − Gaα Gbβ = pd δαβ δab .
(8)
In the above, pd signifies the fraction of the power due to diffuse (ergodic) components of the
channel, while pc = 1 − pd is the stationary part of the channel, which is not averaged over.
These are related to the commonly used Ricean factor K, by K = pc /pd . We will also be
using the notation ρd = ρpd and ρc = ρpc for the diffuse and stationary fractions of the signal
to noise ratio.
The simple form of covariance matrix of (7) has been shown [5, 25] to be the leading
term in a controlled approximation for diffusive environments. More complicated correlations
between matrix elements have been analysed elsewhere [13]. The non-zero mean of G can be
thought to be present due to a non-diffusive and non-ergodic term in the propagation process,
such as a line-of-sight component.
We define the notation · to denote the ensemble average over channel realizations. Thus
for O(G), an arbitrary function of G, we have
1
(9)
O = DG exp − Tr{(G − µ)(G − µ)† } O.
2
Random matrix theory of multi-antenna communications
10863
2. Mathematical framework
The purpose of this paper is to analyse the statistics of the mutual information I in (4) for
Gaussian channels having statistics given by (7). In this section we introduce the mathematical
framework necessary for deriving analytic expressions of the cumulant moments of I. This
method was introduced in this context in [12, 13]. We first introduce the generating function
g(ν) of I
−ν
ρ
g(ν) = det Inr + GQG†
= e−νI
nt
= 1 − νI +
ν2 2
I + · · · .
2
(10)
Assuming that g(ν) is analytic at least in the vicinity of ν = 0, we can express log g(ν) as
follows:
log g(ν) = −νI +
∞
p=2
(−ν)p
Cp ,
p!
(11)
where Cp is the pth cumulant moment of I. For example, C2 = Var(I ) = (I − I )2 is the
variance and C3 = Sk(I ) = (I − I )3 is the skewness of the distribution. It should be noted
that I and Cp for p > 2 are implicit functions of Q and ρ. For notational simplicity we
will be suppressing this dependence. Thus to obtain the moments of the mutual information
distribution we need to calculate g(ν) for ν in the vicinity of ν = 0. This is not necessarily any
easier than evaluating the moments Cp directly, which is a notoriously difficult task, since one
has to average products of logarithms of random quantities. In contrast, averaging g(ν) for
integer values of ν involves averages over integer powers of determinants of random quantities,
in which case some analytic progress can be made. We will therefore make the following
assumption:
Assumption 1 (replica method). g(ν) evaluated for positive integer values of ν can be
analytically continued for real ν, specifically in the vicinity of ν = 0+ .
This assumption, used also in [15, 16, 20], alleviates the problem of dealing with averages
of logarithms of random quantities, since the logarithm is obtained after calculating g(ν). It
has seen widespread use in the field of physics for more than 25 years [14], and, in many cases
[11] can be been shown to produce exactly the same results as systematic series expansions.
Thus, the replica method can be seen essentially a bookkeeping tool. Here, we will be using
it without any direct proof, although we will be comparing some of our final results to Monte
Carlo simulations to demonstrate their validity.
Therefore, in the following analysis we will assume ν to be an arbitrary positive integer.
Using identity 1 in the appendix we can write g(ν) as
ρ
1
†
Tr{X† GQG† X}
(12)
g(ν) = DX e− 2 Tr{X X} exp −
2nt
where X is a complex nr × ν matrix. The
√ bracketed quantity in (12) can be rewritten,
using identity 2 (with A = B = Q1/2 G† X ρ/nt , where Q1/2 is the matrix square root of
Q, which is well defined since Q is non-negative definite) by introducing an integral over a
10864
A L Moustakas and S H Simon
complex nt × ν matrix Y,
ρ
exp −
Tr{X† GQG† X}
2nt
ρ
1
†
=
DY e− 2 Tr{Y Y} exp −
Tr{X† GQ1/2 Y − Y† Q1/2 G† X} .
4nt
(13)
At this point we can use identity 2 to integrate over G. Combining the result of (9), (12), (13),
g(ν) can be expressed as
1
ρd †
Y QYX† X
g(ν) = DX DY exp − Tr X† X + Y† Y +
2
2nt
ρc
Tr{X† µQ1/2 Y − Y† Q1/2 µ† X} .
(14)
× exp −
4nt
To make progress we now need to express the term in the exponent proportional to ρ d, in a
quadratic form in terms of X, Y. This can be done by using identity 3 and introducing ν × ν
matrices R, T . Thus the last term in the exponent of (14) becomes
ρd
ρd
exp
Tr{Y† QYX† X} = dµ(T , R) exp Tr{T R} −
Tr{T Y† QY + X† XR}
2nt
4nt
(15)
The application of identity 3 and the introduction of matrices R and T is a particular form of
a Hubbard–Stratonovich transformation. The usefulness of this method is in that it allows the
integration of certain quantities (X, Y) of limited relevance, and introduces auxiliary quantities
(such as R and T ), which will prove to have particular importance in the final answer. This is
in a sense a mean-field theory approach, which will end up being exact in the large-n limit.
Combining (14), (15) and using identity 1, we can now integrate X, Y, resulting in
g(ν) = dµ(T , R) e−S
(16)
where
ρd
− Tr{T R}
R
S = log det Inr ⊗ Iν +
nt
ρd
ρd
ρc
+ log det Int ⊗ Iν +
Q ⊗ T + Q1/2 µ† µQ1/2 ⊗ Iν +
R
nt
nt
nt
−1
.
(17)
In the above, ν is still a positive integer, which should be taken to zero following
assumption 1. However, before doing this, we will first take the limit of large antenna numbers
nt , nr ≫ 1. In this limit the saddle-point method of evaluating the integral (described below)
becomes accurate. Subsequently, we will take the ν → 0+ limit.
Assumption 2 (interchanging 2 limits). The limits n → ∞ and ν → 0+ in evaluating g(ν) in
(16) can be interchanged by first taking the former and then the latter without changing the
final answer.
As we shall see below, the two limits of large antenna numbers and small ν are related: higher
terms in the expansion in ν involve successively higher terms in a 1/n expansion.
Random matrix theory of multi-antenna communications
10865
We will now use the saddle-point method to calculate the above integral. In the case of
(16) and (17), when nt , nr ≫ 1, the exponent S is nominally of order O(νn). Following
assumption 2 and thus keeping ν fixed we may apply the saddle-point approximation to
calculate the integral of (16) and then take the ν → 0+ limit. It should be stressed that for a
fixed positive integer ν, the saddle-point analysis of S and g(ν) is a straightforward exercise
in asymptotic analysis. The only additional complexity to the standard textbook treatment of
this topic [26] is that S involves integrals over multiple variables (the elements of T , R).
Rather than search for saddle-point solutions over all possible complex T and R matrices,
we are going to invoke the following hypothesis without proof.
Assumption 3 (replica invariance). The relevant saddle-point solution for (16) involves
matrices T , R, which are invariant in replica space and thus are proportional to the identity
matrix Iν .
The above hypothesis, used heavily in physics [15, 16], basically states that there is no
preferred direction in the space of replicas, and thus if any saddle-point solution is valid, so is
any unitary transformation thereof in replica space. Although we are not going to provide a
proof here, it should be noted that in [11] it is shown that the results obtained by this method
are identical to those using a systematic expansion.
2.1. Saddle-point analysis
√
√
The assumed form of T and R at the saddle point is t nt Iν and r nt Iν , respectively. The
√
extra factor of nt has been included for convenience, as will become evident below. To
consider the vicinity around the saddle point, we thus rewrite T , R as
√
T = t nt Iν + δ T
(18)
√
R = r nt Iν + δ R
where δ T, δ R are ν × ν matrices representing deviations around the saddle point. One can
then expand S of (17) in a Taylor series of increasing powers of δ T, δ R as follows:
S = S0 + S1 + S2 + S3 + S4 + · · ·
(19)
with Sp containing pth-order terms in δ T, δ R. These terms can be obtained explicitly by
differentiating (17):
ρc Qu† u
√
√
= νŴ
(20)
S0 = ν nr log(1 + ρd r) − nt rt + Tr log Inr + ρd tQ +
√
nt 1 + r ρd
S1 =
√
ρc
Qu† u
ρd
Tr{δ R}
nt
nr − Tr
− nt t
√
√
√
√
1 + r ρd
nt
(1 + r ρd ) Inr + ρd tQ + ρnct Qu† u
√
√
ρd (1 + r ρd )Q
Tr{δ T}
ρc
+ Tr
− nt r
√
√
√
†
nt
(1 + r ρd ) Inr + ρd tQ + nt Qu u
1
S2 = Tr
2
δR
δT
T
δR
δT
,
(21)
(22)
10866
A L Moustakas and S H Simon
where the 2 × 2 Hessian Σ is given by
2
√
Inr + ρd tQ
ρd nr − nt
=
−
+
Tr
√
11
(1 + r√ρd )In + √ρd tQ + ρc Qu† u2
nt (1 + r ρd )2
r
nt
√
(1 + ρd r)2 Q2
ρd
Tr
22 = −
nt (1 + r√ρd ) In + √ρd tQ + ρc Qu† u 2
r
nt
2 †
ρ
µ
µ
ρ
Q
d c
.
Tr
12 = 21 = − 1 −
(1 + r√ρd ) In + √ρd tQ + ρc Qu† u 2
nt 2
r
(23)
nt
For p > 2 the expanded terms can be written as
nr − nt
(−1)p ρd p/2
Tr{(δ R)p }
Sp = −
√
p
nt
(1 + r ρd )p
√
√
Inr + t ρd Q δ R + (1 + r ρd Qδ T)p
+ Tr
p .
√
√
(1 + r ρd ) Inr + ρd tQ + ρnct Qu† u
(24)
The saddle-point solution of (16) and hence the corresponding values of t, r are found by
demanding that S is stationary with respect to variations in T , R [26], resulting in S1 = 0.
This produces the following saddle-point equations:
r
1
Q
Tr
(25)
=
√
√
√
√
ρd (1 + r ρd )
nt
(1 + r ρd ) Inr + ρd tQ + ρnct Qu† u
√
ρc
Qu† u
t (1 + r ρd )
nr
1
nt
=
− Tr
√
√
√
ρd
nt
nt
(1 + r ρd ) Inr + ρd tQ +
ρc
Qu† u
nt
.
(26)
It is interesting to note that the solutions to the above two equations extremize Ŵ for real and
positive t, r. It is important to note that for generic full rank matrices Q, u† u, both r and
t are generally of order unity, r, t = O(1). Thus the expansion coefficients multiplying the
terms δ Tp , δ Rp , etc are generally of order O(n1−p/2 ), successively decreasing in size for
increasing p.
The small parameter controlling this approximation is therefore n−1/2 , making this saddlepoint solution increasingly accurate for large n. Therefore, the aim of this analysis is to
calculate successively higher order terms in n−1/2 and classify each resulting quantity in terms
of their powers of ν to the appropriate cumulant moment Cp in (11). This matching of powers
of ν implicitly assumes that the expansion in (11) is valid for integer ν, as described in
assumption 1. Thus, for example, terms of orders O(νn) and O(ν/n) will both contribute to
I , as we shall see.
2.2. Ergodic mutual information
We start with the leading term of g(ν) in the saddle-point approximation: g(ν) ≈ exp(−S0 ) =
exp(−νŴ), where Ŵ is evaluated in (21) using (25) and (26). We thus see from (10), (11) that
the leading term in the expansion of I is Ŵ and note that I = O(n).
Random matrix theory of multi-antenna communications
10867
2.3. Variance of the mutual information
To obtain the O(ν 2 ) term in the expansion of log g(ν) in (11) we need to include the next
non-vanishing term, S2 . Thus, for the moment we neglect higher order terms Sp for p > 2.
Noting the measure-preserving transformation T , R → δ T, δ R of (18) we have
−S0
g(ν) = e
dµ(δ T, δ R) e−S2
ν
1
T
−S0
[δT1,ab δR1,ab ]Σ[δT1,ba δR1,ba ] .
(27)
=e
dµ(δ T, δ R) exp −
2 a,b=1
To diagonalize the exponent of the above equation, we rotate each pair δ T, δ R to a new basis
of ν × ν-dimensional matrices W1 and W2 . In particular, [W1,ab W2,ab ]T = U[δT1,ab δR1,ab ]T ,
where U is an orthogonal matrix such that UVUT is a diagonal matrix with the diagonal given
by v = [v1 v2 ]T , the vector of eigenvalues of Σ. We may now rewrite (27) as
ν
1
−S0
[v1 W1,ab W1,ba + v2 W2,ab W2,ba ] .
(28)
g(ν) = e
dµ(W1 , W2 ) exp −
2 a,b=1
We now take appropriate paths to integrate W1,2,ab , resulting in
ν2
ν2
g(ν) = e−S0 |v1 v2 |− 2 = e−S0 |det Σ|− 2 .
(29)
Comparing (11) to (29) and matching order by order the terms of the ν-Taylor expansion of the
exponent of g(ν), we can identify the leading term in the variance of the mutual information
to be
I 2 − I 2 = C2 = − log|det Σ| + · · · .
(30)
We note that since ij = O(1), the variance is also O(1) in the expansion of n−1/2 when
both nt and nr are of the same order. (However, if nr is fixed while nt increases, we find
that C2 = O(n−1 ), in agreement with [27].) Also we see that no term proportional to ν is
produced from S2 . Thus no term of O(1) in the antenna number appears in I , resulting in
I = Ŵ + o(1).
2.4. Higher order terms
To obtain higher order corrections in the small parameter n−1/2 , we need to take into account
the terms Sp for p > 2 in (24). Details of the method can be found elsewhere [13]. For
simplicity, here we will briefly describe how to set up the perturbation expansion and discuss
its implications to the distribution of the mutual information for large n.
We define an expectation bracket of an arbitrary operator O(δ T, δ R) as
ν 2 /2
O = |det Σ|
dµ(δ T, δ R) e−S2 O(δ T, δ R).
(31)
The integration over δ T, δ R is performed as described in the previous section. We can obtain
the expectations of quadratic terms in δ T, δ R, written here in a compact form as follows:
[δRab δTab ][δRcd δTcd ]T = δad δbc Σ−1 .
(32)
In addition, the expectation of any odd power of δ T, δ R must vanish by symmetry. As a
result, only integer powers of 1/n survive in the perturbative expansion.
10868
A L Moustakas and S H Simon
With this bracket notation we can rewrite
g(ν) = e−S0 e−S2 e−(S3 +S4 +···)
!!
""
2
= e−νŴ |det Σ|−ν /2 1 − (S3 + S4 + · · ·) + 21 (S3 + S4 + · · ·)2 + · · ·
(33)
(34)
At this point, it is interesting to count powers of n in the various terms of the expansion. Using
simple power counting arguments, we see that the term Sp is of order n−p/2+1 , but vanishes
for p odd. Also, Sp Sq is of order n−p/2−q/2+2 but is zero for p + q odd, and so forth. By
regrouping the terms in the above expansion by their order in 1/n we obtain the following
expansion:
where
g(ν) = e−S0 [|det |−ν
2
/2
][1 + D1 + D2 + D3 + · · ·]
""
!!
D1 = S4 + 12 S32
!!
D2 = S6 + 12 S42 + S3 S5 +
..
.
1 4
S
24 3
""
(35)
(36)
Here we have regrouped all terms which are of order 1/np into the term Dp . Thus, for
example, D1 contains all terms of order 1/n. As in [13], we can evaluate the averages above
using (32) and Wick’s theorem. We find that D1 produces 1/n corrections to the mean mutual
information and its skewness. All additional terms Dp provide higher order corrections in
1/n. We thus see that in the large n limit only the first two moments of the distribution are
finite. Therefore, asymptotically the mutual information distribution approaches a Gaussian.
2.5. Summary of results
To summarize, we have derived the following results, with Ŵ given by (20), Σ given by (23),
and with the parameters r and t given by (25) and (26),
I = Ŵ + O(1/n)
(37)
C2 = Var(I ) = I 2 − I 2 = −log|det Σ| + O(1/n2 )
(38)
C3 = Sk(I ) = O(1/n).
(39)
The expansion can be continued to higher order in 1/n straightforwardly. We thus see that for
large n the distribution of the mutual information approaches a Gaussian. Similar results have
been obtained for other types of statistics of the Gaussian matrix G [13].
3. Capacity-achieving signal covariance Q
As discussed in section 1.2, instead of instantaneous channel information, the transmitter has
statistical information for the channel G, namely only µ and ρ are known. Based on this
information, the signal covariance can be optimized to maximize a particular metric of the
mutual information distribution. Here we describe how to optimize Q, in order to maximize
the average mutual information, keeping only the O(n) term in (21), i.e. to find maxQ I , in
the large antenna limit.
We start by observing that Ŵ depends on Q through the last term in (21). Expressing the
determinant of this term in the eigen-basis of µ† µ, it can be written as
ρc M̃
√
(40)
ρd tInt +
det Int + Q̃
√
nt (1 + r ρd )
Random matrix theory of multi-antenna communications
10869
where M̃ is a diagonal matrix with the eigenvalues of µ† µ/nt , given by µ2k for k = 1, . . . , nt
√
on the diagonals, where µk are the singular values of µ/ nt , and Q̃ is the original matrix Q
expressed in the eigen-basis
of µ† µ. We now use the fact that for any non-negative definite
#
matrix A, det(A) k Akk , where Akk are the diagonal elements [2]. Applying this inequality
to (40) we get
nt
ρc µ2k
ρc
√
1 + Q̃kk
ρd t +
M̃
√
√
nt (1 + r ρd )
(1 + r ρd )
k=1
√
det Int + Q̃
ρd tInt +
(41)
with equality when Q̃ is diagonal. Thus the Q maximizing Ŵ is simultaneously diagonalizable
with µ† µ.
Once the optimal eigen-basis of Q has been determined to be the same as µ† µ, one needs
to find its optimal eigenvalues qk for k = 1, . . . , nt . Ŵ has to be optimized subject to the power
constraint Tr Q = nt . This constraint is enforced by adding a Lagrange multiplier to Ŵ, i.e.
n
t
Ŵ→Ŵ−
k=1
(42)
qk − nt .
Incorporating the Lagrange multiplier to (21) and maximizing, it is easy to see that the optimal
eigenvalues of Q are then given by
qk =
1
√
1 + r ρd
−√
√
ρd t (1 + r ρd ) + ρc µ2k
where [x]+ = {x + sgn(x)}/2. Here,
,
(43)
+
> 0 is determined by imposing the power constraint
nt
Tr Q =
k=1
qk = nt
(44)
Solving (43), (44) together with (26), (25) allows us to calculate I and Var(I ) in (37),
(38) to obtain the ergodic capacity and the variance of the distribution around it. Note that
the optimization over Q at the transmitter is based on statistical rather than instantaneous
information about the channel. Therefore it depends on statistical quantities (µ† µ and ρ rather
than G itself). As a result, the transmitter needs to be updated about the channel information
at a relatively slow rate.
4. Validity of Gaussian approximation N (I, Var(I))
In section 2.5 we saw that in the limit of large n, the distribution of the mutual information
approaches a Gaussian with mean equal to Ŵ and variance the calculated variance of I, in the
sense that all higher moments and corrections tend to zero. Surprisingly, this approximation
is valid for even a small number of antennas. We demonstrate this property by comparing
numerically the Gaussian distribution N (I , Var(I )) calculated using (21) and (38) with
the simulated distribution resulting from the generation of a large number of random matrix
realizations. This comparison can be seen in figure 1, where indeed the agreement not only
to the Gaussian distribution but to its correct means and variances is striking for both small
10870
A L Moustakas and S H Simon
Probability Distribution of mutual information for nt=nr=2,10
1
0.9
0.8
0.7
Percentage
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.15
0.2
0.25
Mutual Information (nats)
0.3
0.35
Figure 1. Cumulative distribution (CDF) of mutual information per antenna (I /nt ) for
nr = nt = 2, 10. The signal to noise ratio is set to ρ = 1, while the diffuse component is
50% of the power ρd = ρc = 0.5. The dotted lines correspond to the numerically generated
curves, while the solid lines are the theoretical ones, generated as Gaussian distributions with mean
and variance the ones obtained by applying the methods of this paper. We see that the agreement
is very good. We also see that for n = 10 the distribution is more narrowly peaked. In both
cases the non-zero mean component of the channel (µ) was generated randomly from a Gaussian
distribution. Among the pairs corresponding to the same number of antennas, those to the left have
the transmission covariance matrix Q optimized with respect to the known µ† µ, while those to the
right simply have Q = Int .
n = 2 and large n = 10 antenna numbers. This allows us to accurately calculate not only the
ergodic capacity but also the outage capacity, defined in (6).
5. Conclusion
In conclusion, we have presented an analytic approach to calculate the statistics of the mutual
information of MIMO systems for the case of Ricean statistics. To this end we applied tools
developed for the analysis of mesoscopic systems, such as replicas random matrix theory.
In addition, we have used this method to find the optimal signal covariance Q and thus
analytically calculate the capacity when the statistics of the (Gaussian) channel are known at
the transmitter. These methods, although formally valid for large antenna numbers, apply with
very high accuracy to arrays with only few number of antennas. This allows us to accurately
evaluate the outage capacity for any number of antennas. We demonstrated this by comparing
to numerical simulations.
This analytic approach provides the framework and a simple tool to accurately analyse
the statistics of throughput of even small arrays. It is a simple example where mesoscopic
physics methods can have technological applications.
Random matrix theory of multi-antenna communications
10871
Acknowledgments
We wish to acknowledge enlightening discussions with Harold Baranger and Anirvan
Sengupta. After this work was completed we became aware of a work [28] that using
different methods provides analytic results only for the average of the mutual information in
the infinite antenna limit.
Appendix. Complex integrals
Identity 1. Let M be a Hermitian positive-definite square m × m matrix, and let X be a
complex m × n matrix. Then
1
†
−n
(A.1)
(det M) = DX e− 2 Tr{X MX}
where the integration measure DX is given by (1).
Identity 2. Let X, A, B be m × n complex matrices. Then, the following equality holds:
1
1
†
†
†
†
DX exp − Tr{X X + A X − X B} = exp − Tr{A B} .
(A.2)
2
2
Identity 3 (Hubbard–Stratonovich transformation). Let U, V be arbitrary complex ν × ν
matrices, where ν is assumed to be an arbitrary positive integer. Then the following identity
holds:
exp[−Tr{UV}] = dµ(T, R) exp[Tr{RT − UT − RV}].
(A.3)
In the above equation, the auxiliary matrices T, R are general complex matrices ν × ν and
their integration measure is given by (2). The integration of the elements of R and T is along
contours in complex space parallel to the real and imaginary axis.
References
[1] Foschini G J and Gans M J 1998 On limits of wireless communications in a fading environment when using
multiple antennas Wirel. Pers. Commun. 6 311–35
[2] Telatar I E 1999 Capacity of multi-antenna gaussian channels Eur. Trans. Telecommun. Relat. Technol. 10
585–96
[3] Wolniansky P W, Foschini G J, Golden G D and Valenzuela R A 1998 V-BLAST: an architecture for realizing
very high data rates over the rich-scattering wireless channel URSI Int. Symp. Signals Syst. Electronics
pp 295–300
[4] Ling J, Chizhik D, Wolniansky P and Valenzuela R 2001 Multiple transmit multiple receive (MTMR) capacity
survey in Manhattan IEEE Electronics Lett. 37 1041–2
[5] Moustakas A L, Baranger H U, Balents L, Sengupta A M and Simon S H 2000 Communication through a
diffusive medium: coherence and capacity Science 287 287–90 (Preprint cond-mat/0009097)
[6] Verdú S and Shamai S 1999 Spectral efficiency of CDMA with random spreading IEEE Trans. Inform. Theory
45 622–40
[7] Rapajic P B and Popescu D 2000 Information capacity of a random signature multiple-input multiple-output
chanel IEEE Trans. Commun. 48 1245
[8] Lozano A and Tulino A M 2002 Capacity of multiple-transmit multiple-receive antenna architectures IEEE
Trans. Inform. Theory 48 3117–28
[9] Tse D N and Hanly S V 1999 Linear multiuser receivers: effective interference, effective bandwidth and user
capacity IEEE Trans. Inform. Theory 45 641
[10] Chuah C N, Tse D, Kahn J and Valenzuela R A 2002 Capacity scaling in MIMO wireless systems under
correlated fading IEEE Trans. Inform. Theory 48 637
10872
A L Moustakas and S H Simon
[11] Sengupta A M and Mitra P P 1999 Distributions of singular values for some random matrices Phys. Rev. E 60
3389–92
[12] Sengupta A M and Mitra P P 2000 Capacity of multivariate channels with multiplicative noise: I. Random
matrix techniques and large-n expansions for full transfer matrices Preprint physics/0010081
[13] Moustakas A L, Simon S H and Sengupta A M 2003 MIMO capacity through correlated channels in the presence
of correlated interferers and noise: a (not so) large N analysis IEEE Trans. Inform. Theory 45 2545–61
[14] Edwards S F and Anderson P W 1975 Theory of spin glasses J. Phys. F: Met. Phys. 5 965–74
[15] Mézard M, Parisi G and Virasoro M A 1987 Spin Glass Theory and Beyond (Singapore: World Scientific)
[16] Itzykson C and Drouffe J-M 1989 Statistical Field Theory (Cambridge: Cambridge University Press)
[17] Sourlas N 1989 Spin glass models as error correcting codes Nature 339 693–5
[18] Montanari A and Sourlas N 2000 Statistical mechanics of turbo codes Eur. Phys. J. B 18 107–19
[19] Montanari A 2000 Turbo codes: the phase transition Eur. Phys. J. B 18 121–36
[20] Tanaka T 2002 A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors IEEE
Trans. Inform. Theory 48 2888–910
[21] Kang M and Alouini M-S 2002 Capacity of MIMO Rician channels Proc. 40th Annual Conference on
Communication, Control, and Computing (Monticello, IL)
[22] Simon S H, Moustakas A L and Marinelli L 2004 Capacity and character expansions: moment generating
function and other exact results for MIMO correlated channels (submitted) (Preprint cs.IT/0509080)
[23] Simon S H and Moustakas A L 2004 Eigenvalue density of correlated random Wishart matrices Phys. Rev. E
69 (Preprint math-ph/0401038)
[24] Ozarow L H, Shamai S and Wyner A D 1994 Information theoretic considerations for cellular mobile radio
IEEE Trans. Veh. Technol. 43 359–78
[25] Simon S H, Moustakas A L, Stoytchev M and Safar H 2001 Communication in a disordered world Phys. Today
September 38–43
[26] Bender C M and Orszag S A 1978 Advanced Mathematical Methods for Scientists and Engineers (New York:
McGraw-Hill)
[27] Hochwald B M, Marzetta T L and Tarokh V 2004 Multi-antenna channel hardening and its implications for rate
feedback and scheduling IEEE Trans. Inform. Theory 50 1893–909
[28] Dumont J, Loubaton P, Lasaulce S and Debbah M 2005 On the asymptotic performance of MIMO correlated
Ricean channels IEEE Int. Conf. Acoustics, Speech and Signal Processing