TELE9754 L1-ProbTheory

UNSW
TELE9754 Coding & Information Theory

- Probability Theory
Wei Zhang
School of Electrical Engineering & Telecommunications

The University of New South Wales, Sydney, Australia
E-mail: w.zhang@unsw.edu.au
T3 2024
Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 1 / 33
Outline
Probability Theory
Random Process
Outline
Probability Theory
Random Process
Random Signals
I Random Signals and Noise - “random” is used to describe

erratic and apparently unpredictable variations of an
observed signal.
I This randomness or unpredictability is a fundamental
property of information.
I Noise may be defined as any unwanted signal interfering
with or distorting the signal being communicated.
Probability and Random Variables
Relative-Frequency Approach
I The relative frequency is a nonnegative real number less
than or equal to one.
nA
0≤ ≤1
n
I The experiment exhibits statistical regularity if, for any
sequence of n trials, the relative frequency converges to a
limit as n becomes large.
nA
P(A) = lim
n→∞ n
Sample Space
Random Variables
I A function whose domain is a sample space and whose

range is a set of real numbers is called a random variable
of the experiment.
I There may be more than one random variable associated
with the same random experiment. The concept of a
random variable is illustrated in Fig.8.3.
I For a discrete-valued random variable, the probability
mass function describes the probability of each possible
value of the random variable.
Random Variables
Distribution Function
Distribution function F X (x) is defined as
F X (x) = P[X ≤ x]
Two basic properties

I The distribution function F X (x) is bounded between zero
and one.
I The distribution function F X (x) is a monotone
nondecreasing function of x; that is,
F X (x1 ) ≤ F X (x2 ), if x1 ≤ x2
Probability Density Function
Probability density function is denoted by
d
fX (x) = F X (x)
dx
Three basic properties:
1. Since the distribution function is monotone nondecreasing,
it follows that the density function is nonnegative for all
values of x.
2. The distribution function may be recovered from the
density function by integration, as shown by
Z x
F X (x) = fX (s)ds
−∞
3. Property 2 implies that the total area under the curve of the
density function is unity.
Uniform Distribution
Joint Random Variables
Joint distribution function F X,Y (x, y)
F X,Y (x, y) = P[X ≤ x, Y ≤ y]
Joint probability density function fX,Y (x, y)
∂2 F X,Y (x, y)
fX,Y (x, y) =
∂x∂y
Marginal density distribution

Z ∞
fX (x) = F X,Y (x, y)dy
−∞
Z ∞
fY (y) = F X,Y (x, y)dx
−∞
Conditional Probability
P[Y|X] denotes the probability of Y given that X has occurred.
P[X, Y]
P[Y|X] =
P(X)
P[X|Y]: The probability of X given that Y has occurred.

P[X, Y]
P[X|Y] =
P(Y)
Statistically independent if the outcome of X does not affect the

outcome Y. That is,
P[Y|X] = P[Y]
P[X|Y] = P[X]
P[X, Y] = P[X]P[Y]
Expectation
For a discrete random variable X, the mean is the weighted

sum of the possible outcomes.
X
µX = E[X] = xP[X = x]
x
For a continuous random variable X, the expected value is

Z ∞
µX = E[X] = x fX (x)dx
−∞
Variance
For a discrete random variable X, the variance is
σ2X = Var(X) = E[(X − µX )2 ] = E[X 2 ] − µ2X
For a continuous random variable X, the variance is

Z ∞
σ2X = Var(X) = (x − µX )2 fX (x)dx
−∞
Covariance
The covariance of two random variables, X and Y, is given by
Cov(X, Y) = E[(X − µX )(Y − µY )] = E[XY] − µX µY
where Z ∞ Z ∞
E[XY] = xy fX,Y (x, y)dxdy
−∞ −∞
If X and Y are independent, it has
Z ∞Z ∞
E[XY] = xy fX (x) fY (y)dxdy = E[X]E[Y]
−∞ −∞
Then, Cov(X, Y) = 0, that is, the covariance of independent

random variables is zero. However, zero covariance does not,
in general, imply independence.
Transformation of Random Variables
Suppose the distribution function of X is F X (x). Let Y = aX + b

(a > 0). Then, what is the distribution function of Y?
!
y−b y−b
FY (y) = P[Y < y] = P[aX + b < y] = P[X < ] = FX
a a
In general, if Y = g(X), then
FY (y) = F X (g−1 (y))
Gaussian Random Variables
2
1 − (x−µ)2
fX (x) = √ e 2σ
2πσ2
1. A Gaussian RV is completely characterized by its mean µ

and variance σ.
2. A Gaussian RV plus a constant is another Gaussian RV
with the mean adjusted by the constant.
3. A Gaussian RV multiplied by a constant is another
Gaussian RV where both the mean and variance are
affected by the constant.
4. The weighted sum of N independent Gaussian RVs is a
Gaussian RV.
5. If two Gaussian RVs have zero covariance (uncorrelated),
they are also independent.
Gaussian Random Variables
For the special case of a Gaussian random variable with µ = 0
and σ = 1, called the normalized Gaussian RV, the pdf is
1 x2
fX (x) = √ e− 2
2π
Its distribution function is
Z x Z x
1 s2
F X (x) = fX (s)ds = √ e− 2 ds
−∞ 2π −∞
Q function is the complement of the normalized Gaussian
distribution function, given by
Z ∞ 2
1 s
Q(x) = 1 − F X (x) = √ e− 2 ds
2π x
Gaussian Distribution
Q Function
The Central Limit Theorem
Suppose
1. The Xk with k = 1, 2, 3, · · · , N are statistically independent.
2. The Xk all have the same probability density function.
3. Both the mean and the variance exist for each Xk .
Let
N
X
Y= Xk
k=1
The normalized random variable

Y − E[Y]
Z=
σY
approaches a zero-mean Gaussian random variable with unit
variance.
Computer Experiment :
We consider the random variable
N
X
Z= Xk
k=1
where the Xk , k = 1, 2, · · · , N are independent, uniformly

distributed RVs on the interval from −1 to +1.
In the computer experiment, we compute 20, 000 samples of Z
for N = 5, and estimate the corresponding density function by
forming a histogram of the results.
The results indicate how powerful the central limit theorem is

and explain why Gaussian models are ubiquitous in the
analysis of random signals in communications and elsewhere.
Outline
Probability Theory
Random Process
Random Process
Random processes have the following properties:
I Random processes are functions of time.
I Random processes are random in the sense that it is not
possible to predict exactly what waveform will be observed
in the future.
Suppose that we assign to each sample point s a function of
time with the label
X(t, s), −T < t < T
For a fixed sample point s j , the sample function or a realization

is
x j (t) = X(t, s j )
For a fixed time tk , the set of numbers is a random variable
{X(tk , s1 ), X(tk , s2 ), · · · , X(tk , sn )} = X(tk )
Random Process
Some Concepts
I Stationary Process: If a random process is divided into a

number of time intervals, the various sections of the
process exhibit essentially the same statistical properties.
I Covariance of the two random variables X(t1 ) and X(t2 ) is
given by
Cov(X(t1 ), X(t2 )) = E[X(t1 )X(t2 )] − E[X(t1 )]E[X(t2 )]
I Autocorrelation of Random Process
RX (t, s) = E[X(t)X ∗ (s)]
I For stationary process,
RX (t, s) = RX (t − s)
Wide-sense Stationary Random Process
If a random process has the following two properties, then we

say it is wide-sense stationary.
1. The mean of the random process is a constant
independent of time, E[X(t)] = µX for all t.
2. The autocorrelation of the random process only depends
upon the time difference:
E[X(t)X ∗ (t − τ)] = RX (τ)
for all t and τ.
Properties of Autocorrelation Function
For a real-valued random process X(t), it has three properties.

Property 1 Power of a Wide-Sense Stationary Process
RX (0) = E[X(t)X(t)] = E[|X(t)|2 ]
Property 2 Symmetry
RX (τ) = E[X(t)X(t − τ)] = E[X(t − τ)X(t)] = RX (−τ)
Property 3 Maximum Value
RX (τ) ≤ RX (0)
for any τ.
Autocorrelation
Ergodic Process
Ensemble averages of the random process at time t = tk

N
1 X
E[X(tk )] = x j (tk )
N j=1
The time average of a continuous sample function drawn from a

real-valued process is given by
Z T
1
E[x(t)] = lim x(t)dt
T →∞ 2T −T
A process is said to be ergodic if its statistical properties (such

as its mean and variance) can be deduced from a single,
sufficiently long sample (realization) of the process. In other
words, those statistics do not change with time.
Reference
Most contents of this lecture note are adapated from Chapter 8

of the following textbook, including some figures.
Simon Haykin and Michael Moher, Introduction to Analog and

Digital Communications, Second Edition, John Wiley & Sons,
2006.

TELE9754 L1-ProbTheory

Uploaded by

Copyright:

Available Formats

TELE9754 L1-ProbTheory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TELE9754 L1-ProbTheory

Uploaded by

Copyright:

Available Formats

UNSW

TELE9754 Coding & Information Theory

School of Electrical Engineering & Telecommunications

I Random Signals and Noise - “random” is used to describe

I A function whose domain is a sample space and whose

Distribution function F X (x) is defined as

Two basic properties

F X,Y (x, y) = P[X ≤ x, Y ≤ y]

Joint probability density function fX,Y (x, y)

Marginal density distribution

P[X|Y]: The probability of X given that Y has occurred.

Statistically independent if the outcome of X does not affect the

For a discrete random variable X, the mean is the weighted

For a continuous random variable X, the expected value is

For a discrete random variable X, the variance is

σ2X = Var(X) = E[(X − µX )2 ] = E[X 2 ] − µ2X

For a continuous random variable X, the variance is

The covariance of two random variables, X and Y, is given by

Cov(X, Y) = E[(X − µX )(Y − µY )] = E[XY] − µX µY

Then, Cov(X, Y) = 0, that is, the covariance of independent

Suppose the distribution function of X is F X (x). Let Y = aX + b

In general, if Y = g(X), then

FY (y) = F X (g−1 (y))

1. A Gaussian RV is completely characterized by its mean µ

The normalized random variable

where the Xk , k = 1, 2, · · · , N are independent, uniformly

The results indicate how powerful the central limit theorem is

X(t, s), −T < t < T

For a fixed sample point s j , the sample function or a realization

{X(tk , s1 ), X(tk , s2 ), · · · , X(tk , sn )} = X(tk )

I Stationary Process: If a random process is divided into a

Cov(X(t1 ), X(t2 )) = E[X(t1 )X(t2 )] − E[X(t1 )]E[X(t2 )]

I Autocorrelation of Random Process

RX (t, s) = E[X(t)X ∗ (s)]

I For stationary process,

If a random process has the following two properties, then we

E[X(t)X ∗ (t − τ)] = RX (τ)

for all t and τ.

For a real-valued random process X(t), it has three properties.

RX (0) = E[X(t)X(t)] = E[|X(t)|2 ]

RX (τ) = E[X(t)X(t − τ)] = E[X(t − τ)X(t)] = RX (−τ)

Property 3 Maximum Value

Ensemble averages of the random process at time t = tk

The time average of a continuous sample function drawn from a

A process is said to be ergodic if its statistical properties (such

Most contents of this lecture note are adapated from Chapter 8

Simon Haykin and Michael Moher, Introduction to Analog and

You might also like