Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

TELE9754 L1-ProbTheory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

UNSW

TELE9754 Coding & Information Theory


- Probability Theory

Wei Zhang

School of Electrical Engineering & Telecommunications


The University of New South Wales, Sydney, Australia

E-mail: w.zhang@unsw.edu.au

T3 2024

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 1 / 33
Outline

Probability Theory

Random Process

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 2 / 33
Outline

Probability Theory

Random Process

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 3 / 33
Random Signals

I Random Signals and Noise - “random” is used to describe


erratic and apparently unpredictable variations of an
observed signal.
I This randomness or unpredictability is a fundamental
property of information.
I Noise may be defined as any unwanted signal interfering
with or distorting the signal being communicated.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 4 / 33
Probability and Random Variables

Relative-Frequency Approach
I The relative frequency is a nonnegative real number less
than or equal to one.
nA
0≤ ≤1
n
I The experiment exhibits statistical regularity if, for any
sequence of n trials, the relative frequency converges to a
limit as n becomes large.
nA
P(A) = lim
n→∞ n

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 5 / 33
Sample Space

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 6 / 33
Random Variables

I A function whose domain is a sample space and whose


range is a set of real numbers is called a random variable
of the experiment.
I There may be more than one random variable associated
with the same random experiment. The concept of a
random variable is illustrated in Fig.8.3.
I For a discrete-valued random variable, the probability
mass function describes the probability of each possible
value of the random variable.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 7 / 33
Random Variables

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 8 / 33
Distribution Function

Distribution function F X (x) is defined as

F X (x) = P[X ≤ x]

Two basic properties


I The distribution function F X (x) is bounded between zero
and one.
I The distribution function F X (x) is a monotone
nondecreasing function of x; that is,

F X (x1 ) ≤ F X (x2 ), if x1 ≤ x2

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 9 / 33
Probability Density Function
Probability density function is denoted by
d
fX (x) = F X (x)
dx
Three basic properties:
1. Since the distribution function is monotone nondecreasing,
it follows that the density function is nonnegative for all
values of x.
2. The distribution function may be recovered from the
density function by integration, as shown by
Z x
F X (x) = fX (s)ds
−∞

3. Property 2 implies that the total area under the curve of the
density function is unity.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 10 / 33
Uniform Distribution

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 11 / 33
Joint Random Variables
Joint distribution function F X,Y (x, y)

F X,Y (x, y) = P[X ≤ x, Y ≤ y]

Joint probability density function fX,Y (x, y)

∂2 F X,Y (x, y)
fX,Y (x, y) =
∂x∂y

Marginal density distribution


Z ∞
fX (x) = F X,Y (x, y)dy
−∞
Z ∞
fY (y) = F X,Y (x, y)dx
−∞

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 12 / 33
Conditional Probability
P[Y|X] denotes the probability of Y given that X has occurred.
P[X, Y]
P[Y|X] =
P(X)

P[X|Y]: The probability of X given that Y has occurred.


P[X, Y]
P[X|Y] =
P(Y)

Statistically independent if the outcome of X does not affect the


outcome Y. That is,
P[Y|X] = P[Y]
P[X|Y] = P[X]
P[X, Y] = P[X]P[Y]

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 13 / 33
Expectation

For a discrete random variable X, the mean is the weighted


sum of the possible outcomes.
X
µX = E[X] = xP[X = x]
x

For a continuous random variable X, the expected value is


Z ∞
µX = E[X] = x fX (x)dx
−∞

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 14 / 33
Variance

For a discrete random variable X, the variance is

σ2X = Var(X) = E[(X − µX )2 ] = E[X 2 ] − µ2X

For a continuous random variable X, the variance is


Z ∞
σ2X = Var(X) = (x − µX )2 fX (x)dx
−∞

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 15 / 33
Covariance

The covariance of two random variables, X and Y, is given by

Cov(X, Y) = E[(X − µX )(Y − µY )] = E[XY] − µX µY

where Z ∞ Z ∞
E[XY] = xy fX,Y (x, y)dxdy
−∞ −∞
If X and Y are independent, it has
Z ∞Z ∞
E[XY] = xy fX (x) fY (y)dxdy = E[X]E[Y]
−∞ −∞

Then, Cov(X, Y) = 0, that is, the covariance of independent


random variables is zero. However, zero covariance does not,
in general, imply independence.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 16 / 33
Transformation of Random Variables

Suppose the distribution function of X is F X (x). Let Y = aX + b


(a > 0). Then, what is the distribution function of Y?

!
y−b y−b
FY (y) = P[Y < y] = P[aX + b < y] = P[X < ] = FX
a a

In general, if Y = g(X), then

FY (y) = F X (g−1 (y))

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 17 / 33
Gaussian Random Variables
2
1 − (x−µ)2
fX (x) = √ e 2σ
2πσ2

1. A Gaussian RV is completely characterized by its mean µ


and variance σ.
2. A Gaussian RV plus a constant is another Gaussian RV
with the mean adjusted by the constant.
3. A Gaussian RV multiplied by a constant is another
Gaussian RV where both the mean and variance are
affected by the constant.
4. The weighted sum of N independent Gaussian RVs is a
Gaussian RV.
5. If two Gaussian RVs have zero covariance (uncorrelated),
they are also independent.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 18 / 33
Gaussian Random Variables
For the special case of a Gaussian random variable with µ = 0
and σ = 1, called the normalized Gaussian RV, the pdf is
1 x2
fX (x) = √ e− 2

Its distribution function is
Z x Z x
1 s2
F X (x) = fX (s)ds = √ e− 2 ds
−∞ 2π −∞
Q function is the complement of the normalized Gaussian
distribution function, given by
Z ∞ 2
1 s
Q(x) = 1 − F X (x) = √ e− 2 ds
2π x

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 19 / 33
Gaussian Distribution

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 20 / 33
Q Function

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 21 / 33
The Central Limit Theorem

Suppose
1. The Xk with k = 1, 2, 3, · · · , N are statistically independent.
2. The Xk all have the same probability density function.
3. Both the mean and the variance exist for each Xk .
Let
N
X
Y= Xk
k=1

The normalized random variable


Y − E[Y]
Z=
σY
approaches a zero-mean Gaussian random variable with unit
variance.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 22 / 33
The Central Limit Theorem

Computer Experiment :
We consider the random variable
N
X
Z= Xk
k=1

where the Xk , k = 1, 2, · · · , N are independent, uniformly


distributed RVs on the interval from −1 to +1.
In the computer experiment, we compute 20, 000 samples of Z
for N = 5, and estimate the corresponding density function by
forming a histogram of the results.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 23 / 33
The Central Limit Theorem

The results indicate how powerful the central limit theorem is


and explain why Gaussian models are ubiquitous in the
analysis of random signals in communications and elsewhere.
Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 24 / 33
Outline

Probability Theory

Random Process

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 25 / 33
Random Process
Random processes have the following properties:
I Random processes are functions of time.
I Random processes are random in the sense that it is not
possible to predict exactly what waveform will be observed
in the future.
Suppose that we assign to each sample point s a function of
time with the label

X(t, s), −T < t < T

For a fixed sample point s j , the sample function or a realization


is
x j (t) = X(t, s j )
For a fixed time tk , the set of numbers is a random variable

{X(tk , s1 ), X(tk , s2 ), · · · , X(tk , sn )} = X(tk )

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 26 / 33
Random Process

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 27 / 33
Some Concepts

I Stationary Process: If a random process is divided into a


number of time intervals, the various sections of the
process exhibit essentially the same statistical properties.
I Covariance of the two random variables X(t1 ) and X(t2 ) is
given by

Cov(X(t1 ), X(t2 )) = E[X(t1 )X(t2 )] − E[X(t1 )]E[X(t2 )]

I Autocorrelation of Random Process

RX (t, s) = E[X(t)X ∗ (s)]

I For stationary process,

RX (t, s) = RX (t − s)

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 28 / 33
Wide-sense Stationary Random Process

If a random process has the following two properties, then we


say it is wide-sense stationary.
1. The mean of the random process is a constant
independent of time, E[X(t)] = µX for all t.
2. The autocorrelation of the random process only depends
upon the time difference:

E[X(t)X ∗ (t − τ)] = RX (τ)

for all t and τ.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 29 / 33
Properties of Autocorrelation Function

For a real-valued random process X(t), it has three properties.


Property 1 Power of a Wide-Sense Stationary Process

RX (0) = E[X(t)X(t)] = E[|X(t)|2 ]

Property 2 Symmetry

RX (τ) = E[X(t)X(t − τ)] = E[X(t − τ)X(t)] = RX (−τ)

Property 3 Maximum Value

RX (τ) ≤ RX (0)

for any τ.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 30 / 33
Autocorrelation

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 31 / 33
Ergodic Process

Ensemble averages of the random process at time t = tk


N
1 X
E[X(tk )] = x j (tk )
N j=1

The time average of a continuous sample function drawn from a


real-valued process is given by
Z T
1
E[x(t)] = lim x(t)dt
T →∞ 2T −T

A process is said to be ergodic if its statistical properties (such


as its mean and variance) can be deduced from a single,
sufficiently long sample (realization) of the process. In other
words, those statistics do not change with time.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 32 / 33
Reference

Most contents of this lecture note are adapated from Chapter 8


of the following textbook, including some figures.

Simon Haykin and Michael Moher, Introduction to Analog and


Digital Communications, Second Edition, John Wiley & Sons,
2006.

Wei Zhang UNSW TELE9754 Coding & Information Theory - Probability Theory 33 / 33

You might also like