Review of Probability Theory
Review of Probability Theory
Review of Probability Theory
Bose
If BA (i.e. event B is independent of A), then P{B|A}=P{B} Since occurrence of A will not affect the chances of occurrence of B, given that B is independent of A.
Bayes Rule
Denoting
P{B | A} =
P{B}P{ A | B} P{ A}
This relationship is extremely useful in probability calculations such as in changing conditioning of events. (Example: Changing from a priori to a posteriori probabilities)
Note that if A and B are Mutually Exclusive events then P{AB}=0 as in that case, probabilistically, events A and B do not occur together
We define P{AB} as the probability of the union of events A and B. This is the event when either A occurs or B occurs or both occur By definition, P{AB} = P{A} + P{B} P{AB} = P{A} + P{B} if A and B are mutually exclusive events = P{A} + P{B} P{A}P{B} if AB, i.e. are independent events Complementary Event For an event A, the complementary event AC refers to the event where A does not occur P{AC} = 1 P{A} Note also that, for any event B, P{B} = P{B|A}P{A} + P{B|AC}P{AC}
A discrete random variable X takes on discrete values xi with probabilities P{X=xi}>0 for i=1, 2, 3, ........ and P{X(x1, x2, .......)}=1 Examples of Distributions for Discrete Random Variables 1. Binomial Distribution P{ X = x} = p (1 p )
x
n x
n x
x=0, 1, ..., n
2. Poisson Distribution P{ X = x} = e
In this case, the random variable X is not limited to discrete values but can take on any value x in a range [x1, x2], i.e. x[x1, x2], where the probability of the random variable X lying between x and x+dx is given by P{x X x+dx} = fX(x)dx where fX(x) is referred to as the Probability Density Function (pdf) of the random variable X The Cumulative Distribution Function FX(x) may also be used to describe the probability distribution of a continuous random variable. This is defined as FX ( x) = P{ X x} =
( x )dx
1. Normal Distribution -
f X ( x) =
1
2
( x )2 2 2
Memoryless Property
Let the random variable x0 be the length of service provided to a customer when service starts from the time instant t=0. Consider a customer who is still in service at time t and let {(X-t) | X>t} be the remaining service time for that customer. [Not that this random variable is the remaining service time when the customer is examined at time t, given (of course) that the customer is still in service at time t - i.e. the customer's service time X is greater than t]
Note that we can write P{(X-t)>x, X>t} = P{(X-t)>x | X>t}P{X>t} and that trivially P{(X-t)>x, X>t} = P{(X-t)>x} since x and t are both positive Therefore
If the service distribution is memoryless, then that implies that when we examine the customer (who started service at t=0 and is still in service) at time t, the service given in the past during the interval (0,t) is forgotten! If this is indeed the case, then it follows that P{(X-t)>x | X>t} = P{X>x}= 1- FX(x)
[1 FX (t + x )] = [1 FX ( x)][1 FX (t )]
The Exponential Distribution is an example of a memory less distribution. Note that for this distribution,
f X ( x ) = e x and FX ( x ) = 1 e x for x0
Therefore, [1 FX (t + x )] = e Memoryless Property
(t + x )
For integer valued random variable X={0, 1, 2, .......}, the corresponding memory less distribution is the Geometric Distribution where P{X=n} = qn(1-q) leading to P{Xx} = qx for x=0, 1, 2, ......, This may be verified by noting that P{Xx+N} = qx+N = qxqN = P{Xx}P{XN} as required.
Note that the memory less property of the exponential and geometric distributions make them easy to handle. These are therefore very useful in the analytical modeling of queuing systems and computer communications. Joint Distributions The joint distribution for continuous random variables X and Y is given in the following form cumulative distribution function (cdf) probability density function (pdf)
FXY ( x, y ) = P{ X x, Y y}
f XY ( x, y ) =
Note that
FX(x) = FXY(x, )
f X ( x) =
f XY ( x, y )dy and f Y ( x ) =
XY
( x, y )dx
Note also that if XY, then fXY(x,y)=fX(x)fY(y) and FXY(x,y)=FX(x)FY(y) Functions of Random Variables For a random variable X, U=u(X) may be defined as another random variable which is a function of the random variable X. If u(x) is differentiable and monotone, then the pdf of the random variable U may be easily found as fU(u)=fX(x)/|u'(x)| or fU(u)|du|=fX(x)|dx| If u(x) is not monotone then one has to be more careful as the function X=u-1(U) may have multiple roots and these should be accounted for while finding fU(u). If u(x) is not differentiable at some point in the range, then delta functions will arise in the pdf of U. Example: Consider u(x) = x2 to generate the random varaible U from the random variable X, where X[-1, 1] From the form of the function u(x) and the range of X, we can see that fU(u)du = fX(x)dx + fX(-x)|dx| Since du=2xdx, we get that, dx=du/(2u) Therefore,
f U (u ) =
f X ( u ) + f X ( u ) 2 u
If we consider the case where the random variable X is uniformly distributed in [-1, 1], then fX(x)=0.5 =0 for -1x1 otherwise
1 2 u
and FU (u ) =
u for 0u1
This approach may be extended using the Jacobian for the case of functions of more than one variables. As an example of this, consider z=g(x,y) and w=h(x,y) [Note that if only one function is given then the second function may be arbitrarily defined.] i.e. (xi , yi ) are the real solutions to these equations for given (z,w)
g y h y
f (x , y ) f XY ( x1 , y1 ) + ................ + XY n n + ........ J ( x1 , y1 ) J ( xn , y n )
z=g(x,y) and w=h(x,y)
Note that if there are no real solutions for some values of (z,w), then for these fZW(z,w)=0
Expectations
g ( X ) = E{g ( X )} =
=
Some useful results: E{cg(X)} = cE{g(X)} E{g(X)+h(Y)} = E{g(X)} + E{h(Y)} For XY, E{g(X)h(Y)} = E{g(X)}E{h(Y)} The nth moment of the random variable X is defined as E{ X } = X
n n
Specifically -
X = mean
2 X = var iance = E{( X X ) 2 } = X 2 X 2
Laplace Transform
~ FX ( s ) = L( f X ( x )) = E{e sX } = e sx f X ( x )dx
0
X n = (1) n
dn ~ FX ( s ) | s = 0 ds n
(b) Given a transform, inverting it will provide the corresponding pdf fX(x) of X (c) Multiplication in the Transform Domain would correspond to Convolution in the r.v. domain and vice-versa.
~ ~ L ( f 1 ( x ) * f 2 ( x ) ) = L f 1 ( ) f 2 ( x )d = F1 ( s ) F2 ( s ) ~ 1 ~ L F1 ( s ) * F2 ( s ) = f1 ( x) f 2 ( x )
(d) Transform of the sum of independent random variables = Product of the individual transforms If random variables X and Y are such that XY, then we can show that -
~ ~ ~ FX +Y ( s ) = E{e s ( X +Y ) } = FX ( s ) FY (s )
Characteristic Function (Fourier Transform) This type of transform is useful for continuous random variables where X may take on negative values, i.e. - X
jX
} = F [fX(x)]
Properties similar to those described for Laplace Transforms above are also applicable here (a) X
n
= ( j ) n
dn X ( ) | =0 d n
(b) Multiplication in Transform Domain corresponds to Convolution in the random variable domain and vice versa (c) The characteristic function of the sum of independent random variables is the product of the characteristic functions of the individual random variables
Generating Function or Probability Generating Function (Z-Transform of the Probability Distribution) This transform is used for a discrete random variable X, such that X0. It is defined as -
G X ( z ) = E{z X } = pi z i = Z [P{X=i}]
i =0
where pi = P{ X = i}
This also has properties similar to the transforms given earlier (a) Moment Generating Property X = G X (1) (b) If X1X2 and Y=X1+X2, then GY ( z ) = G X1 ( z )G X 2 ( z ) and
pY ( y ) = P{Y = y} = p X1 + X 2 ( y ) =
x1 = 0
X1
( x1 ) p X 2 ( y x1 )
exists for
~ If FX ~ ( x , t ) is known as above, then all possible stochastic dependencies between sample values of X(t)
may be found. This, however, is usually hard to get. Typically, only some limited dependencies will be known and different Stochastic Processes are classified based on these known dependencies. Examples of this leading to special processes are given next. (a) Independent Process: For this type of process, {X(tn)} or {Xn} are independent random variables, and therefore,
~~ ~ ( x ; t ) = f X ( xi ; t i ) fX i
i =1
Note that this is really a trivial case of a process as there are no dependencies between the various Xis. (b) Stationary Processes: These are processes where the joint distribution of the random variables corresponding to a set of time points is invariant to a time shift of all the time points. The process is considered Strictly Stationary if the property holds for any choice of the number of time points. If the property holds for any choice of n time points or less but not for any choice of n+1 time points then the process is referred to as being Stationary of Order n. The process is referred to as being Wide Sense Stationary (WSS) if (a) E{X(t)} is independent of t and (b) E{X(t)X(t+)} depends only on and not on t. Stationary Processes will not be used in our description of queues and will not be considered further here. (c) Markov Processes: Markov Processes are ones for which the Markov Property (given below) holds. This property states that -
In a Semi-Markov Process, the distribution of time spent in a state can have an arbitrary distribution but the one-step memory feature of the Markovian property is retained. We will find processes of this type useful in some of our analyses. A Birth-Death Process is a special type of discrete-time or continuous-time Markov Chain with the restriction that at each step, the state transitions, if any, can occur only between neighboring states.
These are related to random walks except that our interest here lies in counting the number of transitions that take place as a function of time. State at time t = Number of transitions in (0, t) where (ti - ti-1) i are i.i.d. random variables Let Xi = (ti - ti-1) i be a set of i.i.d. random variables. Subject to the conditions that they are independent and have identical distributions, the random variables {Xi} can have any distribution. Note that this corresponds to a SemiMarkov Process.