Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

Mstat Note7 Random Variable f23

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Mstat Note7 Random Variable f23

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Random Variables

Math & Stat for Data Science


Graduate School of Data Science
Seoul National University
This note will cover
• Random variable
• Definition, CDF, PMF, PDF
• Discrete Random Variables
• Bernoulli, Binomial, Poisson, etc
• Continuous Random Variables
• Normal, chi-squared, Exponential, etc
• Multivariate RV
• Independence, conditional dist.
• Change of variables
Random Variables
• Sample space and events
• can determine probability
• But we need to assign them to a real number for
analysis

• Ex. 2 Coin toss


• Sample space
• {HH}, {HT}, {TH}, {TT}
• Need to convert them to a certain number for the
analysis
Random Variables

From this point, we will directly work with random variables.


Some examples
Some examples
Random variables
• Probability?
• Calculate the probability by inverting random variables
Random variables

Here, values of random variables are not 1:1 to the event


Probability and Distribution

• CDF has all the information about the random


variables
• CDF is non-decreasing, right continuous function
CDF
CDF

Right continuous
Non-decreasing
Probability mass function (PMF)

• Defined when X is discrete


• Can calculate CDF using PMF
PMF
PMF
Probability density function (PDF)
• Similarly, pdf can be defined for continuous variable
Probability density function (PDF)
• PDF of uniform(0,1) distribution

• Corresponding CDF
Probability density function (PDF)
• PDF is not a probability!!

• For continuous traits P(X=x)=0 for every x


• PDF can be larger than 1
• PDF of Uniform(0,1/5) = 5 for x in (0, 1/5)

• Mathematically, PDF is a something called Radon-


Nikodym derivative
Properties
Quantile function

.75 (third) Quantile?


Equal distribution
• Two random variable X and Y are equal in
distribution:
• FX(x) = FY(x) for all x

• Does not mean that X = Y


Well known Discrete RVs
Bernoulli Distribution
Bernoulli, p=0.3
0.6
Probability

0.3
0.0

0 1
Bernoulli Distribution
Examples
• Coin Toss
• 0: T
• 1: H
• p: probability to have H

• Disease Probability
• 0: Non disease
• 1: Disease
• p : probability to have the disease
Bernoulli Distribution
Examples
• Suppose there are 5 individuals, and the
probabilities to have the disease is p=0.2

Generate random sample?

# R-code
N=5
p=0.2
rbinom(N, 1, p)
Bernoulli Distribution
Examples
• Suppose there are 5 individuals, and the
probabilities to have the disease are all different as
p1=0.1, p2=0.2, p3=0.3, p4=0.4, p5=0.5

Generate random sample?

# R-code
N=5
p=c(0.1, 0.2, 0.3, 0.4, 0.5)
rbinom(N, 1, p)
Binomial Distribution
Binomial, n=10, p=0.3

0.00 0.10 0.20


Probability

0 1 2 3 4 5 6 7 8 9 10
Binomial Distribution
• Sum of n independent Bernoulli(p) random variables
follows Binomial(n, p)

• Sum of two independent Binomial random variables


follows Binomial distribution

• X1 ~ Binom(n1, p), X2~Binom(n2, p)


• X1+X2 = Binom(n1 + n2, p)
Binomial distribution
Examples
• Coin Toss
• Suppose toss coin 10 times
• x: the number of head
• p: probability to have H

• Disease Probability
• Suppose we sample 50 individuals in SNU
• x: number of individuals with disease
• p: probability to have the disease
Binomial Distribution
Binomial, n=1000, p=0.3 Large n: binomial distribution has a bell
shape
=> Close to Normal distribution
Probability

0.015
0.000

0 61 143 235 327 419 511 603 695 787 879 971

Binomial, n=1000, p=0.001


Very small p (rare event), binomial
distribution does not have the bell shape
0.30
Probability

=> Close to Poisson


0.15
0.00

0 2 4 6 8 10 12 14 16 18 20
Geometric Distribution
Geometric, p=0.3
0.30
Probability

0.15
0.00

0 2 4 6 8 10 12 14 16 18 20

Ex. Number of trials needed until the first head in coin toss
Poisson Distribution
Poisson, lambda=1

0.30
Probability

0.15
0.00

0 2 4 6 8 10 12 14 16 18 20
Poisson Distribution
Siméon Denis Poisson
Poisson, lambda=1
Binomial, n=1000, p=0.001

0.30
0.30

Probability
Probability

0.15
0.15

0.00
0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

• Derived to model the number of rare event


• Poisson derived it to model wrongful conviction

• Ex. Binomial (1000, 0.001) and Poisson(1) are essentially


the same
Poisson Distribution - derivation
Poisson, lambda=1
Binomial, n=1000, p=0.001

0.30
0.30

Probability
Probability

0.15
0.15

0.00
0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Poisson Distribution

• 𝜆 : mean and variance of


the distribution

• Sum of two Poisson RVs


follows Poisson
• X1~Poisson(𝜆!),
X2~Poisson(𝜆")
• X1+X2 ~Poisson(𝜆! + 𝜆")

Plot from Wikipedia


Poisson Distribution
Examples
• Event incidence
• Suppose we are interested in the incidence of car
accident
• x: number of the incidence in each day
• 𝜆: average number

• DNA data
• The number of mutation in the region
• x: number of mutation
• 𝜆: average number
Well known Continuous RVs
Normal Distribution
Normal, mu=0, sigma=1
0.4
0.3
Density

0.2
0.1
0.0

-4 -2 0 2 4

x
Normal Distribution

Abraham de Moivre Carl Friedrich Gaussc

• One of the most important Prob. Distribution!!

• Derived to approximate the limit of the Binomial


trial (De Moivre, 1721) and to model error
distribution in Astronomy (Gauss, 1809)
Normal Distribution
• One of the feature of normal distribution is linear
transformation of Normal RV follows Normal
distribution.

• Ex. X ~ N(3, 5)
#$%
• Dist of &
?

• Calculate P(X > 1)?


Normal Distribution
Examples
• Widely used for model continuous measure

• Any measurement
• Noise (error) in the observation
• Linear regression is a good example
!
𝜒 distribution
Exponential Distribution
Exponential

0.8
Density

0.4
0.0

0 2 4 6 8 10

x
Exponential Distribution
• CDF
𝑥
𝐹 𝑥 = 1 − exp(− )
𝛽

• Memorylessness
P 𝑋 > 𝑡 + 𝑠 𝑋 > t) = P(X > s)

• The current waiting time is independent to the previous


waiting time
Multivariate Distribution
Bivariate Distribution
• Given a pair of random variables, (X, Y), we can
describe a joint distribution
• Discrete: joint mass function

• Continuous: joint pdf


Bivariate continuous
Marginal Distribution

Find an univariate distribution of X from the joint distribution of (X,Y)!


Marginal Distribution-discrete
Marginal Distribution-continuous
Independent Random Variables
Independent Random Variables
• To check the independence, we need to check the
equation (2.7). The following holds for continuous
Example

Independent?
Example

Joint distribution
of X and Y?
Independence
• Following theorem is very useful to identify the
independence
Independence

Independent?
Conditional Distribution

Discrete

Continuous
Example

Conditional Dist. of P(X < 1/4 | Y = 1/3) ?


Example

Marginal distribution of Y?
Multivariate Dist.
• For multivariate random variables, using vector-
notation is more convenient
• X= (X1,…, Xn)
• Corresponding PDF is f(X1,…, Xn)

• Independence of X1,…, Xn
• Can be confirmed using

• Or
IID sampling

Many of the observed data can be thought as IID samples


Multinomial
• Multivariate version of binomial
• Suppose there are k groups, and in each trial, one group can be
selected
• Ex. Dice throw
• 6 possible outcome
• Suppose to throw n times.
• 𝑋 = 𝑋! , 𝑋" , … , 𝑋# : number of each group
• 𝑝 = 𝑝! , 𝑝" , … , 𝑝# : probability to select each group

• X ~ Multinomial (n, p)
Multinomial
• Each element Xj marginally follows Binomial(n, pj)

• Commonly used in survey data


• Satisfaction

• Preference
Multivariate Normal

• One of the most important MV distribution

• Two parameters
• Mean: 𝜇=(𝜇1, …, 𝜇k)
• Variance (nxn matrix): Σ
• Variance should be symmetric and positive definite!!
Multivariate Normal (Extra)
• If each Xj follows IID N(0, 1) (so Z value) and then
Multivariate Normal (Extra)

Linear transformation of MVN follows MVN !


Multivariate Normal
Example: correlated outcomes
• Suppose we want to generate height and weight
• Height ~ N(170, 𝜎 "=25)
• Weight ~ N(72, 𝜎 " =16)
• Covariance = 12
Multivariate Normal
Example: correlated outcomes
Multivariate Normal (Extra)
Transformation of RV
Transformation of RV
• In many situations we need to consider to
transform RVs
• Ex. X -> X2 (for variance calculation)

• Suppose Y=r(X) is a transformation of X. PMF of Y is


Transformation of RV
• Ex. P(X=-1)=P(X=1)=1/4, P(X=0)=1/2. Let Y=X2, then
PDF of Y?
Transformation of RV
• Continuous case
Transformation of RV

Distribution of Y?
Transformation of multivariate RV
• Transform of several random variables
• Max(X, Y), Min(X, Y), X+Y, X/Y
• Ex. Minimum waiting time.
• Let Z=r(X,Y)
Transformation of multivariate RV
• Suppose X1 and X2 are independent RV and follows
exp(1) distribution. Y = Min(X1, X2 ).
Distribution of Y?
Summary
• Random variable
• Map sample space to real number (or vector)
• We actually use random variables (not sample space) to data
analysis
• Discrete Random Variables
• Bernoulli, Binomial, Poisson, etc
• Continuous Random Variables
• Normal, chi-squared, Exponential, etc
• Multivariate RV
• Independence, conditional dist.
• Change of variables

You might also like