Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

Lecture7 - Sampling Distribution - 0930

Uploaded by

九.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture7 - Sampling Distribution - 0930

Uploaded by

九.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DOTE 2011 | Fall 2024

@ CUHK Business School

Statistical Analysis for Business Decisions


Sample Distribution

Yunduan Lin
Assistant Professor
Department of Decisions, Operations and Technology
CUHK Business School
Agenda

Statistical Analysis for Business Decisions

01 Law of Large Number


o Population and sample
o Property of sample mean

02 Central Limit Theorem


o Approximation of sample mean
Homework 1 – 1(d)

KURT function in excel:

Returns sample excesskurtosis


Homework 1 – 3(b)

o We are asking about the value of a conditional probability


Bayes theorem
(or you can also start from the
definition of conditional probability)
o Some term in the equation is not directly given.
o There is also some knowledge not used in the statement. How to relate them together?

A and B both happen A happens but B not


o Still, some term in the equation is not directly given. But it is easy to derive.
Homework 1 – 3(c)

o Either A or B = Union (it will count the case that both A and B happens for once)
o Both A and B = Intersection
Homework 1 – 3(e)

o How to interpret these sentences?


o Define the events: A - has the disease; B - have positive report
o What does these number mean and what is the problem asking for?
▪ 90% of those who have the disease will get a positive result
Fact What we care about

▪ 10% of those who do not have the disease will get a positive result
Fact What we care about

▪ The probability that a person has the disease given positive report
What we care about Fact
Quiz 1 - 1
Combinations (true or false):

Choose r objects from n objects Choose (n-r) objects from n objects


Implies Implies
There remains (n-r) objects There remains r objects
Quiz 1 - 2
Pick one number from 1 and 1000 (1 and 1000 included). Suppose every number is
equally likely to be chosen. What is the probability that the number picked is not divisible
by either 2 or 5?

o Sample space = {1, 2, …, 1000}


o Every other integer is divisible by 2, so there are 500 integers divisible by 2.
o Every 5 integer is divisible by 5, so there are 200 integers divisible by 5.
o Every 10 integer is both divisible by 2 and by 5, so there are 100 integers.
o There are 500+200-100=600 integers divisible by either 2 or 5.

o Therefore, the required probability is


Quiz 1 - 3
A student has to sell 2 books from a collection of 6 math, 7 science, and 4 economics
books. How many choices are possible if both books are to be on the same subject?

There are 3 cases:

o Two math books


o Two science books
o Two economics books
Recap - Discrete Random Variable
Mean Variance PMF

Bernoulli
o Binary outcome

Binomial
Euler constant = 2.718
o Count of successes for repeated discrete trials

Poisson
o Count of events over a continuous time
o Binomial approaches Poisson when n is really large and p is really small
o Can be used to approximate binomial and is easy to calculate, because has only 1 parameter
Recap - Continuous Random Variable
Mean Variance PDF

Exponential
o Time between independent random events
o Poisson: event count -> exponential: time between events
o Memoryless property

For exponential distribution, we have

e.g., the life of a light bulb

Normal
Population and Sample

Population

o Objects we would like to know


o e.g., age and incomes of individuals in a city, satisfaction level of consumers

Sample

o Subset of population

Goal of Inference

Use representative sample (small picture) to make an educated guess on the


population (big picture)
Population and Sample

Population

o represented by bar chart/histogram


o summarized by (relative) frequency table f(x)
o mean: μ; variance: σ2

Sample
o an observation from population

Random Sample

o A random draw from population


o A random variable with probability function is the same as frequency table f(x)
o For a sample with a size n, we write X1, X2, . . . , Xn
Simple Random Sample - Definition

Simple Random Sample: most basic random sample

o Each element has equal probability being selected.


o Each element is selected independently

Explanation:
Probability mass function
X1, …, Xn is a simple random sample if
o X1, … , Xn are independent random variables, and
o X1, . . . , Xn follow the same probability function P(x) or f(x) Probability density function
Simple Random Sample - Property

Consider a population with mean μ and variance σ2.

Property of Simple Random Sample:

If X1, …, Xn is a simple random sample, then

o
Simple random sample in fact has an even strong property
Each observation follows the same distribution as the population
o
This includes all summary statistics

o
Other Sampling Methods

Simple random sample is simple but difficult to achieve in practice:

o Online surveys likely exclude seniors who do not use internet often

o Samples from offline surveys are likely to be dependent due to geographical correlation (e.g.,
economic condition, location preference)

o Advanced sampling method to reduce sampling error: Stratified sampling - divide population into
subsamples, and do simple random sample within each subsample, and produce weighted average
across subsamples
Statistics - Definition

Statistics:

A function of a sample X1, ... Xn

o Data summary
o Data reduction (simplification)

Examples: sample mean, sample variance


Sample Mean - Definition

Sample mean:
It is useful to guess population mean

Sample mean is the mean of a sample.

o This varies sample by sample


o Sample mean is also a random variable.
Hence, we can also derive expectation,
variance for the sample mean.
Sample Mean - Expectation

Expectation of sample mean:

Expectation of sample mean is population mean

Intuition:

o If we sample many times, average of all sample means is the population mean

o This nice property is known as unbiasedness (see next chapter)


Sample Mean - Expectation
Average of sample means: Rolling a dice for (infinitely) many times

Amy rolls a dice for 5 times Charlie rolls a dice for 10 times

Mean for Amy's sample Mean for Charlie's


(5 results) sample (10 results)
Sample Mean - Expectation Example
Example:
Consider population has three numbers: 1, 2, and 3, each with the same probability.

o Population mean​

o Consider sample with size=1, the sample mean can be one of {1,2,3} with the same probability.
Expectation of the sample mean for size=1 is​

o Consider sample with size=2, the sample mean can be one of the following 9 results with the
same probability. Expectation of the sample mean for size=2 is​
x1\x2 1 2 3
1 1 1.5 2
2 1.5 2 2.5
3 2 2.5 3 The sample size can be larger, and even larger than 3, and
there are more possibilities.
Sample Mean – Expectation Proof

Linear property of expectation

Expectation of sum = sum of expectation


Sample Mean - Variance

Variance of sample mean:


It is not the sample variance!

Population variance divided by sample size:

Standard error of sample mean:


Standard deviation of a statistics
is often called standard error
Standard deviation of the sample mean:
Sample Mean - Variance Example
Example:
Consider population has three numbers: 1, 2, and 3, each with the same probability.

o Population mean Population variance

o Consider sample with size=1, the expectation of the sample mean is .


Therefore, the variance of sample mean is

o Consider sample with size=2, . Therefore, the


variance of sample mean is

x1\x2 1 2 3
1 1 1.5 2
2 1.5 2 2.5
3 2 2.5 3
Sample Mean – Variance Proof

Transformation of variance

Variance of sum = sum of variance if independent


Sample Mean – Large Samples

When sample size gets larger,

o As sample size n enlarges, the variance of sample mean shrinks

o Moreover, variance vanishes as n goes to infinity, that is,

o As , when n gets larger, we have the sample mean eventually very close to population
mean, that is,
Law of Large Numbers

Let X1, . . . , Xn be a random sample from a distribution with mean μ and variance σ2.

Law of large numbers:

For any , when n is sufficiently large, we have

Or more rigorously,

Loosely speaking, when sample size is large, variation disappears and the sample mean becomes
population mean. Or, with a larger sample, sample mean is closer to population mean, and it can be
as close as we want.
Law of Large Number

Markov inequality
Consider a nonnegative random variable, , then for all t>0,

Hence, we get the Markov inequality

Chebyshev inequality
Consider , then by Markov inequality

Hence, we get Chebyshev inequality


Law of Large Number

As we have Chebyshev inequality

Then, since , we have

Taking the limit on both sides, we arrive at the law of large number.
Sample Mean – Large Samples

When sample size gets larger,

o Law of large numbers says that sample mean is eventually close to μ.

o But, sample mean itself is still a random variable. What is the distribution function of sample
mean when n becomes larger?

The distribution of sample mean RATHER THAN the distribution of a sample itself

Always normal distribution, regardless of how population looks like


Sample Mean - Variance Example
Example:
Consider population has three numbers: 1, 2, and 3, each with the same probability.

Let's look at the CDF of the sample mean for different sample sizes.

n=1 n=2 n=10


Normal distribution

n=100 n=1000 n=10000


Central Limit Theorem

Central limit theorem:


sample mean approximately follows a normal distribution with a large enough sample.

When n gets large, we have

or

Rule of thumb: sample size n is at least 35.


Central Limit Theorem - Example
Example:
Consider a population with mean 5 and variance 64. Consider a sample with size 100. What is the
probability that the sample mean is no more than 4?

No matter what is the distribution for population. We can use normal distribution to approximate the
sample mean with size 100.

By central limit theorem, we have


Central Limit Theorem - Binary Variable
Example:
Consider the population follows Bernoulli distribution, which means that each element in the
population is ether 0 or 1, the probability of having 1 (success) is p.

o Population mean
o Population variance

Central limit theorem for binary variable:

When n gets large, we have

or

Rule of thumb: good approximation when np and n(1−p) are at least 5.


Central Limit Theorem - Binary Variable
Comparison between binomial distribution and its normal approximation:

n=1 n=2 n=5

n=10 n=30 n=100


Central Limit Theorem - Binary Variable Example
Example:

Let X be binomial distribution with n = 100 and p = 0.6. What is the probability that X is less than
55?

Check first np = 100(0.6) = 60 and n(1−p) = 100(0.4) = 40 are at least 5.


We can use normal approximation:
A Feedback Form for the Entire Term

https://docs.google.com/forms/d/e/1FAIpQLSfsEgnMFLypI_KW6GF7j_FXtVY5E4Jrmf2P_BDwaG8GXWDc0A/viewform?usp=sf_link

You might also like