Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Business Statistics 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Business Statistics

Ikram-E-Khuda
Agenda Items
• Calculation of Z scores using area/ probability values
• Critical Z values
• Empirical rule
• Central Limit Theorem
• confidence intervals and interval estimation
• Hypothesis testing
• Hypothesis testing of mean value
Critical Z Values..
• What are the critical z values taken symmetrical from the middle
position corresponding to :

• 68% area of a standard normal distribution => approximately 1 σ


• 95% area of a standard normal distribution => approximately 2 σ
• 99.7% area of a standard normal distribution => approximately 3 σ
Critical Z Values..
• What are the critical z values taken from the right tail corresponding to :

• 68% area of a standard normal distribution => ?σ


• 95% area of a standard normal distribution => ?σ
• 99.7% area of a standard normal distribution => ?σ
Critical Z Values..
• What are the critical z values taken from the left tail corresponding to :

• 68% area of a standard normal distribution => ?σ


• 95% area of a standard normal distribution => ?σ
• 99.7% area of a standard normal distribution => ?σ
Empirical Rule
Central Limit Theorem
This implies that probabilistic and statistical methods that work for normal
distributions can be applicable to many problems not involving normal distributions
Central Limit Theorem- Statement 1
• The central limit theorem (CLT) institutes that, when independent random
variables are added, their properly normalized sum tends toward a normal
distribution (or a "bell curve") even if the original variables themselves are not
normally distributed.

• For example, suppose that a sample is obtained containing a large number


of observations, each observation being randomly generated in a way that does
not depend on the values of the other observations, and that the arithmetic
average of the observed values is computed. If this procedure is performed many
times, the central limit theorem says that the distribution of the average will be
closely approximated by a normal distribution.

• Example: Rolling the dice twice example


Central Limit Theorem- Statement 2
• Suppose that a sample is obtained containing a large number of observations, each
observation being randomly generated in a way that does not depend on the values of the
other observations.

• Central limit theorem says that the probability of getting a given value of random variable
or all the values of the random variable in a series of trials (identically independent) will
approach a normal curve ( distributed according to a normal distribution).

• According to CLT, distributions of all these probabilities will be a normal


curve
• Example:
• Plot probability distribution of a hypergeometric random experiment with N=10, K=4, k=2
and for n: 2≤n≤8. What distribution do you get? Can we exceed n>8? Give reason.

• Plot probability distribution of a hypergeometric random experiment with N=10, K=4,


k=0,1,2,3 and 4 for n=4. What distribution do you get?

• Plot probability distribution of a binomial random experiment with N=10, n=4, r=0,1,2,3
and 4. What distribution do you get?
Central Limit Theorem- Statement 3
• If a random sample of n observations is selected from any population, then,
when the sample size is sufficiently large (n >30) the sampling distribution
of the mean tends to approximate the normal distribution. The larger the
sample size, n, the better will be the normal approximation to the sampling
distribution of the mean.
• In general, it can be shown that the mean of the sample means is the same
as the population mean, and the standard error of the mean is smaller than
the population standard deviation.
The real advantage of the central limit theorem is that sample data drawn
from populations not normally distributed or from populations of unknown
shape also can be analyzed by using the normal distribution, because the
sample means are normally distributed for sample sizes of n>30.
• Note that the distribution of the sample means begins to approximate the normal
curve as the sample size, n, gets larger.
Central Limit Theorem- Statement 3
• The sample mean will be approximately normally distributed for large
sample sizes regardless of the distribution from which we are sampling
• Suppose we are sampling from a population with mean μ and standard deviation σ

• Let 𝑿 be the random variable representing the sample mean of n independently drawn
observations

• The mean of the sampling distribution of the sample mean is equal to the population mean,
i.e. 𝑥𝑋 = 𝜇

• The standard deviation of the sampling distribution of the sample mean 𝑿 is equal to the
𝜎
population mean, i.e. s𝑋 = 𝑛
Central Limit Theorem- Statement 3
• Since the central limit theorem states that sample means are normally distributed regardless of the shape of
the population for large samples and for any sample size with normally distributed population, sample
means can be analyzed by using Z scores.

• Mathematically this means that the 𝑧 score for random variable sample mean, 𝑿 can be can be written as
follows (provided mean and variance of population are finite values):

𝑥−𝜇
𝑧=𝜎 → ℕ 0,1 𝑎𝑠 𝑛 → ∞
ൗ 𝑛
Here 𝑧 is the 𝑧 score value
𝑥 is the sample mean
𝜇 is the population mean
𝜎 is the population standard deviation
n is the sample size
Central Limit Theorem- Statement 3

We can often use well developed statistical


inference procedures that are based on a normal
distribution even if we are sampling from a
population that is not normal, provided we have a
large sample size
Central Limit Theorem- Statement 3
Central Limit Theorem in Statistics
• Ideal guidelines for a distribution be considered normal:
• Mean = Median = Mode
• 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 = 0
• 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = 0

• Rough but practical guidelines for a distribution be considered normal:


• Mean and mdian are same but mode can change
• −1 ≤ 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 ≤ 1
• −3 ≤ 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 ≤ 3

• The random variable 𝑿 can be considered approximately normally distributed if the sample size is
at least 30.
• This implies that we need sample size n ≥ 𝟑𝟎 to be considered for normal distribution

• Note: When sample size n < 𝟑𝟎 , we obtain a distribution which is called t distribution.
Student’s t Distribution

• df is short of degree of freedom. It shows the total sample size available for analysis.
• df = n in a normal distribution
• df < n for a t distribution
Example Problem
• Suppose salaries at a very large corporation have a mean of PKR
50,000/= and a standard deviation of PKR 10,000/=.

• If a single employee is randomly selected what is the probability that


their salary exceeds PKR 55,000/= ?
Example Problem
• Do you proceed like this ?
• Let the random variable 𝑋 represents the salary of a
randomly selected employee
• 𝑃 𝑋 > 55000 =?
55000−50000
• 𝑃 𝑍> = 𝑃(𝑍 > 0.5)
10000
Example Problem
• Compare this question with the previous problem involving a
continuous random case

• Is there a difference between two?

• This question cannot be answered without the


• information that what is the distribution of X?
Example Problem
• Suppose salaries at a very large corporation have a mean of PKR
50,000/= and a standard deviation of PKR 10,000/=.

• If 31 employees are selected randomly then what is the probability


that their average salary exceeds PKR 55,000/= ?
Example Problem
• Do you proceed like this ?
• Let the random variable 𝑋 represents the average salary of those 31
employees
• This random variable is approximately normally distributed by CLT
• 𝑃 𝑋 > 55000 =?
55000−50000
• 𝑃 𝑍> = 𝑃(𝑍 > 2.78)
10000/ 31

CORRECT

• Now using the standard normal table we have this probability equal to 0.00272
Central Limit Theorem in Statistics
• We normally do not have the information of the population distribution

• So CLT has helped us solve the problem

• The world of statistics would be a very different thing without CLT

• Statistical tests using CLT are part of parametric statistical analysis

• Thanks to CLT !
Confidence Intervals and Interval Estimation
Form of Confidence Intervals (CI)
A confidence interval estimates are intervals within which the
parameter is expected to fall, with a certain degree of confidence.

• The general form interval estimated value using CI:

estimated value ± critical value × std.dev of the estimate


or
estimate ± margin of error
• For example:
sample mean ± critical value × standard error of mean

Standard error of mean is the standard deviation of sample mean


Form of Confidence Intervals (CI)
The intervals using CIs differ based on:

• The parameter of interest, e.g., population mean, population regression


coefficient, population proportion, difference in population's means, etc…

• Design of the sample

• Confidence level or a confidence coefficient, (1 - α)100%, e.g., 95%, 99%, 90%,


80%, corresponding, respectively, to α values of 0.05, 0.01, 0.1, 0.2, etc…

• α is called the significance level and defines no confident or rejection area/


region. This should be as small as possible in precise statistical tests.
Example Simplified Interval Estimation Expression of Population Mean for a
95% Confidence Interval
Interpretation of a Confidence Interval
• The CI either contains the parameter or it does not contain it.

• For example for a 95% CI, we say “we are 95% confident that the true
population parameter is between the lower and upper calculated values”.

• The certainty or confidence level is the probability that a result is significant.

• A high confidence level means that there is only a very small probability that
the result (e.g. correlation or independence) happened purely by chance
(given in terms of α value).
Interval Estimation of Mean
• Uses central limit theorem (CLT)

Interval Estimation of Mean

• If population standard deviation is known , then interval estimation is called a z test and it is calculated as follows:
𝜎
𝑥±𝑧 𝑛

• If population standard deviation is unknown then sample variance is also estimated and mean interval estimation is called a t test. This is
calculated as follows:

𝑠
𝑥±𝑡
𝑛
• The 𝑡 score formula is:
𝑋−𝜇
𝑡= 𝑠
ൗ 𝑛
Here 𝑡 is the 𝑡 score value
𝑥 is the sample mean
𝜇 is the population mean
𝑠 is the sample standard deviation
n is the sample size
Practice Problems

You might also like