Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
43 views

Module3 Part4 One Sample T Procedure

The document discusses the one-sample t-test procedure for analyzing sample data from a normal population. Key points: 1) When the population standard deviation is unknown, the t-distribution is used rather than the normal. 2) The t-distribution is similar to the normal but has more area in the tails. As the sample size increases, the t-distribution approaches the normal. 3) A one-sample t-test can be used to test hypotheses about the population mean and construct confidence intervals when the population standard deviation is unknown.

Uploaded by

Daniel Rios
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Module3 Part4 One Sample T Procedure

The document discusses the one-sample t-test procedure for analyzing sample data from a normal population. Key points: 1) When the population standard deviation is unknown, the t-distribution is used rather than the normal. 2) The t-distribution is similar to the normal but has more area in the tails. As the sample size increases, the t-distribution approaches the normal. 3) A one-sample t-test can be used to test hypotheses about the population mean and construct confidence intervals when the population standard deviation is unknown.

Uploaded by

Daniel Rios
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

One-Sample t Procedure

What we have learned


q Draw an SRS of size n from a Normal population that has
unknown mean µ and known standard deviation σ.
=
q Confidence interval: 𝑋# ± 𝑧 ∗
>

# F
CDE
q Test 𝐻@ : 𝜇 = 𝜇@ , the test statistic 𝑍 = H ~𝑁 0,1 if 𝐻@ is
G I
true
What if we don’t know σ
#
CDE
q When σ is known, 𝑍 = H
G I

q When σ is unknown, we can estimate it using the


sample standard deviation sx?
#
CDE
q What is the sampling distribution of LM ?
G I
#
CDE
Sampling distribution of LM
G I

q This new distribution is called t distribution.


t distribution
q Specified by a degrees of
freedom (df)
q Shape similar to N(0,1)
q Symmetric with a single peak at 0
q Spread wider than N(0,1)
q More area in the tails, less in the
center
q As the degrees of freedom
increase, the t density curve
becomes ever closer to N(0,1)
#
CDE
Sampling distribution of LM
G I

Draw an SRS of size n from a large population that has a


Normal distribution with mean µ and standard deviation σ.
#
CDE
q One-sample z statistic: H ~ 𝑁 0,1
G I

#
CDE
q One-sample t statistic: LM ~ 𝑡>DO , which is a t distribution
G I

with degrees of freedom df=n-1.


One-sample t Confidence Interval
Draw an SRS of size n from a large population that has a Normal
distribution with unknown mean µ. A level C confidence interval
for µ is
𝑠Q
𝑋# ± 𝑡 ∗
𝑛
∗ SM
q The margin of error is 𝑡
>

q 𝑡 ∗ is the critical value for the 𝑡>DO distribution


q We can use t table to find the critical values
Find t*
Suppose you want to construct a 95% confidence interval for the mean
µ of a Normal population based on an SRS of size n = 12. What degrees
of freedom of the t distribution is the critical value t* from?

df = n-1 = 11
Find t*
Suppose you want to construct a 95% confidence interval for the mean
µ of a Normal population based on an SRS of size n = 12. What degrees
of freedom of the t distribution is the critical value t* from?

t11

0.95

-t* t*
Use t table to find t*
Suppose you want to construct a 95% confidence interval for the mean
µ of a Normal population based on an SRS of size n = 12. What critical
t* should you use?
Upper-tail probability p In Table D, we consult the row
corresponding to df = n – 1 = 11.
df .05 .025 .02 .01
10 1.812 2.228 2.359 2.764
11 1.796 2.201 2.328 2.718
12 1.782 2.179 2.303 2.681
z* 1.645 1.960 2.054 2.326 We move across that row to the
90% 95% 96% 98% entry that is directly above 95%
confidence level.
Confidence level C

The desired critical value is t * = 2.201.


113
Example
A manufacturer of high-resolution video terminals must control
the tension on the mesh of fine wires that lies behind the surface
of the viewing screen. The tension is measured by an electrical
device with output readings in millivolts (mV). A random sample of
20 screens has the following mean and standard deviation:
𝑥̅ = 306.32 mV and 𝑠Q = 36.21 mV
We want to estimate the true mean tension µ of all the video
terminals produced this day at a 90% confidence level.
Example
First, we check the assumptions:

ü SRS: We are told that the data come from a random sample of 20
screens from the population of all screens produced that day.

ü Normal population: give no reason to doubt the Normality of the


population.
Example
q 𝑥̅ = 306.32 mV and 𝑠Q = 36.21 mV
q n=20
q Confidence level C = 90%
q t*= 1.729 from a t distribution with df=n-1=19
q A 90% confidence interval is
SM WX.YO
𝑋# ± 𝑡 ∗ = 306.32 ± 1.729 = 292.32,320.32
> Y@

q We are 90% confident that the interval from 292.32 to 320.32 mV captures
the true mean tension in the entire batch of video terminals produced that
day.
One-sample t test
Choose an SRS of size n from a large population that contains an
unknown mean µ. To test the hypothesis H0 : µ = µ0, compute the
one-sample t statistic:
x - µ0
t=
sx
n
Find the P-value by calculating the probability (at degrees of freedom = n – 1) of
getting a t statistic this large or larger in the direction specified by the alternative
hypothesis Ha.
Example
The level of dissolved oxygen (DO) in a stream or river is an
important indicator of the water’s ability to support aquatic life. A
researcher measures the DO level at 15 randomly chosen
locations along a stream. Here are the results in milligrams per
liter:
4.53 5.04 3.29 5.23 4.13 5.50 4.83 4.40
5.42 6.38 4.01 4.66 2.87 5.73 5.55

A dissolved oxygen level below 5 mg/l puts aquatic life at risk.


Steps of Significance Tests
1. Hypotheses
2. Test statistic
3. P-value
4. Conclusion
Hypotheses
q A dissolved oxygen level below 5 mg/l puts aquatic life at
risk.
q µ is the actual mean dissolved oxygen level in this stream.
q We want to perform a test at the α = 0.05 significance
level of:

H0: µ = 5 versus Ha: µ < 5


Check Assumptions
ü SRS: The researcher measured the DO level at 15 randomly chosen
locations.
ü Normal: With such a small sample size (n = 15), we need to look at the
data to see if it’s safe to use t procedures.

ü The histogram looks roughly symmetric; the boxplot shows no outliers;


and the Normal probability plot is fairly linear. With no outliers or strong
skewness, the t procedures should be pretty accurate even if the
population distribution isn’t exactly Normal.
Test Statistic
q H0: µ = 5 versus Ha: µ < 5
q 𝑥̅ = 4.771 and 𝑠Q = 0.9396 with n=15
q t test statistic
𝑥̅ − 𝜇@ 4.771 − 5
𝑠Q = = −0.94
G 𝑛 0.9396G
15
P-value
q H0: µ = 5 versus Ha: µ < 5
q 𝑥̅ = 4.771 and 𝑠Q = 0.9396 with n=15
q t = -0.94
q The P-value is the area to the left of t = Upper-tail probability p

–0.94 under the t distribution curve with df .25 .20 .15


13 .694 .870 1.079
df = 15 – 1 = 14. 14 .692 .868 1.076
15 .691 .866 1.074
q The P-value is between 0.15 and 0.20. 50% 60% 70%

Confidence level C
Conclusion
q H0: µ = 5 versus Ha: µ < 5
q 𝑥̅ = 4.771 and 𝑠Q = 0.9396 with n=15
q t = -0.94
q The P-value is between 0.15 and 0.20.
q Since this is greater than our α = 0.05 significance level,
we fail to reject H0. We don’t have enough evidence to
conclude that the mean DO level in the stream is less than
5 mg/l.
Matched Pairs t Procedures
q Study designs that involve making two observations on
the same individual, or one observation on each of two
similar individuals, result in paired data.
q For pared data, we can make comparisons by analyzing
the differences in each pair.
q If the conditions for inference are met, we can use one-
sample t procedures to perform inference about the mean
difference µd.
Example
Insurance adjusters are concerned about the high estimates
they are receiving for auto repairs from garage I compared to
garage II. To verify their suspicion, each of 15 cars recently
involved in an accident was taken to both garages for
separate estimates of repair costs.
Example
22
20
18
16
14
12

Garage 1 Garage 2
Hypothese
q Insurance adjusters are concerned about the high
estimates they are receiving for auto repairs from garage I
compared to garage II.
q Let 𝜇[ denote the difference of the average estimate from
garage 1 minus the average estimate from garage 2
q We want to perform a test at the α = 0.05 significance
level of H0: 𝜇[ = 0 versus Ha: 𝜇[ > 0
q Take the difference: d = garage 1 – garage 2
One-sample t test on the difference

Histogram of Difference Boxplot of Difference QQ Plot of Difference

● ● ●

1.0

1.0
1.2

● ● ●

0.8

0.8

1.0

0.6

0.6
Sample Quantiles

0.8
Density

0.4

0.4
0.6

● ● ●

0.2

0.2
0.4

0.0

0.0
0.2

−0.2

−0.2
0.0

−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 −1 0 1

Garage 1 − Garage 2 Garage 1 − Garage 2 Theoretical Quantiles


One-sample t test on the differences
q Take the difference: d = garage 1 – garage 2
q H0: 𝜇[ = 0 versus Ha: 𝜇[ > 0
q The sample mean and sample standard deviation of
differences are 𝑑̅ = 0.613 and 𝑠[ = 0.394
q Sample size n = 15
#
[D@ @.XOW
q Test statistic: 𝑡 = L] = F.^_` = 6.02
G I G ab
One-sample t test on the differences
q Take the difference: d = garage 1 – garage 2
q H0: 𝜇[ = 0 versus Ha: 𝜇[ > 0
#
[D@ @.XOW
q Test statistic: 𝑡 = L] = F.^_` = 6.02
G I G ab

q The P-value is the area to the right of t=6.02 under the t


distribution curve with df = 15 – 1 = 14.
q From the t table, the P-value is less than .0005.
One-sample t test on the differences
q Take the difference: d = garage 1 – garage 2
q H0: 𝜇[ = 0 versus Ha: 𝜇[ > 0
q The P-value is less than .0005.
q At the α = 0.05 significance level, we reject H0, which
indicates significant evidence to show the high estimates
they are receiving for auto repairs from garage I compared
to garage II.
Robustness of t Procedures
q A confidence interval or significance test is called robust
if the confidence level or P-value does not change very
much when the conditions for use of the procedure are
violated.
q Except in the case of small samples, the condition that
the data are an SRS from the population of interest is
more important than the condition that the population
distribution is Normal.
Robustness of t Procedures
q Sample size less than 15: Use t procedures if the data
appear close to Normal. If the data are clearly skewed or if
outliers are present, do not use t.
q Sample size at least 15: The t procedures can be used
except in the presence of outliers or strong skewness.
q Large samples: The t procedures can be used even for
clearly skewed distributions when the sample is large,
roughly n ≥ 40.
Power of t test
q Calculation of the exact power of the t-test is a bit
complex. But an approximate calculation that acts as if σ
were known is almost always adequate for planning a
study. This calculation is very much like that for the z-test.
q When guessing σ, it is always better to err on the side of a
standard deviation that is a little larger rather than smaller.
We want to avoid failing to find an effect because we did
not have enough data.
If population distribution is non-normal
When your sample is large,
q Use t procedure because of robustness
q Use z procedure because of Central Limit Theorem
#
CDE
LM ~ N(0,1) approximately
G I
If population distribution is non-normal
When your sample is small,
q Transformation: If the data are skewed, you can attempt to transform the
variable to bring it closer to Normality (e.g., logarithm transformation). The t-
procedures applied to transformed data are often quite accurate for even
moderate sample sizes.

q Use other distribution which might describe your data well. Many non-Normal
models have been developed to provide inference procedures too.

q You can always use a distribution-free (“nonparametric”) inference procedure


that does not assume any specific distribution for the population. However, such a
procedure is usually less powerful than distribution-driven tests (e.g., t-test).
Transforming Data
q The most common transformation is the logarithm
(log), which tends to pull in the right tail of a
distribution.
q The data values must all be positive in order to use the
log transformation.
q Instead of analyzing the original variable X, we first
compute the logarithms and analyze the values of log X.
q However, we cannot simply use the confidence interval
for the mean of the logs to deduce a confidence interval
for the mean µ in the original scale.

You might also like