Stat - 5 Two Sample Test
Stat - 5 Two Sample Test
Replacing the variance calculations with the actual calculation formulas for the sample
variance, the equation for the standard error of the difference becomes
( x 1) ( x 2 )
2 2
x1
2
x2
2
s x1 x 2 n1 n2 1 1
n1 n 2 2 n1 n 2
Test Statistics for Comparing Two Sample Means (Small Populations, n ≤ 30)
Independent t - test
Use to calculate the standard error of the mean when variance is assumed to be the
same.
t
x 1
x 2 1 2 1 2 is zero because the
sx x hypothesized difference is zero
1 2
Substituting the calculation equation for the standard error of the difference,
x1 x 2
t
x12
x 2
x 2
x2
2
1
2
n1 n2 1 1
n1 n 2 2 n
1
n2
x1 x 2
t
(n1 1)s12 (n 2 1)s 22 1 1
n1 n 2 2 n1 n 2
Problem
A researcher wishes to learn if a particular
chemical is toxic to a certain species of beetle
and interfere with the beetle’s reproduction.
She obtained beetles and divided them into two
groups. She then fed one group of beetles with
the chemical and used the second group as a
control. After 2 weeks, she counted the number
of eggs produced by each beetle in each group.
Is there an evidence to prove that the chemical
reduces the egg production?
H0 : 1 = 2
H0 : 1 ≠ 2
Problem 1:
A vicious debate between behaviorists and traditional medical doctors concerns the
merits of using the stimulant Ritalin in treating childhood hyperactivity. A large group
of hyperactive children in an urban school system were randomly assigned to either
a behavior modification program or a drug therapy program. The dependent variable
measured is the childrens' hyperactivity score after 18 weeks of treatment. Use the
data shown directly below to determine if there is a significant difference on these
childrens' hyperactivity scores.
22 24 27 18 21 19 18 17 14 17
Behavior
20 16 13 20 16 12 14 22 23 18
26 32 31 20 34 36 22 29 27 25
Drug
24 32 24 20 32 32 29 27 33 30
Problem 2:
Below is a partial data set from the Milgram Obedience to Authority study. Only two
situations are presented: Remote and Voice. Use the data to determine if there is a
significantly greater voltage given in the Remote situation. Set your alpha level at
0.01.
300 300 315 315 330 345 375 450 450 450
Remote
450 450 450 450 450 450 450 450 450 450
135 150 150 165 385 315 315 360 450 450
Voice
450 450 450 450 450 450 450 450 450 450
Dependent t Formula
t statistics is the ratio between the mean difference subtracted by D D
the hypothesized difference and that of the standard deviation t
sD
( D)
2
D2 n
n(n1 1)
Sample Problem
Suppose a psychologist is interested in studying test anxiety in college students. She has two
treatments for students who report that they are test anxious. One treatment involves going
to a counseling session where relaxation training techniques are taught. A second treatment
involves going to a study group where the material taught in the college class is emphasized
and students are encouraged to study the material presented in the class. The psychologist
selects 20 students and then matches them on the basis of their general anxiety levels. Using
these matched pairs, the psychologist forms 10 pairs of students. One member is randomly
assigned to the experimental therapy group, the other to the study skills group. The
psychologist wants to determine if there is a difference in test anxiety scores due to these
treatments.
Problem
A large group of learning - disabled college freshmen experiencing debilitating
anxiety before major tests were matched on an index of test anxiety. Members of
these matched pairs were randomly assigned to relaxation exercises (Relax) and
study skills training (Study) for two weeks. Using the data below determine if there
is a significant difference between their final exam scores.
Relax 46 49 47 48 50 52 48 47 46 49 42
Study 50 49 51 52 50 48 52 47 50 52 53
Solution:
Using the six steps
Before 11 8 10 9 6 13 11 9 12 11 12 6 7 12 6
After 17 14 9 18 6 16 12 7 15 16 20 14 7 20 12
Problem 2:
Based upon the results of a manual dexterity test, two matched groups of BSIT Automotive
Major college students were formed. One group of students was asked to drive an automobile
simulator after having consumed three bottles of beer in one hour. The second group of
students drove the same simulator after consuming three bottles of nonalcoholic beer in the
same time period. The numbers of errors per minute were recorded in the simulator. Is there
a significant decrease in the errors per minute recording of the nonalcoholic group? Use an
alpha level of 0.01.
Alcohol 6 4 9 3 15 5 9 7 6 8
Non - alcohol 5 5 6 0 8 2 4 2 8 4
Test Statistics for Comparing Two Sample Means (Large Samples, n > 30)
Use to calculate the standard error of the mean when variance is assumed to be the
same.
z
x 1
x 2 1 2 1 2 is zero because the
hypothesized difference is zero
s x1 x2
Problem
The Administrator of a certain hospital was studying on the allegations received on the desk
that resident physicians and nurses are slow in responding emergency calls from senior
citizens who are medical and surgical patients while other patients receives faster service.
Unknown to the resident physicians and nurses, the administrator record the length of time
it took them to respond to the calls for both senior citizens and other patients as
summarized on the table:
Problem 1:
The Dean of the College of Industrial Technology and Engineering wants to investigate
whether there is any difference between the scores of the students in mathematics for those
who attend classes during the day and those who work during the day and attend evening
classes. Records of both day and evening classes revealed that
Standard Number of
Student Mean Score
Deviation Samples Taken
Day 90 12 53
Evening 94 15 46
Use the 6 step procedure and arrive at a decision from a 0.05 level of significance.
Problem 2:
Two machines fill bottles of water. Each machine performance is checked periodically testing
bottles removed randomly from the production line and yield the following statistics:
Mean Standard Number of
Machine
Weight Deviation Bottles Taken
1 203.56 3.56 40
2 200 2.02 50
Use the 6 steps procedure to arrive at a decision from a 0.01 alpha level.
ESTIMATING PROPORTION
n = sample size
x
ˆ
p is the proportion
n
Example:
The Philippine Star, a local newspaper, conducted a poll of 1000 randomly selected
readers to determine their views concerning the lifting out of the odd-even scheme in Metro
Manila. The paper found that 650 people in the sample felt that there is no effect in the
traffic flow. Compute the best point estimate for the percentage of readers who believe that
there is really no effect to traffic flow.
Solution:
Let x = number of people who believe that lifting out the odd-even scheme had no
effect to traffic flow
n = number of randomly selected readers
x 650
Thus, ˆ
p 0.65
n 1000
p z/2 p̂
where z/2 is the distance from the point estimate
to the end of the interval in deviation
units
p̂ is the standard deviation of p̂
Example:
The Sky Cable Television Company thinks that 40% of their customers have more outlets
wired than what they are paying for. A random sample of 400 houses reveals that 110 of the
houses have excessive outlets. Construct a 99% confidence interval for the true proportion
of houses having too many outlets. Do you feel the company is accurate in its belief about
the proportion of customers who have more outlets wired than they are paying for?
Solution:
thus,
x 110
ˆ
p 0.275
n 400
ˆ 1 p
p ˆ 0.2751 0.275
p z/2 = 0.275 2.575
n 400
Interpretation: We are 99% confident that the true proportion of houses having too many
outlets is between 0.2175 and 0.3325.
Note: We do not believe that 40% of the customers have more outlets wired than they are
paying for because we are 99% confident that the true proportion is in the
interval 21.75% to 33.25% and 40% is not in the interval.
The technique for deriving the sample size parallels the discussion of the sample
mean.
Setting one – half the entire width of the confidence interval equal to the maximum
allowable error yields
p̂ = z/2 p 1 p
error = z/2
n
p 1 p
2
z
2
Solving for n yields n =
error 2
Generally the population proportion is unknown and is estimated from a pilot study.
pˆ 1 pˆ
In this case, the sample size necessary to estimate the proportion to 2
within a particular error with a certain level of confidence is given
z
by n 2
error 2
where p̂ is the estimate obtained from the study.
The value 0.5 maximizes the quantity p (1- p) and thus provides the most
conservative estimate of the sample size.
Example
Researchers from MCWD working in a remote mountain barangay of Cebu City feel that
40% of the families in the area are without adequate drinking water either through
contamination or unavailability. What sample size will be necessary to estimate the
percentage without adequate water to be within 5% with 95% confidence?
Solution:
z 2
ˆ 1 p
p ˆ
n 2
error 2
(1.96) 2 (0.40) 1 0.40
n
(0.05) 2
n 368.79
say 369 families
The major changes include the use of the population proportion in the formulation
of the hypothesis rather than the population mean, and some similar changes in the
calculation of test statistics.
Example
CTU Main Campus is trying a new student registration system and would like to know if
there is sufficient evidence to conclude that at least 60% of the students favor the new
system.
Hypothesis
Ho: p 0.6 (at most 60% of the students prefer the new system)
Ha: p > 0.6 (more than 60% of the students prefer the new system)
The test statistic measures how far the sample proportion ( p̂ ) is ˆp
p
from the unknown population proportion ( p ) measured in z
standard deviation units. pˆ
If np 5 and n(1-p) 5, the appropriate test statistic is given by p(1 p)
pˆ
n
CALCULATION OF THE STANDARD DEVIATION OF p̂
Since the null is presumed to be true, p equals the value hypothesized in the null
hypothesis.
Example:
Assuming the null is true, p̂ has a normal distribution which is centered around 0.5
Presuming the null is true, ordinary sampling variation would produce a value this
large about 4 times out of every 100.
Example:
A drug manufacturer believes that the manufacturing process is in control if the standard
deviation of the dosage in each tablet is at most 0.10 mg. The process will be shut down if
there is overwhelming evidence that the process has excessive variation.
Hypothesis
Ho: σ O 0.10 (the variation of the process is acceptable)
Ha: σ > 0.10 (the variation of the process is excessive)
n 1 s 2
TEST STATISTIC FOR TESTING ABOUT A POPULATION VARIANCE
Degrees of freedom = n – 1 2
2
o
o
2
The term refers to the hypothesized value of the variance.
Assumptions:
1. An independent experimental design.
2. The samples are large enough such that n1p1 5, n2p2 5 and n2 (1 – p2) 5
where n1 is the random sample selected from the first population, n2 is the
random sample selected from the second population.
Example:
A telephone executive is interested in the number of defective phones being produce by his
two plants.
Hypothesis:
Ho: p1 – p2 = 0 (There is no difference in the proportion of defective phones
produce at the two plants).
Ha: p1 – p2 ≠ 0 (There is a difference in the production of defective phones produce
at the two plants).
If the null hypothesis is true, then p1 = p2 and p̂1 and p̂ 2 are estimating the same
quantity.
Thus p̂1 and p̂ 2 are pooled to derive a better estimate of the population
proportion.
pˆ1 pˆ 2 0
The test statistic is a standard normal random variable z
and is given by pc 1 pc p 1 pc
c
If the null hypothesis is true, z has an approximately normal n1 n2
distribution.
If the observed value of p̂1 - p̂ 2 is significantly larger or n1 n
smaller than 0, this will produce a large or small value of the pc pˆ1 2 pˆ 2
n1 n2 n1 n2
test statistic, causing us to question whether or not the null
hypothesis in fact true.