Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
113 views

Stat - 5 Two Sample Test

1) Specify the null (no difference) and alternative (difference) hypotheses 2) Set the significance level at 0.01 and determine the degrees of freedom and critical t-value 3) Calculate the t-statistic using the means, standard deviations, and sample sizes of the two groups 4) Make a decision to reject or fail to reject the null hypothesis based on the t-statistic and critical value 5) Write a conclusion statement about whether the differences in scores were statistically significant

Uploaded by

Michael Jubay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
113 views

Stat - 5 Two Sample Test

1) Specify the null (no difference) and alternative (difference) hypotheses 2) Set the significance level at 0.01 and determine the degrees of freedom and critical t-value 3) Calculate the t-statistic using the means, standard deviations, and sample sizes of the two groups 4) Make a decision to reject or fail to reject the null hypothesis based on the t-statistic and critical value 5) Write a conclusion statement about whether the differences in scores were statistically significant

Uploaded by

Michael Jubay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Two Sample Tests

Consideration of these tests comes from two samples


randomly and independently drawn from the population The distribution is normal if
the population is normal or if
with equal population variances.
n1 and n2 are large. This
distribution will be a t-
This is called the homogeneity of variance assumption. distribution.

The sampling distribution of the difference is constructed as follows:


1. Define a population.
2. Take all possible samples of size n1 from the population and compute the mean for
each.
3. Take all possible samples of size n2 from the population and compute a mean for
each of these samples. ( Note: n1 could equal n2, but not necessarily equal)
4. Compute all mean differences.
5. Create a sampling distribution of mean differences. This sampling distribution is
called the sampling distribution of the difference.

Calculating the Standard Error of the Difference


 In estimating the population's variance, use both samples instead of just one.
 If one sample is larger than the other, it provides a better estimate of the population
variance than the smaller sample.
 The sample variance calculated with the larger sample size count for more than the
sample variance calculated with the smaller sample size.
 Using both sample variances to estimate the population variance with samples of
different size has to be weighed in some way. The equation below gives the formula
for pooling the variance of the two samples.

Note that each sample variance has


( n1  1) s12  ( n2  1) s22
s 2p  been multiplied by its sample size
n1  n2  2 minus one to weigh its importance.
Each of these weighted estimates is
then added together and a
weighted average is calculated by
dividing that result by n1+ n2 -2.

49 Two Sample Test Summer 2014


 1 1 
The equation for the standard error of the difference is sx1  x 2  s 2p   
 n1 n2 

Replacing the variance calculations with the actual calculation formulas for the sample
variance, the equation for the standard error of the difference becomes

 (  x 1) ( x 2 ) 
2 2

  x1 
2
  x2 
2

s x1  x 2   n1 n2  1  1 
 n1  n 2  2  n1 n 2 
  
 

Test Statistics for Comparing Two Sample Means (Small Populations, n ≤ 30)

Independent t - test
 Use to calculate the standard error of the mean when variance is assumed to be the
same.

t
x 1  
 x 2  1   2  1  2 is zero because the
sx  x hypothesized difference is zero
1 2

Substituting the calculation equation for the standard error of the difference,

x1  x 2
t 

  x12 
 x  2
 x  2

 x2 
2
1
  2

 n1 n2  1  1 

 n1  n 2  2  n
 1 
n2

 

Using the pooled variance, the t statistics is

x1  x 2
t 
 (n1  1)s12  (n 2  1)s 22  1 1 
   
 n1  n 2  2  n1 n 2 

Notes in Statistics Chapter 5


50
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
Problem
Suppose a physician is studying the effects of A1 and A1 + DDI on the life span of AIDS
patients. This physician randomly and independently samples two groups of 10 patients. He
places one group of 10 on an A1 protocol. The second group of 10 patients gets the A1 + DDI
protocol. The numbers of months till death of these two independent groups of AIDS
patients are then measured. The physician wants to know if there is a significant difference
between the survival times of these two groups of patients.
In the blank spaces fill in what is done in each step. Only the answers are provided.

Step 1. Specify the null and alternative hypotheses


H0 : 1 = 2
H0 : 1 ≠ 2

Step 2. Set the probability level


alpha = .05, df = 18, t critical value = 2.101

Step 3. Collect the data


Σx1 = 482, Σx2 = 500,  x12  23,642 ,  x22  25,024
x1  48.2 , x2  50.0 , n1  n2  10

Step 4. Calculate the statistic


The t value = -0.82

Step 5. Make a decision concerning the null hypothesis


Fail to reject Ho

Step 6. Write a summary statement


The time to death was not significantly different between group 1 and group 2 (t
= -0.82, df = 18, p > .05, two-tailed).

Problem
A researcher wishes to learn if a particular
chemical is toxic to a certain species of beetle
and interfere with the beetle’s reproduction.
She obtained beetles and divided them into two
groups. She then fed one group of beetles with
the chemical and used the second group as a
control. After 2 weeks, she counted the number
of eggs produced by each beetle in each group.
Is there an evidence to prove that the chemical
reduces the egg production?

51 Two Sample Test Summer 2014


Step 1. Specify the null and alternative hypotheses

H0 : 1 = 2
H0 : 1 ≠ 2

Step 2. Set the probability level


alpha = .05
degrees of freedom =
t critical value (tcv) =

Step 3. Collect the data

Step 4. Calculate the statistic

Step 5. Make a decision concerning the null hypothesis

Step 6. Write a summary statement

Problem 1:
A vicious debate between behaviorists and traditional medical doctors concerns the
merits of using the stimulant Ritalin in treating childhood hyperactivity. A large group
of hyperactive children in an urban school system were randomly assigned to either
a behavior modification program or a drug therapy program. The dependent variable
measured is the childrens' hyperactivity score after 18 weeks of treatment. Use the
data shown directly below to determine if there is a significant difference on these
childrens' hyperactivity scores.

22 24 27 18 21 19 18 17 14 17
Behavior
20 16 13 20 16 12 14 22 23 18
26 32 31 20 34 36 22 29 27 25
Drug
24 32 24 20 32 32 29 27 33 30

Problem 2:
Below is a partial data set from the Milgram Obedience to Authority study. Only two
situations are presented: Remote and Voice. Use the data to determine if there is a
significantly greater voltage given in the Remote situation. Set your alpha level at
0.01.

300 300 315 315 330 345 375 450 450 450
Remote
450 450 450 450 450 450 450 450 450 450
135 150 150 165 385 315 315 360 450 450
Voice
450 450 450 450 450 450 450 450 450 450

Notes in Statistics Chapter 5


52
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
Dependent t test (Matched/Paired Observations)
 Also called correlated group designs.
 Assesses or compares whether the mean difference between paired/matched
observations is significantly different from zero.
 Evaluates whether there is significant difference between the means of two
variables (test occasions or events).
 The participants are either tested under two conditions on one measure or
matched/paired on one or more characteristics and testes on one measure.
 The independent variable is dichotomous and its levels (groups/occasions) are
paired/matched in some ways.

Dependent t Formula
 t statistics is the ratio between the mean difference subtracted by D  D
the hypothesized difference and that of the standard deviation t
sD
( D)
2

 The sample standard deviation is computed using,  D2  n


SD 
n(n1  1)

 Substituting the sample standard deviation in the t statistic D


formula yields t
( D )
2

 D2  n
n(n1  1)
Sample Problem

Suppose a psychologist is interested in studying test anxiety in college students. She has two
treatments for students who report that they are test anxious. One treatment involves going
to a counseling session where relaxation training techniques are taught. A second treatment
involves going to a study group where the material taught in the college class is emphasized
and students are encouraged to study the material presented in the class. The psychologist
selects 20 students and then matches them on the basis of their general anxiety levels. Using
these matched pairs, the psychologist forms 10 pairs of students. One member is randomly
assigned to the experimental therapy group, the other to the study skills group. The
psychologist wants to determine if there is a difference in test anxiety scores due to these
treatments.

Step 1. Specify the null and alternative hypotheses


H0 :  1   2
H1 :  1   2

Step 2. Set the probability level


alpha = .05, df = n – 1 (number of pairs - 1 in a matched-groups design or pretest-
posttest design), tcv = 2.262

53 Two Sample Test Summer 2014


Step 3. Collect the data
2
 D  65 , D  6.5 ,  D  691

Step 4. Calculate the statistic


Calculated t = 3.76

Step 5. Make a decision concerning the null hypothesis


Reject the null hypothesis

Step 6. Write a summary statement


There is a significant difference between the therapy and study skills groups. The
experimental therapy leads to changed test anxiety scores (t = 3.76, df = 9, p < 0.05,
two-tailed).

Problem
A large group of learning - disabled college freshmen experiencing debilitating
anxiety before major tests were matched on an index of test anxiety. Members of
these matched pairs were randomly assigned to relaxation exercises (Relax) and
study skills training (Study) for two weeks. Using the data below determine if there
is a significant difference between their final exam scores.

Relax 46 49 47 48 50 52 48 47 46 49 42

Study 50 49 51 52 50 48 52 47 50 52 53

Solution:
Using the six steps

Step 1. Specify the null and alternative hypotheses


H0 :
H1 :

Step 2. Set the probability level

Step 3. Collect the data

Step 4. Calculate the statistic

Step 5. Make a decision concerning the null hypothesis

Step 6. Write a summary statement

Notes in Statistics Chapter 5


54
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
Problem 1:
The attitudes toward obtaining an advanced college degree of fifteen undergraduate minority
students were measured before and after they participated in a federally funded program
designed to increase their awareness of the benefits of a higher degree. The higher the score,
the more positive is the subject's attitude. Did attending the program significantly change
their attitudes?

Before 11 8 10 9 6 13 11 9 12 11 12 6 7 12 6

After 17 14 9 18 6 16 12 7 15 16 20 14 7 20 12

Problem 2:
Based upon the results of a manual dexterity test, two matched groups of BSIT Automotive
Major college students were formed. One group of students was asked to drive an automobile
simulator after having consumed three bottles of beer in one hour. The second group of
students drove the same simulator after consuming three bottles of nonalcoholic beer in the
same time period. The numbers of errors per minute were recorded in the simulator. Is there
a significant decrease in the errors per minute recording of the nonalcoholic group? Use an
alpha level of 0.01.

Alcohol 6 4 9 3 15 5 9 7 6 8
Non - alcohol 5 5 6 0 8 2 4 2 8 4

Test Statistics for Comparing Two Sample Means (Large Samples, n > 30)
 Use to calculate the standard error of the mean when variance is assumed to be the
same.

z
x 1
 
 x 2  1   2  1  2 is zero because the
hypothesized difference is zero
s x1  x2

 Dropping the part that is equal to zero and substitutes the x1  x 2


z 
square root of the sum of each variance divided by the
s12 s22
number of each sample size, the z statistic becomes 
n1 n2

Problem
The Administrator of a certain hospital was studying on the allegations received on the desk
that resident physicians and nurses are slow in responding emergency calls from senior
citizens who are medical and surgical patients while other patients receives faster service.
Unknown to the resident physicians and nurses, the administrator record the length of time
it took them to respond to the calls for both senior citizens and other patients as
summarized on the table:

Sample Sample Standard Number of Sample


Patients
Mean Deviation Taken
Senior Citizens 6.0 0.5 50
Other Patients 5.5 0.3 100
At 0.01 level of significance, the Administrator wanted to infer the truth of the said allegations.

55 Two Sample Test Summer 2014


Solution:
Using the six steps

Step 1. Specify the null and alternative hypotheses


H0 :
H1 :

Step 2. Set the probability level


alpha = 0.01, In the z test, the table value of the distribution (tcv) = ±2.58

Step 3. Collect the data


x 1  6.0 , x 2  5.5 , n1 = 50, n2 = 100, s1 = 0.5, s2 = 0.3

Step 4. Calculate the statistic 6.0  5.5


z  6.51
(0.5) 2 (0.3) 2

50 100
Step 5. Make a decision concerning the null hypothesis
Accept H1
Step 6. Write a summary statement
The length of time to respond the calls of senior citizens was significantly slower
compared to other patients (z = 6.51, p > .05, two-tailed).

Problem 1:
The Dean of the College of Industrial Technology and Engineering wants to investigate
whether there is any difference between the scores of the students in mathematics for those
who attend classes during the day and those who work during the day and attend evening
classes. Records of both day and evening classes revealed that
Standard Number of
Student Mean Score
Deviation Samples Taken
Day 90 12 53
Evening 94 15 46

Use the 6 step procedure and arrive at a decision from a 0.05 level of significance.

Problem 2:
Two machines fill bottles of water. Each machine performance is checked periodically testing
bottles removed randomly from the production line and yield the following statistics:
Mean Standard Number of
Machine
Weight Deviation Bottles Taken
1 203.56 3.56 40
2 200 2.02 50

Use the 6 steps procedure to arrive at a decision from a 0.01 alpha level.

Notes in Statistics Chapter 5


56
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
DECISION MAKING: INFERENCES ABOUT PROPORTION

ESTIMATING PROPORTION

 Estimating the proportion of the population that possesses an attribute is straight


forward

 A random sample is selected and the sample proportion is computed as follows:

X = number in the sample that possess the attribute

n = sample size
x
ˆ 
p is the proportion
n

Example:

The Philippine Star, a local newspaper, conducted a poll of 1000 randomly selected
readers to determine their views concerning the lifting out of the odd-even scheme in Metro
Manila. The paper found that 650 people in the sample felt that there is no effect in the
traffic flow. Compute the best point estimate for the percentage of readers who believe that
there is really no effect to traffic flow.

Solution:

Let x = number of people who believe that lifting out the odd-even scheme had no
effect to traffic flow
n = number of randomly selected readers
x 650
Thus, ˆ 
p   0.65
n 1000

SAMPLING DISTRIBUTION OF THE POINT ESTIMATE

 In order to develop the confidence interval for a


population proportion the sampling distribution of
the point estimate must be developed.

 The random variable p̂ has a binomial distribution


which is approximated with a normal random
variable.

57 Two Sample Test Summer 2014


 The sample proportion p̂ is distributed normally with mean, p, and p  p  1
variance, n
 The standard deviation of the sample p̂ proportion ( p̂ ) is denoted symbolically
and is given by
 p̂ = p1  p  ˆ 1  p
p ˆ
z
n n
where p̂ is used as an estimate of p if the population proportion is
unknown.

CONFIDENCE INTERVAL FOR THE POPULATION PROPORTION

Definition: If the sample size is sufficiently large np  5 and n(1-p)  5,


then 1 -  confidence interval of the population proportion is given by the
expression

p  z/2  p̂
where z/2 is the distance from the point estimate
to the end of the interval in deviation
units
 p̂ is the standard deviation of p̂
Example:

The Sky Cable Television Company thinks that 40% of their customers have more outlets
wired than what they are paying for. A random sample of 400 houses reveals that 110 of the
houses have excessive outlets. Construct a 99% confidence interval for the true proportion
of houses having too many outlets. Do you feel the company is accurate in its belief about
the proportion of customers who have more outlets wired than they are paying for?

Solution:

Let x = number of houses that have


excessive outlets
= 110
n = 400
confidence interval = 0.99

thus,
x 110
ˆ 
p   0.275
n 400

Notes in Statistics Chapter 5


58
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
A 99% confidence interval for the true proportion of houses having too many outlets is
given by

ˆ 1  p
p ˆ 0.2751  0.275
p  z/2 = 0.275  2.575
n 400

= 0.275  2.575 (0.022)


= 0.275  0.0575
= (0.3325, 0.2175)

Interpretation: We are 99% confident that the true proportion of houses having too many
outlets is between 0.2175 and 0.3325.

Note: We do not believe that 40% of the customers have more outlets wired than they are
paying for because we are 99% confident that the true proportion is in the
interval 21.75% to 33.25% and 40% is not in the interval.

ACCURACY IN ESTIMATING POPULATION PROPORTION

 Just as for the population mean, a specific level of accuracy in estimating a


population proportion is desirable.

 When we estimate extremely small quantities, highly precise estimates are


necessary.

DERIVING THE SAMPLE SIZE

 The technique for deriving the sample size parallels the discussion of the sample
mean.

 Setting one – half the entire width of the confidence interval equal to the maximum
allowable error yields

 p̂ = z/2 p 1  p 
error = z/2
n

p 1  p 
2
z
2
 Solving for n yields n =
error 2

59 Two Sample Test Summer 2014


SAMPLE SIZE WHEN p̂ IS KNOWN

Generally the population proportion is unknown and is estimated from a pilot study.

pˆ 1  pˆ 
In this case, the sample size necessary to estimate the proportion to 2
within a particular error with a certain level of confidence is given
z
by n 2
error 2
where p̂ is the estimate obtained from the study.

SAMPLE SIZE WHEN p̂ IS UNKNOWN

 If an estimate of the population proportion is not z 2


 0.51  0.5
available, then the population proportion is set n 2

equal to 0.5 and the sample size is given by error 2

 The value 0.5 maximizes the quantity p (1- p) and thus provides the most
conservative estimate of the sample size.

Example

Researchers from MCWD working in a remote mountain barangay of Cebu City feel that
40% of the families in the area are without adequate drinking water either through
contamination or unavailability. What sample size will be necessary to estimate the
percentage without adequate water to be within 5% with 95% confidence?

Solution:

Given: p̂ = 0.40 , e = 0.05 , confidence interval = 0.95

The sample size is given by,

z 2
ˆ 1  p
p ˆ
n 2
error 2
(1.96) 2 (0.40) 1  0.40 
n
(0.05) 2
n  368.79
 say 369 families

Notes in Statistics Chapter 5


60
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
HYPOTHESIS TEST ABOUT A PROPORTION

 Testing a hypothesis concerning a population proportion is nearly identical to


testing a hypothesis about a population mean.

 The major changes include the use of the population proportion in the formulation
of the hypothesis rather than the population mean, and some similar changes in the
calculation of test statistics.

FORMULATION OF THE HYPOTHESIS

Example

CTU Main Campus is trying a new student registration system and would like to know if
there is sufficient evidence to conclude that at least 60% of the students favor the new
system.

Hypothesis
Ho: p  0.6 (at most 60% of the students prefer the new system)
Ha: p > 0.6 (more than 60% of the students prefer the new system)

TEST STATISTICS FOR TESTING ABOUT POPULATION PROPORTION

 The test statistic measures how far the sample proportion ( p̂ ) is ˆp
p
from the unknown population proportion ( p ) measured in z
standard deviation units.  pˆ
 If np  5 and n(1-p)  5, the appropriate test statistic is given by p(1  p)
 pˆ 
n
CALCULATION OF THE STANDARD DEVIATION OF p̂

 To compute  p̂ requires knowledge of p.

 If p is unknown, how can it be used to compute  p̂ ?

 Since the null is presumed to be true, p equals the value hypothesized in the null
hypothesis.

61 Two Sample Test Summer 2014


CALCULATING A p – value FOR A PROPORTION

Example:

 Assuming the null is true, p̂ has a normal distribution which is centered around 0.5

 If the null is really true, a z – value of 1.76 is


uncommon.

 The probability of observing a value as large


or larger than 1.76 is the p – value.
p – value = p( z  1.76 )
= 0.5 – 0.4068
= 0.0392

 Presuming the null is true, ordinary sampling variation would produce a value this
large about 4 times out of every 100.

FORMULATION OF THE HYPOTHESIS

Example:

A drug manufacturer believes that the manufacturing process is in control if the standard
deviation of the dosage in each tablet is at most 0.10 mg. The process will be shut down if
there is overwhelming evidence that the process has excessive variation.

Hypothesis
Ho: σ O 0.10 (the variation of the process is acceptable)
Ha: σ > 0.10 (the variation of the process is excessive)

 n  1 s 2
TEST STATISTIC FOR TESTING ABOUT A POPULATION VARIANCE

 Degrees of freedom = n – 1  2
2
o
o
2
 The term refers to the hypothesized value of the variance.

 The actual variance of the population is unknown.

From the previous example,  o = 0.10 which implies that  o = 0.01.


2

 The sample variance, s2 should be reasonably close to the unknown population
variance.
s2
 If the null is true, then σ2= 0.01 and the ratio should be near 1 since s2
 o2
should be close to  o .
2

Notes in Statistics Chapter 5


62
Compiled by: Nolasco K. Malabago Comparing Sample Means: Two Samples
DISTRIBUTION OF THE  2 TEST STATISTIC

The test statistic for testing a hypothesis about a


2 distribution
population variance is called a  test statistic
2

because it has a Chi Squared distribution.

COMPARING TWO POPULATION PROPORTIONS

A methodology for comparing two population proportions is particularly useful because


proportions are among the only measures which can be used for summarizing categorical
data.

Assumptions:
1. An independent experimental design.
2. The samples are large enough such that n1p1  5, n2p2  5 and n2 (1 – p2)  5
where n1 is the random sample selected from the first population, n2 is the
random sample selected from the second population.

FORMULATION OF THE HYPOTHESIS

Example:
A telephone executive is interested in the number of defective phones being produce by his
two plants.
Hypothesis:
Ho: p1 – p2 = 0 (There is no difference in the proportion of defective phones
produce at the two plants).
Ha: p1 – p2 ≠ 0 (There is a difference in the production of defective phones produce
at the two plants).

TEST STATISTIC WHEN COMPARING TWO POPULATION PROPORTIONS

 If the null hypothesis is true, then p1 = p2 and p̂1 and p̂ 2 are estimating the same
quantity.
 Thus p̂1 and p̂ 2 are pooled to derive a better estimate of the population
proportion.
pˆ1  pˆ 2  0
 The test statistic is a standard normal random variable z
and is given by pc 1  pc  p 1  pc 
 c
 If the null hypothesis is true, z has an approximately normal n1 n2
distribution.
 If the observed value of p̂1 - p̂ 2 is significantly larger or n1 n
smaller than 0, this will produce a large or small value of the pc  pˆ1  2 pˆ 2
n1  n2 n1  n2
test statistic, causing us to question whether or not the null
hypothesis in fact true.

63 Two Sample Test Summer 2014

You might also like