Professional Documents
Culture Documents
5 Basic Steps in Hypothesis Test: Men Willingly Believe What They Wish." - Julius Caesar (100-44 BC)
5 Basic Steps in Hypothesis Test: Men Willingly Believe What They Wish." - Julius Caesar (100-44 BC)
Lecture 1.3
Men willingly believe what they wish. --Julius Caesar (100-44 BC)
1. Chi-Square Test of Goodness- of- Fit 2. Conditional Probability. Independence. Chi-Square Test: Test of Independence
Population Parameter !
Sample Statistics
Response
! !! ! ! !!!"#$!!$! ! !
Test Statistic ! p0 p Z= p0 (1 ! p0 ) n
where
Confidence Interval
= p
X n
If the p-value " #, then reject H0. If the p-value >#, then fail to reject H0. Recall Probability of exactly k successes in trials: ! ! ! !! ! ! ! ! ! ! ! !! !! ! ! !! !!! ! ! ! ! !! !!! !!!where ! ! ! ! !! ! ! ! !
Two-Sample Hypothesis Test for Difference in Population Proportions (Large n1, n2)
Test Scenario Difference in Population Proportions Explanatory Variable 2 Independent Categorical Group (1,2) Samples (YES/NO, (Gender, Success/Failure, Student/NonTrue/False) Students) Is there a significant difference in proportions of athletes among male and female students at BU? Large Sample Hypothesis Test for Difference in Population Proportions H0: p1-p2=0 Decision Rule: Ha: p1>p2 ! ! !!!! Ha: p1!p2 ! ! !!!!!! Ha: p1<p2 ! ! !!!!! Data Population Parameter !! ! !! Sample Statistics !! ! !! Response
Assumptions
Large n1, n2 !! !! ! !!and ! ! !! ! ! ! ! ! ! and !! !! ! !! ! ! !! ! ! ! ! ! !
Confidence Interval
1 ! p 2 ) Z1!(" / 2) (p
1 (1 ! p 1 ) p (1 ! p 2) p + 2 n1 n2
# ! ! "
2
= p
X1 + X 2 n1 + n2
p=
p1 + p 2 ,q = 1! p 2
Chi-Square Tests
Name of the Test Goodness-of-Fit Test (how well categorical data FITS expected distribution) Description Uses Multinomial Distribution: response is the choice of a category from more than 2 possible categories H0 involves statements about what the proportions should be for the k response categories H0: p1 = p01, p2 = p02,... , pk = p0k Ha: H0 is false Note: all proportions add up to 1. ! ! !! !!!!! ! ! ! Test of Independence (if two categorical variables can be considered INDEPENDENT) ! Consider 1 population and 2 categorical variables. ! Test if the two categorical variables appear to be related (dependent) for a given population of interest. H0: [variable 1] and [variable 2] are INDEPENDENT Ha: [variable 1] and [variable 2] are DEPENDENT Are angry people more likely to have Heart Disease? Are sleep deprived students are more likely to be stressed? Examples Do people park equally often on all 4 levels of a parking garage on rainy days? Do students register equally likely for morning, afternoon, and evening sessions?
Chi-Square statistic: !! ! !
!! !!! ! !!
, reject H0 if !! ! ! !! ! !!"!!
where Oi is the observed count and Ei is the expected count under the corresponding null hypothesis.
Decision Rule: reject H0 if !! ! ! !! ! !!"!!, df = k-1 Oi is the observed count and !! ! ! !!!!" !! is the expected count under the corresponding H0.
Example (DAgoustino, Example 7.10): Goodness of Fit for Teen Issues Volunteers at a teen hotline have been assigned based on the assumption that 40% of all calls are drug related, 25% are sex related (e.g., date rape), 25% are stress related, and 10% concern educational issues. For this investigation, each call is classified into one category based on the primary issue raised by the caller. To test the hypothesis, the following data are collected from 120 randomly selected calls placed to the teen hotline. Based on the data, is the assumption regarding the distribution of topic issues appropriate? Topic Issue: Number of calls: Drugs 52 Sex 38 Stress 21 Education 9 Total n=120
Step 1: Define parameter of interest and state the hypothesis. Parameter: pi = the proportion of calls related to (1 Drugs, 2- Sex, 3 Stress, 4 - Education) H0: p1 = 0.40, p2 = 0.25, p3 = 0.25, p4 = 0.10 Ha: H0 is false Significance Level ! = 0.05
To compute the chi-square statistics, extend the table of observed counts and compute expected counts:
Topic Issue: Number of calls =Oi (Expected Counts)!! ! ! !!!!!" (Oi - Ei)2/Ei
Stress 21
Education 9
Total n=120
Step 3: Assuming the H0 is true, define decision: Decision Rule: Reject H0 if ! ! ! ! !! ! !!"!! Using Table B.5 !! ! !" ! ! !! ! ! ! !!!!" Step 4: Decide whether or not the result is statistically significant based on rejection region: Decision: Fail to Reject H0 Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n=120 phone calls, there is no significant evidence at 5% significance level to conclude that volunteers at a teen hotline have been assigned inappropriately.
Exercise: According to M&Ms web site each regular package of Milk Chocolate M&Ms should contain 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&Ms. Count candies, and use an appropriate test of hypotheses to check if the claimed percentage is consistent with the stated proportion distribution. Step 1: Define parameter of interest and state the hypothesis. H0:________________________________________________ Ha:________________________________________________ Step 2: Summarize the data into an appropriate test statistic: First, create table of observed and expected counts: Colors Blue Brown Green Orange H0 0.24 0.14 0.16 0.2 Oi Ei ! =_____________
Red 0.13
Yellow 0.14
$2 =
df =
Step 3: Assuming the H0 is true, define decision rule:
Step 4: Decide whether or not the result is statistically significant based on rejection region: Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________
6
Exercise: As part of an on-going study, men who quit smoking using a variety of methods are being followed for several years. A group of 350 men (n=350) who quit smoking using a nicotine patch were tracked down 3 years after their quitting date. They were asked if they had successfully quit or if they had gone back to smoking. The possible answers and number of men who answered the question that way are given below. Outcome I haven't smoked since (#1) 188 I don't smoke much anymore, but I occasionally light up (#2) 35 I smoke as much now as I did before (#3) 111 I smoke Total more now than I did before (#4) 16 350
The researchers wish to test the following hypothesis: H0: p1 = 0.60, p2 = 0.10, p3 = 0.25, p4 = 0.05 Step 1: Define parameter of interest and state the hypothesis. Parameter: H0:________________________________________________ Ha: ________________________________________________ ! = 0.05 Step 2: Summarize the data into an appropriate test statistic: !! ! !
!! !!! ! !!
with df = k-1 =
Step 4: Decide whether or not the result is statistically significant based on rejection region:
Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_____________________________________________
7
Test of Independence
A fundamental question: Is there a relationship between the two variables so that the chance that an individual falls into a particular category for one variable depends upon the particular category they fall into for the other variable? A procedure for assessing the statistical significance of a relationship between categorical variables is the chi-square test of independence. Chi-Square Test of Independence !! ! !
!! !!! ! !!
H0: [variable 1] and [variable 2] are INDEPENDENT Oi is the observed count and !!" ! ! !"#!!!!"#$% ! !!"#$%&!!!!"#$%!!!!"#$%!!is the expected count under the corresponding H0.
Two random events A and B are (statistically) independent if and only if ! ! ! ! ! ! !!!"# !! ! !!!!!!!!! Thus, if A and B are independent, then their joint probability can be expressed as a simple product of their individual probabilities. Equivalently, for two independent events A and B with non-zero probabilities: !! !!! ! ! ! ! !"!!!!!!! !!! ! ! ! ! In other words, if A and B are independent, then the conditional probability of A, given B is simply the individual (marginal) probability of A alone; likewise, the probability of B given A is simply the probability of B alone.
8
Example: Are angry people more likely to have Coronary Heart Disease? Coronary heart disease (CHD) is a narrowing of the small blood vessels that supply blood and oxygen to the heart. CHD is also called coronary artery disease. People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a study that followed a random sample of 12,986 people from three locations over about four years. All subjects were free of heart disease at the beginning of the study. The subjects took the Spielberger Trait Anger Scale, which measures how prone a person is to sudden anger. The 8474 people in the sample who had normal blood pressure were classified according to whether they had coronary heart disease (CHD) or not and whether they had low anger, moderate anger, or high anger according to the Anger Scale. CHD * TEMPER Crosstabulation
Count TEMPER Moderate anger 110 4621 4731
CHD Total
CHD No CHD
1. What proportion of sampled subjects had CHD? (Answer: 0.022) 2. What proportion of High anger subjects had CHD? (Answer: 0.043) 3. What proportion of Moderate anger subjects had CHD? (Answer: 0.023)
Step 1: State the hypotheses. H0:__Having CHD is independent of Level of Anger _____________ Ha: _______________________________________________________ Significance level ! =______0.05_______
* TEMPER Crosstabulation Step 2: Summarize CHD the data into an appropriate test statistic: Count TEMPER Moderate anger 110 4621 4731
CHD Total
CHD No CHD
NOTE: All cell counts must be % 5. First, compute expected counts: !!" ! ! !"#!!!!"#$% ! !!"#$%&!!!!"#$%!!!!"#$% !"#$%&!!!!"#$% !"#$ !!" ! ! !"#!!!!"#$% ! ! ! !"# ! ! !"#! !! !"!#$ !"#"
CHD * TEMPER Crosstabulation TEMPER Moderate anger 110 106.1 4621 4624.9 4731 4731.0
CHD
CHD No CHD
Total
$2 =16.077 with df = (R-1)(C-1)= (2-1)(3-1) =2 Step 3: Assuming the H0 is true, define decision rule: $2!5.99 Step 4: Decide whether or not the result is statistically significant based on rejection region: ___________________________________________________________ Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= __________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________
10
Exercise: Are sleep deprived students are more likely to be stressed? Using the following data perform an appropriate test and conclude if there is a relationship between level of stress and the lack of sleep of BU students at 5% significance level. Stressed Sleep Deprived Not Sleep Total Deprived 18 9 Not Stressed 7 12 Total
Step 1: Define parameter of interest and state the hypothesis. H0:________________________________________________ Ha:________________________________________________ Step 2: Summarize the data into an appropriate test statistic: !! ! ! df = Step 3: Assuming the H0 is true, define decision rule: !! ! !! !!
!
! =_____________
Step 4: Decide whether or not the result is statistically significant based on rejection region: Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________
11