Lecture 4
Lecture 4
Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and John McGready. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.
Lecture Topics
Comparing two groupsthe paired data situation Hypothesis testingthe null and alternative hypotheses p-valuesdefinition, calculations, and more information
Section A
The Paired t-Test and Hypothesis Testing
Paired Design
Before vs. After Why pairing?
The sample average of the differences is 4.8 The sample standard deviation (s) of the differences is s = 4.6
by
Note
In essence, what we have done is reduced the BP information on two samples (women prior to OC use, women after OC use) into one piece of information: information on the differences in BP between the samples This is standard protocol for comparing paired samples with a continuous outcome measure
Where SEM =
Notes
Notes
The number 0 is NOTin confidence interval (1.538.07) Because 0 is not in the interval, this suggests there is a non-zero change in BP over time
Notes
The BP change could be due to factors other than oral contraceptives A control group of comparable women who were not taking oral contraceptives would strengthen this study
Hypothesis Testing
Want to draw a conclusion about a population parameter In a population of women who use oral contraceptives, is the average (expected) change in blood pressure (after-before) 0 or not?
Hypothesis Testing
Sometimes statisticians use the term expectedfor the population average is the expected (population) mean change in blood pressure
Hypothesis Testing
Null hypothesis: Alternative hypothesis:
Hypothesis Testing
We are testing both hypotheses at the same time Our result will allow us to either reject H0or fail to reject H0
Hypothesis Testing
We start by assuming the null (H0) is true, and asking . . . How likely is the result we got from our sample?
Do our sample results allow us to reject H0 in favor of HA? would have to be far from zero to claim HA is true But is = 4.8 big enough to claim HA is true?
Do our sample results allow us to reject H0 in favor of HA? Maybe we got a big sample mean of 4.8 from a chance occurrence Maybe H0 is true, and we just got an unusual sample
Do our sample results allow us to reject H0 in favor of HA? We need some measure of how probable the result from our sample is, if the null hypothesis is true
Do our sample results allow us to reject H0 in favor of HA? What is the probability of having gotten such an extreme sample mean as 4.8 if the null hypothesis (H0: = 0) was true? (This probability is called the p-value)
Do our sample results allow us to reject H0 in favor of HA? If that probability (p-value) is small, it suggests the observed result cannot be easily explained by chance
The p-value
So what can we turn to evaluate how unusual our sample statistic is when the null is true?
The p-value
We need a mechanism that will explain the behavior of the sample mean across many different random samples of 10 women, when the truth is that oral contraceptives do not affect blood pressure Luckily, weve already defined this mechanismits the sampling distribution!
Sampling Distribution
Sampling Distribution
Recall, the sampling distribution is centered at the truth,the underlying value of the population mean, In hypothesis testing, we start under the assumption that H0 is trueso the sampling distribution under this assumption will be centered at 0, the null mean
Getting a p-value
To compute a p-value, we need to find our value of , and figure out how unusual it is
Getting a p-value
In other words, we will use our knowledge about the sampling distribution of to figure out what proportion of samples from our population would have sample mean values as far away from 0 or farther, than our sample mean of 4.8
Section A
Practice Problems
Practice Problems
1. Which of the following examples involve the comparison of paired data? If so, on what are we pairing the data?
Practice Problems
a. In Baltimore, a real estate practice known as flipping has elicited concern from local/federal government officials Flipping occurs when a real estate investor buys a property for a low price, makes little or no improvement to the property, and then resells it quickly at a higher price
Practice Problems
a. This practice has raised concern, because the properties involved in flipping are generally in disrepair, and the victims are generally low-income Fair housing advocates are launching a lawsuit against three real estate corporations accused of this practice
Practice Problems
a. As part of the suit, these advocates have collected data on all houses (purchased by these three corporations) which were sold in less than one year after they were purchased Data were collected on the purchase price and the resale price for each of these properties
Practice Problems
a. The data were collected to show that the resale prices were, on average, higher than the initial purchase price A confidence interval was constructed for the average profit in these quick turnover sales
Practice Problems
b. Researchers are testing a new blood pressure-reducing drug; participants in this study are randomized to either a drug group or a placebo group Baseline blood pressure measurements are taken on both groups and another measurement is taken three months after the administration of the drug/placebo
Practice Problems
b. Researchers are curious as to whether the drug is more effective in lowering blood pressure than the placebo
Practice Problems
2. Give a one sentence description of what the p-value represents in hypothesis testing
Section A
Practice Problem Solutions
Solutions
1(a).The flipping example In this example, researchers were comparing the difference in resale price and initial purchase price for each property in the sample This data is paired and the pairing unit is each property
Solutions
1(b). Miracle blood pressure treatment Researchers used before and after blood pressure measurements to calculate individual, person-level differences
Solutions
1(b). Miracle blood pressure treatment To evaluate whether the drug is effective in lowering blood pressure, the researchers will want to test whether the mean differences are the same amongst those on treatment and those on placebo So the comparison will be made between two different groups of individuals
Solutions
2. The p-value is the probability of seeing a result as extreme or more extreme than the result from a given sample, if the null hypothesis is true
Section B
The p-value in Detail
Recall the results of the example on BP/OC use from the previous lecture Sample included 10 women Sample Mean Blood Pressure Change4.8 mmHg (sample SD, 4.6 mmHg)
What is the probability of having gotten a sample mean as extreme or more extreme then 4.8 if the null hypothesis was true (H0: = 0)? The answer is called the p-value In the blood pressure example, p = .0089
We observed a sample mean that was 3.31 standard errors of the mean (SEM) away from what we would have expected the mean to be if OC use was not associated with blood pressure
Is a result 3.31 standard errors above its mean unusual? It depends on what kind of distribution we are dealing with
Recall what we know about the sampling distribution of the sample mean, If our sample is large (n > 60), then the sampling distribution is approximately normal
Recall what we know about the sampling distribution of the sample mean, With smaller samples, the sampling distribution is a t-distribution with n-1 degrees of freedom
So in the BP/OC example, we have a sample of size 10, and hence a sampling distribution that is t-distribution with 10 -1 = 9 degrees of freedom
To compute a p-value, we would need to compute the probability of being 3.31 or more standard errors away from 0 on a t9 curve
Stata Output
The p-value in the blood pressure/OC example is .0089 InterpretationIf the true before OC/after OC blood pressure difference is 0 amongst all women taking OCs, then the chance of seeing a mean difference as extreme/more extreme as 4.8 in a sample of 10 women is .0089
Recall, we specified two competing hypotheses about the underlying, true mean blood pressure change,
We now need to use the p-value to choose a course of action . . . either reject H0, or fail to reject H0 We need to decide if our sample result is unlikely enough to have occurred by chance if the null was trueour measure of this unlikeliness is p = 0.0089
Establishing a cutoff In general, to make a decision about what p-value constitute unusual results, there needs to be a cutoff, such that all p-values less than the cutoff result in rejection of the null
Establishing a cutoff Standard cutoff is .05this is an arbitrary value Cut off is called -level of the test
Establishing a cutoff Frequently, the result of a hypothesis test with a p-value less than .05 (or some other arbitrary cutoff) is called statistically significant At the .05 level, we have a statistically significant blood pressure difference in the BP/OC example
Blood Pressure
Statistical method The changes in blood pressures after oral contraceptive use were calculated for 10 women A paired t-test was used to determine if there was a statistically significant change in blood pressure and a 95% confidence was calculated for the mean blood pressure change (after-before)
Blood Pressure
Result Blood pressure measurements increased on average 4.8 mm Hg with standard deviation 4.6 mmHg The 95% confidence interval for the mean change was 1.5 mmHg -8.1 mmHg
Blood Pressure
Result The blood pressure measurements after oral contraceptive use were statistically significantly higher than before oral contraceptive use (p=.009)
Blood Pressure
Discussion A limitation of this study is that there was no comparison group of women who did not use oral contraceptives We do not know if blood pressures may have risen without oral contraceptive usage
The paired t-test is a useful statistical tool for comparing mean differences between two populations which have some sort of connection or link
Example three Matched case control scenario Suppose we wish to compare levels of a certain biomarker in patients with a given disease versus those without
Compute difference in outcome for each paired set of observations Compute , sample mean of the paired differences Compute s, sample standard deviation of the differences
Usually, just:
Section B
Practice Problems
Practice Problems
Eight counties were selected from State A Each of these counties was matched with a county from State B, based on factors, e.g., Mean income Percentage of residents living below the poverty level Violent crime rate Infant mortality rate (IMR) in 1996
Practice Problems
Information on the infant mortality rate in 1997 was collected on each set of eight counties IMR is measured in deaths per 10,000 live births A pre-and post-neonatal care program was implemented in State B at the beginning of 1997
Practice Problems
This data is being used to compare the IMR rates in States A and B in 1997 This comparison will be used as part of the evaluation of the neonatal care program in State B, regarding its effectiveness on reducing infant mortality
Practice Problems
The data is as follows:
Practice Problems
1. What is the appropriate method for testing whether the mean IMR is the same for both states in 1997? 2. State your null and alternative hypotheses 3. Perform this test by hand 4. Confirm your results in Stata
Practice Problems
5. What would your results be if you had 32 county pairs and the mean change and standard deviation of the changes were the same?
Section B
Practice Problem Solutions
Solutions
What is the appropriate test for testing whether the mean IMR is the same for both states? Because the data is paired, and we are
Solutions
2. State your null and alternative hypotheses Three possible ways of expressing the
hypotheses . . .
Solutions
2. State your null and alternative hypotheses Three possible ways of expressing the
hypotheses . . .
Solutions
2. State your null and alternative hypotheses Three possible ways of expressing the
hypotheses . . .
Solutions
2. State your null and alternative hypotheses Three possible ways of expressing the
hypotheses . . .
Solutions
test, we must first calculate the difference in IMR with in each pair I will take the difference to be IMRB IMRA
Solutions
3. Perform this test by hand Once the differences are calculated,
and
= 6.13 (deaths per 10,000 live births) = 14.5 (deaths per 10,000 live births)
Solutions
3. Perform this test by hand To calculate our test statistic . . .
Solutions
3. Perform this test by hand We need to compare our test-statistic to
a t-distribution with 81=7 degrees of freedom. Consulting our table, we see we must be at least 2.3 standard errors from the mean (below or above) for the p-value to be .05 or less We are 1.2 SEs below; therefore, our pvalue will be larger than .05
Solutions
3. Perform this test by hand Since p > .05, we would fail to conclude
there was a difference in mean IMR for State A and State B This is as specific as we can get about the p-value from our t-table
Solutions
4. Confirm your results in Stata
Solutions
4. Confirm your results in Stata
Solutions
4. Confirm your results in Stata
Solutions
5. What would your results be if you had 32 county pairs and the mean change and standard deviation of the changes were the same?
Solutions
Section C
The p-value in Even More Detail!
p-values
p-values are probabilities (numbers between 0 and 1) Small p-values mean that the sample results are unlikely when the null is true The p-value is the probability of obtaining a result as/or more extreme than you did by chance alone assuming the null hypothesis H0 is true
p-values
The p-value is NOT the probability that the null hypothesis is true! The p-value alone imparts no information about scientific/substantive content in result of a study
p-values
One-sided p-value (p = .0045) Probability of a more extreme positive result than observed
You never know what direction the study results will go In this course, we will use two-sided pvalues exclusively The appropriate one sided p-value will be lower than its two-sided counterpart
Stata Output
The confidence interval gives plausible values for the population parameter
If 0 is not in the 95% CI, then we would reject H0 that = 0 at level = .05 (the p-value < .05)
So, in this example, the 95% confidence interval tells us that the p-value is less than .05, but it doesnt tell us that it is p= .009
A significant p-value is only ruling out random sampling (chance) as the explanation
Example: Blood Pressure and Oral Contraceptives n = 100,000; = .03 mmHg; s= 4.57 p-value = .04
Big n can sometimes produce a small p-value even though the magnitude of the effect is very small (not scientifically/substantively significant)
Very Important Always report a confidence interval 95% CI: 0.002 -0.058 mmHg
Section C
Practice Problems
Practice Problems
1. Why do you think there is such a controversy regarding one-sided versus two-sided p-values? 2. Why can a small mean difference in a paired t-test produce a small p-value if n is large?
Practice Problems
3. If you knew that the 90% CI for the mean blood pressure difference in the oral contraceptives example did NOT include 0, what could you say about the p-value for testing . . .
Practice Problems
4. What if the 99% CI for mean difference did NOT include 0? What could you say about the p-value?
Section C
Practice Problem Solutions
Solutions
1. Why do you think there is such a controversy regarding one-sided versus two-sided p-values?
If the appropriate one-sided hypothesis test is done (the one that best supports the sample data), the p-value will be half the p-value of the two sided test
Solutions
1. Why do you think there is such a controversy regarding one-sided versus two-sided p-values?
This allows for situations where the two sided p-value is not statistically significant, but the one-sided p-value is
Solutions
2. Why can a small mean difference in a paired t-test produce a small p-value if n is large?
When n gets large (big sample), the SEM gets very small. When SEM gets small, t gets large
Solutions
3. If you knew that the 90% CI for the mean blood pressure difference in the oral contraceptives example did not include 0, what could you say about the p-value for testing:
The p-value is less than .10 (p < .10).This is as as specific as we can be with the given information.
Solutions
What if the 99% CI for mean difference did not include 0? What could you say about the p-value?