Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
88 views

Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)

This unit covers hypothesis testing and tests of significance. Students will learn the formal process for hypothesis testing, including stating the null and alternative hypotheses, checking conditions, calculating test statistics, determining p-values, and making conclusions. The document provides an example of using this process to analyze data on students' perceptions of elapsed time. It outlines the six main steps and has students practice applying them to additional examples.

Uploaded by

Riddhiman Pal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)

This unit covers hypothesis testing and tests of significance. Students will learn the formal process for hypothesis testing, including stating the null and alternative hypotheses, checking conditions, calculating test statistics, determining p-values, and making conclusions. The document provides an example of using this process to analyze data on students' perceptions of elapsed time. It outlines the six main steps and has students practice applying them to additional examples.

Uploaded by

Riddhiman Pal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit 5: Test of Significance/Hypothesis Testing

(Topics 20, 22, 23)

In this unit we will add to what we have learned about statistical inference by studying tests of
significance. These tests will assess the degree to which sample data provide evidence against a
particular conjecture or hypothesis about the value of the population mean. We will study the
formal structure of the tests, starting with the hypotheses and ending with a conclusion. Like a
confidence interval allows us to infer something about a population mean, so does a significance
test.
Work for Dec 7 – 15 begins here:
Estimating Elapsed Time (from Introduction to Statistical Investigations by Tintle, et. all)
Does it ever seem like time drags on or time flies by? Perception, including that of time, is one of the
things that psychologists study. Students in a statistics class collected data on 48 other students’
perception of time. They told their subjects that they would be listening to some music and then after it
was over, they would be asked some questions. They played 10 seconds of the Jackson 5’s song “ABC”.
Afterward, they simply asked the subjects how long they thought the song clip lasted. They wanted to see
whether students could accurately estimate the length of this short song segment. Below is a frequency
table for the data:
Time (sec) 5 6 7 8 10 12 13 15 20 21 22 30
Frequency 1 1 3 6 11 3 3 10 4 1 1 4

Let’s explore this study using a formal, step-by-step process called a test of significance. We will outline
the six main steps in such a test throughout this activity and you will use these same steps for the next two
topics as well. Since we will be working with samples of quantitative data, we will be conducting t-tests.

a. Write out a definition of the parameter of interest in the elapsed time study and indicate what symbol
you will use to represent it.

Page 1
b. In this elapsed time study, the null hypothesis is that the mean time elapsed is 10 seconds. Restate this
using the symbol from part a and the appropriate hypothesized value, instead of words:

c. In the elapsed time study, the students think the other students’ estimates will differ from the actual
time since the perception of time is inaccurate. Restate this conjecture (with symbols and a number) as an
alternative hypothesis:

Note that the symbol and hypothesized value don’t change between the null and the alternative
hypothesis; you are just selecting between less than, greater than, or not equal to based on the research
conjecture.

d. What conditions needed to be met for the Central Limit Theorem to be satisfied?

Note that you will often check these conditions assuming the null hypothesis is true. If random sampling
isn’t mentioned, it can usually be assumed because it is part of a well-designed experiment. If the sample
is not large enough (n < 30), look at displays of the distribution and/or a normal probability plot to make
sure the data appear to be normally distributed. If not, proceed with caution through the rest of the test.
If independence isn’t clear (if you don’t know how large the population is), you can usually assume a
sample is independent, but you should note you are assuming this when you state the conditions.

Page 2
At this point, it is very good practice to then draw a well-labeled sketch of the sampling distribution of the
sample mean using the mean of 10 seconds and the standard deviation of the sampling distribution of

You should mark the observed value on the graph. In


this case the observed value is the sample mean of
13.708 seconds, which is essentially off the graph.

7.185 8.124 9.062 10 10.938 11.876 12.815


Sample Mean, Time Elapsed in seconds

e. Your calculator can find the test statistic, or t-statistic since this is a t-test, but you can find it as well
by calculating how many standard deviations of the sampling distribution the sample mean is away from
the population mean (very similar to the z-score calculation):

f. Since our alternative hypothesis is “not equal”, we will need the probability in both tails. This is a
two-sided or two-tailed test. Since our t-statistic is positive, we will find the probability above it in the t-
distribution with the correct degrees of freedom (47 in this case) and then we will multiply that
probability by 2 to calculate the total probability in both tails. You can do this using a t-table or using
your calculator (if you use your calculator with the correct alternative hypothesis, it will multiply by 2 for
you).

Page 3
Note that the smaller the p-value, the stronger the evidence against the null hypothesis and in favor of the
alternative hypothesis. Typical evaluations are

 A p-value above .10 constitutes little or no evidence against the null hypothesis.
 A p-value below .10 but above .05 constitutes moderately strong evidence against the null
hypothesis.
 A p-value below .05 but above .01 constitutes reasonably strong evidence against the null
hypothesis.
 A p-value below .01 constitutes very strong evidence against the null hypothesis.

In some studies, the researcher decides in advance how small the p-value must be to provide convincing
evidence against the null hypothesis. This cutoff value is called a significance level, denoted by α (alpha).
Common values are α = .10, α = .05, and α = .01. A smaller significance level indicates a stricter
standard for deciding if the null hypothesis can be rejected. If the researcher specifies a level of
significance in advance, you would then say you reject or fail to reject at that particular level. Another
common expression is to say that the data are statistically significant if it is unlikely to have occurred by
chance or sampling variability alone (assuming that the null hypothesis is true).

g. Does the p-value for the Elapsed Time study lead to reject or failing to reject the null hypothesis at
the .01 level?

h. Does this study provide convincing evidence that the mean estimate for time elapsed was different
than 10 seconds? Explain in context.

Page 4
Practice Exercise:
Use the 6-step process outlined above to complete a test of significance for the following situation (from
AMSCO’s AP Statistics): An association of college bookstores reported that the average amount of
money spent by students on textbooks was $325.16 with a standard deviation of $76.42. A random
sample of 75 students at the local campus of the state university indicated that an average bill for
textbooks for the semester in question to be $312.34. Do these data provide significant evidence at a 95%
confidence level (same as a 0.05 significance level) that the actual bill will be less than the $325.16 that
was reported? Show all steps.

Page 5
Walk the Line Experiment

You’ll need someone to help you with this data collection. We want to test how much eyesight helps you
keep walking in a straight line. We assume without any eye covering, you could walk a straight line
without any issues. We will have you blindfolded, either inside or outside depending on your own
situation, and try to walk down a straight line for 10 ft. Hopefully there is already a line on the floor or
the sidewalk you use. At the end of the 10 ft, measure how far off the line (to either side) you are in
inches.
What is the parameter we wish to study?

What is the null hypothesis?


What is the alternative hypothesis?

Materials: blind-fold, helper to make sure you don’t run into anything, ruler

Procedure:
1. Find a spot, inside or outside, with 10 ft of a straight line.

2. Have someone blindfold you and place you at the start of the 10 foot segment.

3. Walk 10 ft ahead. Have someone stop you at the end of 10 ft.

4. Measure how far off the line you are at the end of the 10 ft in inches. Count either side as
positive.

Record your data in the data collection spreadsheet and use data from the spreadsheet to finish the
problem.

Calculate the sample mean and standard deviation:

In order to complete a t-test, verify the technical conditions:

Page 6
Calculate the test statistic:

Calculate the p-value and find a confidence interval using a 95% confidence level:

Summarize your conclusion:

Now complete Activity 20-1 starting on page 422 and Activity 20-2 starting on page 426 in your
textbook.

Page 7
Topic 20 Summaries

1. From the Watch-Out on page 426,


a. Is a hypothesis about a parameter or a statistic?

b. When should the alternative hypothesis be formulated?

c. What is the denominator of the test statistic?

d. When you calculate the p-value for a two-sided alternative, what area is included?

e. What does a small p-value indicate?

2. From the Watch-Out on page 430, when should you be cautious about generalizing to a larger
population?

Page 8
Work for Dec 16 – Jan 3 begins here:
Complete Activity 22-1 starting on page 472 in your textbook.

Personality Quiz
Students will take a personality-style quiz using the link: https://forms.gle/6UafB96FAD3njkga7. After
answering the quiz questions, students will calculate a score. The data collection will include the score
and whether the student was born before July 1 or not. The question is whether there is a difference in
score on the personality-style quiz based on time of year you were born.

1. Identify the parameter(s) to be measured:

2. State the hypotheses:

Data Collection

Take the personality-style quiz using the link above. It is an FCPS Google link. Report your score to the
data collection spreadsheet and then use data from there to complete this problem.

3. Check Assumptions (technical conditions):

Testing the results


Let n E be the number of students born before July 1.
Let n L be the number of students before on or after July 1.
Fill in the summary statistics for each sample:

nE = = sE =

nL = = sL =

Page 9
Which distribution will you use? Why? Are there any assumptions to be made that you haven't already
stated? Explain.

4. Test Statistic:

t=

5. p-value =
d.f. (if applicable) =
95% C.I. =

6. Test decision and conclusion in context:

Page 10
Topic 22 Summaries

1. From the Notes on page 475


a. How large sample sizes need to be depends on how non-normal the original populations are. If

both are both symmetric with similar shapes, then the sample size may be __________________

but if the populations are skewed, we prefer sample sizes be _____________________________.

b. What does it mean that the degrees of freedom convention is a conservative approximation?

2. From page 479, all else being the same, a test results becomes more statistically significant as

a.

b.

c.

3. From the Watch-Out on page 479, is failing to reject a null hypothesis the same as accepting it?

4. From the Watch-Out on page 483,


a. What is a good first step to choose a correct procedure?

b. Before performing a test, what must you check?

c. What should you always relate your conclusion to?

Page 11
Work for Jan 4/5 begins here:
Complete Activity 23-1 starting on page 498 in your textbook.
Complete Activity 23-3 on page 504 in your textbook.

Heart Rate Experiment: Is there a difference between your resting and active heart rate?

For most people, their resting heart rate is lower than their active heart rate. Heart rate is measured in
beats per minute and can be calculated by finding your pulse in your wrist, counting the beats in 10
seconds, and multiplying by 6 to get beats per minute.
In a matched pairs experiment, we can test that idea. We will test the same person’s heart rate at rest and
after 1 minute of jumping jacks.

1. Parameter:

2. Hypotheses:

Materials: timer

Procedure:
1. After sitting (at rest) for at least one minute, find your resting heart rate. The pulse in your wrist
is usually easiest to find. Using two fingers (not your thumb), gently press on your wrist until you feel
your pulse. Count the number of beats in 10 seconds. Multiply that by 6 to get beats per minute.

2. Do jumping jacks for 1 minute.

3. Find your active heart rate. Immediately following your minute of jumping jacks, find your pulse
and count the number of beats in 10 seconds. Multiply that by 6 to get beats per minute.

Difference in heart rate in bpm (active – resting) = _________________________________

You will record this in the data collection spreadsheet.

3. Technical Conditions:

Page 12
4. Test statistic:

5. p-value:
df =
95% Confidence Interval:

6. Test decision (Significance Level: ):


Topic 23 Summaries

Page 13
From the Watch-Out on page 502,
a. How do you determine which t-procedure to use?

b. What can help you decide whether data are collected with a paired design?

c. In which type of design does mixing up the order of values in one group create a problem?

d. If a study has a different number of observations between two groups, what type of design
cannot it not be?

Page 14
Summary

Hypothesis Testing – t-Test: Procedure to test if there is any statistical significance to your data.

Step 1: Collect data either through a sampling plan or experimental design.


Define your parameter of interest in context, using words and symbols.

Step 2: State competing claims concerning the parameter of interest. Write the null and alternative
hypothesis in words and symbols.

Step 3: Determine if the conditions are met to conduct an appropriate test.


Determine the sample size. Use to check for independence.
Plot the data to check for normality if sample size is too small.
Check for randomness.

Step 4: Calculate test statistic, some measure of difference between values (z-score, t-score, etc.).
Calculate degrees of freedom.

Step 5: Calculate the p-value associated with the test statistic.

Step 6: Draw a conclusion based on whether or not to reject or fail to reject the null hypothesis. Using
the p-value from Step 5, assume that the null hypothesis is true and the resulting probability (p-
value) is that of obtaining the sample statistical measure or a more extreme sample statistical
measure. If this probability is smaller than the level of significance, α, you should reject the
null hypothesis. If this probability is larger than the level of significance, α, you should fail to
reject the null hypothesis.

Your conclusion is two parts. The first part (a) is including the p-value to state whether you
reject or fail to reject the null hypothesis. The second part (b) is stating in context of the
problem what you are rejecting or failing to reject.

Another thing to consider:

A level of significance, α, is the maximum probability of error that you are willing to allow in the
hypothesis testing procedure.

Page 15
Unit 5 In-Class Review

1. Perform a complete hypothesis test: A state university is concerned that there is a difference in the
writing abilities of their male and female students. To test this assertion, the university took a random
sample of 60 of their first-year students and recorded their genders and SAT Writing scores. The data
appears below.

SAT Writing Scores of Female Students


480 540 620 590 530 620 580 530 530 560 510 560
560 550 520 480 560 510 500 540 490 430 610 620
510

SAT Writing Scores of Male Students


480 560 400 580 480 460 430 430 490 610 540 500
540 400 530 640 350 470 600 610 530 580 430 510
520 380 540 460 640 520 570 560 490 440 480

Use an appropriate t-test to compare these data sets and their population means.

Page 16
2. A study on children’s television viewing was conducted by Stanford researchers (Robinson, 1999). At
the beginning of the study, parents of third- and fourth-grade students at two public elementary schools in
San Jose were asked to report how many hours of television the child watched in a typical week. The 198
responses had a mean of 15.41 hours and a standard deviation of 14.16 hours.

Conduct a test of whether or not these sample data provide evidence at the .05 level for concluding that
third- and fourth-grade children watch an average of more than two hours of television per day. Include
all the components of a significance test, and explain what each component reveals. Start by identifying
the observational units, variable, sample, and population.

3. Police trainees were seated in a darkened room facing a projector screen. Ten different license planes A B
were projected on the screen, one at a time, for 5 seconds each, separated by 15-second intervals. 6 6
8 5
After the last 15-second interval, the lights were turned on and the police trainees were asked to write 6 6
down as many of the 10 license plate numbers as possible, in any order at all. 7 5
9 7
A random sample of 15 trainees who took this test where then given a week-long memory training 8 5
course. They were then retested. The results are shown in the table (A is after training, B is before) 9 4
6 6
Test, at the 5% level of significance, that the memory course improves the ability of the trainees to 7 7
correctly identify license plates. 5 8
9 4
8 5
6 4
8 6
6 7

Page 17
Extra Practice: Unit 5 Review

1. Which type of sampling must be used to select the samples used for constructing confidence
intervals and performing hypothesis tests?
2. The null hypothesis is a claim about a:
a) parameter, where the claim is assumed to be false until it is declared true
b) parameter, where the claim is assumed to be true until it is declared false
c) statistic, where the claim is assumed to be false until it is declared true
d) statistic, where the claim is assumed to be true until it is declared false
3. If we want to calculate a confidence interval or perform a hypothesis test for a population mean,
when will we use the t -distribution rather than the z -distribution in the formulas and procedures?
4. The mean federal income tax paid last year by a random sample of 19 persons selected from a city
was $4275 with a standard deviation of $766. If we want to use this information to test at a 5%
significance level that the mean income tax of all persons in this city is more than $4000, we
a) could construct a Z-interval
b) could construct a T-interval
c) could perform a Z-test
d) could perform a T-test
5. In a hypothesis test, if we REJECT the null hypothesis at a 5% significance level, then it must be
that
a) P-value > 0.05
b) P-value < 0.05
c) P-value = 0.05
d) P-value > 0.025
6. A two-tailed hypothesis test using the normal distribution reveals that the area under the sampling
distribution curve of the mean and located in the tail to the right of the sample mean equals 0.028.
Consequently, the p-value for this test equals:
7. We want to know if there is a difference in pay among females and males at a large cooperation. We
draw two random samples, one from the population of female employees and one from the
population of male employees at this cooperation. The two samples are
a) independent
b) dependent
c) matched samples
d) paired samples
8. Drug A was given to 132 patients and Drug B was given to 127 patients in a Phase 3 clinical trial for
efficacy. Each drug claims to reduce patients' diastolic blood pressure. Blood pressure readings were
taken before and after administration of the drug to test parameters μ A and μ B. What type of T-test is
appropriate and how many degrees of freedom would you use in the following T-tests, using the
textbook's conservative approximations?

a) H 0: The average diastolic of patients given Drug A was 85, before administration.

Page 18
b) H 0: The average change in diastolic of patients given Drug A was -5.
c) H 0: The average diastolic of patients given Drug A was the same as patients given Drug B,
before administration.
d) H 0: The average change in diastolic of patients given Drug B was -5.
e) H 0: Drugs A and B are equally effective because the average change in diastolic for the two
groups is identical.
9. A soft-drink manufacturer claims that its 12-ounce cans do not contain, on average, more than 30
calories. A random sample of 64 cans of this soft drink, which were checked for calories, contained
a mean of 32 calories with a standard deviation of 3 calories. Does the sample information support
the alternative hypothesis that the manufacturer's claim is false? Use a significance level of 5%.

10. Listed below are temperatures (in ❑∘ F ) of subjects measured at 8:00 am and then again at 12:00am.

a) Construct a 95% confidence interval estimate of the difference between the 8:00 am
temperatures and the 12:00 am temperatures.
b) Test at 5% significance level the claim that the body temperature is the same at both times.
c) Explain the relationship between your answers to the above 2 parts.
8:00 97. 96. 97. 96. 97. 99.
AM 0 2 6 4 8 9
12:00 98. 98. 98. 98. 98. 97.
AM 0 6 8 0 6 6

11. John read that farmers in Japan routinely subject plants to stress before transplanting from the
greenhouse to the field. Methods of stress induction included pulling on the plants and hitting them with
straw rakes. John decided to investigate this phenomenon by growing two groups of bean plants
(10/group) in a greenhouse for 15 days during which time the plants in one group were pulled on three
times daily at 8:00 in the morning and at 4:00 in the afternoon. The plants were then transplanted to a
field. John hypothesized that stressed plants would exhibit significantly larger mean plant heights after
transplanting than the non-stressed plants (control). Use and complete a hypothesis test showing
all work.
Plant heights (in cm.) after 30 days were:
Stressed Plants: 55, 65, 50, 57, 59, 73, 57, 54, 62, 68
Non-stressed Plants: 48, 65, 59, 57, 51, 63, 65, 58, 44, 50

12. Car emissions on highway and in-town


Claim: Mean level of emissions is less for highway driving than for stop-and-go in-town driving.
Data: Each car is driven both on the highway and in-town

Formulate and test an appropriate hypothesis.

Stop-and-Go Highway
1 1500 941
2 870 456
3 1120 893
4 1250 1060
5 3460 3107
6 1110 1339

Page 19
7 1120 1346
8 880 644
Unit 5 Review Key
1. Random sampling of independent items; 2. B;

3. Use t when you don't know σ , the population standard deviation.

4. B or D; 5. B; 6. p=2(0.028)=0.056

7. A. If the corporation is sufficiently large, it is safe to assume the samples are independent.
8. a. 1-Sample T-test with 131 degrees of freedom, b. 1-Sample Matched Pairs T-test with 131 degrees
of freedom, c. 2-Sample T-test with 126 degrees of freedom, d. 1-Sample Matched Pairs T-test with
126 degrees of freedom, e. 2-Sample T-test with 126 degrees of freedom

9. The test rejects H 0 in favor of H A : μ0 >30 with a t -statistic of 5.3 and a p-value less that 10−6 .

10. a) A 95% confidence interval is (−0.9094,2 .476) using the TI-84.


b) We fail to reject H 0 at the 5% level because p=0.2876
c) We are 95% confident that the difference in means is between -0.9094 and 2.476. Since 0 is
within this confidence interval, we cannot reject H 0 at the 5% level. The 95% central probability
associated with the confidence interval is the complement of the 5% alpha-region which would
allow us to reject H 0.
11. Two-sample t-test for the stressed and non-stressed plants:

1. is the mean height in cm after 30 for the stressed plants; is the mean height in cm
after 30 days for the non-stressed plants

2.
3. Simple random sampling isn’t stated, but we can assume that the 20 plants used were
randomly sampled from a larger population. We also know that the two samples are
independent of each other and that there are far more than 200 plants, so the samples are
independent. Since the sample size, 10, is less than 30, we will check normality by
constructing a normal probability plot for each sample. The sample heights in cm are on the
x-axis and the z-score for each height is the y-value (the red squares are the stressed plants
and the blue crosses are the non-stressed plants):
The red squares from
the stressed plants
appear to form a line
so we can assume the
sample is normal.
The blue crosses from
the non-stressed
plants are not quite as
linear, but we will
proceed with caution.

Page 20
4. Since the two groups of plants were part of different treatment groups, either being stressed
before moving or non-stressed, we will run a two-sample t-test.
Test statistic: t = 1.240
5. p-value = 0.115, df = 17.944 or 17.945
95% confidence interval: (-2.777, 10.777)
6. With p = 0.115 > 0.05, we fail to reject the null hypothesis. We do not have sufficient
evidence to say that the mean height in cm of the stressed plants is greater than the mean
height of the non-stressed plants. The 95% confidence interval includes the value 0 which
is more evidence that there is no difference in the plant heights in cm for the two groups on
average.
12. Matched pairs t-test because emissions values were taken from each car, once when driven on the
highway and once when driven off the highway.
1. is the mean emissions value (no units given) for non-highway driving; is the
mean emissions value for the highway driving; is the difference of the means, non-
highway minus highway

2.
Note: if you set of the mean of the differences the opposite way, you would choose the
opposite alternative hypothesis.
3. Simple random sampling isn’t stated, but we can assume that the 16 cars were randomly
sampled from a larger population. We also know that there are far more than 160 cars, so
the sample is independent. Since the sample size, 8, is less than 30, we will check
normality by constructing a normal probability plot for the sample that is the difference in
emissions for each car. The sample difference in emissions for each car is on the x-axis
and the z-score for each car is the y-value:
From the normal
probability plot
we are not
convinced that
the sample data
is normal since
the points for not
look
approximately
linear, so we will
proceed with
caution.

4. Since each car was tested for


emissions after highway and non-
highway driving, we will conduct a matched pairs one-sample t-test:
Test statistic: t = 1.896 or 1.897

5. p-value = 0.049 or 0.050; df = 7


95% confidence interval: (-47.02, 428.02)
6. The p-value = 0.049 is just at or slightly under 0.05 so we reject the null hypothesis. We
have some evidence that emissions from cars driven off the highway are higher than

Page 21
emissions from cars driven on the highway. The 95% confidence interval includes 0, so
the evidence that there is a difference is minimal, but the center of the confidence interval
is 237.52, well above 0, showing that emissions from cars driven off the highway tend to
be much higher.

Page 22
Glossary
Alternative hypothesis – a statement of what researchers suspect or hope to be true about the
parameter. It will take one of these three forms:
 Ha: parameter < hypothesized value
 Ha: parameter > hypothesized value
 Ha: parameter ≠ hypothesized value
The specific form (direction) of the alternative is determined by the research question, before
the sample data are determined.

Comparing two means – common inference procedure used when the response variable is
quantitative; procedure attempts to distinguish between an observed difference due to sampling
variability and one too large to have occurred by chance.

Conservative - describes decisions and/or calculations that may underestimate statistical


significance in order to avoid the worse error of saying an observed value or observed difference is
more unusual than it actually is. When calculating a confidence interval, a conservative approach
may yield a slightly wider interval. (see p.475) A test is conservative if, when conducted for a given
significance level, the true probability of incorrectly determining significance is unlikely.

Matched-pairs Experiment – Experiment that incorporates blocking, where the block size is 2. The
pairs may arise naturally, and they may not be independent.

Null hypothesis – A statement about the parameter of interest. Typically a statement of no effect or
no difference, the null states the parameter of interest is equal to a specific value:
H0-: parameter = hypothesized value
One-tailed test – significance test conducted when the alternative hypothesis is one-sided. For
example, Ha: μ > μ0 or Ha: μ < μ0.

Practical significance – When large samples are available, even tiny deviations from the null
hypothesis will be statistically significant. But a tiny deviation may not have practical importance,
so use your common sense and look at the size of an observed difference. Ask yourself whether the
observed difference is important.
p-value – The probability, assuming the null hypothesis to be true, of obtaining a test statistic at
least as extreme as the one actually observed. Extreme means in the direction of the alternative
hypothesis.
Robust – Describes a procedure that tends to give reasonable results even for small sample sizes as
long as the population is not severely skewed and does not have extreme outliers

Significance Level – The cutoff p-value that the researcher decides in advance in order to provide
convincing evidence against the null hypothesis.

Technical conditions for t-test – The t-test requires a simple random sample from a population of
interest. The t-test also requires either a large sample size or a normally distributed population.
You can generally regard a sample of at least 30 as large enough for the procedure to be valid. If
the sample size is less than 30, examine visual displays of the sample data to see whether they
appear to follow a normal distribution.

Page 23
Test decision – A comment evaluating the strength of evidence against the null hypothesis. Where a
test decision needs to be made:
If the p-value is small, reject the null hypothesis.
If the p-value is not small, fail to reject the null hypothesis.
The decision should respond to the research question, stating that you either have evidence for the
alternative hypothesis (in context) or you do not. In other words, restate your final conclusions in
the language of the research question.

Test of Significance – A significance test is a formal procedure for comparing observed data with a
claim (hypothesis) whose truth we want to assess. The claim is a statement about a parameter, like
the population mean μ. We express the results of a significance test in terms of a probability that
measures how well the data and the claim agree.

Test statistic – This is a measure of the discrepancy between our observed statistic and the
hypothesized value of the parameter. If the discrepancy is large, we have evidence against the null
hypothesis.

Two-sample t-tests – Inference procedure to compare two populations or two treatments. We


examine the difference and compare it to the hypothesized difference μ1 – μ2.

Two-tailed test – When we look for results at least as extreme as the sample result in both
directions. When the alternative hypothesis is two-sided (not equal to), we find the p-value by
computing
2 ∙ P(Z > |z|)

Page 24

You might also like