Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
29 views

Lecture Note On Statistical Methods With An Application

fers

Uploaded by

Beshir Eshetu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture Note On Statistical Methods With An Application

fers

Uploaded by

Beshir Eshetu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 489

Introduction 1

Objectives of this course


• Learn how to summarize data (descriptive statistics)

• Learn how to use sample data to make inferences about population

parameters.

• Become an informed user of statistical analysis.

• Learn to think for yourself. Knowledge of statistics will allow you to see the

difference between junk science and real science.

• To guide the design of an experiment or survey prior to data collection

• To analyze data using proper statistical procedures and techniques

• To present and interpret the results to researchers and other decision makers
INTRODUCTION TO
STATISTICS
A lecture note by

Dereje D. (PhD)

Department of Statistics

Hawassa University
Introduction 3

Key Terms
• Population. Universe. The entire category under consideration. This is the
data which we have not completely examined but to which our conclusions
refer. The population size is usually indicated by a capital N.
• Examples: every lawyer in the United States; all single women in the United
States.

• Sample. That portion of the population that is available, or to be made


available, for analysis. A good sample is representative of the population.
We will learn about probability samples and how they provide assurance that
a sample is indeed representative. The sample size is shown as lower case
n.
• If your company manufactures one million laptops, they might take a sample of
say, 500, of them to test quality. The population size is N = 1,000,000 and the
sample size is n= 500.
Introduction 4

Key Terms
Parameter. A characteristic of a population. The
population mean, µ and the population standard
deviation, σ, are two examples of population
parameters. If you want to determine the population
parameters, you have to take a census of the entire
population. Taking a census is very costly.

Statistic. A statistic is a measure that is derived from


the sample data. For example, the sample mean, ,
and the sample standard deviation, s, are statistics.
They are used to estimate the population
parameters.
Introduction 5

Key Terms
• Statistical Inference. The process of using sample
statistics to draw conclusions about population parameters
is known as statistical inference.

• For instance, using (based on a sample of, say, n=1000) to draw


conclusions about µ (population of, say, 300 million). This is a
measure of performance in which the sample measurement is used
to estimate the population parameter.

• Note that pollsters do not call every adult who can vote for
president. This would be very expensive. What pollsters do is call a
representative sample of about 1,000 people and use the sample
statistics (the sample proportion) to estimate who is going to win
the election (population proportion).
Introduction 6

Key Terms
Example of statistical inference from quality control:

• GE manufactures LED bulbs and wants to know how


many are defective. Suppose one million bulbs a year are
produced in its new plant in Staten Island. The company
might sample, say, 500 bulbs to estimate the proportion of
defectives.
• N = 1,000,000 and n = 500
• If 5 out of 500 bulbs tested are defective, the sample proportion
of defectives will be 1% (5/500). This statistic may be used to
estimate the true proportion of defective bulbs (the population
proportion).
• In this case, the sample proportion is used to make inferences
about the population proportion.
Introduction 7

Key Terms
• Descriptive Statistics. Those statistics that summarize a sample of
numerical data in terms of averages and other measures for the
purpose of description, such as the mean and standard deviation.

• Descriptive statistics, as opposed to inferential statistics, are not concerned


with the theory and methodology for drawing inferences that extend beyond
the particular set of data examined, in other words from the sample to the
entire population. All that we care about are the summary measurements
such as the average (mean).

• Thus, a teacher who gives a class, of say, 35 students, an exam is interested


in the descriptive statistics to assess the performance of the class. What was
the class average, the median grade, the standard deviation, etc.? The
teacher is not interested in making any inferences to some larger population.

• This includes the presentation of data in the form of graphs, charts, and
tables.
Introduction 8

Source of Data: Primary V.S secondary data


• Primary data. This is data that has been compiled by the
researcher using such techniques as surveys,
experiments, depth interviews, observation, focus groups.

• Types of surveys. A lot of data is obtained using


surveys. Each survey type has advantages and
disadvantages.
• Mail: lowest rate of response; usually the lowest cost
• Personally administered: can “probe”; most costly;
interviewer effects (the interviewer might influence the
response)
• Telephone: fastest
• Web: fast and inexpensive
Introduction 9

Primary vs. Secondary Data


• Secondary data. This is data that has been
compiled or published elsewhere, e.g., census
data.
• The trick is to find data that is useful. The data was
probably collected for some purpose other than helping to
solve the researcher’s problem at hand.

• Advantages: It can be gathered quickly and inexpensively. It


enables researchers to build on past research.

• Problems: Data may be outdated. Variation in definition of


terms. Different units of measurement. May not be accurate
(e.g., census undercount).
Introduction 10

Primary vs. Secondary Data


• Typical Objectives for secondary data research designs:

• Fact Finding. Examples: amount spend by industry and competition on


advertising; market share; number of computers with modems in U.S.,
Japan, etc.

• Model Building. To specify relationships between two or more variables,


often using descriptive or predictive equations. Example: measuring
market potential as per capita income plus the number cars bought in
various countries.

• Longitudinal vs. static studies.


Introduction 11

Errors in Survey Design


• Response Errors. Data errors that arise from issues with survey
responses.
• subject lies – question may be too personal or subject tries to give the socially
acceptable response (example: “Have you ever used an illegal drug? “Have you
even driven a car while intoxicated?”)

• subject makes a mistake – subject may not remember the answer (e.g., “How
much money do you have invested in the stock market?”

• interviewer makes a mistake – in recording or understanding subject’s response

• interviewer cheating – interviewer wants to speed things up so s/he makes up


some answers and pretends the respondent said them.

• interviewer effects – vocal intonation, age, sex, race, clothing, mannerisms of


interviewer may influence response. An elderly woman dressed very conservatively
asking young people about usage of illegal drugs may get different responses than
young interviewer wearing jeans with tattoos on her body and a nose ring.
Introduction 12

Survey Errors
• Nonresponse error. If the rate of response is low, the sample may not be
representative. The people who respond may be different from the rest of the
population. Usually, respondents are more educated and more interested in
the topic of the survey. Thus, it is important to achieve a reasonably high rate
of response. (How to do this? Use follow- ups.)

• Which Sample is better?


Sample 1 Sample 2
Sample size n = 2,000 n = 1,000,000
Rate of Response 90% 20%

• Answer: A small but representative sample can be useful in making inferences. But, a large and
probably unrepresentative sample is useless. No way to correct for it. Thus, sample 1 is better
than sample 2.
Introduction 13

Sampling Techniques
• Nonprobability Samples – based on convenience or
judgment
• Convenience (or chunk) sample - students in a class, mall intercept

• Judgment sample - based on the researcher’s judgment as to what

constitutes “representativeness” e.g., he/she might say these 20 stores are


representative of the whole chain.
• Quota sample - interviewers are given quotas based on demographics for

instance, they may each be told to interview 100 subjects – 50 males and
50 females. Of the 50, say, 10 nonwhite and 40 white.
• The problem with a nonprobability sample is that we do not know how
representative our sample is of the population.
Introduction 14

Probability Samples
• Probability Sample. A sample collected in such
a way that every element in the population has a
known chance of being selected.

• One type of probability sample is a Simple


Random Sample. This is a sample collected in
such a way that every element in the population
has an equal chance of being selected.

• How do we collect a simple random sample?


• Use a table of random numbers or a random number generator.
Introduction 15

Probability Samples
• Other kinds of probability samples (beyond the
scope of this course).
• systematic random sample.
• Choose the first element randomly, then every kth
observation, where k = N/n
• stratified random sample.
• The population is sub-divided based on a characteristic
and a simple random sample is conducted within each
stratum
• cluster sample
• First take a random sample of clusters from the population
of cluster. Then, a simple random sample within each
cluster. Example, election district, orchard.
Introduction 16

Types of Data
• Qualitative data result in categorical responses. Also called
Nominal, or categorical data
• Example: Sex MALE
 FEMALE

• Quantitative data result in numerical responses, and may be


discrete or continuous.
• Discrete data arise from a counting process.
• Example: How many courses have you taken at this College? ____
• Continuous data arise from a measuring process.
• Example: How much do you weigh? ____

• One way to determine whether data is continuous, is to ask yourself


whether you can add several decimal places to the answer.
• For example, you may weigh 150 pounds but in actuality may weigh
150.23568924567 pounds. On the other hand, if you have 2 children, you do
not have 2.3217638 children.
Introduction 17

Levels of Data: Nominal Data


• Nominal data is the same as Qualitative. It is a classification and
consists of categories. When objects are measured on a nominal
scale, all we can say is that one is different from the other.
• Examples: sex, occupation, ethnicity, marital status, etc.
• [Question: What is the average SEX in this room? What is the average
RELIGION?]
• Appropriate statistics: mode, frequency

• We cannot use an average. It would be meaningless here.


• Example: Asking about the “average sex” in this class makes no sense (!).
• Say we have 20 males and 30 females. The mode – the data value that
occurs most frequently - is ‘female’. Frequencies: 60% are female.
• Say we code the data, 1 for male and 2 for female: (20 x 1 + 30 x 2) / 50 =
1.6
• Is the average sex = 1.6? What are the units? 1.6 what? What does 1.6
mean?
Introduction 18

Levels of Data: Ordinal Data


• Ordinal data arises from ranking, but the intervals between the points are not equal

• We can say that one object has more or less of the characteristic than another object when

we rate them on an ordinal scale. Thus, a category 5 hurricane is worse than a category 4
hurricane which is worse than a category 3 hurricane, etc. Examples: social class, hardness
of minerals scale, income as categories, class standing, rankings of football teams, military
rank (general, colonel, major, lieutenant, sergeant, etc.),
• Example: Income (choose one)
_Under 5,000birr – checked by, say, Yaikob
_50,000 – 9,999 – checked by, say, Amsal
_10,000 and over – checked by, say, Alamodin
In this example, Alamodin checks the third category even though he earns several billion birrs. The distance between
Alamodin and Amsal is not the same as the distance between Amsal and Yaikob.

• Appropriate statistics: – same as those for nominal data, plus the median; but not the mean.
Introduction 19

Levels of Data: Ordinal Data


• Ranking scales are obviously ordinal. There is nothing
absolute here.
• Just because someone chooses a “top” choice does not
mean it is really a top choice.

• Example:

Please rank from 1 to 4 each of the following:


___being hit in the face with a dead rat
___being buried up to your neck in cow manure
___failing this course
___having nothing to eat except for chopped liver for a month
Introduction 20

Levels of Data: Interval Data


• Equal intervals, but no “true” zero. Examples: IQ, temperature,
GPA.

• Since there is no true zero – the complete absence of the


characteristic you are measuring – you cannot speak about ratios.

• Example: Suppose Hawassa temperature is 30 degrees and A.A


temperature is 15 degrees. Does that mean it is twice as cold in
A.A as in Hawassa? No.

• Appropriate statistics
• same as for nominal
• same as for ordinal plus,
• the mean
Introduction 21

Levels of Data: Ratio Data


• Ratio data has both equal intervals and a “true” zero.
• Examples: height, weight, length, units sold

• All scales, whether they measure weight in kilograms or


pounds, start at 0. The 0 means something and is not
arbitrary.

• 100 lbs. is double 50 lbs. (same for kilograms)

• $100 is half as much as $200


Introduction 22

What Type of Data to Collect?


• The goal of the researcher is to use the highest level of
measurement possible.

• Example: Two ways of asking about Smoking behavior. Which is better,


A or B?
(A) Do you smoke? Yes
 No

(B) How many cigarettes did you smoke in the last 3 days (72 hours)? __

(A) is nominal, so the best we can get from this data are frequencies. (B) is
ratio, so we can compute: mean, median, mode, frequencies.
Introduction 23

What Type of Data to Collect?


Example Two: Comparing Soft Drinks. Which is better, A or B?
(A) Please rank the taste of the following soft drinks from 1 to 5 (1=best, 2= next best, etc.)
__Coke __Pepsi __7Up __Sprite __Dr. Pepper
(B)Please
(B) Pleaserate
rateeach
eachofofthe
thefollowing
following brands
brands of soft
of soft drink:
drink:
Coke: (1) excellent (2) very good (3) good (4) fair (5) poor (6) very poor (7) awful
Pepsi: (1) excellent (2) very good (3) good (4) fair (5) poor (6) very poor (7) awful
7Up: (1) excellent (2) very good (3) good (4) fair (5) poor (6) very poor (7) awful
Sprite: (1) excellent (2) very good (3) good (4) fair (5) poor (6) very poor (7) awful
Dr Pepper: (1) excellent (2) very good (3) good (4) fair (5) poor (6) very poor (7) awful
Scale (B) is almost interval and is usually treated so – means are computed. We call this a rating
scale. By the way, if you hate all five soft drinks, we can determine this by your responses. With
scale (A), we have no way of knowing whether you hate all five soft drinks.

• Rating Scales – what level of measurement? Probably better than ordinal,


but not necessarily exactly interval. Certainly not ratio.
• Are the intervals between, say, “excellent” and “good” equal to the interval
between “poor” and “very poor”? Probably not. Researchers typically
assume that a rating scale is interval.
DESCRIPTIVE
STATISTICS
Descriptive Statistics I 25

Descriptive Statistics
• In this lecture we discuss using descriptive statistics, as
opposed to inferential statistics.
• Here we are interested only in summarizing the data in
front of us, without assuming that it represents anything
more.
• We will look at both numerical and categorical data.
• We will look at both quantitative and graphical techniques.
• The basic overall idea is to turn data into information.
Methods of data presentation
Numerical presentation

Graphical presentation

Mathematical presentation
1- Numerical presentation
Tabular presentation (simple – complex)

Simple frequency distribution Table (S.F.D.


Title
Name of
variable
Frequency %
(Units of
variable)
-
- Categories
-

Total
Table (I): Distribution of 50 patients at the
surgical department of Alexandria
hospital in May 2008 according to their
Blood ABOFrequenc
blood groups %
group y
A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
Table (II): Distribution of 50 patients at
the surgical department of HU Specialized
hospital in May 2008 according to their
age
Age Frequenc %
(years) y
20-<30 12 24
30- 18 36
40- 5 10
50+ 15 30
Total 50 100
Complex frequency distribution Table

Table (III): Distribution of 20 lung cancer patients at the


chest department of Alexandria hospital and 40 controls
in May 2008 according to smoking

Lung cancer
Total
Smokin Cases Control
g No. % No. % No. %

Smoke 38.3
15 75% 8 20% 23
r 3
Non 61.6
smoker 5 25% 32 80% 37
7
Complex frequency distribution Table

Table (IV): Distribution of 60 patients at the chest


department of Alexandria hospital in May 2008
according to smoking & lung cancer
Lung cancer
Total
Smokin positive negative
g No. % No. % No. %

Smoke
15 65.2 8 34.8 23 100
r
Non
smoker 5 13.5 32 86.5 37 100
2- Graphical presentation
 Graphs drawn using Cartesian
coordinates

• Line graph
• Frequency polygon
• Frequency curve
• Histogram
• Bar graph
• Scatter plot

 Pie chart
rules
 Statistical maps
Line Graph

MMR/1000
MM
60 Year R
50 196 50
40 0
30
20 197 45
10 0
0
Year 198 26
1960 1970 1980 1990 2000
0
199 15
0
Figure (1): Maternal mortality200 12
rate of (country), 1960-2000 0
Frequency polygon
Age Sex Mid-point of
(years) Males Femal interval
es
20 - 3 (12%) 2 (20+30) / 2 = 25
(10%
)
30 - 9 (36%) 6 (30+40) / 2 = 35
(30%
)
40- 7 (8%) 5 (40+50) / 2 = 45
(25%
)
50 - 4 (16%) 3 (50+60) / 2 = 55
Frequency polygon Sex
Age M-P
M F
Males Females
% (12% (10%
20- 25
40 ) )
35 (36% (30%
30- 35
30 ) )
25 (25%
40- (8%) 45
20
)

15 (16% (15%
50- 55
) )
10
60- (20%
5 (8%) 65
70 )
0
Age
25 35 45 55 65

Figure (2): Distribution of 45 patients


at (place) , in (time) by age and sex
Frequency
curve

9
8 Female

7 Male

6
Frequency

5
4
3
2

1
0
20- 30- 40- 50- 60-69
Age in years
Histogram
Distribution of a group of cholera patients by age

% 35 Age (years) Frequency %


25- 3 14.3
30 30- 5 23.8
40- 7 33.3
25 45- 4 19.0
60-65 2 9.5
20 Total 21 100

15
10
5
0

Age (years)
Figure (2): Distribution of 100 cholera
patients at (place) , in (time) by age
Bar chart
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
Marital status
Bar chart
%
50
Male
40 Female

30
20
10
0
Single Married Divorced Widowed
Marital status
Pie chart

Deletion
Inversion
3%
18%

Translocation
79%
Descriptive Statistics I 41

Numerical Data – Single Variable


• Numerical data may be summarized according to several
characteristics. We will work with each in turn and then all of them
together.
• Measures of Location
• Measures of central tendency: Mean; Median; Mode
• Measures of non central tendency - Quantiles
• Quartiles; Quintiles; Percentiles
• Measures of Dispersion
• Range
• Interquartile range
• Variance
• Standard Deviation
• Coefficient of Variation
• Measures of Shape
• Skewness and kurtosis
• 5-number summary; Box-and-whisker; Stem-and-leaf
• Standardizing Data
Descriptive Statistics I 42

Measures of Location
• Measures of location place the data set on the scale of
real numbers.

• Measures of central tendency (i.e., central location) help


find the approximate center of the dataset.
• These include the mean, the median, and the mode.
Descriptive Statistics I 43

The Mean
• The sample mean is the sum of all the observations (∑X i) divided by
the number of observations (n):
ΣXi = X1 + X2 + X3 + X4 + … + Xn

• Example. 1, 2, 2, 4, 5, 10. Calculate the mean. Note: n = 6 (six


observations)

∑Xi = 1 + 2+ 2+ 4 + 5 + 10 = 24
= 24 / 6 = 4.0
Descriptive Statistics I 44

The Mean
Example.
For the data: 1, 1, 1, 1, 51. Calculate the mean. Note: n
= 5 (five observations)

∑Xi = 1 + 1+ 1+ 1+ 51 = 55
= 55 / 5 = 11.0

• Here we see that the mean is affected by extreme values.


Descriptive Statistics I 45

The Median
• The median is the middle value of the ordered data
• To get the median, we must first rearrange the data into
an ordered array (in ascending or descending order).
Generally, we order the data from the lowest value to
the highest value.
• Therefore, the median is the data value such that half of
the observations are larger and half are smaller. It is
also the 50th percentile (we will be learning about
percentiles in a bit).
• If n is odd, the median is the middle observation of the
ordered array. If n is even, it is midway between the two
central observations.
Descriptive Statistics I 46

The Median
0 2 3 5 20 99 100
Example:

Note: Data has been ordered from lowest to highest. Since n is odd
(n=7), the median is the (n+1)/2 ordered observation, or the 4th
observation.
Answer: The median is 5.

The mean and the median are unique for a given set of data.
There will be exactly one mean and one median.

Unlike the mean, the median is not affected by extreme values.


Q: What happens to the median if we change the 100 to 5,000? Not a
thing, the median will still be 5. Five is still the middle value of the data
set.
Descriptive Statistics I 47

The Median
10 20 30 40 50 60
Example:

Note: Data has been ordered from lowest to highest. Since n is


even (n=6), the median is the (n+1)/2 ordered observation, or the
3.5th observation, i.e., the average of observation 3 and observation
4.

Answer: The median is 35.


Descriptive Statistics I 48

The Median
• The median has 3 interesting characteristics:

• 1. The median is not affected by extreme values, only


by the number of observations.
• 2. Any observation selected at random is just as likely to
be greater than the median as less than the median.
• 3. Summation of the absolute value of the differences
about the median is a minimum:
= minimum
Descriptive Statistics I 49

The Mode
• The mode is the value of the data that occurs with
the greatest frequency.

Example. 1, 1, 1, 2, 3, 4, 5
Answer. The mode is 1 since it occurs three times. The other
values each appear only once in the data set.

Example. 5, 5, 5, 6, 8, 10, 10, 10.


Answer. The mode is: 5, 10.
There are two modes. This is a bi-modal dataset.
Descriptive Statistics I 50

The Mode
• The mode is different from the mean and the median in
that those measures always exist and are always
unique. For any numeric data set there will be one
mean and one median.
• The mode may not exist.
• Data: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0
• Here you have 10 observations and they are all different.

• The mode may not be unique.


• Data: 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7
• Mode = 1, 2, 3, 4, 5, and 6. There are six modes.
Descriptive Statistics I 51

Quantiles
• Measures of non-central location used to
summarize a set of data

• Examples of commonly used quantiles:


• Quartiles
• Quintiles
• Deciles
• Percentiles
Descriptive Statistics I 52

Quartiles
• Quartiles split a set of ordered data into four parts.
• Imagine cutting a chocolate bar into four equal pieces… How many cuts
would you make? (yes, 3!)
• Q1 is the First Quartile
• 25% of the observations are smaller than Q1 and 75% of the observations
are larger
• Q2 is the Second Quartile
• 50% of the observations are smaller than Q2 and 50% of the observations
are larger. Same as the Median. It is also the 50th percentile.
• Q3 is the Third Quartile
• 75% of the observations are smaller than Q3and 25% of the observations
are larger
• Some books use a formula to determine the quartiles. We prefer a quick-and-
dirty approximation method outlined in the next slide.
Descriptive Statistics I 53

Quartiles
• A quartile, like the median, either takes the value of one of the observations, or the
value halfway between two observations.
• The simple method we like to use is just to first split the data set into two equal parts
to get the median (Q2) and then get the median of each resulting subset.

• The method we are using is an approximation. If you solve this in MS Excel, which relies on
a formula, you may get an answer that is slightly different.
Descriptive Statistics I 54

Exercise
Computer Sales (n = 12 salespeople)
Original Data: 3, 10, 2, 5, 9, 8, 7, 12, 10, 0, 4, 6
Compute the mean, median, mode, quartiles.

First order the data:


0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 12
∑Xi = 76
= 76 / 12 = 6.33 computers sold
Median = 6.5 computers
Mode = 10 computers
Q1 = 3.5 computers, Q3 = 9.5 computers
Descriptive Statistics I 55

Other Quantiles
• Similar to what we just learned about quartiles, where 3
quartiles split the data into 4 equal parts,
• There are 9 deciles dividing the distribution into 10 equal
portions (tenths).
• There are four quintiles dividing the population into 5 equal
portions.
• … and 99 percentiles (next slide)
• In all these cases, the convention is the same. The point,
be it a quartile, decile, or percentile, takes the value of
one of the observations or it has a value halfway
between two adjacent observations. It is never necessary
to split the difference between two observations more
finely.
Descriptive Statistics I 56

Percentiles
• We use 99 percentiles to divide a data set into 100 equal
portions.

• Percentiles are used in analyzing the results of


standardized exams. For instance, a score of 40 on a
standardized test might seem like a terrible grade, but if it is
the 99th percentile, don’t worry about telling your parents. ☺

• Which percentile is Q1? Q2 (the median)? Q3?

• We will always use computer software to obtain the


percentiles.
Descriptive Statistics I 57

Some Exercises
Data (n=16):
1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 10
Compute the mean, median, mode, quartiles.

Answer.
1 1 2 2 ┋ 2 2 3 3 ┋ 4 4 5 5 ┋ 6 7 8 10
Mean = 65/16 = 4.06
Median = 3.5
Mode = 2
Q1 = 2
Q2 = Median = 3.5
Q3 = 5.5
Descriptive Statistics I 58

Exercise: # absences
Data – number of absences (n=13) :
0, 5, 3, 2, 1, 2, 4, 3, 1, 0, 0, 6, 12
Compute the mean, median, mode, quartiles.

Answer. First order the data:


0, 0, 0,┋ 1, 1, 2, 2, 3, 3, 4,┋ 5, 6, 12
Mean = 39/13 = 3.0 absences
Median = 2 absences
Mode = 0 absences
Q1 = .5 absences
Q3 = 4.5 absences
Descriptive Statistics I 59

Exercise: Reading level


Data: Reading Levels of 16 eighth graders.
5, 6, 6, 6, 5, 8, 7, 7, 7, 8, 10, 9, 9, 9, 9, 9

Answer. First, order the data:


5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
Sum=120.
Mean= 120/16 = 7.5 This is the average reading level of
the 16 students.
Median = Q2 = 7.5
Q1 = 6, Q3 = 9
Mode = 9
Descriptive Statistics I 60

Exercise: Reading level


5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
Alternate method: The data can also be set up as a
frequency distribution.
Reading Level - Xi Frequency - fi
5 2
6 3
Measures can be 7 3
computed using methods 8 2
9 5
for grouped data. 10 1
16

Note that the sum of the frequencies, ∑fi = n = 16


Descriptive Statistics I 61

Exercise: Reading level


Xi fi (Xi)(fi)
5 2 10
6 3 18
7 3 21
8 2 16
9 5 45
10 1 10
16 120

= ∑Xifi/n = 120 / 16 = 7.5


We see that the column total- ∑Xifi -is the sum of
the ungrouped data.
Descriptive Statistics I 62

Measures of Dispersion
• Dispersion is the amount of spread, or variability, in
a set of data.

• Why do we need to look at measures of dispersion?

• Consider this example:


A company is about to buy computer chips that must have
an average life of 10 years. The company has a choice of
two suppliers. Whose chips should they buy? They take a
sample of 10 chips from each of the suppliers and test
them. See the data on the next slide.
Descriptive Statistics I 63

Measures of Dispersion
We see that supplier B’s chips have a longer average life.
Supplier A chips Supplier B chips
(life in years) (life in years)
11 170
However, what if the company offers 11 1
a 3-year warranty? 10 1
10 160
11 2
Then, computers manufactured 11 150

using the chips from supplier A 11 150


11 170
will have no returns 10 2

while using supplier B will result in 12 140

4/10 or 40% returns. = 10.8 years = 94.6 years


A

MedianA = 11 years MedianB = 145 years


sA = 0.63 years sB = 80.6 years
RangeA = 2 years RangeB = 169 years
Descriptive Statistics I 64

Measures of Dispersion
• We will study these five measures of dispersion
• Range
• Interquartile Range
• Standard Deviation
• Variance
• Coefficient of Variation
Descriptive Statistics I 65

The Range
• Range = Largest Value – Smallest Value

Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30


Answer: Range = 30 – 1 = 29.

• The range is simple to use and to explain to


others.
• One problem with the range is that it is influenced
by extreme values at either end.
Descriptive Statistics I 66

Inter-Quartile Range (IQR)


• IQR = Q3 – Q1

• Example (n = 15):
0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98
Q1 = 3, Q3 = 22
IQR = 22 – 3 = 19 (Range = 98)

• This is basically the range of the central 50% of the


observations in the distribution.

• Problem: The interquartile range does not take into account


the variability of the total data (only the central 50%). We are
“throwing out” half of the data.
Descriptive Statistics I 67

Standard Deviation
• The standard deviation, s, measures a kind of
“average” deviation about the mean. It is not really the
“average” deviation, even though we may think of it
that way.

• Why can’t we simply compute the average deviation


about the mean, if that’s what we want?

• If you take a simple mean, and then add up the


deviations about the mean, as above, this sum will be
equal to 0. Therefore, a measure of “average
deviation” will not work.
Descriptive Statistics I 68

Standard Deviation
• Instead, we use:

• This is the “definitional formula” for standard deviation.


• The standard deviation has lots of nice properties,
including:
• By squaring the deviation, we eliminate the problem of the
deviations summing to zero.
• In addition, this sum is a minimum. No other value subtracted from
X and squared will result in a smaller sum of the deviation squared.
This is called the “least squares property.”
• Note we divide by (n-1), not n. This will be referred to as a
loss of one degree of freedom.
Descriptive Statistics I 69

Standard Deviation
Example. Two data sets, X and Y. Which of the
two data sets has greater variability? Calculate
the standard deviation for each.

We note that both sets of data have the same


mean: X i Yi

=3 1 0
2 0
=3 3 0
4 5
(continued…) 5 10
Descriptive Statistics I 70

Standard Deviation
X (X-) (X-)2
1 3 -2 4
2 3 -1 1
SX == 1.58 3 3 0 0
4 3 1 1
5 3 2 4
∑=0 10

Y (Y-) (Y- )2
0 3 -3 9
0 3 -3 9
SY == = 4.47 0
5
3
3
-3
2
9
4
10 3 7 49
∑=0 80

[Check these results with your calculator.]


Descriptive Statistics I 71

Standard Deviation: N vs. (n-1)


Note that and

• You divide by N only when you have taken a census and therefore know
the population mean. This is rarely the case.

• Normally, we work with a sample and calculate sample measures, like


the sample mean and the sample standard deviation:

• The reason we divide by n-1 instead of n is to assure that s is an


unbiased estimator of σ.
• We have taken a shortcut: in the second formula we are using the sample
mean, , a statistic, in lieu of μ, a population parameter. Without a correction, this
formula would have a tendency to understate the true standard deviation. We
divide by n-1, which increases s. This makes it an unbiased estimator of σ.
• We will refer to this as “losing one degree of freedom” (to be explained more
fully later on in the course).
Descriptive Statistics I 72

Variance
The variance, s2, is the standard deviation (s)
squared. Conversely, .

Definitional formula:
Computational formula:

This is what computer software


(e.g., MS Excel or your calculator key) uses.
Descriptive Statistics I 73

Coefficient of Variation (CV)


• The problem with s2 and s is that they are both, like the mean, in the
“original” units.
• This makes it difficult to compare the variability of two data sets that
are in different units or where the magnitude of the numbers is very
different in the two sets. For example,
• Suppose you wish to compare two stocks and one is in dollars and the other is in
yen; if you want to know which one is more volatile, you should use the coefficient of
variation.
• It is also not appropriate to compare two stocks of vastly different prices even if both
are in the same units.
• The standard deviation for a stock that sells for around $300 is going to be very
different from one with a price of around $0.25.
• The coefficient of variation will be a better measure of dispersion in
these cases than the standard deviation (see example on the next
slide).
Descriptive Statistics I 74

Coefficient of Variation (CV)

CV is in terms of a percent. What we are in


effect calculating is what percent of the sample
mean is the standard deviation. If CV is 100%,
this indicates that the sample mean is equal to
the sample standard deviation. This would
demonstrate that there is a great deal of
variability in the data set. 200% would obviously
be even worse.
Descriptive Statistics I 75

Example: Stock Prices


Which stock is more volatile?
Closing prices over the last 8 months: Stock A Stock B
JAN $1.00 $180
FEB 1.50 175
$1.62 MAR 1.90 182
CVA = x 100% = 95.3% APR .60 186
$1.70 MAY
JUN
3.00
.40
188
190
JUL 5.00 200
AUG .20 210

Mean $1.70 $188.88


$11 .33 s2 2.61 128.41
CVB = $188.88 x 100% = 6.0% s $1.62 $11.33

Answer: The standard deviation of B is higher than for A, but A


is more volatile:
Descriptive Statistics I 76

Exercise: Test Scores


Data (n=10): 0, 0, 40, 50, 50, 60, 70, 90, 100, 100
Compute the mean, median, mode, quartiles (Q1, Q2, Q3), range,
interquartile range, variance, standard deviation, and coefficient of
variation. We shall refer to all these as the descriptive (or summary)
statistics for a set of data.

Answer. First order the data:


0, 0, 40, 50, 50 ┋ 60, 70, 90, 100, 100
• Mean: ∑Xi = 560 and n = 10, so = 560/10 = 56.
Median = Q2 = 55
• Q1 = 40 ; Q3 = 90 (Note: Excel gives these as Q1 = 42.5, Q3 = 85.)
• Mode = 0, 50, 100
Range = 100 – 0 = 100
• IQR = 90 – 40 = 50
• s2 = 11,840/9 = 1315.5
• s = √1315.5 = 36.27
• CV = (36.27/56) x 100% = 64.8%
INTRODUCTION TO
PROBABILITY
Probability 78

Key Terms
• Probability. The word probability is actually undefined,
but the probability of an event can be explained as the
proportion of times, under identical circumstances, that
the event can be expected to occur.
• It is the event's long-run frequency of occurrence.
• For example, the probability of getting a head on a coin toss = .5. If you
tossing a coin repeatedly, for a long time, you will note that a head occurs
about one half of the time.

• Probability vs. Statistics.


• In probability, the population is known.
• In statistics you draw inferences about the population from the sample.
Probability 79

Key Terms
• Objective probabilities are long-run frequencies of occurrence, as
above. Probability, in its classical (or, objective) meaning refers to a
repetitive process, one which generates outcomes which are not
identical and not individually predictable with certainty but which may
be described in terms of relative frequencies. These processes are
called stochastic processes (or, chance processes). The individual
results of these processes are called events.

• Subjective probabilities measure the strength of personal beliefs.


For example, the probability that a new product will succeed.
Probability 80

Key Terms
• Random variable. That which is observed as the result of a stochastic
process. A random variable takes on (usually numerical) values. Associated
with each value is a probability that the value will occur.
For example, when you toss a die:
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6

• Simple probability. P(A). The probability that an event (say, A) will occur.

• Joint probability. P(A and B). P(A ∩ B). The probability of events A and B
occurring together.

• Conditional probability. P(A|B), read "the probability of A given B." The


probability that event A will occur given event B has occurred.
Probability 81

Probability: Some Basic Rules


1. The probability of an event (say, event A) cannot be less than zero
or greater than one. The probability that you will pass this course
cannot be 150%.
So, 0 < P(A) < 1

2. The sum of the probabilities of all possible outcomes (events) of a


process (or, experiment) must equal one.
So, ∑Pi = 1 or P(A) + P(A') = 1 [A' means not A.]
Probability 82

Probability: Some Basic Rules


3. Rules of addition.
a. P(A or B) = P(A U B) = P(A) + P(B) if events A and B are mutually
exclusive. Two events are mutually exclusive if they cannot occur
together. E.g., male or female; heads or tails.

b. In general, P(A or B) = P(A) + P(B) - P(A and B)


This is the general formula for addition of probabilities, for any two
events A and B. P(A and B) is the joint probability of A and B occurring
together and is equal to zero if they are mutually exclusive (i.e., if they
cannot occur together).

Thus, the probability of getting a 1 or 2 when tossing a die: P (1 or 2) = P


(1) + P (2) = 1/6 + 1/6 = 1/3; The probability of getting a 1 and 2 when
tossing a die is 0, since they are mutually exclusive. P (1 and 2) = 0.
Probability 83

Probability: Some Basic Rules


• Mutually exclusive events, in Venn Diagrams:

Events A,B are A B


mutually exclusive.

Events A, B are not P(A∩B)


A B
mutually exclusive.

P(A∩B) is the intersection of A and B.


Probability 84

Probability: Some Basic Rules


• Also, we can see from the Venn diagrams that:
• P(A′) = P(not A) = 1 – P(A)
• P(A′ or B′) = 1 – P(A and B)

• Example: Suppose you go to a college where you are


allowed to double major. 10% of students major in
Sociology (A); 15% major in Anthropology (B); 3% are
double majors (both A and B).
• What is the probability of being an Sociology or Anthropology major?
• [Hint: The events are not mutually exclusive.]
• Answer: P(A ∪ B) = P(A or B) = .10 + .15 - .03 = .22
Probability 85

Probability: Some Basic Rules


4. Rules of multiplication are used for determining joint probabilities,
the probability that events A and B will occur together.

a. P(A and B) = P(A ∩ B) = P(A)⋅P(B) if events A and B are


independent. Events A and B are independent if knowledge of the
occurrence of B has no effect on the probability that A will occur. For
example,
P(Blue eyes | Male) = P(Blue eyes) because eye color and sex are independent.
P(over 6 feet tall | Female) ≠ P(over 6 feet tall)
P(getting into car accident | under 26) ≠ P(getting into car accident)
P (getting head on second toss of coin/got head on first toss of coin) = ?
Answer = .50 The two coin tosses are independent. What happened during the first coin toss
has no effect on what will happen during the second coin toss.
Probability 86

Probability: Some Basic Rules


b. In general, P(A and B) = P(A|B) P(B) = P(B|A) P(A)
This is the general formula for multiplying probabilities for any two events A
and B, not necessarily independent.
P(A|B) is a conditional probability.
If events A and B are independent, then P(A|B) = P(A), and
P(A and B) reduces to P(A)P(B), as in 4.a. above.

Independent events: Events A and B are independent if:


P(A|B) = P(A) or if P(A and B) = P(A)P(B)

5. Conditional probability. From the formula in 4.b., we see that we can


compute the conditional probability as
P(A|B) = P(A and B) / P(B)
Probability 87

Probability: Some Basic Rules


6. Bayes' Theorem [OPTIONAL TOPIC]

Since P(A|B) = P(A and B) / P(B)


and P(A and B) = P(B|A) P(A)
Therefore P(A|B) = P(B|A) P(A) / P(B)

[We generally need to compute P(B)].


Probability 88

Example: Readership
• In a small village in Ethiopia, we are looking at readership of the Ethiopian Herald (T)
and the Admas news papers (W):

• P(T) = .25
P(W) = .20
P(T and W) = .05

• Question: What is the probability of being either a Ethiopian Herald or a Admas news
paper?

• P (T or W) = P(T) + P(W) – P (T and W) = .25 + .20 - .05 = .40


Probability 89

Example: Readership

Another way to solve this problem, by Venn Diagram:

5 15
20 60

T W
N

So, out of 100 people in total:


25 people read NYT (T);
20 people read WSJ (W);
5 people read both;
60 people read neither.

Thus, it can be easily seen that 40 people (out of 100) read either NYT or
WSJ.
Probability 90

Example: Readership
• Other Probabilities: P(T′ and W′) = .60
• P(T′ and W) = .15
• P(T and W′) = .20
• P(T or W) = 1 – P(T′ and W′) = 1– .60 = .40

• P (T′ or W′) = P(T′) + P(W′) – P(T′ and W′)


=.75+ .80 - .60
= .95 (95% of people in this town do not read one of the two papers; In fact, only 5%
read both).

Note that:
P(T′ or W′) = 1- P(T and W) = 1 - .05 = .95
Probability 91

Example: Readership
• Another way to do solve this problem is to construct a
table of joint probabilities:
T T′

W .05 .15 .20

W′ .20 .60 .80

.25 .75 1.00


Probability 92

Independence vs. Mutually Exclusive Events


Important: Do not confuse mutually exclusive and independent.

Mutually exclusive means that two things cannot occur at the same time [P (A
and B) = 0]. You cannot get a head and tail at the same time; you cannot be
dead and alive at the same time; you cannot pass and fail a course at the
same time; etc.

Independence has to do with the effect of, say B, on A. If knowing about B has
no effect on A, then they are independent. It is very much like saying that A
and B are unrelated.
Are waist size and gender independent of each other? Suppose I know that someone who is an
adult has a 24-inch waist, does that give me a hint as to whether that person is male or female?
How many adult men have a 24-inch waist? How many women? Is P (24-inch waist/adult male)
= P (24 inch waist/adult female). We suspect that the two probabilities are not the same, that
there is a relationship between gender and waist size (also hand size and height for that matter).
Thus, they are not independent.
Probability 93

Independence vs. Mutually Exclusive Events


• If two events are mutually exclusive, they cannot occur together; we only
examine events for independence (or, conversely, to see if they are related) if
they can occur together.
• Researchers are always testing for relationships. Sometimes we want to
know whether two variables are related.
• Is there a relationship between cigarette smoking and cancer or are they independent? We
know the answer to that one.
• Is there a relationship between your occupation and how long you will live (longevity) or are
they independent? Studies show that they are not independent. Librarians and professors have
relatively long life spans; coal miners have the shortest life spans. Drug dealers (is that an
occupation?) also have very short life spans.
• For women, is there a relationship between salary and how slender you are? Or, are salary and
weight independent?
• For men, is there a relationship between how many dates you will get on a website such as
eHarmony and your occupation? For women, how about number of dates and hair color?
Probability 94

Example: Smoking and Cancer


S S′
Data in a (smoker) (non-smoker)
Contingency Table: C (cancer) 100 50 150

C′ (no cancer) 300 550 850

400 600 1000

Are smoking and cancer independent?


Joint Probabilities Marginal Probabilities

P(C and S) = .10 P(C) = .15 (150/1000)

P(C` and S) = .30 P(C`) = .85 (850/1000)

P(C and S`) = .05 P(S) = .40 (400/1000)

P(C` and S`)= .55 P(S`) = .60 (600/1000)


Probability 95

Example: Smoking and Cancer


Are smoking and cancer independent?
To answer, check: Are the following probabilities equal or not?
P(C) ≟ P(C|S) ≟ P(C|S′)

P(C|S) = P(C and S)/P(S) = .10/.40 = .25


P(C|S′) = P(C and S′)/P(S′) = .05/.60 = .083
P(C) = .15

Thus, cancer and smoking are not independent. There is a relationship between cancer
and smoking.

Note that P(C) is a weighted average of P(C|S) and P(C|S′)


i.e., P(C) = P(S)P(C|S) + P(S′)P(C|S′) = .40(.25) + .60 (.0833) = .15
Probability 96

Example: Smoking and Cancer


Are smoking and cancer independent? Alternate method of solving this problem:

If C and S are independent, then


P(C and S) = P(C) P(S)
.10 ?= (.15)(.40)
.10 ≠ .06

Since .10 is not equal to .06, we conclude that cancer and smoking are not independent.

These calculations are much easier if you set up a joint probability table. Coming up in
the next two slides.
Probability 97

Example: Smoking and Cancer


We use the frequencies in the contingency table (above) to compute the
probabilities in the joint probability table.
S(smoker) S′(non-smoker)

C (cancer) 100/1000 50/1000 150/1000

C′ (no cancer) 300/1000 550/1000 850/1000

400/1000 600/1000 1000/1000


S S′
(smoker) (non-smoker)

C (cancer) .10 .05 .15

C′ (no cancer) .30 .55 .85

.40 .60 1.00

Notice that the marginal probabilities are the row and column totals. This is not an
accident. The marginal probabilities are totals of the joint probabilities and weighted
averages of the conditional probabilities.
Probability 98

Example: Smoking and Cancer

Once we compute the table of joint probabilities, we can answer any


question about a probability in this problem. The joint probabilities are
in the center of the table; the marginal probabilities are in the margins,
and the conditional probabilities are easily computed by dividing a joint
probability by a marginal probability.
P(C) = .15
P(C and S) = .10
P(C|S) = P(C and S) / P(S) = .10 / .40 = .25
Probability 99

Example: Gender and Beer


M (male) F (female)
Contingency Table Data:
B (beer drinker) 450 350 800

B′ (not a beer drinker) 450 750 1200

900 1100 2000

M (male) F (female)
The Joint Probability Table:
B (beer drinker) .225 .175 .40

B′ (not a beer drinker) .225 .375 .60

.45 .55 1.00

Joint Probabilities Marginal Probabilities


P(B and M) = .225 P(B) = .40
P(B and F) = .175 P(B′) = .60
P(B′ and M) = .225 P(M) = .45
P(B′ and F) = .375 P(F) = .55
Probability 100

Example: Gender and Beer


• Given that an individual is Male, what is the probability that that person is a beer
drinker?

P(B|M) = P(B and M)/P(M) = .225/.45 = .50

• Given that an individual is Female, what is the probability that that person is a Beer
drinker?

P(B|F) = P(B and F)/P(F) = .175/.55 = .318

• Are beer drinking and gender independent?


P(B) = .40, therefore, beer-drinking and sex are not independent.
Probability 101

Example: Gender and Dove Soap


Data collected from a random sample of 1,000 adults:
M (male) F (female)

D (use Dove soap) 80 120 200

D′ (does not use Dove soap) 320 480 800

400 600 1,000

Computed probabilities:
Joint Probability Marginal Totals
P(D and M) = .08 P(D) = .20
P(D and F) = .12 P(D′) = .80
P(D′ and M) = .32 P(M) = .40
P(D′ and F) = .48 P(F) = .60
Probability 102

Example: Gender and Dove Soap


Are the events Gender and Use of Dove soap independent?
P(D)= 200/1000 = .20
P(D|M) = 800/1000 = .08/.40 = .20
P(D|F) = .12/.60 = .20
Yes, these two events are independent.

Alternative method: Is P(M and D) ≟ P(M) P(D) ?


P(M and D) = .08
P(M) = 400/1000 = .40
P(D) = 200/1000 = .20
Is .08 ≟ (.40)(.20)?
Yes, therefore the two events M and D are independent.
PROBABILITY
DISTRIBUTIONS
With Expected Value
Probability Distributions 104

Types of Probability Distributions


• A random variable is a variable whose value is determined by
chance.

• There are two types of random variables:


• A Discrete random variable can take on only specified, distinct values.
• A Continuous random variable can take on any value within an
interval.

• Thus, there are also two types of probability distributions:


• Discrete probability distributions
• Continuous probability distributions
Probability Distributions 105

Discrete Probability Distributions

• A probability distribution for a discrete random variable is


a mutually exclusive listing of all possible numerical
outcomes for that random variable, such that a particular
probability of occurrence is associated with each
outcome.
Probability Distributions 106

Discrete Probability Distributions


… Continued

Example:
Probability Distribution for the Toss of a Die
Xi P(Xi)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6

This is an example of a uniform distribution.


Probability Distributions 107

Discrete Probability Distributions


… Continued
• Discrete Probability Distributions have these three major
properties:
1) ∑ P(X) = 1
2) P(X) ≥ 0
3) When you substitute a value of the random variable (X) into the
function*, your output is the probability that the particular value will occur.

*if the distribution is expressed as a function, e.g., Binomial Distribution.


108

COMMON DISCRETE DISTRIBUTIONS


AND THEIR PROPERTIES
109

Revision on the Definition of Discrete Random


Variable and Probability distribution

 Flip a coin three times, let X be the number of heads in three


tosses.
S = {HHH, THH, HTH, HHT, TTH, THT, HTT, TTT}
 X (HHH) =3
 X (HHT) =X (HTH) =X (THH) =2
 X (HTT) =X (THT) =X (TTH) =1
 X (TTT) =0
X= {0, 1, 2,3}
Discrete random variable: are variables which can assume only a
specific number of values which are clearly separated and they can
be counted.
Example:
Toss coin n times and count the number of heads.
Number of Children in a family.
Number of car accidents per week.
Number of defective items in a given company

Definition: A probability distribution is a complete list of all


possible of values of a random variable and their corresponding
probabilities.
111

Discrete probability distribution: is a distribution whose random


variable is discrete.
Example: Consider the possible outcomes for the exp't of tossing
three coins together.
Sample space, S = {HHH, THH, HTH, HHT, TTH, THT, HTT, TTT}
Let the r.v. X be the No of heads that will turn up when three coins
tossed
X = {0, 1, 2, 3}
P(X = 0) = P (TTT) = 1/8,
P(X=1) = P (HTT) +P (THT) + P (TTH) =1/8+1/8+1/8 = 3/8
P(X=2) = P (HHT) +P (HTH) +P (THH) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P (HHH) = 1/8

X=x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
112

COMMON DISCRETE PROBABILITY DISTRIBUTIONS

I. Binomial Distribution
The origin of binomial experiment lies in Bernoulli trial.
Bernoulli trial is an experiment of having only two mutually
exclusive outcomes which are designated by “success(s)” and
“failure (f)”. Sample space of Bernoulli trial {s, f}
Notation: Let probability of success and failure are p and q
respectively
P (success) = P(s) = p and P (failure) = P (f) = q, where q= 1- p
Definition: Let X be the number of success in n repeated
Binomial trials with probability of success p on each trial,
then the probabilities distribution of a discrete random
variable X is called binomial distribution.
113

Let P = the probability of success

q= 1-P= the probability of failure on any given trial.


A binomial random variable with parameters n and p represents

the number of r successes in n independent trials, when each trial


has P probability of success
If X is a random variable, then for i= 0, 1, 2… n
P(x=r)= n!
P r q 
n r
whereq 1  P
r !n  r !
n!
P r (1  P) n  r X  Binomial probability distribution formula
r !n  1!
114

ASSUMPTIONS
A binomial experiment is a probability experiment that satisfies the following

requirements called assumptions of a binomial distribution.

1. The experiment consists of n identical trials.

2. Each trial has only one of the two possible mutually

exclusive outcomes, success or a failure.

3. The probability of each outcome does not change from trial to trial.

4. The trials are independent.


115

If X is a binomial random variable with two parameters n


and P, then
E (X) = n.p
Var ( X) = npq
116

Example 1:The probability that a certain kind of component will


survive a shock test is 3/4.Find the probability that exactly 2 of the
next 4 components tested survive.
Solution: Assuming that the tests are independent and p=3/4 for each
of the 4 tests, we obtain
 X˜ Bin (x,n,p)= C(n,x)*px qn-x

Example 2:The probability that a patient recovers from a rare blood


disease is 0.4. If 15 people are known to have contracted this disease,
what is the probability that
(a) at least 10 survive,
(b) from 3 to 8 survive, and
(c) exactly 5 survive?
117

II. Poisson distribution


Experiments yielding numerical values of a random variable X,
the number of outcomes occurring during a given time interval or
in a specified region, are called Poisson experiments.
The given time interval may be of any length, such as a minute, a
day, a week, a month, or even a year.
It is a discrete probability distribution which is used in the area of
rare events such as
number of car accidents in a day,
arrival of telephone calls over interval of times,
number of misprints in a typed page
natural disasters Like earth quake, etc,
a Poisson experiment can generate observations for the random variable X
representing the number of telephone calls received per hour by an office,
the number of days school is closed due to natural disasters, or
 the number of games postponed due to rain during a football season.
Assumptions
The expected occurrences of events can be estimated from past trials
( records)
The numbers of success or events occur during a given regions / time intervals
are independent in another.
Properties of the Poisson Process
1. The number of outcomes occurring in one time interval or specified region of
space is independent of the number that occur in any other disjoint time
interval or region. In this sense we say that the Poisson process has no
memory.
2. The probability that a single outcome will occur during a very short time
interval or in a small region is proportional to the length of the time interval or
the size of the region and does not depend on the number of outcomes
occurring outside this time interval or region.
3. The probability that more than one outcome will occur in such a short time
interval or fall in such a small region is negligible.
119

• Definition: Let X be the number of occurrences in a Poisson


process and λ be the actual average number of occurrence of an
event in a unit length of interval, the probability function for
Poisson distribution is,

  x
P(X=x)=  e 
 forX 0,1, 2,....
X!

 0, otherwise
Remarks
Poisson distribution possesses only one parameter λ
If X has a Poisson distribution the parameter λ, then E (X) =λ
and Var (X) =λ
i.e. E (X) = Var (X) = λ And 

 P( X
x 0
 x) 1
120

Example 1: During a laboratory experiment, the average


number of radioactive particles passing through a counter in
1 millisecond is 4.
a) What is the probability that 6 particles enter the counter in
a given millisecond?
b) What is the probability of at least 6 particles?
Solution: Using the Poisson distribution with x= 6 and λt = 4
we have
a)

b) P(X ≥ 6)= 1-P(X<6)


=1-[P(X=0)+P(X=1)+P(X=2)+P(X=3)
+P(X=4)+P(X=5)]
125

COMMON CONTINUOUS
PROBABILITY DISTRIBUTIONS
126

Revision on the Definition of Continuous


Random Variable and Probability
distribution
Continuous random variable: are variables that can assume any in
an interval.
Example:
Height of students at certain college.
Mark of a student.
Probability density function(continuous probability
distribution): is a probability distribution whose random variable is
continuous.
If f(x) is a probability density functions:

Then f(x) ≥0 And  f ( x)dx 1

127

Probability of a single value is zero and probability of an interval is

the area bounded by curve of probability density function and


interval on x-axis.
Let a and b be any two values; a <b. The prob. that X assumes a

value that lies b/n a and b is equal to the area under the curve a and
b.
Since X must assume some value, it follows that the total area

under the density curve must equal 1.


128

Uniform distribution

Definition: A random variable X is said to uniformly distributed over


the range (A, B) if its probability density function (p d f ) is given by

The cumulative distribution function of a uniform random variable on


the interval (A, B) is given by
129
130
131
132

Example :Suppose that a large conference room at a certain


company can be reserved for no more than 4 hours. Both long and
short conferences occur quite often. In fact, it can be assumed that
the length X of a conference has a uniform distribution on the
interval [0, 4]
(a) What is the probability density function?
(b) What is the probability that any given conference lasts at least 3
hours?
Solution:
(a) The appropriate density function for the uniformly distributed
random variable X in this situation is
133
134

Normal Distributions
It is the most important distribution in describing a continuous
random variable and used as an approximation of other
distribution.
A random variable X is said to have a normal distribution if its
probability density function is given by
1
1
2
 2
 x   
f ( x)  e 2

 2
Where X is the real value of X,
i.e. - ∞ <x< ∞, -∞<µ<∞ and σ>0
Where µ=E(x) σ2 = variance(X)
 µ and σ2 are the Parameters of the Normal Distribution.
135

Properties of Normal Distribution:


It is bell shaped and is symmetrical about its mean. The
maximum coordinate is at x =  X
The curve approaches the horizontal x-axis as we go either
direction from the mean.
Total area under the curve sums to 1, that is
  1
1  x   2
f ( x)dx 
  e
2  
2
dx 1
The Probability that a random variable will have a value
between any two points is equal to the area under the curve
between those points.
The height of the normal curve attains its maximum at this
implies the mean and mode coincides(equal)
136

Standard normal Distribution


It is a normal distribution with mean 0 and variance 1.
Normal distribution can be converted to standard normal
distribution as follows:
If X has normal distribution with mean X and standard deviation , then
the standard normal distribution devariate Z is given by Z  x  

P(Z)=
Properties of the standard normal distribution:
The same as normal distribution, but the mean is zero and the variance is
one.
Areas under the standard normal distribution curve have been tabulated in
various ways.
The most common ones are the areas between Z = 0 and a positive value of
Z.
137

Given a normal distributed random variable X with mean µ and


standard deviation σ
b  x  a 
P(b<x<a) P(   )
  
NOTE:  x  a    a 
P( X  a)  P     P  Z  
      
i ) P (a  X  b) P (a  X  b) P (a  X b) P (a  X b)
ii ) P (  X  ) 1
iii ) P (a  Z  b) P ( Z  b)  P ( Z  a ) for b  a
138

Examples:
1. Find the area under the standard normal distribution which lies
Z 0 and Z 0.96
a) Between
Solution: Area P(0  Z  0.96) 0.3315
Z  1.45 and Z 0
b) Between
Solution: Area P ( 1.45  Z  0)
P (0  Z  1.45)
0.4265
Z  0.35
c)SOLUTION
To the right: of
Area  P ( Z   0.35)
 P (  0.35  Z  0)  P ( Z  0)
 P (0  Z  0.35)  P ( Z  0)
0.1368  0.50 0.6368
139
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
140

Example:

A normal distribution has mean 62.4.Find its standard deviation


if 20.05% of the area under the normal curve lies to the right of
72.9.
Solution : P( X  72.9) 0.2005  P( X    72.9   ) 0.2005
 
72.9  62.4
 P(Z  ) 0.2005

10.5
 P(Z  ) 0.2005

10.5
 P (0  Z  ) 0.50  0.2005 0.2995

And from table P (0  Z  0.84) 0.2995
10.5
 0.84

  12.5
141

Example :The average grade for an exam is 74, and the standard
deviation is 7. If 12% of the class is given As, and the grades are
curved to follow a normal distribution, what is the lowest possible A
and the highest possible B?
Solution: In this example, we begin with a known area of probability,
find the z value, and then determine x from the formula x = σz+μ.
An area of 0.12, corresponding to the fraction of students receiving
As, is shaded in Figure We require a z value that leaves 0.12 of the
area to the right and, hence, an area of 0.88 to the left. From Table,
P(Z<1.18) has the closest value to 0.88, so the desired z value is
1.18. Hence, x= (7)(1.18) + 74 = 82.26. Therefore, the lowest A is 83
and the highest B is 82.
142

Exercise
1. An electrical firm manufactures light bulbs that have a life, before
burn-out, that is normally distributed with mean equal to 800
hours and a standard deviation of 40 hours. Find the probability
that a bulb burns between 778 and 834 hours.(ans. 0.5111)
2. A certain type of storage battery lasts, on average, 3.0 years with a
standard deviation of 0.5 year. Assuming that battery life is
normally distributed, find the probability that a given battery will
last less than 2.3 years.(ans. 0.0808)
3. Given a normal distribution with μ=40 and σ= 6, find the value of
x that has
(a) 45% of the area to the left and (ans x= 39.22)
(b) 14% of the area to the right. (ans x= 46.48)
143

Exponential Distribution
Definition : A continuous random variable has an exponential
distribution with parameter λ if its probability density function f
is given by

The cumulative distribution function, F, is then given by

The mean and variance of the exponential distribution are μ =β


and σ2 = β2
The mean and variance of the exponential distribution are 1/ λ
and 1/ λ2 respectively
144
145
146

Theorem: The memory less property


The exponential distribution also satisfies the memory less
property. That is if X has an exponential distribution, then
for all s, t > 0,
P(X > s + t / X > t) = P(X > s)
PROOF: NEXT PAGE

So in the context of lifetimes, the probability of surviving a


further time s, having survived time t is the same as the
original probability of surviving a time s. This is called
the “memory less” property of the distribution. It is therefore
the continuous analogue of the geometric distribution, which
also has such a property.
147
148

Example : Suppose that the number of miles that a car can run
before its battery wears out is exponentially distributed with an
average value of 10,000 miles. If a person desires to take a 5000
mile trip; what is the probability that he or she will be able to
complete the trip without having replacing the car battery?
Solution: It follows by the memory less property of the exponential
distribution that the remaining lifetime (in thousands of miles) of
the battery is exponential with parameter λ =1/10. Hence the
desired probability is
a) P{remaining lifetime >5} = 1- F(5)= e-5λ =e-1/2 ≈ 0.604
149

Example: Suppose that a system contains a certain type of


component whose time, in years, to failure is given by T. The
random variable T is modeled nicely by the exponential distribution
with mean time to failure β = 5. If 5 of these components are
installed in different systems, what is the probability that at least 2
are still functioning at the end of 8 years?
Solution: The probability that a given component is still functioning
after 8 years is given by

Let X represent the number of components functioning after 8 years.


Then using the binomial distribution, we have
150

More on expectation and moments

 Every discrete random variable X has a point associated with it.


The points collectively are known as a probability mass function
which can be used to obtain probabilities associated with the
random variable.
Let X be a discrete random variable, then the probability mass
function is given by
 f(x) = P(X=x), for real number x.
 A function is probability mass function
f(x) ≥0 and
 f ( x) 1
all x
151

Expected Value (Mean)


Let X be a discrete random variable X whose possible values are
X1, X2 …., Xn with the probabilities P(X1), P(X2),P(X3),
…….P(Xn) respectively.
Then the expected value of X, E(X) is defined as:
 E(X) =X1P(X1) +X2P(X2) +……..+XnP (Xn)
n
E ( X )  X i P X  xi 
i 1

Properties of expected value


If C is a constant then E(C) = C
E (CX) =CE(X), Where C is constant.
E (X+C) =E(X) +C, Where C is a constant.\
E(X + Y)= E(X) +E(Y)
152

Variance

 If X is a discrete random variable with expected value µ (i.e.


E(X) = µ), then the variance of X, denoted by Var (X), is defined
by
Var (X) = E(X- µ) 2
= E (X2) - µ2
= n ( x ) 2P  x  - µ2

i 1
i i

n
 X ) P x 
2
Alternatively, Var (X) =  ( xi  i
i 1
153

Properties of Variances
For any r.v X and constant C, it can be shown that
Var (CX) = C2 Var (X)
Var (X +C) = Var (X) +0 = Var (X)
If X and Y are independent random variables, then
Var (X + Y) = Var (X) + Var (Y)
More generally if X1, X2 ……, Xk are independent random
variables,
 Then Var(X1 +X2 + …..+ Xk) = Var(X1) +Var(X2) +…. + var(Xk)
 i.e  k  k
Var ( x)   xi    Var X i 
 i 1  i 1

If X and Y are not independent, then


 Var (X+Y) = Var(X) + 2Cov(X,Y) + Var(Y)
 Var(X-Y) = Var(X) – 2Cov(X,Y) + Var(Y)
The Chi-Square Distribution
• Suppose that Zi , i=1,…,n are iid ~ N(0,1), and X=(Zi2), then
• X has a chi-square distribution with n degrees of freedom (df),
that is
• X~2n
• If X~2n, then E(X) = n and Var (X) = 2n
The t distribution

• If a random variable, T, has a t distribution with n degrees


of freedom, then it is denoted as T~tn
• E(T)=0 (for n>1) and Var(T)=n/(n-2) (for n>2)
• T is a function of Z~N(0,1) and X~2n as follows:

Z
T
X
n
The F Distribution
• If a random variable, F, has an F distribution with (k1,k2) df,
then it is denoted as F~Fk1,k2
• F is a function of X1~2k1 and X2~2k2 as follows:

 X1 
 k 
F  1

 X2 
 k 
 2
SAMPLING
DISTRIBUTIONS
Sampling Distributions 158

Sampling Distribution of X̅
• The sample mean, X̅ , is a random variable.

• There are a lot of different values of X̅

• Every sample we collect has a different X̅

• There is only one population mean, µ.


Sampling Distributions 159

Sampling Distribution of X̅

• If each of you in the class collected your own data you


would each get a different
• Approx, 95% of your X̅ ’s would be close to µ, within ±2
s.d.

• This is because X̅ is a normally distributed random


variable – for large samples

• X̅ follows a normal distribution centered about µ

• This is known as the Central Limit Theorem


Sampling Distributions 160

Central Limit Theorem

• Even if a population distribution is non normal, the


sampling distribution of X̅ may be considered to be
approximately normal for large samples.
• What’s large? At least 30.
Sampling Distributions 161

Central Limit Theorem (cont’d)


This “hypothetical” sampling distribution of the mean, as n
gets large, has the following properties:
• E() = μ. It has a mean equal to the population mean.
• It has a standard deviation (called the standard error of
the mean, ) equal to the population standard deviation
divided by √n.
• It is normally distributed.
Sampling Distributions 162

Central Limit Theorem (cont’d)


This means that, for large samples, the sampling
distribution of the mean (X̅ ) can be approximated by a
normal distribution with
Sampling Distributions 163

A Very, Very Small Example


• Suppose that in a population consisting of 5 elements and
one wishes to take a random sample of 2. There are 10
possible samples which might be selected.

• Consider the Population (N=5): 1, 2, 3, 4, 5


Sampling Distributions 164

…Very Small Example…


• Since we know the entire population, we can compute the
population parameters:
• μ = 3.0

•σ= = √2 = 1.41

• [Note: N, not n-1. This is the formula for computing the population
standard deviation, σ.]
Sampling Distributions 165

…Very Small Example…


• All possible samples of size n=2:
Sample Mean ( X )
(1,2) 1.5
(1,3) 2.0
(1,4) 2.5
(1,5) 3.0
(2,3) 2.5
(2,4) 3.0
(2,5) 3.5
(3,4) 3.5
(3,5) 4.0
(4,5) 4.5
30.0

• The average of all the possible sample means is


E()=30.0 / 10 = 3.0.
So, E()=μ. This property is called unbiasedness.
Sampling Distributions 166

The Sampling Distribution of X̅


• By the Central Limit Theorem, follows a normal
distribution (for large n):
Sampling Distributions 167

The Sampling Distribution of X̅


• Since this is a normal distribution, we can standardize it
(transform to Z) just like any other normal distribution.
X  E( X ) X 
Z = Z
X / n
• If n is large, say 30 or more, use s as an unbiased
estimate of σ.
Sampling Distributions 168

Example- Steel Chains


• Suppose you have steel chains with an average breaking
strength of μ=200 lbs. with a σ=10 lbs., and you take a
sample of n=100 chains.

• What is the probability that the sample mean breaking


strength will be 195 lbs. or less? This is the same as
asking: What proportion of the sample means will be X̅
=195 lbs. or less?
Sampling Distributions 169

Example- Steel Chains


• Solution (Draw a picture!)

•Z= = -5

• Ans: The probability is close to zero.


Sampling Distributions 170

Example- Bulbs
• A manufacturer claims that its bulbs have a mean life of
15,000 hours and a standard deviation of 1,000 hours. A
quality control expert takes a random sample of 100 bulbs
and finds a mean of X̅ = 14,000 hours. Should the
manufacturer revise the claim?

• NOTE: We don’t know the distribution of the lifetimes of


the bulbs. But we DO know the distribution of the mean
life of a (large enough) sample of bulbs.
Sampling Distributions 171

Example- Bulbs
• Solution (DRAW A PICTURE!)

14,000 15,000 X

• Z= = = -10

• That’s 10 standard deviations away from the mean! Yes, the claim
should definitely be revised.
Sampling Distributions 172

Example- Hybrid Motors


• In a large automobile manufacturing company, the life of
hybrid motors is normally distributed with a mean of
100,000 miles and a standard deviation of 10,000 miles.

• (a) What is the probability that a randomly selected hybrid


motor has a life between 90,000 miles and 110,000 miles
per year?

• (b) If a random sample of 100 motors is selected, what is


the probability that the sample mean will be below 98,000
miles per year?
Sampling Distributions 173

Example- Hybrid Motors


• SOLUTION (Of course, DRAW A PICTURE!)

• (a) Convert to Z using the formula:

Z = (Xi − μ) / σ

Z= = −1

Z= = +1

• Thus, we have to find how much area lies between −1 and +1


of the Z-distribution
Answer = .6826
Sampling Distributions 174

Example- Hybrid Motors


• (b) SOLUTION (Of course, DRAW A PICTURE!)

• (b) Here we are looking at the sampling distribution of the mean.


Sample means follow a different distribution and to convert to Z, we use
the following formula.

= −2

• Ans: The probability the sample mean will be below 98,000/year is .5 − .4772
= .0228.
Sampling Distributions 175

Other Sampling Distributions


• We have been looking at the relationship between X̅ and
µ.
• Of course, statisticians will often be interested in
estimating other parameters such as the population
proportion (P), the population standard deviation (σ), the
population median, etc.
• In each case we use a statistic from a sample to estimate
the parameter. Each of these statistics has its own
sampling distribution.
Sampling Distributions 176

Implications
• The relationship between X̅ and µ is the foundation of
statistical inference

• Statistical inference includes estimation (of µ) and testing


hypotheses (about µ)

• Since there are so many ’s – as many as there are


possible samples - we use the X̅ value we happened to get
as a tool to make inferences about the only true mean, µ.

• Without actually conducting a census we can never know


µ with 100% certainty
Sampling Distributions 177

Importance
• This concept is what the whole rest of the course is about.
DETERMINING SAMPLE SIZE
Sample Size Determination 191

Sample Size Determination: µ


• In estimating µ based on sample statistics, how large a
sample do we need for the level of precision we want?
• To determine the sample size we need, we must know the (1)
desired precision and (2) σ.

ePrecision

X Z / n
• e, the half-width of the confidence interval estimator is the
precision with which we are estimating. e is also called
sampling error.
Sample Size Determination 192

Sample Size Determination: µ


…continued

We use e to solve for n:

Z Z
If e then n
n e

Z 2 2
and so n 2
e
Sample Size Determination 193

Example
Suppose σ=20 based upon previous studies. We would
like to estimate the population mean within ±10 of its true
value, at α=.05 (i.e., 95% confidence). What sample size
should we take?

1.96 2 20 2 = 15.4
n
10 2
so, we need a sample size of at least 15.4, or n=16. We
round UP, not down.
Sample Size Determination 194

Sample Size Determination: P


• Similarly, taking e (precision) from formula for the half-
width of a confidence interval estimator for P:
• Population proportion case

2
Z P(1  P)
e2
• Q: If we are trying to estimate the population proportion, P,
what do we use for P in this formula?
Sample Size Determination 195

Example: Political Poll


Suppose a pollster wants a maximum error of
e = .01 with 95% confidence.

We assume that variance is the highest possible, so we


use P=.5. This is the way we ensure that sampling error
will be within ±.01 of the true population Proportion.

Then,
n = 1.962.5(1  .5) = 9,604
.012 That is a VERY large sample.
Sample Size Determination 196

Example: Political Poll


…continued

Let’s try that again with e = .03.


1.96 2.5(1  .5)
n= .032 = 1,067

This is the sample size that most pollsters work with.


Sample Size Determination 197

Homework
• Practice, practice, practice.
• As always, do lots and lots of problems. You can find these in the
online lecture notes and homework assignments.
INTRODUCTION TO
STATISTICAL INFERENCE
Estimation
Estimation 199

Statistical Inference involves:


• Estimation
• Hypothesis Testing

Both activities use sample statistics (for example, ) to make


inferences about a population parameter (μ).
Estimation 200

Estimation
• Why don’t we just use a single number (a point estimate)
like, say, X̅ to estimate a population parameter, μ?

• The problem with using a single point (or value) is that it


will very probably be wrong. In fact, with a continuous
random variable, the probability that the variable is equal
to a particular value is zero. So, P(X̅ =μ) = 0.

• This is why we use an interval estimator.

• We can examine the probability that the interval includes


the population parameter.
Estimation 201

Confidence Interval Estimators


• How wide should the interval be? That depends upon how much
confidence you want in the estimate.

• For instance, say you wanted a confidence interval estimator for the
mean income of a college
You might have
graduate:
That the mean income is between
100% confidence $0 and $∞
95% confidence $35,000 and $41,000
90% confidence $36,000 and $40,000
80% confidence $37,500 and $38,500
… …
0% confidence $38,000 (a point estimate)

• The wider the interval, the greater the confidence you will have in it
as containing the true population parameter μ.
Estimation 202

Confidence Interval Estimators


• To construct a confidence interval estimator of μ, we use:

X̅ ± σ /√n (1-α) confidence

where we get Zα from the Z table.

• When we don’t know σ we should really be using a different table (future


lectures will cover this) but, often, if n is large (say n≥30), we may use s instead
since we assume that it is close to the value of σ.
Estimation 203

Confidence Interval Estimators


• To be more precise, the α is split in half since we are
constructing a two-sided confidence interval. However,
for the sake of simplicity, we call the z-value Zα rather than
Za/2 .

/2 /2

-Z/2 Z/2
Estimation 204

Question
• You work for a company that makes smart TVs, and your
boss asks you to determine with certainty the exact life of
a smart TV. She tells you to take a random sample of 100
TVs.

• What is the exact life of a smart TV made by this


company?
Sample Evidence:
n = 100
X̅ = 11.50 years
s = 2.50 years
Estimation 205

Answer – Take 1
• Since your boss has asked for 100% confidence, the only
answer you can accurately provide is: -∞ to + ∞ years.

• After you are fired, perhaps you can get your job back by
explaining to your boss that statisticians cannot work with
100% confidence if they are working with data from a
sample. If you want 100% confidence, you must take a
census. With a sample, you can never be absolutely certain
as to the value of the population parameter.

• This is exactly what statistical inference is: Using sample


statistics to draw conclusions (e.g., estimates) about
population parameters.
Estimation 206

The Better Answer


n = 100
X̅ = 11.50 years
S = 2.50 years

at 95% confidence:
11.50 ± 1.96*(2.50/√100)
11.50 ± 1.96*(.25)
11.50 ± .49
The 95% CIE is: 11.01 years ---- 11.99 years
[Note: Ideally we should be using σ but since n is large we assume that s is close to the
true population standard deviation.]
Estimation 207

The Better Answer - Interpretation


• We are 95% confident that the interval from 11.01 years to
11.99 years contains the true population parameter, μ.
• Another way to put this is, in 95 out of 100 samples, the
population mean would lie in intervals constructed by the
same procedure (same n and same α).
• Remember – the population parameter (μ ) is fixed, it is
not a random variable. Thus, it is incorrect to say that
there is a 95% chance that the population mean will “fall”
in this interval.
Estimation 208

EXAMPLE: Life of a Refrigerator


The sample:
n = 100
X̅ = 18 years
s = 4 years

• Construct a confidence interval estimator (CIE) of the true


population mean life (µ), at each of the following levels of
confidence:
• (a)100% (b) 99% (c) 95% (d) 90% (e) 68%
Estimation 209

EXAMPLE: Life of a Refrigerator


 Again, in this example, we should ideally be using σ but
since n is large we assume that s is close to the true
population standard deviation.

• It should be noted that s2 is an unbiased estimator of σ2:


E(s2) = σ2

σ2 = s2 =
Estimation 210

EXAMPLE: Life of a Refrigerator


(a) 100% Confidence
[α = 0, Zα = ∞]
100% CIE: −∞ years ↔ +∞ years
(b) 99% Confidence
α = .01, Zα = 2.575 (from Z table)
18 ± 2.575 (4/√100)
18 ± 1.03
99% CIE: 16.97 years ↔ 19.03 years
(c) 95% Confidence
α = .05, Zα = 1.96 (from Z table)
18 ± 1.96 (4/√100)
18 ± 0.78
95% CIE: 17.22 years ↔ 18.78 years
Estimation 211

EXAMPLE: Life of a Refrigerator


(d) 90% Confidence
α = .10, Zα = 1.645 (from Z table)
18 ± 1.645 (4/√100)
18 ± 0.66
90% CIE: 17.34 years ↔ 18.66 years

(e) 68% Confidence


α = .32, Zα =1.0 (from Z table)
18 ± 1.0 (4/√100)
18 ± 0.4
68% CIE: 17.60 years ↔ 18.40 years
Estimation 212

Balancing Confidence and Width in a CIE

• How can we keep the same level of confidence and still


construct a narrower CIE?
• Let’s look at the formula one more time: X̅ ± Zασ/√n
• The sample mean is in the center. The more confidence
you want, the higher the value of Z, the larger the half-width
of the interval.
• The larger the sample size, the smaller the half-width, since
we divide by √n.
• So, what can we do? If you want a narrower interval, take a
larger sample.
• What about a smaller standard deviation? Of course, this depends on the
variability of the population. However, a more efficient sampling procedure
(e.g., stratification) may help. That topic is for a more advanced statistics
course.
Estimation 213

Key Points
• Once you are working with a sample, not the entire
population, you cannot be 100% certain of population
parameters. If you need to know the value of a parameter
certainty, take a census.
• The more confidence you want to have in the estimator,
the larger the interval is going to be.
• Traditionally, statisticians work with 95% confidence.
However, you should be able to use the Z-table to
construct a CIE at any level of confidence.
Estimation 214

More Homework for you.

Do the rest of the problems in the lecture notes.


STATISTICAL
INFERENCE
Hypothesis Testing

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Hypothesis Testing 216

Hypothesis Testing
• Testing a Claim: Companies often make claims about
products. For example, a frozen yogurt company may claim
that its product has no more than 90 calories per cup. This
claim is about a parameter – i.e., the population mean number
of calories per cup (μ).

• The claim is tested is by taking a sample - say, 100 cups - and


determining the sample mean. If the sample mean is 90
calories or less we have no evidence that the company has
lied. Even if the sample mean is greater than 90 calories, it is
possible the company is still telling the truth (sampling error).
However, at some point – perhaps, say, a sample average of
500 calories per cup – it will be clear that the company has not
been completely truthful about its product.
Hypothesis Testing 217

Hypothesis Testing
• A hypothesis is made about the value of a parameter, but the only
facts available to estimate the true parameter are those provided
by the sample. If the statistic differs (and of course it will) from
the hypothesis stated about the parameter, a decision must be
made as to whether or not this difference is significant. If it is, the
hypothesis is rejected. If not, it cannot be rejected.

• H0: The null hypothesis. This contains the hypothesized


parameter value which will be compared with the sample value.

• H1: The alternative hypothesis. This will be “accepted” only if H0


is rejected.
Technically speaking, we never accept H 0 What we actually say is that we do not
have the evidence to reject it.
Hypothesis Testing 218

Two Types of Errors: Alpha and Beta


• Two types of errors may occur: α (alpha) and β (beta).
The α error is often referred to as a Type I error and β
error as a Type II error.
• You are guilty of an alpha error if you reject H 0 when it really is true.

• You commit a beta error if you “accept” H0 when it is false.


Hypothesis Testing 219

Two Types of Errors: Alpha and Beta


• This alpha error is related to the (1- α) we just learned
about when constructing confidence intervals. We will
soon see that an  error of .05 in testing a hypothesis
(two-tail test) is equivalent to a confidence of 95% in
constructing a two-sided interval estimator.

/2 /2

-Z/2 Z/2
Hypothesis Testing 220

Two Types of Errors: Alpha and Beta


TRADEOFF!
• There is a tradeoff between the alpha and beta errors. We cannot
simply reduce both types of error. As one goes down, the other rises.

• As we lower the  error, the β error goes up: reducing the error of
rejecting H0 (the error of rejection) increases the error of “Accepting”
H0 when it is false (the error of acceptance).

• This is similar (in fact exactly the same) to the problem we had earlier
with confidence intervals. Ideally, we would love a very narrow
interval, with a lot of confidence. But, practically, we can never have
both: there is a tradeoff.
Hypothesis Testing 221

Tradeoff in Type I / Type II Errors: Examples


• Our legal system understands this tradeoff very well.

• If we make it extremely difficult to convict criminals because we


do not want to incarcerate any innocent people we will probably
have a legal system in which no one gets convicted.

• On the other hand, if we make it very easy to convict, then we


will have a legal system in which many innocent people end up
behind bars.

• This is why our legal system does not require a guilty verdict to
be “beyond a shadow of a doubt” (i.e., complete certainty) but
“beyond reasonable doubt.”
Hypothesis Testing 222

Tradeoff in Type I / Type II Errors: Examples


• Quality Control.
• A company purchases chips for its smart phones, in batches of
50,000. The company is willing to live with a few defects per
50,000 chips. How many defects?
• If the firm randomly samples 100 chips from each batch of 50,000
and rejects the entire shipment if there are ANY defects, it may end
up rejecting too many shipments (error of rejection). If the firm is
too liberal in what it accepts and assumes everything is “sampling
error,” it is likely to make the error of acceptance.

• This is why government and industry generally work with an alpha


error of .05
Hypothesis Testing 223

Steps in Hypothesis Testing


1. Formulate H0 and H1. H0 is the null hypothesis, a hypothesis about the value of a
parameter, and H1 is an alternative hypothesis.
• e.g., H0: µ=12.7 years; H1: µ≠12.7 years

2. Specify the level of significance (α) to be used. This level of significance tells you
the probability of rejecting H0 when it is, in fact, true. (Normally, significance level of
0.05 or 0.01 are used)

3. Select the test statistic: e.g., Z, t, F, etc. So far, we have been using the Z
distribution. We will be learning about the t-distribution (used for small samples)
later on.

4. Establish the critical value or values of the test statistic needed to reject H 0. DRAW
A PICTURE!

5. Determine the actual value (computed value) of the test statistic.

6. Make a decision: Reject H0 or Do Not Reject H0.


Hypothesis Testing 224

One-Tail and Two-Tail Hypothesis Tests


• When we Formulate H0 and H1, we have to decide
whether to use a one-tail or two-tail test.

• With a “two-tail” hypothesis test, α is split into two and put


in both tails. H1 then includes two possibilities: μ = # OR μ
≠ #. This is why the region of rejection is divided into two
tails. Note that the region of rejection always corresponds
to H1.

• With a “one-tail” hypothesis test, the α is entirely in one of


the tails.
Hypothesis Testing 225

One-Tail and Two-Tail Hypothesis Tests


• For example, if the company claims that a certain product has exactly
1 mg of aspirin, that would result in a two-tail test. Note words like
“exactly” suggest two tail tests. There are problems with too much
aspirin and too little aspirin in a drug.

• On the other hand, if a firm claims that a box of its raisin bran cereal
contains at least 100 raisins, a one-tail test has to be used. If the
sample mean is more than 100, everything is ok. The problems arise
only if the sample mean is less than 100. The question will be
whether we are looking at sampling error or perhaps the company is
lying and the true (population) mean is less than 100 raisins.
Hypothesis Testing 226

Two-Tail Tests
• A company claims that its soda vending machines deliver exactly 8 ounces of soda.
Clearly, You do not want the vending machines to deliver too much or too little soda.
How would you formulate this?

Answer:
H0: µ = 8 ounces
H1: µ ≠ 8 ounces
If you are testing at α=.01, The .01 is split into two: .005 in the left tail and .005 in the
right tail The critical values are ±2.575

.005 .005

-2.575 2.575
Hypothesis Testing 227

Two-Tail Tests
• A company claims that its bolts have a circumference of exactly 12.50
inches. (If the bolts are too wide or narrow, they will not fit properly):
Answer:
H0: µ = 12.50 inches
H1: µ ≠ 12.50 inches

• A company claims that a slice of its bread has exactly 2 grams of


fiber. Formulate this:
Answer:
H0: µ = 2 grams
H1: µ ≠ 2 grams
Hypothesis Testing 228

One-Tail Tests
• A company claims that its batteries have an average life of at least 500 hours. How
would you formulate this?

Answer:
H0: µ ≧ 500 hours
H1: µ < 500 hours

If you are testing at an α = .05, The entire .05 is in the left tail (hint: H1 points to where
the rejection region should be.) The critical value is -1.645.
Hypothesis Testing 229

One-Tail Tests
A company claims that its overpriced, bottled spring water has no more than 1 mcg of
benzene (poison). How would you formulate this:

Answer:
H0: µ ≦ 1 mcg. benzene
H1: µ > 1 mcg. benzene

If you are testing at an α = .05, The entire .05 is in the right tail (hint: H1 points to where
the rejection region should be.) The critical value is +1.645.

.05

1.645
Hypothesis Testing 230
Example: Two-Tail Test
A pharmaceutical company claims that each of its pills contains exactly 20.00 milligrams
of Cumidin (a blood thinner). You sample 64 pills and find that the sample mean X̅
=20.50 mg and s = .80 mg. Should the company’s claim be rejected? Test at α = 0.05.
• Formulate the hypotheses
H0: µ =20.00 mg
H1: µ  20.00 mg
• Choose the test statistic and find the critical values; draw region of rejection
Test statistic: Z
At α = 0.05, the critical values are ±1.96.

• Use the data to get the calculated value of the test statistic
Z= = =5 [ .80/√.64 = .10 This is the standard error of the mean. ]

• Come to a Conclusion: Reject H0 or Do Not Reject H0


The computed Z value of 5 is deep in the region of rejection.
Thus, Reject H0 at p < .05
Hypothesis Testing 231
Example: Two-Tail Test
• Suppose we took the above data, ignored the hypothesis, and constructed a
95% confidence interval estimator.

20.50  1.96(.10)
95%, CIE: 20.304 mg  20.696 mg

• We note that 20.00 mg is not in this interval.


• As you can see, hypothesis testing and CIE are virtually the same exercise;
they are merely two sides of the same coin. Both rely on the sample
evidence.
• If a claim is made about a parameter, do a hypothesis test. If no claim is
made and a company wants to use sample evidence to estimate a parameter
(perhaps to determine what claims may be made in the future about a
parameter), construct a confidence interval estimator.
Hypothesis Testing 232

Example: One-Tail Test


• A company claims that its LED bulbs will last at least 8,000 hours. You
sample 100 bulbs and find that X̅ =7,800 hours and s=800 hours.
Should the company’s claim be rejected? Test at α = 0.05.

• H0: µ ≧ 8,000 hours 5%

H1: µ < 8,000 hours -1.645

• Z = 7,800 – 8,000 / (800/√100) = -200/80 = -2.50


• [800/√100 = 80, the standard error of the mean]

• The computed Z value of -2.50 is in the region of rejection. Thus, reject


H0 at p < .05

• Note: When testing a hypothesis, we often have to perform a one-tail test if the claim requires it.
However, we will always use only two-sided confidence interval estimators when using sample
statistics to estimate population parameters.
Hypothesis Testing 233

Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes.
STUDENT’S T-
DISTRIBUTION

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
The t Distribution 235

Student’s t-Distribution
• In the previous lectures on statistical inference (estimation, hypothesis
testing) about the mean, we used the Z statistic even when we did not
know σ. In that case, we used s as a point estimate of σ for “large
enough” n.

• What do we do if (a) we don’t know σ and (b) n is small? If the population


of interest is normally distributed, we can use the Student’s t-distribution in
place of the standard normal distribution, Z.

• William S. Gossett, who developed the t-distribution, wrote under the


name “Student” since, as an employee of the Guiness Brewery in Dublin,
he was required by the firm to use a pseudonym in publishing his results.

• In a nutshell, we use the t-Distribution when we have small samples. As


you recall, the central limit theorem tells us that the sample means follow a
normal distribution when n is large (some say at least 30; others say 50).
The t Distribution 236

When to use the t-Distribution


Large sample (n) Small sample (n)
σ known Z test Z test
σ unknown Z test t test – but only if population is ND**

• We will use Z for EITHER (1) known σ OR (2) large n

• Use t for (1) Small sample AND (2) Taken from N.D. population** AND (3) Unknown σ

** What can we do if the population is not normally distributed and we have a


small sample? Always use non-parametric statistical methods in this case.
(We will not study non-parametric statistics in this course.
The t Distribution tDegrees of freedom 237

The t-Distribution
• The t-distribution looks like the normal distribution, except that it has
more spread.
• It is still symmetrical about the mean; mean=median=mode.
• It also goes from -∞ to +∞.

• The Student’s t distribution is not a single distribution (like the


standardized normal (Z) distribution.
• It is a series of distributions, one for each number of degrees of freedom.
• The t distribution with 10 degrees of freedom has a slightly different shape
than a t distribution with 20 degrees of freedom.
• As n gets larger, student’s t distribution approaches the normal distribution.

• Degrees of freedom (df) = n -1. We will shortly see why we lose the degree of
freedom (hint: remember we divided by (n-1) when computing the sample
standard deviation).
The t Distribution 238

Using the t statistic


• for testing hypotheses (n-1 degrees of freedom):
X  0 X  0
t n 1  
sX s
n

• µH0 or μ0 denotes the claimed population mean (for example, a


company might be making a claim about this parameter).

• (1-)% Confidence Interval Estimator:

s
X t
n
The t Distribution 239

Losing a degree of freedom


• The number of degrees of freedom is equal to the sample size minus 1.

• We “lose” a degree of freedom each time a statistic computed from the


sample is used as a point estimator in place of the parameter.

• In this case, we do not know the population σ, so instead of using σ


(which comes from a census, not a sample) we use s, the sample
standard deviation.
• As you already know, in the formula to compute s, we divide by (n-1), a loss of a
degree of freedom. If we did not divide by (n-1) in computing the standard deviation,
it would be biased when used as a point estimator of the population standard
deviation, σ

• In a nutshell, we have n-1 degrees of freedom for reasons that have to do


with mathematically ensuring that our formulas all work properly without
any bias.
The t Distribution 240

Example 1
• A consulting firm claims that its consultants earn on
average exactly $260 an hour. You decide to test the
claim using a sample of 16 consultants. You find that X̅ =
$200 and s = $96. Test the claim at α = .05 level. Assume
that the population follows a normal distribution.

• We follow the same steps in hypothesis testing as before


(see the virtual handout in question).

• Step 1: Formulate the null and alternate hypotheses


H0: μ = $260
H1: μ ≠ $260
The t Distribution 241

Example 1 (cont’d)
• Step 2: =.05
• Step 3: Choose the test statistic
• We are going to use the t-distribution since n is small (only 16) and we do not know σ
t 15

2 .5 % 2 .5 %

-2.1315 2.1315

• Step 4: Establish the critical value or values of the test statistic needed to
reject H0.
• This is a t15 so we cannot use the critical values of ±1.96 which we used for a Z test.
If you go to the t-table (next slide) and examine the column that has .025 on the top
and the row for 15 degrees of freedom, you will find that the critical values are
±2.1315
• Please do not forget that you have (n-1) degrees of freedom (d.f.); n = 16, but d.f. =
15.
The t Distribution 242

Example 1 (cont’d) Upper Tail Areas


Degrees of
Freedom .25 .10 .05 .025 .01 .005
1 1.0000 3.0777 6.3138 12.7062 31.8207 63.6574
2 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248
3 0.7649 1.6377 2.3 534 3.1824 4.5407 5.8409
4 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041
5 0.7267 1.4 759 2.0150 2.5706 3.3649 4.0322
6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074
7 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995
8 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554
9 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498
10 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693

11 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058


12 0;6955 1.3562 1. 7823 2.1788 2.6810 3.0545
13 0.6938 1.3502 1.7709 2.1604 2.6503 3.0123
14 0.6924 1.3450 1.7613 2.1448 2.6245 2.9768
15 0.6912 1.3406 1.7531 2.1315 2.6025 2.9467
16 0.6901 1.3368 1.7459 2.1199 2.5835 2.9208
17 0.6892 1.3334 1.7396 2.1098 2.5669 2.8982
18 0.6884 1.3304 1.7341 2.1009 2.5524 2.8784
19 0.6876 1.3277 1.7291 2.0930 2.5395 2.8609
20 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453

… … … … … … …
∞ 1.645 1.96 2.33 2.575
The t Distribution 243

Example 1 (cont’d)
• Step 5: Determine the actual value (computed value) of the test
statistic.
200  260  60

• t15 = 96 24= - 2.50
16

• Step 6: Make the decision (Reject H0 or Do Not Reject H0).


• Reject H0
The t Distribution 244

Example 1 (cont’d)
• Now, how do we construct a confidence interval estimator
(CIE), using t?
• Suppose there had not been a claim about the population
mean and you simply wanted to construct a 95% CIE for μ.
Using the sample evidence, what should you do?
200± 96
2.1315
16
$200 ± $51.16
$148.84 ↔ $251.16
 Interpretation: We have 95% confidence that this interval
($148.84 ↔ $251.16) really does contain the true population
mean, µ.
The t Distribution 245

Example 2
• A company claims that its soup vending machines deliver
(on average) exactly 4.00 ounces of soup. The company
statistician finds:
n=25
X̅ =3.97 ounces
s=.04 ounces

• (a) Test the claim at α = .02.


• (b) Supposing no claim was made, construct a two-sided
98% confidence interval estimator using the sample
evidence.
The t Distribution 246

Example 2 (cont’d) (a)


H 0 : µ = 4 oz
H 1 : µ ≠ 4 oz
• (a) Test the claim (=.02) t24

H0: µ=4.0 oz.


H1: µ≠4.0 oz. 0.01 0.01

-2.4922 2.4922

3.97  4 - .03
t24   -3.75 3.97 − 4 - .03
.04 / 25 .008 t = = = -3 .7 5 REJECT H 0
.04 / 2 5 .0 0 8

Reject H0
The t Distribution 247

Example 2 (cont’d)
•(b) 98% CIE of µ
3.97 ± 2.4922(.008)
3.97 ± .02
3.95 oz ←———→ 3.99 oz

• Note that 4.00 ounces is not in


the 98% CIE. How is this
consistent with (a)?
The t Distribution 248

Example 3
• A school claims that the average reading score of its
students is at least 70.
• A sample of 16 students is randomly selected (n=16) to test this
claim.
• X̅ = 68 and s = 9

• (a) Test claim at α = .05.

• (b) Supposing no claim was made. Use sample evidence


to construct a two-sided 95% confidence interval estimate
(CIE) of μ.
The t Distribution 249

(a) H 0 : µ ≥ 70

Example 3 (cont’d) H 1 : µ < 70

t 15

• Test the claim at =.05


H0: µ≥70 5%

H1: µ<70
-1.7531

68 − 70 -2
68  70 -2 t = = = -0 .8 8 DO NOT REJECT H 0
t15   -0.88 15
9 / 16 2 .2 5
9 / 16 2.25

Do Not Reject H0
The t Distribution 250

Example 3 (cont’d)
• (b) Two-sided 95% CIE.
• We will always construct two-sided CIEs in this course.

68  2.1315(2.25)
68  4.8

63.2 ——— 72.8


The t Distribution 251

Example 4
• The Vandelay Water Company claims that at most there
is1 ppm (part per million) of benzene in their terribly
expensive horrid- tasting bottled water.
(a) Test the claim at α=.05
(b) Supposing no claim was made. Construct a two-tailed 95%
C.I.E of μ

• SampleData:
n=25 randomly selected bottles of water
X̅ = 1.16 ppm and s = .20 ppm
The t Distribution 252

Example 4 (cont’d)
 Test the claim at α=.05
H0: µ≤1.0 ppm
H1: µ>1.0 ppm

1.16  1.00 .16


t 24   4
.20 / 25 .04

Reject H0
The t Distribution 253

Example 4 (cont’d)

(b) 2-sided 95% CIE of μ t 24

0.025 0.025

1.16 2.0639(0.04) -2.0639 2.0639

1.16.083

1.077 ppm —— 1.243 ppm


The t Distribution 254

Do Your Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes.
TWO-SAMPLE T-TEST
Inferences About Means from Two Independent Groups
Two-Sample t Test 256

The Two-Sample t-Statistic


• Similar to one-sample hypothesis testing, we use the Z
statistic if we can.
• Use Z if σ1 and σ 2 are known OR if samples are large
enough
• What is large enough? Both samples together must be at least 32.
Some say 60, or more.
• Therefore we must use t when
• σ1 and σ 2 are unknown AND
• the samples are small AND
• the populations are normally distributed
Two-Sample t Test 257

The Two-Sample t-Statistic


To calculate the two-sample t statistic:
1   2
t n1 n2  2 
2 1 1
s pooled (  )
n1 n2
• We have to calculate s2pooled first.
• This pooled variance is the variance you would get if you combined
the two groups into one and calculated the variance.
• It is a weighted average of s12 and s22, the variances of the two
groups, weighted by degrees of freedom.
Two-Sample t Test 258

The Two-Sample t-Statistic


• The pooled variance is the variance you would get if you
combined the two groups into one and calculated the
variance.

2 (n1  1) s12  (n2  1) s22


s pooled 
n1  n2  2
• It is a weighted average of s12 and s22, the variances of the
two groups, weighted by degrees of freedom.
Two-Sample t Test 259

About Homoskedasticity
• Technically, to use this formula one must know (or be able
to prove statistically) that the two variances are “equal” –
this property is called homoskedasticity.
• Incidentally, this is sometimes spelled with a c, “homoscedasticity.”

• An F-test may be performed to test whether σ1 and σ2 and


are statistically equivalent, i.e., to test for
homoskedasticity.
• We will learn about this F-test in other courses.

• If we do not have homoskedasticity, i.e. the variances are


not proved to be statistically equivalent, then we have to
make adjustment to the formula.
Two-Sample t Test 260

Problem 1: Reading Scores


Suppose we want to compare the reading scores of men
and women on a standardized reading test. We take a
random sample of 31 people and obtain the results below.
Note that the women outperform the men by 4 points. Of
course, this might simply be sampling error. We would like
to test whether or not this difference is significant at the
=.05 level.
Men Women
 80   1 84   2
S 16  S1 20  S 2
n 16 n1 15 n2
Two-Sample t Test 261

Problem 1: Reading Scores (cont’d)


• H0: μ1= μ2
H1: μ1≠ μ2

• Note that
• σ1,σ2 are not given
• n1+ n2 = 31

• We will use a t statistic with n1+ n2 -2 = 29 degrees of


freedom
Two-Sample t Test 262

Problem 1: Reading Scores (cont’d)


• To calculate the t29 test statistic, first get the pooled variance:
• s2pooled = 15(256)  14(400) 9440
 325.5
29 29

• Note that, since it is a weighted average, s2pooled is between s1 (=256) and s2


(=400).
80  84 4 4
   .62
1 1 42.04 6.48
• t29 = 325.5(  )
16 15

• Conclusion: Do not reject H0. There is no statistically


difference between men and women on test scores.
Two-Sample t Test 263

Problem 2: Company Pay


We would like to determine whether the difference in daily
pay between two companies is statistically significant. α
= .01

Company 1 Company 2
Sample average X̅ 1= $210 X̅ 2 = $175
Standard deviation s1 = $25 s2 = $20
Sample size n1 = 10 n2 = 20
Two-Sample t Test 264

Problem 2: Company Pay (cont’d)


t 28

.005 .005

-2.7633 2.7633
Two-Sample t Test 265

Problem 2: Company Pay (cont’d)


t 28

.005 .005

-2.7633 2.7633

• Conclusion: Reject H0.


• Since the calculated value of 4.16 is in the rejection region (right
tail), we conclude that the pay rates at the two companies are
indeed different. Company A pays more than Company B.
Two-Sample t Test 266

Problem 3: Strength of Concrete Beams


• Two types of precast concrete beams are being
considered for sale. The only difference between the two
beams is in the type of material used. Strength is
measured in terms of pounds per square inch (psi) of
pressure. Is there a significant difference in beams
supplied by Supplier A and Supplier B? Or, is the
difference of 25 psi explainable as sampling error? Test
at α = 0.05
Supplier A Supplier B
Sample average X̅ 1= 5000 psi X̅ 2 = 4975 psi
Standard deviation s1 = 50 psi s2 = 60 psi
Sample size n1 = 12 batches n2 = 10 batches
Two-Sample t Test 267

Problem 3: Concrete Beams (cont’d)

Do not reject H0; There is no statistically significant


difference between the beams made by Supplier A and
Supplier B.
Two-Sample t-Tests
Two-Sample t Test 269

Spending on Wine
• A marketer wants to determine whether men and women
spend different amounts on wine. (It is well known that
men spend considerably more on beer.)
• The researcher randomly samples 34 people (17 women
and 17 men) and finds that the average amount spent on
wine (in a year) by women is $437.47. The average
amount spent by men is $552.94.
• Given the Excel printout (next slide), is the difference
statistically significant?
Two-Sample t Test 270

Spending on Wine (cont’d)


Output from MS Excel:

Note: Variable 1 represents women; Variable 2,men.


Two-Sample t Test 271
A comparison of men and women and how much they spend on wine per year.

Spending on Wine (cont’d)


W om en M en

$100.00 $107.00
• Digression – what was the $250.00 $240.00

input? $890.00
$765.00
$880.00
$770.00
$456.00 $409.00
$356.00 $500.00
$876.00 $800.00
• This is the data input into MS $740.00 $900.00
$231.00 $1,000.00
Excel as two columns of $222.00 $489.00
$555.00 $800.00
numbers showing how much $666.00 $890.00
$876.00 $770.00
money 17 men and 17 women $10.00 $509.00
$290.00 $100.00
spent on wine over the year. $98.00 $102.00
$56.00 $134.00
Two-Sample t Test 272

Spending on Wine (cont’d)

Given the printout, is the difference statistically significant?


Answer: If a two-tail test was done, the probability of getting the sample evidence (or
sample evidence showing an even larger difference) given that there is no difference in the
population means of job satisfaction scores for men and women is .30 (rounded
from .297399288). In another words, if men and women (in the population) spend the
same on wine, there is a 30% chance of getting the sample evidence (or sample evidence
indicating a larger difference between men and women) we obtained. Statisticians usually
test at an alpha of .05 so we do not have evidence to reject the null hypothesis.
Conclusion: There is no statistically significant difference between men and women on how
much they spend on wine consumption.
Two-Sample t Test 273

Spending on Wine (cont’d)


In the printout, the calculated t-statistic is
-1.059287941. Why is it negative?
Answer: The amount spent on wine by women is less than that
spent by men (although the difference is not statistically significant).
If you make men the first variable the t-value will be positive but the
results will be exactly the same (the t-distribution is symmetric).

What would the calculated t-value have to be for us to


reject it?
Answer: If a two-tail test is being done, the critical value of t is
2.036931619. To reject the null hypothesis, we would need a
calculated t-value of more than 2.036931619 or less than -
2.036931619.
Two-Sample t Test 274

Job Satisfaction
• Comparing men and women on job satisfaction
• 10 is the highest job satisfaction score; 0 the lowest
MEN WOMEN
7 1 1 4
8 7 10 3
6 2 3 5
5 4 4 6
6 6 1 4
5 7 1 2
6 8 2 5
9 9 3 1
8 7 5 4
Two-Sample t Test 275

Job Satisfaction (cont’d)


The output:

Variable 1 represents Men (n1 = 18)


Variable 2 represents Women (n2 = 18)
Two-Sample t Test 276

Job Satisfaction (cont’d)


• Question: It appears that men have more job satisfaction than
women at this firm. The company claims that a sample of 36 is
quite small given the fact that 5,000 people work at the company
and they are asserting that the difference is entirely due to
sampling error. Given the Excel printout above, what do you
think?

Answer: If a two-tail test was done, the probability of getting the
sample evidence given that there is no difference in the
population means of job satisfaction scores for men and women
is .0012. In another words, if men and women feel the same
about working in this firm, there is only a 12 out of 10,000
chance of getting the sample evidence we obtained (or worse -
one showing an even larger difference). Statisticians usually test
at an alpha of .05 so we are going to reject the null hypothesis.
Two-Sample t Test 277

Job Satisfaction (cont’d)


• The average job satisfaction score for men is 6.17 (rounded)
and 3.56 for women. This is a difference of about 2.61 in
satisfaction on the 0 to 10 scale.

• If men and women in the firm actually have the same job
satisfaction (i.e., the difference between the population means is
actually 0), the likelihood of getting a difference between two
sample means of men and women of 2.61 or greater is .0012.
This is why we will reject HO that the two population means are
the same.

• Conclusion: Reject HO. There is a statistically significant


difference between the average job satisfaction scores of men
and women at this firm.
Two Sample Z Test 278

Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes and homework assignments. Solve using both the
formulas (with your calculator) and with MS Excel.
UNDERSTANDING
HYPOTHESIS TESTING
Going Deeper…

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Understanding Hypothesis Testing 280

Understanding Hypothesis Testing


• In previous lectures, we were content to learn how to perform a
statistical test of hypothesis, following prescribed steps. This is more-
or-less a cookbook type of approach and will certainly get the job
done.

• In this lecture what we really want to do is understand the process of


hypothesis testing and the conclusion(s) we might reach thereby.

 First we will take a careful look at how we set up the null and
alternate hypotheses, and the difference between tests that
approach a claim to refute it and those that substantiate the claim.
Understanding Hypothesis Testing 281

Setting up H0 and H1

•Question: Is it easier to prove that something is false or that something


is true?
 Answer: It is much easier to prove something false than true.

o Suppose someone states that all geese are white. To refute this
statement, you only need to produce one black goose. To prove it is
true, you will need to check the color of every goose in the world.

 Our objective in hypothesis testing is to determine whether there is


strong evidence from the sample data against the null hypothesis.

 This is why it is very important to know how to set up the null and
alternate hypotheses.
Understanding Hypothesis Testing 282

Setting up H0 and H1
Sometimes, getting H0 and H1 right will depend on who is doing the
testing

•The testing might be done by the company itself to substantiate its


own claim. Substantiation of a claim may be necessary, e.g., in
order to demonstrate compliance with government standards, for
advertising claims or for claims of product purity.
•On the other hand, some testing might be done by a government
agency (acting on behalf of consumers) or even by by a competitor.
In this case, the testing is done to see whether the claim can be
refuted.

It is interesting to see how the direction of the hypotheses changes


depending on whether we are trying to substantiate or refute a claim.
Understanding Hypothesis Testing 283

Substantiating a claim
• When it comes to substantiating a claim, a company wants to set
up the hypotheses, H0 and H1, in such a way that there is strong
evidence that its claim is accurate. The cost of being mistaken is
very high. The company wants to demonstrate that its claim is true.
It does not want the sample evidence to be inconclusive (i.e.,
having no evidence to reject the claim). This is why the company’s
claim belongs in the alternative hypothesis, H 1.

• In addition to ensuring compliance with government standards


against making false or misleading claims, this approach is very
important in quality control. The cost of being wrong when it comes
to quality is very high. A company will lose its reputation and even
get sued. Imagine how angry consumers get when they find that a
product they just purchased does not work as expected.
Understanding Hypothesis Testing 284

Substantiating a claim […2]


• Example: Your company, Company A, produces cables for
a particular task that requires them to have a breaking
strength of at least 200 lbs. Since customers will not
purchase cables with a lower breaking strength, your firm
works hard to make sure that its cables meet customers’
needs. Failure can result in mass returns of unused cables
or, worse, litigation. Your firm uses the following statistical
test to substantiate its claim (to customers and, perhaps, to
government agencies):

Sample evidence:
n = 64
= 220 lbs
s = 48 lbs
Understanding Hypothesis Testing 285

Substantiating a claim […3]


H0: µ ≤ 200 lbs
H1: µ > 200 lbs .05

1.645

With α = 0.05, this test requires a critical value for Z of +1.645.

Computing the value of Z using the sample data above, we get:


220  200 20
Z  3.33
48 / 64 6

Conclusion: Reject H0. The company’s claim is substantiated.


Understanding Hypothesis Testing 286

Substantiating a claim […4]


• Note that to substantiate the claim, the test is more stringent than
what is typical. Instead of substantiating the claim that the true
population breaking strength is at least 200 lbs, as in the original
problem statement, we are actually trying to reject the hypothesis that
the breaking strength is not greater than 200 lbs. The claim being
substantiated by this test is that the cables have a breaking strength
of greater than 200 lbs. For this purpose, the company’s claim is in
H1. The company wants to reject the null hypothesis and thus show
that its cables have a mean breaking strength of at least 200 pounds.
This is also important if a government agency demands proof that the
company has substantiated its claim.
Understanding Hypothesis Testing 287

Refuting a claim
• Example. Same as above, but this time the research is done by
a competing firm, which wishes to refute Company A’s claim.

• Company B is trying to use the sample evidence to refute


Company A’s claim that its cables have a mean breaking
strength of at least 200 lbs. Thus the hypothesis test is set up
so that the claim is contained in the null hypothesis. If the null
hypothesis is rejected, then the alternative hypothesis is taken
to be true and Company A’s claim is refuted. For the competitor
to win its case, it will need to produce a sample mean breaking
strength that is significantly less than 200 pounds. In fact, any
sample evidence that results in a mean that is 200 pounds or
more automatically makes it impossible to reject H0. It is not
even necessary to do any statistical test.
Understanding Hypothesis Testing 288

Refuting a claim […2]


• Since the company is claiming that the population mean breaking
strength is at least 200 pounds, the null and alternative hypotheses
for this test would be:

H0: µ ≥ 200 lbs


H1: µ < 200 lbs

With α = 0.05, this test requires a critical value for Z of -1.645.

If we are using the same sample evidence as above, with an of 220


lbs, there is no reason to continue. There is no way H 0 will be rejected
unless the sample mean is shown to be significantly less than 200 lbs.
Clearly, the sample evidence supports the claim that µ ≥ 200 lbs.
Understanding Hypothesis Testing 289

A Deeper Understanding of Hypothesis Testing

 We note again that the objective in hypothesis testing is


to determine whether there is strong evidence from the
sample data against the null hypothesis.

• In the following slides, rather than simply following


prescribed steps (a good-enough approach), we attempt
to understand the process of hypothesis testing and the
conclusion(s) we might reach thereby.
Understanding Hypothesis Testing 290

Example 3: Testing a claim


• A company claims that its tablets have a life of at least 8 years. Before making a
large purchase, you would like to test this claim. You sample 100 tablets and
find that X̅ =7.6 years and s=1.2 years. Should the company’s claim be
rejected? Test at α = 0.05

Here’s the way we’ve been doing


hypothesis testing.
5%

• H0: µ ≧ 8 years
-1.645
H1: µ < 8 years

• Computed Z = 7.6 – 8.0 / (1.2/√100) = -3.33

• The computed Z value of -3.33 is in the region of rejection since it is < -1.645.
Thus, we Reject H0 at p < .05
Understanding Hypothesis Testing 291

Ex 3: Testing a claim […2]


• When a company makes a claim about a parameter – in
this case that the lifetime of its tablets (µ) is 8 or more
years – we start the testing by temporarily accepting the
claim.

• Note that this claim is really just a “straw man”: We have


set it up only because we wish to knock it down. We are
trying to reject this hypothesis.

• We try to do this using the sample evidence.


Understanding Hypothesis Testing 292

Ex 3: Testing a claim […3]

• We want to determine whether the sample evidence is


likely to have occurred given the company’s claim about
the parameter of the population from which the sample
was supposedly drawn.

8.0

• We draw the distribution (in this case Z) assuming that the


claim is true, and that the true average life really is 8
years.
Understanding Hypothesis Testing 293

Ex 3: Testing a claim […4]


In this case, the sample evidence resulted in a Z value of -3.33.
Now let’s use the cumulative Z table to see how much area is
between –∞ and -3.33.

• We find this area is .0004 or, in other words, only 4 chances


in 10,000.

• This represents the likelihood of getting the sample evidence


of 7.6 years or worse (something more extreme such as, say,
7 years).
Understanding Hypothesis Testing 294

Ex 3: Testing a claim […5]

• So what we have found is that the sample evidence of 7.6


years, which resulted in a computed Z of -3.33, is not a
likely outcome if the population mean is truly 8.0 years.

• Note that the probability (the p-value) for this computed Z


value is .0004. This is a lot less than .05.
Understanding Hypothesis Testing 295

Example 3b: When we can’t Reject


• Let us examine the same problem when the sample
evidence leads us to a different conclusion.

• The company’s claim is the same, so

H0: µ ≧ 8 years
H1: µ < 8 years
Understanding Hypothesis Testing 296

Ex 3b: When we can’t Reject […2]


• Now here is the sample evidence:

X̅ =7.9 years
s = 1.2 years
n = 100 tablets

Computed Z = 7.9 – 8.0 / (1.2/√100)


Z= -0.83

p > .05 Do Not Reject H0


Understanding Hypothesis Testing 297

Ex 3b: When we can’t Reject […3]


• In this case the sample evidence resulted in a computed Z
value of -0.83. We use the cumulative Z table to see how
much area is between –∞ and -0.83.

• We find this area is .2033. Or, in other words, the


likelihood of this outcome is a bit more than 20 chances in
100. This represents the likelihood of getting the sample
evidence of 7.9 or worse if we assume the claim is true.
Understanding Hypothesis Testing 298

Ex 3b: When we can’t Reject […4]

• Note that this is not very unlikely. In fact, our p-value of .20 is
more than the .05 we were willing to work with as the
(maximum) probability of rejecting H 0 when the claim really is
true.

• We cannot say that the sample evidence is unlikely to occur


given the company’s claim. In other words, we do not have
enough evidence to reject H0.

• We set up the “straw man”, H0, but could not knock it down.
The company’s claim could be true. The sample evidence
was not able to reject the company’s claim.
Understanding Hypothesis Testing 299

More about p-value vs. α


• We see that if you have the p-value, the probability of
getting the sample evidence from the distribution set forth
in H0, you do not need to first establish the critical value(s)
in order to conduct these hypothesis tests.

• There are many computer programs that give you the p-


value so that you can compare that probability with the
significance level α at which you are working.
Understanding Hypothesis Testing 300

Do Your Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes.
TWO-SAMPLE Z-TEST
Inferences About Means from Two Groups

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Two Sample Z Test 302

Two-Sample Hypothesis Test


• Researchers often want to compare two populations, with
regard to a particular parameter (say, µ). For example, a
researcher might wish to determine whether or not men
and women are different when it comes to overall GPA in
college.
Two Sample Z Test 303

Two-Sample Hypothesis Test


• We test the hypothesis that there is really no difference
between the two population means.
H0: μ1= μ2
H1: μ1≠ μ2

• Here H0 is implying that the two populations that the


samples were taken from are, in effect, a single
population since there is no difference between them.
• As we know, the null hypothesis, H0 is a “straw man.” We
set it up and then see whether or not we can knock it
down based on the sample evidence.
Two Sample Z Test 304

Two-Sample Hypothesis Test


• The null hypothesis that μ1= μ2 is equivalent to stating that
the difference between the two population means is 0. So
H0 could be stated as: (μ1-μ2) = 0.
• Note that H0 is always about population parameters, in
this case the difference between the two population
means.
• The random variable for this test is (X̅ 1-X̅ 2), and, as
always, takes its value from the sample data
Two Sample Z Test 305

Sampling Distribution of (X̅ 1-X̅ 2)


• If the samples are large, random, and independent, then
(X̅ 1-X̅ 2), which is a random variable, has approximately a
normal distribution, with

 
E 1   2 1   2

 12  22
and  1    
2
n1 n2
Two Sample Z Test 306

Two-Sample Z-test

• If the sample sizes are large enough, or if


population standard deviations are known, we
use a Two-Sample Z-test.
Two Sample Z Test 307

To Calculate Z

       
1 2 1 2

 12  22

n1 n2

But, of course, under H0, µ1= µ2, so

1   2

for known σ1 and σ2
 12  22

n1 n2
Two Sample Z Test 308

To Calculate Z (cont’d)
• If σ1 and σ2 are unknown, we can use

X1  X 2
Z=
s12 s22

n1 n2

as long as n1+n2 is large enough.


Two Sample Z Test 309

CIE for Difference of Means


• The confidence interval estimator (CIE) for a two-sample test is used
to estimate the difference between the two population means (μ1–μ2).
As always, sample data can be expected to have some sampling
error (also known as the margin of error). To construct a (1-)% CIE
for the difference, use this formula.
 12  22
 1 
  2 
n1

n2
• The Z value in the formula comes from the Z table. We will be doing
two-sided confidence intervals. Thus, for 95% CIE, use a Z of 1.96;
for 90% CIE, use a Z of 1.645; for 99% CIE, use Z of 2.575; and so
on.
Two Sample Z Test 310

CIE for Difference of Means(cont’d)


• If the CIE includes 0, this means that we have to be willing
to accept that there is a difference of 0 (fancy way of
saying no difference) between the two population means).
• There will always be a 0 in the CIE if we find that the confidence interval
goes from a negative number to a positive number or a positive number
to a negative number.
Two Sample Z Test 311

Problem 1: A New Drug


• Suppose a medical researcher wants to test a new drug
that is supposed to protect against the common cold.
• The researcher randomly assigns subjects to one of two
groups – patients in one group are given the drug; the
others (the control group) take a placebo.
• The measure of interest is the number of colds subjects
get in a year. (Typically, most people get about 4 to 6
colds a year.)
• In a “double blind” study, neither the subjects nor the
researchers know who is taking the drug and who is
taking the placebo.
Two Sample Z Test 312

Problem 1: A New Drug


• Group 1(Drug Group)
Sample size: n1=81
Sample mean: X̅ 1 = 4.4 colds per year
Sample standard deviation: s1 = 0.7 colds per year

• Group 2 (Placebo Group)


Sample size: n2=64
Sample mean: X̅ 2 = 4.8 colds per year
Sample standard deviation: s2 = 0.8 colds per year

• The difference between the two sample means is -0.4 colds (4.4 – 4.8).
This difference could be a statistically significant difference or could just
be chance. A chance difference means that another researcher
comparing two other groups might find that the placebo group has fewer
colds, on average. This is why we need a statistical test.
Two Sample Z Test 313

Problem 1: A New Drug


• (a) Test at =.05
 0 : 1  2
1 : 1   2

4.4  4.8  .4  .4
Z    3.15
.7 2  .82 .01605 .127
81 64

• Therefore, REJECT H0 p < .05. The two groups are indeed


statistically different. The Drug group has fewer colds per year.
Two Sample Z Test 314

Problem 1: A New Drug


• (b) Construct a 95% Confidence Interval Estimator (CIE)
for the difference between the two population means.
• The CIE in a two-sample situation is an estimator for the difference.
We observed a difference in the data of -.4 colds. This was the
difference between the two sample means. We want to construct a
95% CIE for the difference between the two population means.
• If we see a 0 in the confidence interval, this means that we have to
be willing to accept a difference of 0, i.e., that the two means are
the same.
Two Sample Z Test 315

Problem 1: A New Drug


95% CIE for the difference between the two population
means. The Z-value for a 95% confidence level is 1.96.

-.40 ± 1.96(.127)
-.40 ± .25 Thus, the margin of sampling error is .25 colds

The 95% CIE is: -.65 ←—→ -.15

Since 0 is not in the above interval, we see that there is a


difference between the two groups (The difference is not 0).
We are 95% sure that the true population mean difference
between the drug group and the placebo (control) group is
between .15 and .65 fewer colds.
Two Sample Z Test 316

Problem 2: Science Test Scores


• Scores on a standardized science test. Is there a
significant difference between men and women?
• (a) Test at =.05.
Scores on a standardized science test. Is there a difference b
women? Test at α = .05.

Men Women

X 80.0 76.5
s 10 16
n 100 64
Two Sample Z Test 317

Problem 2: Science Test Scores


 0 : 1  2
1 : 1  2

80.0  76.5 3 . 5 3 .5
Z   1.56
(10) 2 16  5 2.24
2

100 64

Do not reject H0 (p > .05). There is no statistically significant


difference between men and women on test scores in science.
Two Sample Z Test 318

Problem 2: Science Test Scores


(b) Construct a 95% Confidence Interval Estimate for the
difference between the two population means. The Z-
value we use for 95% confidence is 1.96.

3.5± 1.96(2.24)
3.5± 4.4 The margin of sampling error is 4.4

• The 95% CIE is: -.9 ←—→ +7.9

Since 0 is in the above interval, we conclude that there is


no statistically significant difference between men and
women on the science test.
Two Sample Z Test 319

Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes.
UNDERSTANDING HOW
HYPOTHESIS TESTING
WORKS
Two-Sample Z Tests

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
321

Understanding Hypothesis Testing


• In previous lectures, we were content to learn how to
perform a statistical test of hypothesis, following
prescribed steps. This is more-or-less a cookbook type of
approach and will certainly get the job done.

• In this lecture what we really want to do us understand the


process of hypothesis testing and the conclusion(s) we
might reach thereby.

Understanding Two-sample Hypothesis


Testing
322

Example: Two-sample test


• A researcher compares men and women with regard
to leadership aptitude. Test scores range from 0
(absolutely no aptitude) to 100. Women Men
83.7 74.3

• Results are as follows: s 16.0 18.0
n 64 54

 Is there a difference between men and women? Test at α =


0.05.

Understanding Two-sample Hypothesis


Testing
323

Example [continued]
Here’s what we’ve been doing.

• H0: µ1 = µ2
H1: µ1 ≠ µ2
83.7  74.3 9 .4
Z  2.97
162  182 10
• Computed
64 54

• The computed Z value of 2.97 is in the region of rejection since it is


> 1.96. Thus, we Reject H0 at p < .05. Men and women are
different with regard to leadership Understanding
aptitude.Two-sample Hypothesis
Testing
324

Understanding Two-sample Hypothesis Testing

• The null hypothesis is a “straw man”: We have set it up


only because we wish to knock it down. We are trying to
reject H0.

• The “claim” in this case is that there is NO difference


between men and women when it comes to leadership
aptitude. Just as with a one-sample test, we start by
assuming that the claim is correct.

• We try to Reject H0, the “straw man,” using the sample


evidence.
Understanding Two-sample Hypothesis
Testing
325

Understanding Hypothesis Testing


[continued]
• We want to determine whether the sample evidence of a
9.4 difference in leadership aptitude scores is likely to
have occurred given the claim that men and women have
the same aptitude.

• There are two possibilities:


• 1) Men and women in the population are the same when it comes
to leadership aptitude. The difference of 9.4 that we observed in
the samples was just sampling error, i.e., chance.
• 2) The 9.4 difference between the two samples is too great to be
explained by chance. Rather, the populations of men and women
are different with regard to leadership aptitude.

Understanding Two-sample Hypothesis


Testing
326

Understanding Hypothesis Testing


[continued]
• In this case, the sample evidence resulted in a Z value of
2.97. Since this is a two-tail test, we have to double the
probability to cover both sides of the distribution.

• Using the cumulative Z distribution table, we look up the


area from –∞ to -2.97 and then the area from 2.97 to + ∞.
For the right side of the distribution we compute 1.00 – p.

Understanding Two-sample Hypothesis


Testing
327

Understanding Hypothesis Testing


[continued]
.0015 .0015

-2.97 2.97
• This represents the likelihood of getting the sample
evidence - a difference of 9.4 or greater.

a total of .003 or, in


• We find this combined area is
other words, only 3 chances in 1,000.

Understanding Two-sample Hypothesis


Testing
328

Understanding Hypothesis Testing


[continued]

• So what we have found is that the sample evidence of a


difference of 9.4 in aptitude scores, which resulted in a
computed Z of 2.97, is not a likely outcome if the two
population means are the same.

• Note that the probability (the p-value) for this computed Z


value is .003 (.0015+.0015). This is a lot less than .05.

Understanding Two-sample Hypothesis


Testing
329

p-value vs. α

• We see that if you have the p-value, the probability of


getting the sample evidence from the distribution set forth
in H0, you do not need to first establish the critical value(s)
in order to conduct these hypothesis tests.

• There are many computer programs that give you the p-


value so that you can compare that probability with the
significance level α at which you are working.

Understanding Two-sample Hypothesis


Testing
330

Do Your Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes.

Understanding Two-sample Hypothesis


Testing
INFERENCES ABOUT P
Inferences About the Population Proportion

Each slide has its own narration in an audio file.


For the explanation of any slide click on the audio icon to start it.

Professor Friedman's Statistics Course by H & L Friedman is licensed under a


Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Z Test for Proportions 332

Sampling Distribution of a Proportion


When n*P and n*(1-P) are both ≥ 5, we can use the Z
distribution as an approximation to the sampling
distribution of the Proportion.
s    sample proportion
Z s
(1  )  population proportion
n

(1-α)% Confidence Interval Estimator


 (1   )
of P:
s Z  s s

Q: Why does the formula for a CIE use Ps and the formula for Zcalc
use P?
Z Test for Proportions 333

Sampling Distribution of a Proportion [Aside]


This formula comes from the fact that the normal
distribution can be used to approximate the binomial
distribution if np and n(1-p) ≥ 5.
 np  for a 
  
  np (1  p )  binomial 
 x 
  p 
  x    x  np  n P P 
 s
  np (1  p ) np (1  p ) p (1  p ) 
 
 n n 
As n gets large and p→.5, then the binomial→normal.
Use when np and n(1-p) 5
Z Test for Proportions 334

EXAMPLE: Did the Politician Lie?


A politician claims that 70% of the people in her district are
Democrats. A researcher takes a sample of 100 people
and finds that only 50 are Democrats. Is the politician a
liar?
(a) Test at α=.05 (2-tail test)
 0 :  .70
1 :  .70
2.5% 2.5%

.50  .70  .20 -1.96 1.96


   4.36
(.70)(.30) .0458
100
Reject H0 p<.05
Z Test for Proportions 335

EXAMPLE: Did the Politician Lie?


(Continued)
(b) Construct a 2-sided 95% CIE of P
 (.50)(.50) 
.50 1.96 

 100 

.50  .10

.40 ============ .60

We are 95% confident that the interval .40 to .60 contains


the true proportion P of Democrats in the district.
Z Test for Proportions 336

Example: Legalizing Drugs


A politician claims that exactly 90% of the American public
favors the legalization of drugs. A survey of 100 people
shows that only 79 are in favor of drug legalization.
(a) Test at α=.05

 0 :  .90
2.5% 2.5%
1 :  .90
-1.96 1.96

.79  .90  .11 Z distribution


   3.67 Reject H0 p <.05
(.90)(.10) .03
100
Z Test for Proportions 337

Example: Legalizing Drugs


(Continued)
(b) Construct a 2-sided 95% CIE of P
 (.79)(.21) 
.79 1.96   .79 .08

 100 

.71 ========= .87


Z Test for Proportions 338

Example: Defective Widgets


A Company claims that no more than 8% of its widgets are
defective.
Sample: n = 100; 10 defectives.
Ps = 10/100=.10

a) Test at α=.05

b) Construct a 2-sided 95% Confidence Interval Estimator


of P.
Z Test for Proportions 339

Example: Defective Widgets


A Company claims that no more than 8% of its widgets are defective.
Sample: n = 100; 10 defectives.
Ps = 10/100=.10
a) Test at α=.05
 0 :  .08
1 :   .08

.10  .08 .020 .05


  .74
(.08)(.92) .027 1.645

100

Do not reject H0 p > .05


Z Test for Proportions 340

Example: Defective Widgets


(Continued)
(b) Construct a 2-sided 95% CIE of P
 (.10)(. 90) 
.10 1.96 

 100 

.10 ±.06

.04 ======== .16


Z Test for Proportions 341

Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes and homework assignments.
TWO-SAMPLE Z TEST
FOR P
Inferences About Proportions of Two Groups
Two Sample Z Test for Proportions 343

Testing the Difference Between Two Proportions

• Examples:
• You wish to compare the defective rates (a proportion) of two
companies that supply the computer chips needed for your tablet
computer.
• You want to compare the death rates for heart transplants at two
hospitals.
• You want to compare the graduation rates of two high schools in
the same area.
Two Sample Z Test for Proportions 344

Testing the Difference Between Two Proportions


• We are trying to determine whether the observed
difference between two proportions is statistically
significant or just sampling error.
• We use a two-sample Z test to compare the proportions
(P) of two groups.
• To use this test n1 and n2 should be large. Specifically,
• n1p1 ≥ 5
• n1(1-p1) ≥ 5
• n2p2 ≥ 5
• n2(1-p2) ≥ 5
Two Sample Z Test for Proportions 345

The Two-Sample Z Test for P

• This Z test uses the normal approximation:

Ps1  Ps 2
Z
1 1
p 1  p   
 n1 n2 
Two Sample Z Test for Proportions 346

The Two-Sample Z Test for P


X1
Ps1= n1= the sample proportion in population 1
X2
Ps2= n2= the sample proportion in population 2

Where
• X1 represents the # of “successes” in sample 1
• X2 represents the # of “successes” in sample 2
• A “success” is the outcome you are interested in, e.g., a defective part.

• n1 = sample size for group 1

 2 = sample
• n  for group 2
X  Xsize
 p  1 2

 n1  n2 

is the pooled estimate of the population proportion:


Two Sample Z Test for Proportions 347

Problem 1: Death Rates at Two Hospitals


• We would like to compare the death rates from liver
transplants at two hospitals in similar areas.
Hospital A: 77/100 died within 6 months
Hospital B: 120/200 died within 6 months

• Are the death rates for the two hospitals statistically


different? Test at α = .05.
Two Sample Z Test for Proportions 348

Problem 1: Death Rates at Two Hospitals (cont’d)

H0: P1=P2
H1: P1≠P2

Ps1 = 77/100 = .77; Ps2 = 120/200 = .60

77  120 197
P  .657
300 300
.77  .60 .17
 2.93
 1 1  .058
(.657)(.343)  
 100 200 
Z=

Reject H0
Two Sample Z Test for Proportions 349

Problem 2: Two Unemployment Rates

• Unemployment rates in two counties. Are they different?


• Two random samples taken:
• County A: 100/400 unemployed
• County B: 44/200 unemployed
• Test at α=.05

Pooled P= (100+44)/(400+200) =
Two Sample Z Test for Proportions 350

Problem 2: Two Unemployment Rates (cont’d)

H0: P1=P2
H1: P1≠P2

Ps1 = 100/400 = .25; Ps2 = 44/200 = .22


144
p .24
600
.25  .22 .03
 .8
Z= .24.76 1

1 

.037
 400 200 

• Conclusion: Do not reject H0


Two Sample Z Test for Proportions 351

Problem 3: The Donner Party


• This is an actual study published by Donald Grayson. Dr.
Grayson was interested in knowing whether the survival
rate under conditions of starvation is different for men and
women.
• The Donner party were traveling from Illinois to California
and were caught in a huge blizzard. They were stranded
for 6 months and had no food; they resorted to
cannibalism to survive.
• The death rate for women was 10/34 and for men 30/53.
• We will conduct a test at the =.05 significance level to
determine whether the death rates for men and women
are statistically different from each other.
Two Sample Z Test for Proportions 352

Problem 3: The Donner Party


H0: P1=P2
H1: P1≠P2

Death rates:
Ps1 = 10/34 = .294 (women)
Ps2 = 30/53 = .566 (men)

= = .46

Z = = = -2.48

Conclusion: Reject H0. -2.48 is in the rejection region. The two population
proportions are not the same.
• Women have a statistically higher survival rate under adverse conditions than do men.
Possible reason given is that women have an additional layer of fat tissue which provides a
fetus (and, of course, women) with sufficient nourishment in case of a famine.
Two Sample Z Test 353

Homework
• Practice, practice, practice.
• Do lots and lots of problems. You can find these in the online
lecture notes and homework assignments.
INTRO TO
REGRESSION
Simple Linear Regression
Simple Regression 355

Regression
• Using regression analysis, we can derive an equation by
which the dependent variable (Y) is expressed (and
estimated) in terms of its relationship with the
independent variable (X).
• In simple regression, there is only one independent variable (X) and one dependent
variable (Y). The dependent variable is the outcome we are trying to predict.

• In multiple regression, there are several independent variables (X1, X2, … ), and still
only one dependent variable, Y. We are trying to use the X variables to predict the Y
variable.

• We will be studying Simple Linear Regression, in which


we investigate whether X and Y have a linear relationship.
Simple Regression 356

A Simple Regression Example


• A researcher wishes to determine the relationship
between X and Y. It is important to know which is the
dependent variable, the one you are trying to predict.
Presumably, hours studied affects grades on a quiz, not
the other way around. The data:

X (hours) Y (Grade on quiz)


1 40
2 50
3 60
4 70
5 80
Simple Regression 357

A Simple Regression Example


• If X and Y have a linear relationship, and you would like to plot the
line, what would it look like?
• Take a moment and try it. Note that the data points fall on a straight line.
• Now, add another point: If X=6, then Y= ?
• We see that as X changes by 1, Y changes by 10. That is, the slope,
called b1 is equal to 10.
• b1 = ∆Y / ∆X = 10.

• The Y-intercept, or the value of Y when X=0, called b 0 is equal to 30.


• When X = 0, then Y= 30. Someone who studies 0 hours for the quiz, should expect
a grade of 30.
• Then, every additional hour studied adds 10 points to the score on the quiz.
• The equation for the plot of this data is: Ŷ = 30 + 10X
Simple Regression 358

A Simple Regression Example


• If you plot the above X and Y, you will find that all the
points are on a straight line. This, of course, means that
we have a perfect relationship between X and Y and all
the points are on the line.
• From studying correlation you know that for this data r=+1
and R2=100%. 90

80

70

60

50

40

30

20

10

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Simple Regression 359

Simple Linear Regression


• In general, the simple linear regression equation is:
Ŷi = b0 + b1Xi

• Why do we need regression in addition to correlation?


• To predict a Y for a new value of X
• To answer questions regarding the slope. For example,
• With additional shelf space (X), what effect will there be on sales (Y)?
• If we raise prices by a particular amount or percentage, will it cause sales to
drop? (This measures elasticity.)
• It makes the scatter plot a better display (graph) of the data if we can plot a
line through it. It presents much more information on the diagram.

• In correlation, on the other hand, we just want to know if two


variables are related. This is used a lot in social science
research.
Simple Regression 360

Simple Linear Regression


• The regression equation Ŷi = b0 + b1Xi is a sample estimator of the
true population regression equation, which we could build were we
to take a census:

• Yi = β 0 + β 1 Xi + εi
where,
β0 = true Y intercept for the population
β1 = true slope for the population
εi = random error in Y for observation i

• In regression analysis, we hypothesize that there is a true


regression line for the population. The b0 and b1 coefficients are
estimates of the true population coefficients, β0 and β1.
Simple Regression 361

Simple Linear Regression

Ŷi is a point on the regression line


Yi is an individual data value

• The deviations of the individual observations (the points) from the


regression line, (Yi - Ŷi), the residuals, are denoted by ei where ei = (Yi
- Ŷi).
• Some deviations are positive (for the points above the line); some are negative (for
the points below the line). If a point is on the line, its deviation = 0. Note that the
Σei = 0.
Simple Regression 362

Simple Linear Regression


• Mathematically, the regression line minimizes the sum of
the squared errors (SSE):
Σei2 = Σ(Yi - Ŷi)2 = Σ[Yi – (b0 + b1Xi)]2

• Regression is called a “least squares” method.


• Taking partial derivatives, we get the “normal equations” that are
used to solve for b0 and b1.
• In regression, the levels of X are considered to be fixed. Y is the
random variable.
Simple Regression 363

Simple Linear Regression


• This is why the regression line is called the least squares
line. It is the line that minimizes the sum of squared
residuals (SSE).
• In the example below, we see that most of the points are
either above the line or below the line. Only about 5 points
are
actually on the line or
touching it.
Simple Regression 364

Steps in Regression
1- For Xi (independent variable) and Yi (dependent variable),
Calculate:
ΣYi
ΣXi
ΣXiYi
ΣXi2
ΣYi2

2- Calculate the correlation coefficient, r:


nX i Yi  (X i )(Yi )
r=
nX i
2
 X i 
2
 nY
i
2
 Yi 
2

-1 ≤ r ≤ 1
[This can be tested for significance. H0: ρ=0. If the correlation is not significant,
then X and Y are not related. You really should not be doing this regression!]
Simple Regression 365

Steps in Regression
3- Calculate the coefficient of determination: r2 = (r)2
0 ≤ r2 ≤ 1
This is the proportion of the variation in the dependent variable (Yi) explained by
the independent variable (Xi)

4- Calculate the regression coefficient b1 (the slope):


nX i Yi  (X i )( Yi )
b1 =
nX i2  X i 
2

Note that you have already calculated the numerator and the denominator for parts
of r. Other than a single division operation, no new calculations are required.
BTW, r and b1 are related. If a correlation is negative, the slope term must be
negative; a positive slope means a positive correlation.

5- Calculate the regression coefficient b0 (the Y-intercept, or constant):


b0 = Y  b1 X

The Y-intercept (b0) is the predicted value of Y when X = 0.


Simple Regression 366

Steps in Regression
6- The regression equation (a straight line) is:
Yˆi = b0 + b1Xi

7- [OPTIONAL] Then we can test the regression for statistical significance.

There are 3 ways to do this in simple regression:


(a) t-test for correlation:
H0: ρ=0
H1: ρ≠0

r n 2
tn-2 =
1 r2

(b) t-test for slope term


H0: β1=0
H1: β1≠0
Simple Regression 367

Steps in Regression
(c) F-test – we can do it in MS Excel
MSExplaine d MSRegressi on
F= F=
MSUn exp lained MSResidual

where numerator is Mean Square (variation) Explained by the regression


equation, and the denominator is Mean Square (variation) unexplained by the
regression.

• This is part of the output you see when you do regression


in MS Excel.
Simple Regression 368

Example: Water and Tomato Yield


• n = 5 pairs of X,Y observations
• Independent variable (X) is amount of water (in gallons)
used on crop
• Dependent variable (Y) is yield (bushels of tomatoes).

Yi Xi X iY i X i2 Y i2
2 1 2 1 4
5 2 10 4 25
8 3 24 9 64
10 4 40 16 100
15 5 75 25 225
40 15 151 55 418
Simple Regression 369

Example: Water and Tomato Yield


Step 1-
ΣYi = 40
ΣXi =15
ΣXiYi =151
ΣXi2 = 55
ΣYi2 = 418

(5)(151)  (15)( 40) 155


Step 2- r = = = .9903
(5)(55)  (15) (5)(418)  (40) 
2 2
50490
Simple Regression 370

Example: Water and Tomato Yield


Step 3- r2 = (.9903)2 = 98.06%

155
Step 4- b1 = = 3.1 The slope is positive. There is a positive relationship
50
between water and crop yield.

40 15
Step 5- b0 =   - 3.1   = -1.3
 5   5

Step 6- Thus, Yˆi = -1.3 + 3.1Xi


Simple Regression 371

Example: Water and Tomato Yield

Yˆi = -1.3 + 3.1 Xi


# bushels Does no water Every gallon # gallons of water
of result in a adds
tomatoes negative yield? 3.1 bushels
of tomatoes
Simple Regression 372

Example: Water Yˆand Tomato Yield


Yi Xi
i
ei ei 2
2 1 1.8 .2 .04
5 2 4.9 .1 .01
8 3 8.0 0 0
10 4 11.1 -1.1 1.21
15 5 14.2 .8 .64
Σei = 0 Σei2 = 1.90

Σei2 = 1.90. This is a minimum, since regression minimizes Σei2


(SSE)

Now we can answer a question like: How many bushels of


tomatoes can we expect if we use 3.5 gallons of water? -1.3 + 3.1
(3.5) = 9.55 bushels.

Notice the danger of predicting outside the range of X. The more


water, the greater the yield? No. Too much water can ruin the crop.
Simple Regression 373

Same Example Using MS Excel


r= ? r2 = ? Yˆi =?

Is this regression significant?


H0: No regression
H1: Yes regression
F-statistic = 151.7368
P(sample evidence | H0 is true) = .00115
Simple Regression 374

Sources of Variation in Regression


Measures of Variation in Regression
If we did not have a significant regression (i.e., X does not
ˆ Y
predict Y) we would use Y as our Yregression
i equation.

Y  Y Yˆ  Y  Y  Yˆ 
i i i i

 ˆ
2
2
ˆ
 Yi  Y  Yi  Y   Yi  Yi
           
 
2

Total Explained Un exp lained
Variation Variation Variation
inY

X
Simple Regression 375

Sources of Variation in Regression



 Yi  Y  Y
2 2

Y 2
i
Total Variation: n


 Yˆi  Y  b Y  b X Y 
2 Yi 2
0 i 1 i i
Explained Variation: n


 Yi  Yˆi  Y
2
i
2
 b0Yi  b1X iY
Unexplained Variation:

From our previous problem,


Total variation in Y = 418 – (40)2/5 = 98
Explained variation (explained by X) = -1.3(40) + 3.1(151) – (40)2/5 = 96.10
Unexplained variation = 418 - -1.3(40) - 3.1(151) = 1.90
ExplainedVariation 96.10
 r2  .98
TotalVaria tion 98
The coefficient of determination, r , is the proportion of Y explained by X.
2

In other words, 98% of the total variation in crop yield is explained by the linear
relationship of yield with amount of water used on the crop.
Simple Regression 376

Example: Hours Studied & Grades


Let’s examine the relationship between hours studied and
grade on a quiz. n=7 pairs of data. Highest grade on quiz
is a 15.
X ≡ hours studied; Y ≡ grade on quiz.

Xi Yi XiYi Xi2 Yi2


1 5 5 1 25
2 8 16 4 64
3 9 27 9 81
4 10 40 16 100
5 11 55 25 121
6 12 72 36 144
7 14 98 49 196
ΣX= 28 ΣY= 69 ΣXY= 313 ΣX2= 140 ΣY2= 731
Simple Regression 377

Example: Hours Studied & Grades


∑X = 28
∑Y = 69
∑XY = 313
∑X2 = 140
∑Y2 = 731

Calculate the correlation coefficient, r:


7(313)  (28)(69) 259 259
r   0.98
7(140)  (28) 7(731)  (69) 
2 2
(196)(356) 264.2

[slope will be positive]


Simple Regression 378

Example: Hours Studied & Grades


Calculate the coefficient of determination, R2:
R2 = (0.98)2 = .9604

Calculate the regression coefficient b1 (the slope):

7(313)  (28)(69) 259


b1  2
 1.3
Calculate 7(140
the )  (28)
regression 196
coefficient b0 (the Y-intercept, or
constant):

7(313)  (28)(69) 259


b1  2
 1.3
7(140)  (28) 196
Simple Regression 379

Example: Hours Studied & Grades


• The regression equation:
Yˆi 4.58  1.32 X i

• Q: Explain the meaning of the regression coefficients.

• Q: If someone studies 3.5 hours, what would we predict


his/her quiz score to be?

• Try these yourself before advancing the slide.


Simple Regression 380

Example: Hours Studied & Grades

Answers to … ?

• Q: Explain the meaning of the regression coefficients.

• Q: If someone studies 3.5 hours, what would we predict


his/her quiz score to be?

4.58 + 1.32 (3.5) = 9.2


Simple Regression 381

Hours Studied & Grades


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.980498039
R Square 0.961376404
Adjusted R Square 0.953651685
Standard Error 0.626783171
Observations 7

ANOVA
df SS MS F Significance F
Regression 1 48.89285714 48.89285714 124.4545455 0.000100948
Residual 5 1.964285714 0.392857143
Total 6 50.85714286

Coefficients Standard Error t Stat P-value Lower 95%


Intercept 4.571428571 0.529728463 8.62975824 0.000344965 3.209718206
X Variable 1 1.321428571 0.118450885 11.15591975 0.000100948 1.016940877
Simple Regression 382

More About Output


Before using MS Excel, you should know the following:
df is degrees of freedom
SS is sum of squares
MS is mean square (SS divided by its degrees of freedom)
SST = SSR + SSE
Sum of Squares Total (SST) = Sum of Squares Regression (SSR)
+ Sum of Squares Error (SSE)
Total variation in Y = Explained Variation (Explained by the X-
variable) + Unexplained Variation
SSE is the sum of the squared residuals.
Note that some textbooks use the term Residuals and others use Error. They are the same thing and
deal with the unexplained variation, i.e., the deviations. This is the number that is minimized by the
least squares (regression) line.
SSR/SST is the proportion of the variation in the Y-variable explained by
the X-variable. ANOVA
This is the R-Square, r2, the coefficient of determination.
df SS MS F Significance F
Regression 1 SSR MSR MSR/MSE
Residual (Error) n-2 SSE MSE
Total n-1 SST
Simple Regression 383

More About Outputs


The F-ratio is the (SS Regression / degrees of freedom) = MS Regression
(SS Residual / degrees of freedom) MS Residual

In simple regression, the degrees of freedom of the SS Regression is 1 (the


number of independent variables). The number of degrees of freedom for the SS
Residual is (n – 2). Please note that SS Residual is the SSE.

If X is not related to Y, you should get an F-ratio of around 1. In fact, if the


explained (regression) variation is 0, then the F-ratio is 0. F-ratios between 0 and
1 will not be statistically significant.

On the other hand, if all the points are on a line, then the unexplained variation
(residual variation) is 0. This results in an F-ratio of infinity.
An F-value of, say, 30 means that the explained variation is 30 times greater than
the unexplained variation. This is not likely to be chance and the F-value will be
significant.
Simple Regression 384

Example: Education & Income


A researcher is interested in determining whether there is a
relationship between years of education and income.

Education (X) 9 10 11 11 12 14 14 16 17 19 20 20
Income (‘000s) (Y) 20 22 24 23 30 35 30 29 50 45 43 70

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.860811139
R Square 0.740995817
Adjusted R Square 0.715095399
Standard Error 7.816452413
Observations 12

ANOVA
df SS MS F Significance F

Regression 1 1747.947383 1747.947383 28.60941509 0.000324168


Residual 10 610.9692833 61.09692833
Total 11 2358.916667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -11.02047782 8.909954058 -1.236872575 0.244393811 -30.87309606 8.832140427

X Variable 1 3.197952218 0.597884757 5.348776972 0.000324168 1.865781732 4.530122704


Simple Regression 385

Education & Income


This regression is very significant; the F-value is 28.61. If
the X-variable explains very little of the Y-variable, you
should get an F-value that is 1 or less. In this case, the
explained variation (due to regression = explained by the
X-variable) is 28.61 times greater than the unexplained
(residual) variation. The probability of getting the sample
evidence or even a stronger relationship if the X and Y are
unrelated (Ho is that X does not predict Y) is .000324168.
In other words, it is almost impossible to get this kind of
data as a result of chance.

The regression equation is:


Income = -11.02 + 3.20 (years of education).
Simple Regression 386

Education & Income


In theory, an individual with 0 years of education would make a negative income
of $11,020 (i.e., public assistance). Every year of education will increase income
by $3,200.

The correlation coefficient is .86 which is quite strong.


The coefficient of determination, r2, is 74%. This indicates that the unexplained
variation is 26%.
One way to calculate r2 is to take the ratio of the sum of squares regression/
sum of squares total.
SSREG/SST = 1747.947383/ 2358.916667 = .741

The Mean Square Error (or using Excel terminology, MS Residual) is 61.0969.
The square root of this number 7.816 45 is the standard error of estimate and is
used for confidence intervals.

The mean square (MS) is the sum of squares (SS) divided by its degrees of
freedom.
Simple Regression 387

Example: Education & Income


• Another way to test the regression for significance is to test the b 1
term (slope term which shows the effect of X on Y). This is done via a
t-test. The t-value is 5.348776972 and this is very, very significant.
The probability of getting a b1 of this magnitude if Ho is true (the null
hypothesis for this test is that B1 = 0, i.e., the X variable has no effect
on Y), or one indicating an even stronger relationship, is
0.000324168. Note that this is the same sig. level we got before for
the F-test. Indeed, the two tests give exactly the same results.
Testing the b1 term in simple regression is equivalent to testing the
entire regression. After all, there is only one X variable in simple
regression. In multiple regression we will see tests for the individual
bi terms and an F-test for the overall regression.
Simple Regression 388

Example: Education & Income


Prediction: According to the regression equation, how
much income would you predict for an individual with 18
years of education?

Income = -11.02 + 3.20 (18)

Answer = 46.58 in thousands which is $46,580.


Please note that there is sampling error so the answer has a margin of
error. This is beyond the scope of this course so we will not learn it.
CORRELATION
Correlation 390

Linear Correlation
• The topic of this lecture involves measuring the strength of the linear
relationship between two random variables (each with at least an interval
scale level of measurement).

• Researchers often wish to determine whether or not two


variables are related. For example, a researcher might be
interested in knowing whether or not there is a relationship
between how long people live (longevity) and the number of
calories consumed per day. Or, between hours spent on the
Internet and high school average; or hours spent studying and
grades on a statistics final. These are situations where
correlation might be appropriate.

• In this course we will be looking at linear correlation.


Correlation 391

Linear Correlation
• We will use a simple formula to compute r, the correlation coefficient,
from sample data. This correlation coefficient, r, ranges from -1 to +1.

• An r of +1 indicates a perfect positive linear relationship.

• An r of -1 indicates a perfect negative linear relationship.

• An r of 0 indicates absolutely no linear relationship.

• r, the sample correlation coefficient is an estimate of the population


correlation coefficient, ρ (rho). We can compute ρ only if we take a
census of the entire population.
Correlation 392

A Positive Relationship
• A correlation of coefficient, r, of +1 indicates a perfect positive linear
relationship between the two variables. In fact, if we draw a scatter plot placing
all the paired sample data on a graph, all the points would lie on a straight line.
Of course, in real life, one almost never encounters perfect relationships
between variables. For instance, it is certainly true that there is a very strong
positive relationship between hours studied and grades. However, there are
other variables that affect grades as well. Two students can each spend 20
hours studying for an exam and one will get a 100 on the exam and the other
will get an 80. This indicates that there is also random variation and/or other
variables that explain performance on a test (e.g., IQ, previous knowledge, test
taking ability, etc.).
Correlation 393

A Negative Relationship
• A correlation of -1 indicates a perfect negative
linear relationship (i.e., an inverse relationship).
In fact, in a scatter plot, all the points would lie on
a line with a downward slope.
Correlation 394

No Linear Correlation (r = 0)
• A correlation of 0 indicates absolutely no relationship between X and Y. In
real life, correlations of 0 are very rare. You might, rather, get a
correlation of .10 and it will not be significant, i.e., it is not statistically
different from 0. (There are ways to test correlations for significance.)

no relationship at all no linear relationship


Correlation 395

Three Extreme Situations: r = +1, r = -1, and r = 0


Correlation 396

Correlation and Causality


• Correlation does NOT imply causality. Here are four possible explanations for a
significant correlation:
• X causes Y
• Y causes X
• Z causes both X and Y
• Spurious correlation (a fluke)

• Examples:
• Poverty and crime are correlated. Which is the cause?
• 3 % of older singles suffer from chronic depression; does being single cause depression?
Perhaps, being depressed results in one being single. People do not want to marry unhappy
people.
• Cities with more cops also have more murders. Does ‘more cops’ cause ‘more murders’? If
so, get rid of the cops!
• There is a strong inverse correlation between the amount of clothing people wear and the
weather; people wear more clothing when the temperature is low and less clothing when it is
high. Therefore, a good way to make the temperature go up during a winter cold spell is for
everyone to wear very little clothing and go outside.
• There is a strong correlation between the number of umbrellas people are carrying and the
amount of rain. Thus, the way to make it rain is for all of us to go outside carrying umbrellas!
Correlation 397

Coefficient of Determination, R2
• The coefficient of determination, R2 (in Excel, it is called R-squared) is also an
important measure. It ranges from 0 to 1.0 (or, 0% to 100%) and measures
the proportion of the variation in Y explained by X. R2 is actually equal to (r)2,
or in other words, the square of the correlation coefficient.

• When you do correlation, you generally do not worry about which is the X and
which is the Y variable since you have no interest in predicting Y from X. In
regression, where you want to see an equation relating X to Y, you must
specify which is the Y-variable (dependent variable) and which is the X-
variable (independent variable). The correlation coefficient is the same
regardless of which variable is X and which one is Y. The regression equation
will be different if you reverse the X and Y.
Correlation 398

Coefficient of Determination, R2
• If all the points are on the line, r = 1 (or -1 if there is an inverse relationship),
then R2 is 100%. This means that all of the variation in Y is explained by
(variations in) X. This indicates that X does a perfect job in explaining Y and
there is no unexplained variation.

• Thus, if r = .30 (or -.30), then R2 = .09. Only 9% of the variation in Y is


explained by X and 91% is unexplained. This is why a correlation coefficient
of .30 is considered weak—even if it is significant.
• If r = .50 (or -.50), then R2 = .25. 25% of the variation in Y is explained by X
and 75% is unexplained. This is why a correlation coefficient of .50 is
considered moderate.
• If r = .90 (or -.90), then R2 = .81. 81% of the variation in Y is explained by X
and 19% is unexplained. This is why a correlation coefficient of .90 is
considered very strong.
Correlation 399

Computing the Correlation Coefficient


n ∑ XY − ∑ X ∑ Y
r=
[n ∑ X 2
][
− (∑ X ) n ∑ Y 2 − (∑ Y )
2 2
]
• The input data consists of pairs of numbers, Xi and Yi. It does not
matter which variable you call X and which variable you call Y when
you are doing correlation. (That will become important when we study
regression.)

• To compute r, you need n (number of pairs of observations) and the


following summations:
• ∑Xi, ∑Yi, ∑ XiYi, ∑Xi2, and ∑Yi2

• This is very easy to compute in a spreadsheet. For simplicity, we


have removed the subscripts. Note that ∑Xi2 is not equal to (∑Xi)2
Correlation 400

Example 1: Grade and Height

Y (Grade) 100 95 90 80 70 65 60 40 30 20
X (Height) 73 79 62 69 74 77 81 63 68 74

∑Xi = 720
∑Yi = 650
∑XiYi = 46,990
∑Xi2 = 52,210
∑Yi2 = 49,150
Correlation 401

Example 1: Grade and Height (cont’d)

n XY   X Y
• r=
n X 2
  X  n Y   Y  
2 2 2

10(46,990)  720(650) 1900


10(52,210)  (720) 10(49,150)  (650) 
2 2
3,70069,000
• r= = = .1189

• R2 = 1.4%

• The correlation coefficient is not significant (for now, you have to trust me on this;
there is a t-test we can learn later to test for significance). The correlation
coefficient, r, of .1189 is not significantly different from 0. Thus, there is no
relationship between height and grades. Correlation coefficients of less than .30 are
generally considered very weak and of little practical importance even if they turn
out to be significant. If you go back to the scatter plot, you will note that the X and Y
do not seem to be related.
• Keep in mind that r is based on a sample of 10 students. The population consists of millions
of students. There is sampling error in measuring r. Therefore, we need a statistical test to
determine whether the sample correlation coefficient is significantly different from 0.
Correlation 402

Example 2: Grades and Hours Studied


Y (Grade) 100 95 90 80 70 65 60 40 30 20
X (Hours Studied) 10 8 9 8 7 6 7 4 2 1

∑Xi = 62
∑Yi = 650
∑ XiYi = 4,750
∑Xi2 = 464
∑Yi2 = 49,150
Correlation 403

Example 2: Grades and Hours Studied (cont’d)


• The scatter plot indicated a very strong positive linear relationship.

10(4,750)  650(62) 7200


• r= 10(464)  (62) 10(49,150)  (650=) 
2 2
= .97
[796]69,000

• R2 = 94.09%

• The correlation coefficient is significant. (Again, you have to trust


me on this. To test the significance of the correlation coefficient, a t-
test can be done.) A correlation coefficient of.97 is almost perfect.
Thus, there is a significant relationship between hours studied and
grades. Correlation coefficients of more than .80 are generally
considered very strong and of great practical importance.
Correlation 404

Example 3: Price and Quantity


X (price) Quantity Demanded
$2 95
3 90
4 84
5 80
6 74
7 69
8 62
9 60
10 63
11 50
12 44

∑Xi=77
∑Yi=771
∑XiYi=4,867
∑Xi2 =649
∑Yi2 =56,667
Correlation 405

Example 3: Price and Quantity (cont’d)


11(4867)  77(771)  5,830

•r= 11(649)  (77) 11(56,667)  (771) 


2 2
= [1210]28,896

• r = -.99; R2 = 98.01%

• To test the significance of the correlation coefficient, a t-


test can be done. The correlation coefficient is significant
(again, you have to trust me on this). A correlation
coefficient of -.99 is almost perfect. Thus, there is a
significant and strong inverse relationship between price
and quantity demanded.
Correlation 406

Example 4: Attractiveness and Salary

The higher the score, the more attractive the employee.


Attractiveness Starting Salary
Score (in thousands)
0 20
1 24
2 25
3 26
4 20
∑Xi = 45 5 30
∑Yi = 289 6 32
7 38
∑XiYi = 1,472 8 34
∑Xi2 = 285 9 40
∑Yi2 = 8,801
Correlation 407

Example 4: Attractiveness and Salary (cont’d)


10(1472)  45(289) 1715
10(285)  (45) 10(8801)  (289) 
2 2
[825]4489
•r= = = .891

• R2 = 79.39%

• To test the significance of the correlation coefficient, a


t-test can be done. We will learn how to use Excel to
test for significance. The correlation coefficient is
significant (again, you have to trust me on this). A
correlation coefficient of .891 is strong. Thus, there is
a significant and strong relationship between
attractiveness and starting salary.
Correlation 408

Do Your Correlation Homework


• As always – practice, practice, practice!
Multiple Regression

Multiple regression analysis is a straightforward extension


of simple regression analysis which allows more than one
independent variable.
410

Multiple Linear Regression Models


Introduction
• Many applications of regression
analysis involve situations in which
there are more than one regressor
variable.

• A regression model that contains more


than one regressor variable is called a
multiple regression model.
411

Multiple Linear Regression Models

1.1 Introduction
• For example, suppose that the effective life of a cutting
tool depends on the cutting speed and the tool angle. A
possible multiple regression model could be

where
Y – tool life
x1 – cutting speed
x2 – tool angle
412

Multiple Linear Regression Models

1.1 Introduction

Figure 12-1 (a) The regression plane for the model E(Y) = 50 +
10x1 + 7x2. (b) The contour plot
413

1: Multiple Linear Regression Models

1.1 Introduction
414

1: Multiple Linear Regression Models

1.1 Introduction

Figure 2 (a) Three-


dimensional plot of the
regression model E(Y) =
50 + 10x1 + 7x2 +
5x1x2. (b) The contour
plot
415

1: Multiple Linear Regression Models

1.1 Introduction

Figure 3 (a) Three-

dimensional plot of the

regression model E(Y) = 800 +

10x1 + 7x2 – 8.5x12 – 5x22 +

4x1x2. (b) The contour plot


416

Multiple Linear Regression Models


Least Squares Estimation of the Parameters
417

Multiple Linear Regression Models

1.2 Least Squares Estimation of the Parameters


• The least squares function is given by

• The least squares estimates must satisfy


418

Multiple Linear Regression Models

1.2 Least Squares Estimation of the Parameters


• The least squares normal Equations are

• The solution to the normal Equations are the least


squares estimators of the regression coefficients.
419

Multiple Linear Regression Models

Example 1 The following data is a pull strength of wire bond in a semiconductor


manufacturing process, wire length and die height to illustrate a building an empirical
model.
420

Multiple Linear Regression Models

Figure 4 Matrix of scatter plots (from Minitab) for the wire bond
pull strength data in Table 12-2.
421

Multiple Linear Regression Models

Example 1
422

Multiple Linear Regression Models


Example 1
423

Multiple Linear Regression Models

Example 1
424

Multiple Linear Regression Models

1.3 Matrix Approach to Multiple Linear Regression

Suppose the model relating the regressors to the


response is

In matrix notation this model can be written as


425

Multiple Linear Regression Models

Matrix Approach to Multiple Linear Regression

where
426

Multiple Linear Regression Models

Matrix Approach to Multiple Linear Regression

We wish to find the vector of least squares


estimators that minimizes:

The resulting least squares estimate is


427

Multiple Linear Regression Models

Matrix Approach to Multiple Linear Regression


428

Example 2
429

Multiple Linear Regression Models

Example 2
430

Multiple Linear Regression Models

Example 2
431

Multiple Linear Regression Models


Example 2
432

Multiple Linear Regression Models

Example 2
2. Analysis of Variance
(ANOVA)
One-Way Analysis of Variance

• Evaluate the difference among the means of three


or more groups
Examples: Average production for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires

• Assumptions
• Populations are normally distributed
• Populations have equal variances
• Samples are randomly and independently drawn
Hypotheses of One-Way ANOVA
• H0 : μ1 μ2 μ3  μK
• All population means are equal
• i.e., no variation in means between groups

• H1 : μi μ j for at least one i, j pair


• At least one population mean is different
• i.e., there is variation between groups
• Does not mean that all population means are different
(some pairs may be the same)
One-Way ANOVA
H0 : μ1 μ2 μ3  μK
H1 : Not all μi are the same

All Means are the same:


The Null Hypothesis is True
(No variation between
groups)

μ1 μ2 μ3
One-Way ANOVA
(continued)
H0 : μ1 μ2 μ3  μK
H1 : Not all μi are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Variation is present between groups)

or

μ1 μ2 μ3 μ1 μ2 μ3


Chap 17-438

Variability
• The variability of the data is key factor to test the equality
of means
• In each case below, the means may look different, but a
large variation within groups in B makes the evidence
that the means are different weak
A B

A B C A B C
Group Group
Small variation within groups Large variation within groups
Partitioning the Variation
• Total variation can be split into two parts:

SST = SSW + SSG


SST = Total Sum of Squares
Total Variation = the aggregate dispersion of the individual
data values across the various groups
SSW = Sum of Squares Within Groups
Within-Group Variation = dispersion that exists among the
data values within a particular group
SSG = Sum of Squares Between Groups
Between-Group Variation = dispersion between the group
sample means
Partition of Total Variation
Total Sum of Squares
(SST)

Variation due to Variation due to


= random sampling + differences
(SSW) between groups
(SSG)
Total Sum of Squares
SST = SSW + SSG
K ni
S ST    (x ij  x ) 2

Where: i 1 j 1

SST = Total sum of squares


K = number of groups (levels or treatments)
ni = number of observations in group i
xij = jth observation from group i
x = overall sample mean
Chap 17-442

Total Variation
(continued)

SST (x 11  x )2  (X12  x )2  ...  (x KnK  x )2

Re s p o n s e , X

G rou p 1 G rou p 2 G rou p 3


Within-Group Variation
SST = SSW + SSG
K ni
S S W    (x ij  x i ) 2

i 1 j 1
Where:
SSW = Sum of squares within groups
K = number of groups
ni = sample size from group i
Xi = sample mean from group i
Xij = jth observation in group i
Within-Group Variation
(continued)

K ni
S S W    (x ij  x i ) 2
i 1 j 1
SSW
Summing the variation
within each group and then
MSW 
adding over all groups n K
Mean Square Within =
SSW/degrees of freedom

μi
Within-Group Variation
(continued)

SSW (x 11  x1 )2  (x 12  x1 )2  ...  (x KnK  x K )2

Re s p o n s e , X

x3
x2
x1

G rou p 1 G rou p 2 G rou p 3


Chap 17-446

Between-Group Variation

SST = SSW + SSG


K
SSG  ni ( x i  x ) 2

i1
Where:
SSG = Sum of squares between groups
K = number of groups
ni = sample size from group i
xi = sample mean from group i
x = grand mean (mean of all data values)
Chap 17-447

Between-Group Variation
(continued)
K
SSG  ni ( x i  x ) 2

i1

SSG
Variation Due to
Differences Between MSG 
Groups K 1
Mean Square Between Groups
= SSG/degrees of freedom

μi μj
Between-Group Variation
(continued)

2 2 2
SSG n1(x1  x)  n2 (x 2  x)  ...  nK (x K  x)

Re s p o n s e , X

x3
x
x2
x1

G rou p 1 G rou p 2 G rou p 3


Obtaining the Mean Squares

SST
MST 
n 1
SSW
MSW 
n K

SSG
MSG 
K 1
One-Way ANOVA Table

Source of SS df MS F ratio
Variation (Variance)
Between SSG MSG
SSG K-1 MSG =
Groups K - 1 F = MSW
Within SSW
SSW n-K MSW =
Groups n-K
Total SST = n-1
SSG+SSW
K = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
Chap 17-451

One-Factor ANOVA
F Test Statistic
H0: μ1= μ2 = … = μK
H1: At least two population means are different

• Test statistic
MSG
F
MSW
MSG is mean squares between variances
MSW is mean squares within variances
• Degrees of freedom
• df1 = K – 1 (K = number of groups)
• df2 = n – K (n = sum of sample sizes from all groups)
Interpreting the F Statistic

• The F statistic is the ratio of the between estimate of


variance and the within estimate of variance
• The ratio must always be positive
• df1 = K -1 will typically be small
• df2 = n - K will typically be large

Decision Rule:
 Reject H0 if  = .05

F > FK-1,n-K, 0 Do not Reject H0


reject H0
FK-1,n-K,
One-Factor ANOVA
F Test Example

You want to see if three Pro 1 Pro 2 Pro 3


different beer production lines 254 234 200
produces different volumes. 263 218 222
You randomly select five 241 235 197
measurements from trials on 237 227 206
an automated driving 251 216 204
machine for each club. At the
.05 significance level, is there
a difference in mean
distance?
One-Factor ANOVA Example:
Scatter Diagram
Distance
Pro 1 Pro 2 Pro 3 270
254 234 200 260 •
263 218 222 •
250 • x1
241 235 197 240 •
237 227 206 230
• ••
251 216 204
220 • x2 x

••
210

x1 249.2 x 2  226.0 x 3 205.8 200 •• x3


••
190
x  227.0

1 2 3
Club
One-Factor ANOVA Example
Computations
Pro 1 Pro 2 Pro 3 x1 = 249.2 n1 = 5
254 234 200 x2 = 226.0 n2 = 5
263 218 222
241 235 197 x3 = 205.8 n3 = 5
237 227 206 n = 15
251 216 204 x = 227.0
K=3
SSG = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSG = 4716.4 / (3-1) = 2358.2 2358.2


F 25.275
MSW = 1119.6 / (15-3) = 93.3 93.3
Chap 17-456

One-Factor ANOVA Example


Solution
H 0: μ 1 = μ 2 = μ 3 Test Statistic:
H1: μi not all equal
MSA 2358.2
 = .05 F  25.275
MSW 93.3
df1= 2 df2 = 12

Critical Value: Decision:


F2,12,.05= 3.89 Reject H0 at  = 0.05
Conclusion:
 = .05
There is evidence that
0 Do not Reject H at least one μi differs
reject H
0
F = 25.275
from the rest
0

F2,12,.05 = 3.89
ANOVA -- Single Factor:

EXCEL: tools | data analysis | ANOVA: single factor


ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
Two-Way Analysis of Variance

• Examines the effect of


• Two factors of interest on the dependent variable
• e.g., Percent carbonation and line speed on soft drink bottling
process
• Interaction between the different levels of these two factors
• e.g., Does the effect of one particular carbonation level depend
on which level the line speed is set?
Chap 17-459

Two-Way ANOVA
(continued)

• Assumptions

• Populations are normally distributed


• Populations have equal variances
• Independent random samples are
drawn
Randomized Block Design

Two Factors of interest: A and B


K = number of groups of factor A
H = number of levels of factor B
(sometimes called a blocking variable)

Group
Block 1 2 … K

1 x11 x21 … xK1


2 x12 x22 … xK2
. . . . .
. . . . .
H x1H x2H … xKH
Two-Way Notation
• Let xji denote the observation in the jth group and ith block
• Suppose that there are K groups and H blocks, for a
total of n = KH observations
• Let the overall mean be x
• Denote the group sample means by

(j 1,2,by
x j means
• Denote the block sample  ,K)

x i (i 1,2, ,H)
Partition of Total Variation
• SST = SSG + SSB + SSE

Variation due to
Total Sum of
Squares (SST) = differences between
groups (SSG)
+
Variation due to
differences between
blocks (SSB)
+
The error terms are assumed Variation due to
to be independent, normally random sampling
distributed, and have the same (unexplained error)
variance
(SSE)
Chap 17-463

Two-Way Sums of Squares


• The sums of squares are Degrees of
Freedom:
K H
Total : S S T    (x ji  x ) 2 n–1
j 1 i 1

K
B etween - Groups : S S G  H  ( x j  x ) 2 K–1
j 1

H
B etween - B locks : S S B  K  (x  i  x ) 2 H–1
i 1

K H
E rror : S S E    (x ji  x j  x  i  x ) 2 (K – 1)(K – 1)
j 1 i 1
Two-Way Mean Squares
• The mean squares are

SST
MST 
n 1
SST
MSG 
K 1
SST
MSB 
H 1
SSE
MSE 
(K  1)(H  1)
Two-Way ANOVA:
The F Test Statistic

F Test for Groups


H0: The K population group
means are all the same MSG Reject H0 if
F
MSE
F > FK-1,(K-1)(H-1),

F Test for Blocks


H0: The H population block
means are the same MSB Reject H0 if
F
MSE
F > FH-1,(K-1)(H-1),
General Two-Way Table Format

Source of Sum of Degrees of Mean Squares F Ratio


Variation Squares Freedom
Between SSG MSG
groups SSG K–1 MSG 
K 1 MSE
Between SSB MSB
SSB H–1 MSB 
blocks
H 1 MSE
SSE (K – 1)(H – 1) SSE
Error
MSE 
(K  1)(H  1)
Total SST n-1
Estimators and Estimates
• Typically, we can’t observe the full population, so we must

make inferences base on estimates from a random sample


• An estimator is just a mathematical formula for estimating a

population parameter from sample data


• An estimate is the actual number the formula produces from

the sample data


Examples of Estimators

• Suppose we want to estimate the population mean


• Suppose we use the formula for E(Y), but substitute 1/n for
f(yi) as the probability weight since each point has an equal
chance of being included in the sample, then
• Can calculate the sample average for our sample:

n
1
Y   Yi
n i 1
What Make a Good Estimator?
• Unbiasedness
• Efficiency
• Mean Square Error (MSE)
• Asymptotic properties (for large samples):
• Consistency
Unbiasedness of Estimator

• Want your estimator to be right, on average


• We say an estimator, W, of a Population Parameter, q, is
unbiased if E(W)=E(q)
• For our example, that means we want

E (Y ) Y
Proof: Sample Mean is Unbiased

1 n  1 n
E (Y ) E   Yi    E (Yi )
 n i 1  n i 1
1 n 1
  Y  nY Y
n i 1 n
Efficiency of Estimator

• Want your estimator to be closer to the truth, on average,


than any other estimator
• We say an estimator, W, is efficient if Var(W)< Var(any
other estimator)
• Note, for our example

1  1 n n
 2
Var (Y ) Var   Yi   2  2

 n i 1  n i 1 n
MSE of Estimator

• What if can’t find an unbiased estimator?


• Define mean square error as E[(W-q)2]
• Get trade off between unbiasedness and efficiency, since
MSE = variance + bias2
• For our example, that means minimizing

 
E Y  Y  Var Y  E Y  Y 
2 2
Consistency of Estimator

• Asymptotic properties, that is, what happens as the sample


size goes to infinity?
• Want distribution of W to converge to q, i.e. plim(W)=q
• For our example, that means we want

 
P Y  Y    0 as n  
More on Consistency
• An unbiased estimator is not necessarily consistent –
suppose choose Y1 as estimate of mY, since E(Y1)= mY, then
plim(Y1) mY
• An unbiased estimator, W, is consistent if Var(W)  0 as n

• Law of Large Numbers refers to the consistency of sample
average as estimator for m, that is, to the fact that:

plim(Y)  Y
476

Central Limit Theorem

• Asymptotic Normality implies that P(Z<z)F(z) as n ,


or P(Z<z) F(z)
• The central limit theorem states that the standardized
average of any population with mean m and variance s2 is
asymptotically ~N(0,1), or

Y  Y a
Z ~ N 0,1

n
477

Estimate of Population Variance

• We have a good estimate of mY, would like a good estimate


of s2Y
• Can use the sample variance given below – note division by
n-1, not n, since mean is estimated too – if know m can use n

n 2
1
S 2
 Yi  Y 
n  1 i 1
478

Estimators as Random Variables


• Each of our sample statistics (e.g. the sample mean, sample

variance, etc.) is a random variable - Why?


• Each time we pull a random sample, we’ll get different sample

statistics
• If we pull lots and lots of samples, we’ll get a distribution of

sample statistics
479

Categorical Data analysis


1. Qualitative Random Variables Yield Responses That Can
Be Put In Categories. Example: Gender (Male, Female)
2. Measurement or Count Reflect # in Category
3. Nominal (no order) or Ordinal Scale (order)
4. Data can be collected as continuous but recoded to
categorical data. Example (Systolic Blood Pressure -
Hypotension, Normal tension, hypertension )
2 TEST OF
INDEPENDENCE
BETWEEN 2
CATEGORICAL VARIABLES
481

Hypothesis Tests
Qualitative Data
Q ualitative
Data

2 or m ore
1 pop. pop.
Proportion Independence

2 pop.

Z Test Z Test 2 Test 2 Test


482

2 Test of Independence
1. Shows If a Relationship Exists Between 2
Qualitative Variables, but does Not Show Causality
2. Assumptions
Multinomial Experiment
All Expected Counts  5

3. Uses Two-Way Contingency Table


483

2 Test of Independence
Contingency Table

• 1. Shows # Observations From 1 Sample Jointly in 2

Qualitative Variables
484

2 Test of Independence
Contingency Table
1. Shows # Observations From 1 Sample Jointly in 2
Qualitative Variables

Levels of variable 2

Residence
Disease Urban Rural Total
Status
Disease 63 49 112
No disease 15 33 48
Total 78 82 160
Levels of variable 1
485

2 Test of Independence
Hypotheses & Statistic

1. Hypotheses

• H0: Variables Are Independent

• Ha: Variables Are Related (Dependent)


486

2 Test of Independence
Hypotheses & Statistic
1. Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)
2. Test Statistic

Observed count

O  Eij 
2
Expected
2  
all cells
ij

Eij count
487

2 Test of Independence
Hypotheses & Statistic
1. Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)
2. Test Statistic

Observed count

Degrees of Freedom: (r 
-  E – 1)
O1)(c
2
Expected
  2

all cells
ij

Eij
ij

count
488

2 Test of Independence Expected


Counts
1. Statistical Independence Means Joint Probability
Equals Product of Marginal Probabilities
2. Compute Marginal Probabilities & Multiply for
Joint Probability
3. Expected Count Is Sample Size Times Joint
Probability
489

Expected Count Example


Marginal probability = 112
160
Residence
Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
490

Expected Count Example


Marginal probability = 112
160
Residence
Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160

78
Marginal probability =
160
491

Expected Count Example


112 78
Joint probability = Marginal probability = 112
160 160 160
Residence
Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160

78
Marginal probability =
160
492

Expected Count Example


112 78
Joint probability = Marginal probability = 112
160 160 160
Residence
Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
112 78
78 Expected count = 160·
Marginal probability = 160 160
160 = 54.6
493

Expected Count Calculation

Expected count =
Row total Column tot al 
Sample size
494

Expected Count Calculation


Expected count =
Row total Column tot al 
Sample size

112x78 Residence 112x82


160
Disease Urban Rural 160
Status Obs. Exp. Obs. Exp. Total
Disease 63 54.6 49 57.4 112
No Disease 15 23.4 33 24.6 48
Total 78 78 82 82 160

48x78 48x82
160 160
2 Test of Independence
Example on HIV
• You randomly sample 286 sexually active individuals and
collect information on their HIV status and History of STDs.
At the .05 level, is there evidence of a relationship?

HIV
STDs Hx No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
495
2 Test of Independence
Solution
H0: Null hypothesis Test Statistic:
Ha: Alternative hypothesis
 = significance level
df = Degrees of freedom
Critical Value(s): Decision:

Conclusion:
Reject

0 2 496
2 Test of Independence
Solution
H0: No Relationship Test Statistic:
Ha: Relationship
=
df =
Critical Value(s):
Decision:

Reject Conclusion:

0 2 497
2 Test of Independence
Solution
H0: No Relationship Test Statistic:
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:

Reject Conclusion:

0 2 498
2 Test of Independence
Solution
H0: No Relationship Test Statistic:
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:

Reject
Conclusion:
 = .05

0 3.841 2 499
500

2 Test of Independence
Solution

 E(nij)  5 in all
cells
116x132 HIV 154x116
286 No Yes 286
STDs HX Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286

170x132 170x154
286 286
501

2 Test of Independence
Solution

 
2
Oij  Eij 
2

all cells Oij

 2

O11  E11 
2
 
O22  E22 
2

E11 E22


84  53.5 32  62.5

2 2
 
122  91.5
2
54.29
53.5 62.5 91.5
2 Test of Independence
Solution
H0: No Relationship Test Statistic:
Ha: Relationship
 = .05
2 = 54.29
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:

Reject Conclusion:
 = .05

0 3.841 2 502
2 Test of Independence Solution

H0: No Relationship Test Statistic:


Ha: Relationship
 = .05
2 = 54.29
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:

Reject Reject at  = .05


Conclusion:
 = .05

0 3.841 2 503
2 Test of Independence Solution

H0: No Relationship Test Statistic:


Ha: Relationship
 = .05
2 = 54.29
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:

Reject Reject at  = .05


Conclusion:
 = .05
There is evidence of a
0 3.841 2 relationship
504

You might also like