Preliminary Concepts On Statistical Inference
Preliminary Concepts On Statistical Inference
Statistics plays important role in the field of applied scientific research. To get
the data or information about the population, the researcher may just use a portion
of it in order to eliminate at least the cost and time constraints. Statistics offers
varied tools and techniques that will help the researcher draw reliable and valid
inferences or generalization about the population using the sample as basis.
Statistical Estimation
Inferential statistics help facilitate our work. Imagine, for instance, that the
population of the said department cited in the example above was 1,000 students
distributed among seven programs. It would be extremely laborious for the
researchers to involve all the students. They can make their work easier by drawing
representative sample for each program which is proportional to the population size
of every program. The total sample size will be 286 students only applying Slovins
formula and using 5% margin of error. And yet, they can report about the results as
reflective of the adjustment of the whole freshmen students.
Statistical inference also deals with the concepts about parameter and
statistics that are involved in the estimation and further in the testing the
hypothesis.
Statistics are the computed measures about the sample. The sample mean
is symbolized as x and the sample standard deviation by s. The term statistics is
synonymous to the concepts of estimates.
The second property is independence. This means that the chance of one
member being drawn does not affect the chances of the other members getting
chosen. For example, in a population where there are father and son members not
necessarily paired, when the father is drawn, this does not mean that the son will
automatically be included in the sample. Hence, the selection of the father is
independent of the inclusion of the son in this sample selection.
We know from Chapter 2 that sampling can be broken down into 2 types. The
first one isthe nonprobability sampling where there is no way of estimating the
probability that each individual or element will be included in the sample. Examples
of this type are the accidental sampling, quota sampling and the purposive
sampling. The convenience and economical use of this type are its advantages.
The second type is the probability sampling wherein every individual has
an equal chance of becoming a part of the sample. Examples under these type
include the simple random sampling, stratified random sampling, cluster sampling,
systematic sampling and multistage sampling. It is also noted whenever the
sampling is not carried out like in the probability sampling then the result is a biased
sample.
The sampling error is the difference between a particular value, and its
corresponding statistic. Supposing that someone administered an intelligence test
to a random sample of incoming freshmen in a certain college and computed the
mean. The statistic is an estimate of the parameter mean x . The sampling error is
denoted as eis the difference between the population mean and the sample mean,
thus - x = e.
There are two types of an estimator, these are the point estimator and
interval estimator. The point estimator is a rule or formula that gives the value of
the gathered information for estimating a particular parameter. An example of a
point estimator is the sample mean x , in which it estimates the value of the
population mean, . The second type is referred to as the interval estimator in
which it is a rule or formula that gives a set of computed values of the gathered
information indicating the range of values in which the parameter to be estimated
will lie.
` Standard Error
Hypothesis Testing
The goal of hypothesis is not to question the computed value of the sample
statistic but to make a judgment about the difference between the sample statistics
and a hypothesized population parameter.
Hypothesis testing enables a researcher to generalize a population from
relatively small samples. In many instances, a population from relatively small
samples. In many instances, a researcher can only rely on the information provided
for by a part of the population.
A hypothesis is a tentative explanation for certain events, phenomena or
behaviors. It is a statement of prediction of the relationship between or among
variables. It is also the most specific statement of a problem in which the variables
considered as measurable and that the statement specifies how these variables
are related. Furthermore, this statement is testable which means that the
relationship between the variables can be put into test by means of the application
of appropriate statistical test on the data gathered about the variables.
Example 1 The Mathematics ability test scores of the control group do not differ
with that of the experimental group.
Example 2 The job performance of a group of employees working in a class A
hotel is independent on their working condition.
Example 3 The scholastic competition among Freshman students has no
relationship on their academic achievement.
Example 4 There is no difference in the college entrance examination scores
obtained by the students in the public and private schools.
Example 5 The Mathematics ability test scores of the control group differs with
that of the experimental group
Example 6 The job performance of a group of employees working in a class A
hotel is related on their working condition.
Example 7 The scholastic competition among Freshmen students has a
relationship on their academic achievement.
Example 9 The Mathematics ability test scores of the control group is lower than
that of the experimental group.
Example 10 High scores in the mental ability test corresponds to high scores on the
self-concept test.
Example 11 Students exposed to time pressure has a negative effect on their
reading comprehension skills.
Example 12 The brand of cellular phone used by college students in XYZ University
has a positive effect in developing ones self-image.
The critical region is a set of values of the test statistic (computed from the
gathered data set) that is chosen before the experiment to define the conditions
under which the null hypothesis will be rejected.
The critical value or values separate the critical region from the values of
test statistic that would lead to the rejection of the null hypothesis. The critical
values depend on the nature of the null hypothesis, the relevant sampling
distributions and the level of significance.
In two-tailed tests, the level of significance is divided equally between two
tails that constitute the critical region. For example, in a two tailed test with a
significance level of = 5%, there is an area of 2.5% in each of the two tails as
shown on the figure 9.1. On the other hand, for one tailed-tests, the level of
significance constitutes the critical region that can be found on either the left or
right tail of the distribution.
Confidence Interval
4. Write the decision rule expressing on how we accept or reject the null
hypothesis.
5. Compute for the test statistic and compare with the critical value. The test
statistic plays a vital role whether the null hypothesis will be rejected or
accepted.
6. State the decision based on the resulting computed value when compared to
the critical value.
By the following the procedures in testing the hypothesis for only a single
mean, we have a hypothesized mean (0). Symbolically the null hypothesis is
written as:
Ho: = 0(3)
Stating that the mean is equal to the hypothesized mean. On the other hand,
the alternative hypothesis is written as symbolically as:
Ha: = 0 (4)
`` The decision rule is stated as follows: Reject the null hypothesis if the
absolute value of the test statistic exceeds the critical value. Otherwise, accept the
null hypothesis.
x 0
Zc = /n (7)
The critical value is obtained using the Z-tabular value located in Appendix 2.
For a two-sided test we consider the value of 1 - 2 , the value is symbolically
written as Z 2 . Otherwise, for a one-sided test, we consider the value of 1 and
Solution:
Z 0.850.80 0.05
c= = =3.125
0.16 0.016
( )
100
Example 2 A random sample of 100 recorded deaths in U.S during the past year
showed on average life span of 71.8 yrs. assuming a population standard deviation
of 8.9 years, does this seem to indicate that the mean life span today is greater
than 70 years? Use a 0.05 level of significance.
1.) HO: = 70
HA: > 70
2.) Z- Test (right tailed)
3.) =.05
1 - 0.05 = 0.95
4.) Reject HO if ZC > Ztab
(71.870) 100
5.) Z c=
8.9
Zc = 2.02
P(z > 2.02) = 1 P(z > 2.02)
= 1 0.9783
= 0.0217
**Note: P value is the lowest significance in which the observed value of the
test statistic is significant.
6.) Zc > Ztab; Reject HO
7.) The mean life span today is greater than 70.
Example 3 A manufacturer of sports equipment has developed a new synthetic
finishing line that he claims has a mean breaking strength of 8 kg with = 0.5 kg
test the hypothesis that = 8 kg against = 8kg if a random sample of 50 lines is
tested and found to have a mean breaking strength of 7.8 kg. Use 0.01 level of
significance
1.) HO: = 8
HA: 8
2.) Z- Test ()
3.) = 0.01/2
Ztab1= -2.575
Ztab2= 2.575
Zc = -2.828
P(z < -2.828) = 0.0023(2)
= 0.0046
6.) Zc < Ztab1 ; Reject HO
7.) The mean mreaking strengths is not equal to 8kg.
21 22
+
n1 n2
Where:
X1, X2 = sample mean of 1 and 2
12, 22 = variances of 1 and 2
n1, n2 = size of 1 and 2
Example 1 An admission test was administered to incoming freshman in 2 colleges.
Two independent sample of 150 students each are randomly selected and the mean
scored of the given samples are X1 = 88 and X2 = 85. Assume that the variances of
the test scores are 40 and 35 respectively. Is the difference between the mean
scores significant or can be attributed to chance? Use 0.01 level of significance.
1.) HO: 1 = 2
HA: 1 2
2.) Z- Test (two-tailed)
3.) = 0.01
Ztab1= -2.578
Ztab2= 2.578
Zc = 4.2426
6.) Zc > Ztab1 ; Accept HO
7.) There is a significant difference and can be attributed to chance.
mean, s is the sample standard deviation which is known or given and n is the
sample size.
The critical value is obtained using the t-tabular value located in Appendix 5.
For a two-sided test we look for the value of df, which is referred as the degrees of
t
freedom, this is symbolically written 2
(n1 ) . Otherwise, for a one-sided test, we
look on the column of and look for the value of df, this is symbolically written as
t (n1) .
Solution:
0.850.80
0.05
tc = 0.25
( ) = 0.025 = 2.000
100
So that tc = 2.000 against the t -tabular value of t0.025(99)=
1.960. Notice that on the values of n in the t-table is only up to
30 and then followed by INF (infinity), which is used for n greater
than 30.
Solution:
Example 3 The manager of a car rental agency claims that the average mileage of
cars rented is less than 8000. A sample of 5 auto-mobiles has an average mileage
of 7723, with st. dev. of 500 miles. At =0.01, is there enough evidence to reject to
managers claim?
1.) HO: 1 = 8000
HA: 1 < 8000
2.) T- Test (left-tailed)
3.) = 0.01
V= n-1= 5-1= 4
Ttab= -3.747
Zc = -1.238
6.) Since tc > ttab ; Accept HO
7.) The managers claim is false.
2 2
n1 S1 +n2 S 2 n1 +n 2
+
n 1+ n22 n1 n2
Example 4 Two samples are randomly selected from two groups of students
who have been taught using different teaching methods. An examination is
given and the results are shown below.
Group1 Group2
n1=8 n2=10
X1=85 X2=87
S12=46 S22=66
Using 0.05 level of significance, can we conclude that the two different
teaching methods are equally effective?
1.) HO: 1 = 2
HA: 1 2
2.) T- Test (two-tailed test)
3.) = 0.05
V= n1+n2-2= 8+10-2 = 16
Ttab1= -2.120
Ttab2= 2.120
tc = -0.651
6.) Since tc > ttab1 ; Accept HO
7.) The two teaching methods are equally effective.
EXERCISES
1. Suppose that mi allergist wishes to test the hypothesis that at least 30% of the
public is allergic to some cheese products. Explain how the allergist could commit
(a) a type I error;
(b) a type II error.
Answer: ((a) Conclude that fewer than 30% of the public are allergic to some cheese
products when, in fact, 30% or more are allergic.
(b) Conclude that at least 30% of the public are allergic to some cheese
products when, in fact, fewer than 30% are allergic.)
2. A sociologist is concerned about the effectiveness of a training course designed
to get more drivers to use seat, belts in automobiles.
(a) What hypothesis is she testing if she commits a type I error by erroneously
concluding that the training course: is ineffective?
(b) What hypothesis is she testing if she commits a type II error by erroneously
concluding that the training course is effective?
Answer: ((a) The training course is effective.
(b) The training course is effective.)
3. A large manufacturing firm is being charged with discrimination in its hiring
practices.
(a) What hypothesis is being tested if a jury commits a type I error by dueling the
firm guilty?
(b) What hypothesis is being tested if a jury commits a type II error by finding the
firm guilty?
Answer: ((a) The firm is not guilty.
(b) The firm is guilty.)
4. The proportion of adults living in a small town who are college graduates is
estimated to be p = 0.6.
To test this hypothesis, a random sample of 15 adults is selected. If the number of
college graduates in our sample is anywhere from 6 to 12, we shall not reject the
null hypothesis that p = 0.6; otherwise, we shall conclude that p ^ 0.6.
(a) Evaluate a assuming that p = 0.6. Use the binomial distribution.
(b) Evaluate 8 for the alternatives p = 0.5 and p 0.7.
(c) Is this a good test procedure?
Answer: ((a) = P(X 5 | p = 0.6)+P(X 13 | p = 0.6) = 0.0338+(10.9729) =
0.0609.
(b) = P(6 X 12 | p = 0.5) = 0.9963 0.1509 = 0.8454.
= P(6 X 12 | p = 0.7) = 0.8732 0.0037 = 0.8695.
(c) This test procedure is not good for detecting differences of 0.1 in p.)
5. Repeat. Exercise 10.4 when 200 adults are selected and the fail to reject region is
defined to be
110 < x < 130, where x is the number of college graduates in our sample. Use the
normal approximation.
Answer: ((a) = P(X < 110 | p = 0.6) + P(X > 130 | p = 0.6) = P(Z < 1.52) + P(Z
>
1.52) = 2(0.0643) = 0.1286.
(b) = P(110 < X < 130 | p = 0.5) = P(1.34 < Z < 4.31) = 0.0901.
= P(110 < X < 130 | p = 0.7) = P(4.71 < Z < 1.47) = 0.0708.
(c) The probability of a Type I error is somewhat high for this procedure,
although
Type II errors are reduced dramatically.)
6. A fabric manufacturer believes that the proportion of orders for raw material
arriving late is p = 0.6.
If a random sample of 10 orders shows that 3 or fewer arrived late, the hypothesis
that p = 0.6 should be rejected in favor of the alternative p < 0.6. Use the binomial
distribution.
(a) Find the probability of committing a type I error if the true proportion is p = 0.6.
(b) Find the probability of committing a type II error for the alternatives p = 0.3, p
0.4, and p = 0.5.
Answer: ((a) = P(X 3 | p = 0.6) = 0.0548.
(b) = P(X > 3 | p = 0.3) = 1 0.6496 = 0.3504.
= P(X > 3 | p = 0.4) = 1 0.3823 = 0.6177.
= P(X > 3 | p = 0.5) = 1 0.1719 = 0.8281.)
7. Repeat Exercise 10.6 when 50 orders are selected and the critical region is
defined to be x < 24, where x is the number of orders in our sample that arrived
late. Use the normal approximation.
Answer: ((a) = P(X 24 | p = 0.6) = P(Z < 1.59) = 0.0559.
(b) = P(X > 24 | p = 0.3) = P(Z > 2.93) = 1 0.9983 = 0.0017.
= P(X > 24 | p = 0.4) = P(Z > 1.30) = 1 0.9032 = 0.0968.
= P(X > 24 | p = 0.5) = P(Z > 0.14) = 1 0.4443 = 0.5557.)
8. An electrical firm manufactures light bulbs that have a lifetime that is
approximately normally distributed with a mean of 800 hours and a standard
deviation of 40 hours. Test tbe hypothesis that p = 800 hours against the alternative
p ^ 800 hours if a random sample of 30 bulbs has an average life of 788 hours. Use
a P-value in your answers.
Answer: (The hypotheses are
H0 : = 800,
H1 : 6= 800.
Now, z = 788800
40/30
NONPARAMETRIC TESTS
There are some situations when it is clear that the outcome does not follow a
normal distribution. These include situations:
Consider a clinical trial where study participants are asked to rate their
symptom severity following 6 weeks on the assigned treatment. Symptom
severity might be measured on a 5 point ordinal scale with response options:
Symptoms got much worse, slightly worse, no change, slightly improved, or
much improved. Suppose there are a total of n=20 participants in the trial,
randomized to an experimental treatment or placebo, and the outcome data
are distributed as shown in the figure below.
Note that 75% of the participants stay at most 16 days in the hospital
following transplant, while at least 1 stays 35 days which would be
considered an outlier. Recall from page 8 in the module on Summarizing Data
that we used Q1-1.5(Q3-Q1) as a lower limit and Q3+1.5(Q3-Q1) as an upper
limit to detect outliers. In the box-whisker plot above, 10.2, Q1=12 and
Q3=16, thus outliers are values below 12-1.5(16-12) = 6 or above
16+1.5(16-12) = 22.
Limits of Detection
This module will describe some popular nonparametric tests for continuous
outcomes. Interested readers should see Conover3 for a more comprehensive
coverage of nonparametric tests.
Key Concept:
The techniques described here apply to outcomes that are ordinal, ranked, or
continuous outcome variables that are not normally distributed. Recall that
continuous outcomes are quantitative measures based on a specific
measurement scale (e.g., weight in pounds, height in inches). Some
investigators make the distinction between continuous, interval and ordinal
scaled data. Interval data are like continuous data in that they are
measured on a constant scale (i.e., there exists the same difference between
adjacent scale scores across the entire spectrum of scores). Differences
between interval scores are interpretable, but ratios are not. Temperature in
Celsius or Fahrenheit is an example of an interval scale outcome. The
difference between 30 and 40 is the same as the difference between 70
and 80, yet 80 is not twice as warm as 40. Ordinal outcomes can be less
specific as the ordered categories need not be equally spaced. Symptom
severity is an example of an ordinal outcome and it is not clear whether the
difference between much worse and slightly worse is the same as the
difference between no change and slightly improved. Some studies use
visual scales to assess participants' self-reported signs and symptoms. Pain
is often measured in this way, from 0 to 10 with 0 representing no pain and
10 representing agonizing pain. Participants are sometimes shown a visual
scale such as that shown in the upper portion of the figure below and asked
to choose the number that best represents their pain state. Sometimes pain
scales use visual anchors as shown in the lower portion of the figure below.
Assigning Ranks
7 5 9 3 0 2
The ranks, which are used to perform a nonparametric test, are assigned as
follows: First, the data are ordered from smallest to largest. The lowest value
is then assigned a rank of 1, the next lowest a rank of 2 and so on. The
largest value is assigned a rank of n (in this example, n=6). The observed
data and corresponding ranks are shown below:
Ordered Observed
023579
Data:
Ranks: 123456
A complicating issue that arises when assigning ranks occurs when there are
ties in the sample (i.e., the same values are measured in two or more
participants). For example, suppose that the following data are observed in
our sample of n=6:
Observed Data: 7 7 9 3 0 2
The 4th and 5th ordered values are both equal to 7. When assigning ranks, the
recommended procedure is to assign the mean rank of 4.5 to each (i.e. the
mean of 4 and 5), as follows:
Ordered Observed 0. 2. 3.
7 7 9
Data: 5 5 5
1. 2. 3. 4. 4.
Ranks: 6
5 5 5 5 5
Suppose that there are three values of 7. In this case, we assign a rank of 5
(the mean of 4, 5 and 6) to the 4th, 5th and 6th values, as follows:
Ordered Observed
023777
Data:
Ranks: 123555
Using this approach of assigning the mean rank when there are ties ensures
that the sum of the ranks is the same in each sample (for example,
1+2+3+4+5+6=21, 1+2+3+4.5+4.5+6=21 and 1+2+3+5+5+5=21).
Using this approach, the sum of the ranks will always equal n(n+1)/2. When
conducting nonparametric tests, it is useful to check the sum of the ranks
before proceeding with the analysis.
3. Set up decision rule. The decision rule is a statement that tells under
what circumstances to reject the null hypothesis. Note that in some
nonparametric tests we reject H0 if the test statistic is large, while in
others we reject H0 if the test statistic is small. We make the distinction
as we describe the different tests.
2=(OE)2E
Where,
2 - Chi Square
Chi square test is applied when we have two categorical variables from a
single population. It is used to determine whether there is a significant
association between the two variables. This test is applicable when the
observations are independent (random). The Chi-square test for
independence is also called a contingency table Chi-square test.
For a given population, we consider two attributes and we may find the
dependence between them. We have a set of workers in a factory and we try
to classify them as smokers and non-smokers. The same workers are
classified again as 'men' and 'women'. Here, we may find that the number of
smokers are more in men than in women. So, we say that the attributes
'smoking' and 'sex' is dependent (associated).
Chi Square Goodness of Fit Test
This test is applicable when the observations are independent (random) and
the total frequency should be large. This test is used to test association of
variables in two-way tables where the assumed model of independence is
evaluated against the observed data. The chi-square goodness of fit test is
that it can be applied to any univariate distribution for which you can
calculate the cumulative distribution function. The chi-square goodness-of-fit
test can be applied to discrete distributions such as the binomial and the
Poisson.
The number of degree of freedom that we calculate for the Chi-square test
for goodness of fit reflects the number of categories that we are comparing
minus one.
Estimating the revised model in which new path has been added.
Calculating the difference between the two resulting chi square values.
The resulting chi square difference statistic also has a chi square distribution.
The degree of freedom for the chi square difference test is equal to the
difference between degree of freedom associated with the models. When the
chi square difference is statisticant, the model with the smaller chi-square is
considered to fit the data better than the model with the higher chi-square.
Chi Square Test of Homogeneity
The chi square test of homogeneity is used to test the differences between
two popuations that are homogeneous with respect to some characteristics.
In this test categories are assumed mutually and exhaustively exclusive. The
test statistics for chi square test of homogeneity is the same as that for chi
square of association.
2 = mi=1nj=1(OijEij)2Eij
2 ( fofe )2
XC =
fe
Example:
The city distributor of air conditioners in the city of Manila has divided the
area into four sub-areas. A prospective buyer of the distributorship was told
that the installations of the equipment are equally distributed. The
prospective buyer took a random sample of 40 installed performed during the
past years from the corporation file and found the following:
Based on the information can we say that the units are equally distributed?
Use = 0.05
3.) = 0.05
Xtab= 7.815
Df= c-1
4-1
Df= 3
fe=
r c
n
Where: r= row
c= column
df=(r-1)(c-1)
Example: a survey was conducted to determine whether gender and age are
related among stereo shop customers. A total of 200 respondents was taken
and the results are presented in the table.
Age Gender
Male fe Female fe TOTAL
Under 30 60 77 50 33 110
30 and 80 63 10 27 90
over
TOTAL 140 60 200
Conduct attest whether gender and age of stereo shop customers are
independent at 1% level of significance
1.) Ho: Gender and age of stereo shop customers are independent
Ha: Gender and age of stereo shop customers are dependent
3.) = 0.9
Df= (2-1)(2-1)
=1
Xtab2= 6.635
5.) Xc =2 (fofe)
fe
2 2 2 2
2 (6077) (8063) (5033) (1027)
Xc = + + +
77 63 33 27
XC2= 27.80
Economic Intelligence
Level
Dull fe Avera fe Intellige fe TOTA
ge nt L
Rich 81 128.6 322 347.3 273 160.0 636
7 1 1
Middle class 141 151.9 457 410.1 153 188.9 751
4 1 5
Poor 127 68.38 163 184.5 148 85.04 338
8
TOTAL 349 742 438 1725
Using this results, can we conclude that intelligence is related to the
economic level? Use 1% level of significance
3.) = 0.01
Df= (3-1)(3-1)
=4
Xtab2= 13.277
5.)
2
Xc =
(fofe)
fe
2 2 2 2 2
2 (81128.67) (322347.31) (273160.01) (141151.94) ( 457410.11) (1531
Xc = + + + + +
128.67 347.31 160.01 151.94 410.11 188
XC2= 137.70
Exercises
Question 1:
Find the chi square for the following given datas
Observed frequency 5 15 10 20
Expected frequency 10 20 5 30
Answer:
=9.58333)
Question 2:
Find the chi square for the following given datas
Observ 10 5 25 35
ed
freque
ncy
Expect 15 30 30 25
ed
freque
ncy
Answer:
(27.3332)
Question 3:
Find the chi square for the following given datas
Color Bl Bla Bro Yello
ue ck wn w
Observ 23 24 32 23
ed
freque
ncy
Expect 12 32 25 21
ed
freque
ncy
Answer:
(14.2338)
Question 4:
Determine whether the gender and shoe size are dependent among the students
of section ChE 4102 and instructors from Chemical Engineering Department of
Batangas State University. Use 0.01for level of significance.
Data
Shoe Size
Gender below 8 8 and above TOTAL
Male 2 13 15
Female 23 11 34
TOTAL 25 24 49
Answer:
(Gender and shoe size are dependent.)
Question 5:
49 samples are selected from the group of male and female students of section ChE
4102 from Chemical Engineering Department of Batangas State University with their
instructors. Determine whether gender and height are independent among the students
and their professor. Use 0.01 level of significance. Given data below:
GENDER HEIGHT
below 160 cm 160 cm and above TOTAL
Male 1 14 15
Female 16 18 34
TOTAL 17 32 49
Answer:
Question 6:
Reference:
http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/BS/BS704_Nonparametric/BS704_Nonparametric_print.html
http://math.tutorvista.com/statistics/chi-square-test.html