Eda PDF
Eda PDF
Eda PDF
CONCEPTS AND
VARIABLES
=================
Statistics Defined
Statistical tests are universally used in quantitative research. The rationale
of this chapter is to present a brief introduction as to the purpose of employing
statistics, divisions of statistics, and levels of measurements, concepts, and variables.
Statistics is the science of collecting, organizing, analyzing, and interpreting
numerical facts, which we call data. Statistics is a set of tools used to organize and
analyze data. Data must either be numeric in origin or transformed by researchers
into numbers. For instance, statistics could be used to analyze the percentage scores
students receive on a test. The percentage scores ranging from 0 to 100 are already
in numeric form. Statistics could also be used to analyze grades on an essay by
assigning numeric values to the letter grades, for example, A=1, B=2, C=3, and D=5.
Importance of Statistics
Data bombards us in everyday life. Most of us correlate statistics with the
bits of data that appear in news reports: baseball batting averages, imported car
sales, the latest poll of the president's popularity, average pulse rate and beat per
minute, and the average high temperature for today. Advertisements often claim
that data show the superiority of the advertiser's product. All sides in public debates
about economics, education, and social policy argue from data. Yet the usefulness of
statistics goes far beyond these everyday examples. The study and collection of data
are important in the work of many professions so that training in the science of
statistics is valuable preparation for a variety of careers. For example, government
statistical offices release the latest numerical information on unemployment and
inflation. Economists and financial advisors as well as policy makers in
government and business study these data to make informed decisions. Doctors
must understand the origin and trustworthiness of the data that appear in medical
journals if they are to offer their patients the most effective treatment. Politicians
4
rely on data from polls of public opinion. Market research data that reveal
consumer tastes influence business decisions. Farmers study data from field trials
of new crop varieties. Engineers gather data on the quality and reliability of
manufactured products. A teacher can also make use of statistical concepts in
analyzing the reliability and validity of the tests given to his students. He can also
make necessary predictions about the success of his students in various board
examinations. Most areas of academic study make use of numbers, and therefore
also make use of the method of statistics. We can no more escape data than we can
avoid the use of words. Just as words on a page are meaningless to the illiterate or
confusing to the partially educated, so data do not interpret themselves but must be
read with understanding. A writer can arrange words into convincing arguments or
incoherent nonsense. Similarly, you can manipulate data to be compelling,
misleading, or simply irrelevant. Numerical literacy, the ability to follow and
understand numerical arguments is important for everyone. The ability to express
oneself numerically, to be an author rather than just a reader is a vital skill in many
professions and areas of study. The study of statistics is therefore essential to a
sound education. We must learn how to read data, critically and with
comprehension; we must learn how to produce data that provide clear answers to
the important question, and we must learn sound methods for drawing trustworthy
conclusions based on data as well as acquire the ability to effectively communicate
valid conclusions. Statistics teaches you how to gather, organize, and analyze data,
and then infer the underlying reality from these data. It is a powerful intellectual
method that is applied in many contexts and most disciplines. Persons in industry
and government make decisions that are increasingly dependent upon the
collection and interpretation of data, and employers demand greater quantitative
sophistication from their employees (or prospective employees). Statistics students
learn to define problems, to think critically, to analyze and to synthesize which
prepares them to explore widely throughout their professional lives, and to be
creative and productive citizens -- regardless of the precise nature of a career. They
also learn to discover the integrity of data, the uncertainty of measurements, and,
through these, the development of understanding of the powers and limitations of
science. Moreover, students discover that there are things scientists can do and other
things they cannot do and that experimental results are not exact, but scientists can
usually evaluate the range of uncertainty within specified confidence limits. The
development of an understanding of the powers and limitations of science is
essential to rational participation in the resolution of societal issues. The language of
statistics is a foreign language to most students. Students must understand the
language to understand the concepts and the statistical procedures. Learning the
language of statistics provides students with insights and an awareness of ideas and
thoughts beyond the realm of previous experience. Statistical language requires
precision and careful attention to exactly communicate valid conclusions and
interpretations, which result from data analysis.
5
groups. These characteristics are referred to as variables. Data is gathered and
recorded for each variable. Descriptive statistics can then be used to reveal the
distribution of the data in each variable. Statistics is also frequently used for
purposes of prediction. Prediction is based on the concept of generalizability: if
enough data is compiled about a particular context (students studying Mathematics
in a specific set of classrooms), the patterns revealed through analysis of the data
collected about that context can be generalized (or predicted to occur in) similar
contexts. The prediction of what will happen in a similar context is probabilistic.
That is, the researcher is not certain that the same things will happen in other
contexts; instead, the researcher can only reasonably expect that the same things will
happen. Prediction is a method employed by individuals throughout daily life. For
instance, if writing students begin class every day for the first half of the semester
with a five-minute free writing exercise, then they will likely come to class the first
day of the second half of the semester prepared to again free write for the first five
minutes of class. The students will have predicted the class content based on their
previous experiences in the class: Because they began all previous class sessions with
free writing, it would be probable that their next class session will begin the same
way. Statistics is used to perform the same function; the difference is that precise
probabilities are determined in terms of the percentage chance that an outcome will
occur, complete with a range of error. Prediction is a primary goal of inferential
statistics.
VARIABLE
QUALITATIVE QUANTITATIVE
Nominal Ordinal
Discrete Continuous
Interval Ratio
6
Variables (Levels of Measurement)
Ordinal variables order (or rank) data in terms of degree. Ordinal variables
do not establish the numeric difference between data points. They indicate only that
one data point is ranked higher or lower than another. For instance, a researcher
might want to analyze the numeric or letter grades written on a student’s transcript
of record. A grade of A=1.0 would be ranked higher than B=2.0 and B=2.0 higher
than C=3.0. However, the difference between these data points, the precise distance
between 1.0 and 2.0, is not defined. Numeric or Letter grades are an example of an
ordinal variable. In other words, it involves ordering or ranking the variable under
consideration. Examples are ranking of professors in the college (Prof. I, Prof. II,
Prof. III, etc.), and fruits or vegetables classification. Categories are ordered
concerning the degree to which they possess particular characteristics without being
able to say how much of the character they possess. Another example is "greatly
dislike, moderately dislike, indifferent, moderately like, greatly like". Numbers
assigned to such variables indicate rank-order only, the "distance" between the
numbers has no meaning.
3. Interval Variables
Interval variables score data. Thus, the order of data is known as well as the
precise numeric distance between data points. A researcher might analyze the actual
percentage scores, assuming that percentage scores are given by the instructor. A
score of 99 (A) ranks higher than a score of 85 (B), which ranks higher than a score
of 75 (C). Not only is the order of these three data points known, but so is the exact
7
distance between them -- 14 percentage points between the first two, 10 percentage
points between the second two, and 24 percentage points between the first and last
data points. It is said to be equally spaced variables, e.g. temperature. The difference
between a temperature of 66 degrees and 67 degrees is taken to be the same as the
difference between 76 degrees and 77 degrees.
4. Ratio Scales
Like interval scale but with absolute (true) zero. Equal intervals are
equivalent (44cm is twice as long as 22cm; unit increases in weight from 12kg to 18
kg and same as the increase in 155kg to 161kg.
8
Dependent Variables. In an experimental study, a variable whose score
depends on (or is determined or caused by) another variable is called a dependent
variable. For instance, an experiment might explore the extent to which the quality
of students is affected by the approach of instruction they received. In this case, the
dependent variable would be the quality of graduates.
Characteristics of a Hypothesis
Examples of hypotheses:
9
Independent Variables Dependents Variables
Diet Intelligence
Newspaper Voting Patterns
Attendance at Lectures Exam Scores
Health Education Programs Number of People who Smoke
10
Worksheets #1
1) Define the Following Terms:
1.1) Statistics
1.2) Descriptive Statistics
1.3) Prediction
1.4) Nominal Variable
1.5) Ordinal Variable
1.6) Interval Variable
1.7) Ratio Scale
1.8) Inferential statistics
1.9) Dependent Variable
1.10) Independent Variable
11
Worksheet # 2
Indicate whether the variables concerned are Nominal, Ordinal, Interval,
or Ratio:
Variables Classification
1. Gender: Male or Female
2. Skin Color
3. Temperature of the Room
4. Your Height (in centimeters)
5. Population of Mango Tree
6. Social Class
7. Your Blood Type
8. Number of Students in this Room
9. Enrollees from Elementary to College
10.Level of Aggressiveness
11.Salary Grade
12.Academic Ranks
13.Marital Status
14.Academic Performance
15.Political Affiliation
16.Attitude Toward Nuclear Plant
17.Family Size
18.Courses Offered in the College
19.Perception of Leadership Style
20.Weight of the Baby
21.Business Classification
22.Community of Cow
23.Your phone number
24.Residents in Olongapo City
25.Volume of Water
12
Worksheet # 3
Indicate whether the following variables are Qualitative or Quantitative.
If they are quantitative, indicate whether they are Discrete or Continuous.
Variables Classification
1. The height of a plant between 1995 and 1996
2. The color of your professor’s shoes
3. Volume of Water
4. Your Height (in centimeters)
5. The marks of a class of students in a course
6. Social Class
7. Your Blood Type
8. The shoe sizes of that child between 5 and 10
9. Community of Carabao
10. Weight of the Baby
11.Salary Grade
12.Academic Ranks
13.Marital Status
14.Academic Performance
15.Political Affiliation
16.Attitude Toward Nuclear Plant
17. Residents in Olongapo City
18.Courses Offered in the College
19.Perception of Leadership Style
20. Level of Aggressiveness
21.The scent of the flowers of that plant
22. Enrollees from Elementary to College
23.Your phone number
24. Family Size
25. Temperature of the Room
13
Lesson 2
DATA PRESENTATION
TOOLS
=================
Descriptive Statistics enable us to understand data through summary
values and graphical presentations. It is important to look at summary statistics
along with the data set to understand the entire picture, as the same summary
statistics may describe very different data sets. Descriptive statistics can be
illustrated understandably by presenting them graphically using statistical and data
presentation tools.
Table 1
Profile of the Respondents in Terms of Age and Gender
Male Female
Age Bracket F % F %
21 - 25 9 8.18 4 3.64
26 – 30 8 7.27 14 12.73
31 – 35 7 6.36 6 5.45
36 – 40 6 5.45 7 6.36
41 – 45 9 8.18 22 20.00
46 – 50 5 4.55 8 7.27
51- Above 2 1.82 3 2.73
Total 46 41.82 64 58.18
14
Frequency Distribution
The easiest method of organizing data is a frequency distribution, which
converts raw data into a meaningful pattern for statistical analysis.
2. When all intervals are to be the same width, the following rule may be
used to find the required class interval width:
W = (L - S) / N
Where:
W= class width,
L= the largest data,
S= the smallest data,
N= number of classes
Example: Suppose the ages of a sample of 10 students are: 20.9, 18.1, 18.5,
21.3, 19.4, 25.3, 22.0, 23.1, 23.9, and 22.5. We select N=4 and W=(25.3 -
18.1)/4 = 1.8 which is rounded-up to 2. The frequency table is as follows:
18--20................................3..................................30%
20--22................................2..................................20%
22--24................................4..................................40%
24--26................................1..................................10%
Note: The sum of all the relative frequencies must always be equal to 1.00
or 100%. In the above example, we see that 40% of all students are younger than 24
years old but older than 22 years old. The relative frequency may be determined for
both quantitative and qualitative data and is a convenient basis for the comparison
of similar groups of different sizes.
15
What does Frequency Distribution tell Us?
Illustration: Consider the following set of data, which are the scores
recorded for 30 participants. We wish to summarize this data by creating a
frequency distribution of the scores.
1. Identify the highest and lowest values in the data set. For our scores, the
highest score is 51 and the lowest is 43.
2. Create a column with the title of the variable we are using, in this case, the
score. Enter the highest score at the top and include all values within the
range from the highest score to the lowest score.
3. Create a tally column to keep track of the scores as you enter them into the
frequency distribution. Once the frequency distribution is completed you
can omit this column. Most printed frequency distributions do not retain the
tally column in their final form.
4. Create a frequency column, with the frequency of each value, as shown in the
tally column, recorded.
5. At the bottom of the frequency column record the total frequency for the
distribution proceeded by N =
6. Enter the name of the frequency distribution at the top of the table.
16
If we applied these steps to the data, we would have the following frequency
distribution.
17
Cumulative Frequency Distribution for Scores Recorded for 30
Participants: N = 30
Scores Recorded Tally Frequency Cumulative Frequency
51 //// 4 30
50 //// 4 26
49 ////// 6 22
48 0 16
47 /// 3 16
46 /// 3 13
45 //// 4 10
44 /// 3 6
43 /// 3 3
18
6) To find the upper limit of the first class, subtract one from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the rest
of the upper limits.
7) Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5
units from the upper limits. The boundaries are also halfway between the upper limit
of one class and the lower limit of the next class.
8) Tally the data.
9) Find the frequencies.
10) Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find the cumulative frequencies.
11) If necessary, find the relative frequencies and/or relative cumulative frequencies.
Illustration: Look at the following data set. The highest score is 59 and the
lowest is 39. If we were to create a simple frequency distribution of this data we
would have 21 values. This is greater than 20 values so we should create a grouped
frequency distribution.
If we use this data and follow the suggestions for the creation of a grouped frequency
distribution, we will create the following grouped frequency distribution.
19
Cumulative Grouped Frequency Distribution
20
similar to pie charts, show proportional relationships between data within each bar.
In addition, divided bar graphs can show changes over time.
Step 1. Choose the type of bar chart that stresses the results to be focused
on. Grouped and stacked bar charts will require at least two classification variables.
For a stacked bar chart, tally the data within each category into combined totals
before drawing the chart.
Step 2. Draw the vertical axis to represent the values of the variable of
comparison (e.g., number, cost, time). Establish the range for the data by subtracting
the smallest value from the largest. Determine the scale for the vertical axis at
approximately 1.5 times the range and label the axis with the scale and unit of
measure.
Step 3. Determine the number of bars needed. The number of bars will
equal the number of categories for simple or stacked bar charts. For a grouped bar
chart, the number of bars will equal the number of categories multiplied by the
number of groups. This number is important for determining the length of the
horizontal axis.
Step 4. Draw bars of equal width for each item and label the categories and
the groups. Provide a title for the graph that indicates the sample and the period
covered by the data; label each bar.
2. Pie Chart or Circle Graph. Pie charts show proportions about a whole,
with each wedge representing a percentage of the total. Pie charts are useful for
displaying:
21
How to Use a Pie Chart
Step 2. Draw a circle. Using the percentages, determine what portion of the
circle will be represented by each category. By eye, divide the circle into four
quadrants, each representing 25 percent.
Step 4. Provide a title for the pie chart that indicates the sample and the
period covered by the data.
Caution:
a. Be careful not to use too many notations on the charts. Keep them as
simple as possible and include only the information necessary to
interpret the chart.
22
c. Whenever possible, use bar or pie charts to support data interpretation.
Do not assume that results or points are so clear and obvious that a
chart is not needed for clarity.
3. Line Graph (Polygon Method). Line graphs show sets of data points
plotted over some time and connected by straight lines. Line graphs are useful for
displaying:
23
Worksheet # 4
Distance to Number of
Number Gender Major
School (Km) Siblings
1 F BSED 5 3
2 F BSBA 9 5
3 M BSED 11 6
4 F AB 14 3
5 M BSN 3 3
6 F BSED 5 6
7 M AB 20 3
8 F AB 7 1
9 F BSBA 9 6
10 F AB 6 3
11 M BSN 10 4
12 F BSBA 10 2
13 M BSED 5 1
14 F BSBA 9 3
15 M ECE 10 1
16 F AB 3 2
17 F Educ 15 4
18 M AB 6 5
19 F ECE 30 3
20 M AB 8 5
24
Worksheet # 5
Given the following set of data as the actual scores of fifty BSA students in
their major examination, prepare a table showing the following:
1) Class Interval
2) Tally
3) Interval Midpoint
4) Frequency
5) Cumulative Frequency
25
Worksheet # 6
1) Gender
2) Major
3) Distance to School
4) Number of Siblings in the family
Using the data available presented in Table (Worksheet #4) for 20 college
students, prepare a line graph to illustrate the following:
1) Gender
2) Major
3) Distance to School
4) Number of Siblings in the family
1) Gender
2) Major
3) Distance to School
4) Number of Siblings in the family
26
Lesson 3
MEASURES OF
CENTRAL TENDENCY
AND VARIATIONS
=================
A. Central Tendency. Central tendency is measured in three
ways: Mean or Arithmetic Mean, Median, and Mode.
1. Arithmetic Mean (X )
Arithmetic Mean is defined as the sum of all of the items, divided by the
number of them:
Formulas:
27
Where:
X ═ ΣX
N
Where:
X = Mean
Σ = Summation Symbol
X = Frequency
N = Number of cases
ΣX = 9+4+7+7+4
N = 5
= 31
5
= 6.2
Note that we have added a column called fX or the score times the
frequency of that score.
Also, note that the sum of the fX column is recorded at the bottom of the
column.
28
Frequency Distribution of Ages for Respondents in
the Research Conducted
Age (X) Frequency (f) fX
11 2 22
10 4 40
9 8 72
8 7 56
7 3 21
6 2 12
5 1 5
N= 27 228
The formula for the population means for a set of scores in a frequency
distribution is:
μ = ∑ fX
N
μ = 228 = 8.44
27
Consider the following working formula for the mean of the grouped data.
Formula:
__
X ═ ΣFM
N
Where:
X = Mean
Σ = Summation Symbol
F = Frequency
M = Midpoint.
N = Number of cases
Illustration: Now let us consider the data presented in the table. We can
compute the mean age of the two groups of respondents, the males, and the
females, by determining the (M) midpoint of the age bracket.
29
Male Female
Age Bracket F % F %
21 - 25 9 8.18 4 3.64
26 – 30 8 7.27 14 12.73
31 – 35 7 6.36 6 5.45
36 – 40 6 5.45 7 6.36
41 – 45 9 8.18 22 20.00
46 – 50 5 4.55 8 7.27
51- Above 2 1.82 3 2.73
Total 46 41.82 64 58.18
Complete the Tables below to compute the mean age of the male and
female respondents using the equation above:
30
Weighted Mean
Choices
STATEMENTS WM DR
5 4 3 2 1
1) I prefer science to other 47 57 60 62 64 2.87 Sometimes
subjects.
2) I consider science learning
an interesting and exciting 102 85 43 46 14 3.74 Oftentimes
experience.
3) I feel that science concepts
and skills answer a lot of 52 110 52 38 38 3.34 Sometimes
problems I encounter.
4) I work cooperatively with 83 83 61 43 20 3.57 Oftentimes
other members of the
group.
5) I like to work on difficult
concepts and accept… 44 62 46 53 85 2.75 Sometimes
Formula:
--
X ═ ΣFX
N
--
Where: X = Weighed Mean
Σ = Summation Symbol
F = Frequency
X = Weights of each item
N = Number of cases
--
X ═ Σ (47)(5) + (57)(4) + (60)(3) + (62)(2) + (64)(1)
290
--
X ═ 235 + 228 + 180 + 124 + 64
290
31
--
X ═ 831
290
--
X = 2.8655 or 2.87
2. Median (Mdn). The median is the middle score in a distribution or the middle
most score of the distribution.
The formula for finding the median position of ungrouped data is:
Mdn = (n+1) / 2
If there are even numbers of items, two middle items are added together and
divided by two.
For example, if 6 people earn 60 pesos, 70 pesos, 100 pesos, 110 pesos, 115
pesos, 320 pesos per week, the median position would be (n+1) / 2, which equals
(6+1) / 2, which is 3.5.
For the following set of scores (already arranged in order of size): 10, 14,
16, 16, 19, 23, 27
The median would be the fourth score as there are seven scores. The value of
the fourth score is 16 so the median of the scores is 16.
For the following set of scores (already arranged in order of size): 10, 14,
16, 19, 23, 27
The median would be halfway between the third and fourth scores as there
are six scores. Halfway between 16 (the third score) and 19 (the fourth score) is (16 +
19)/2 which is 17.5, so the median is 17.5
32
Median of Grouped Data. The Formula for finding the Median of
grouped data is:
N/2 - F
Mdn = L+ (i)
fm
Where:
Mdn = Median
L = Exact lower limit of interval containing the Mdn class
fm = Frequency of interval containing the Mdn class
F = Frequency below L
N = Number of cases
I = Class Interval
= 40.5 + 32 - 31 ( 5 )
22
= 40.5 + 1 ( 5 )
22
= 40.5 + 5
22
= 40.5 + 0.227
Mdn = 40.73
33
3. Mode (Mo). The mode is the most frequently occurring value in a distribution.
Mode of Ungrouped Data. The mode for the ungrouped data is the
specific score, which has a higher frequency. We can identify this value by
inspection.
Illustrations:
2. Sometimes two figures tie for "most popular", as with the ages 17; 17; 17;
17; 18; 18; 18; 18; 19; 19; 20; 21.
3. For the following set of 10 scores: 14, 18, 16, 17, 20, 15, 18, 17, 16, 17
Mode of Grouped Data. The mode for the grouped data is identified as
the midpoint of the interval containing the largest number of cases of frequency.
Age Bracket F
21 - 25 4
26 – 30 3
31 – 35 6
36 – 40 5
41 – 45 10
46 – 50 8
51 - 55 5
56 - 60 6
61 - 65 2
Total 49
The mode ( Mo) is 43 because the class interval 41 – 45 has the largest
frequency and 43 is the midpoint of the class interval.
34
B. Measures of Variation. The measures of variability we will
consider are the range, the variance, and the standard deviation.
Range. The range is the distance between the lowest data point and the
highest data point. Deviation scores are the distances between each data point and
the mean.
The Range of a Set of Data. The range of a set of data is the difference
between the highest and lowest values in the set. To find the range, first, order the
data from least to greatest. Then subtract the smallest value from the largest value in
the set.
Illustrations:
1. John Erick took 7 math tests in one marking period. What is the range of
her test scores?
Highest - lowest = 94 - 73 = 21
In the problem above, the set of data consists of 7 test scores. It is ordered
the data from least to greatest before finding the range. It is recommended that you
do this, too. This is especially important with large sets of data.
35
3. A marathon race monitored by Mr. Redondo is completed by 5
participants. What is the range of times given in hours below?
Set A: 4, 5, 6, 7, 8
Set B: 2, 4, 6, 8, 10
We can see that the means for these two sets of scores are the same:
Although these two sets of scores have the same Mean, they differ in how
spread out the scores are.
The scores in Set A vary over a smaller set of values (4 through 8) than does
Set B which varies over score values from 2 through 10.
-
X -X
To define the amount of deviation of a dataset from the mean, calculate the
mean of all the deviation scores, i.e. the variance:
σ 2 = ∑ ( X - μ) 2
N
Where:
-
S2 = ∑ ( X - X ) 2
n-1
Where:
n = Sample size
37
Calculating Variance by the Deviation Score Method. Let us first
consider the deviation score method for calculating the variance for a population of
scores. Although in practice the deviation score is rarely used, it does help us
understand how variance is calculated. It follows our previous definition of variance
and could thus be referred to as the definitional method of calculating variance. The
formula for the variance for a population using the deviation score method is as
follows:
We can see that the formula truly does say what the definition of variance
says - variance is the average squared deviation of the scores from the mean. We will
use this formula to find the variance for the seven scores found in the following
worksheet. In the first column, we have the seven scores. At the bottom of the
column is the sum of the scores and this can be used to find the mean. The mean =
the sum of the scores divided by the number of scores or 28/7 = 4.
In the second column is recorded the value for the score minus the mean
and in the third column is recorded this value squared, that is the square of the
score minus the mean. At the bottom of the third column is the sum of the
squared deviations of the scores from the mean and this is the quantity we need
to calculate the variance.
= 4
7
= 0.57
38
If we were to consider the scores above as a sample rather than a
population, we would use the formula for the variance for a sample using the
deviation score method. That formula is:
S2 = ∑ ( X – X )2
n-1
There are three differences between the formula for the sample variance and
the formula for the population variance. Thus, we can see that for the sample
variance for our problem above we simply divide by n-1 rather than N.
S2 = ∑ ( X – X ) 2
n-1
S2 = 4
7–1
= 0.667
For our problem then, the sample variance is 0.667 (slightly larger than the
population variance).
We can see from this formula that to find the population variance we sum up
the squared individual scores ∑ X 2 and subtract from them the sum of the scores
quantity squared divided by the number of scores.
(∑ X )2
N
39
The variance then is this quantity divided by the number of scores. We can
see that the only data we need to make this calculation are the sum of the scores, the
sum of the squared scores, and the number of scores.
The quantity recorded at the bottom of the first column is the sum of the
scores ∑ X and the quantity at the bottom of the second column is the sum of the
squared scores ∑ X 2. The number of scores N can be obtained by counting the
number of scores in the first column. We can put these quantities into the formula
for population variance to obtain 0.57, the same answer as we got with the deviation
score method.
116 - (28)2
= 7
7
= 116 - 112
7
= 4 / 7 = 0.57
40
There are three differences between the formula for the sample variance and
the formula for the population variance
2. The denominator of the equation uses the sample size -1 (n -1) rather
than N.
Thus, we can see that for the sample variance for our problem above we
simply divide by n-1 rather than N. For our problem then, the sample variance is
0.667 (slightly larger than the population variance).
116 - (28)2
= 7
7–1
= 116 - 112
7-1
= 4 / 6 = 0.667
41
Formulas:
For a Sample:
For a Population:
The formula tells us to take a score and subtract the mean from it and square
the difference. We then sum up all of these squared differences and divide by the
number of scores. Finally, we take the square root of the product and this is the
standard deviation.
Let's do a problem in finding the standard deviation using the same data we
used earlier to find the variance.
42
The sum of squared deviation scores is found at the bottom of the third
column and this is the quantity we need (as well as N) to find the standard deviation.
So using the formula we find the standard deviation to be 0.755
= 4
7
= 0.57
= 0.755
So, for our example, the sample standard deviation by the deviation score
method is 0.816.
Σ (X – X)2
S = n - 1
S = 4
7 - 1
S = 4
6
S = 0.667
S = 0.816
43
Standard Deviation by the Raw Score Method. We can calculate the
standard deviation for a population of scores using the raw score method by using
the following formula:
The two formulas only differ by the term in the denominator. With the
population, the denominator consists of N, while for the sample we use n-1 in the
denominator.
Let’s use the scores in the following worksheet to find the standard deviation
for a population and a sample by the raw score method.
At the bottom of the first column is the sum of the scores (30) and at the
bottom of the second column is the sum of the squared scores (220).
We need these quantities and the number of scores (5) to calculate the
standard deviation.
44
220 – 180
= 220 – (30) 2 = 5
5
= 8
= 2.828
220 – 180
= 220 – (30) 2 = 4
5
5-1
= 10 = 3.162
So, for our five scores, the standard deviation considering the scores as a
population is 2.828 while the standard deviation for the same five scores if we
consider them a sample is 3.162
45
Worksheet # 7
In the above study, compute the Mean, Median, and Mode for the data
set for the BSBA students’ self-efficacy scale.
Mean = ___________________
Median= __________________
Mode = __________________
46
Worksheet # 8
How would you work the Mean, Median, and the Mode of the following set
of numbers?
Try to think this through without looking back to the previous pages.
13, 11, 17, 14, 20, 6, 15, 15, 12, 6, 13, 24, 22, 19
Mean = ___________________
Median= __________________
Mode = __________________
47
Worksheet # 9
The 25 junior students were asked to write a 1 next to their name on the role
sheet if their family income is less than P15, 000; a 2 if their income is P16, 000 to
P20, 000; 3 if their income is P21, 000 to P25, 000; 4 if their income is 26,000 to
30,000; 5 if their income is 31,000 to 35,000; 6 if their income is 36,000 or more.
48
Worksheet # 10
Compute the weighted mean (wx) of each item and supply the
descriptive rating based on the computed mean:
5 – Always
4 – Oftentimes
3 - Sometimes
2 - Seldom
1 – Never
STATEMENTS Choices
WX DR
5 4 3 2 1
1) I tolerate criticisms.
79 62 54 50 45
2) I like to create and suggest.
83 58 51 45 53
3) I like to use instructional
materials. 79 69 68 29 45
4) I cooperate with my
teachers in doing. 92 64 53 46 35
5) I study and perform
assignments in science. 76 52 59 41 62
49
Worksheet # 11
The manager of Bank XXX is confused about the variations of the salary
given to the 10 rank-in-file employees. The following are the individual gross salary
received by the employees per month. Calculate the Variance by Deviation
Method:
50
Worksheet # 12
The Head Nurse is trying to make necessary computations of the variance
regarding volume (liter) of dextrose consumed by 10 patients every three days of
confinement. Calculate the Variance by Raw Method:
51
Worksheet # 13
52
Worksheet # 14
53
Lesson 4
SAMPLING DESIGN
AND TECHNIQUE
=================
When undertaking any survey, it is essential that you obtain data from
people that are as representative as possible of the group that you are studying. Even
with the perfect questionnaire, your survey data will only be regarded as useful if it is
considered that your respondents are typical of the population as a whole.
• Methodology selected
• Degree of accuracy required for the study (how much error can be
tolerated)
• The extent to which there is variation in the population about key
characteristics of the study
• Likely response rate (which itself will depend on sampling method
selected)
• Time and money available
54
Probability and Non-Probability Sampling
A probability sample is one in which each member of the population has
an equal chance of being selected.
1. Probability Samples. There are five main types of a probability sample. The
choice of these depends on the nature of the research problem, the availability of a
good sampling frame, money, time, desired level of accuracy in the sample, and data
collection methods. Each has its advantages, each it's disadvantages. They are:
• Simple random
• Systematic
• Random route
• Stratified
• Multi-stage cluster sampling
Characteristics:
• Each person has the same chance as any other of being selected
• Standard against which other methods are sometimes evaluated
• Suitable where the population is relatively small and where the sampling
frame is complete and up-to-date
Procedure:
Decide on a pattern of movement through the table and stick to it, e.g.
numbers from every second column and every row. If a number comes up twice or a
number is selected which is larger than the population number, discard it.
55
1.2. Systematic sampling. Similar to simple random sampling, but
instead of selecting random numbers from tables, you move through the list (sample
frame) picking every nth name.
You must first work out sampling fraction by dividing population size by
required sample size. For example, for a population of 500 and a sample of 100, the
sampling fraction is 1/5. Thus, you will select one person out of every five in the
population. The random number needs to be used only to decide on starting point.
With the sampling fraction of 1/5, the starting point must be within the first 5 people
on your list
Advantages:
Problems:
1.4. Stratified Sampling. All people in the sampling frame are divided
into "strata" (groups or categories). Within each stratum, a simple random sample or
systematic sample is selected.
56
females. In this case, there are 22 male students and 28 females. To work out the
number of males and females in the sample, we have:
• Cheaper
• Used when sampling frame is not available
• Useful when the population is so widely dispersed that cluster sampling
would not be efficient
• Often used in exploratory studies, e.g. for hypothesis generation
• Some research is not interested in working out what proportion of the
population gives a particular response but rather in obtaining an idea of
the range of responses on ideas that people have.
57
to him/her to be representative of the population and will usually try to ensure that a
range from one extreme to the other is included.
Often used in political polling - districts chosen because their pattern has in
the past provided a good idea of outcomes for the whole electorate.
2.2. Quota Sampling. Have you ever been ambling along your local High
Street, noticed a Market Researcher with a clipboard, and thought "I don't mind
being asked some questions - it might be interesting", only to find that the
researcher looks straight through you? No? Well, for those people who have had that
happen, there is no need to take it personally. It is all due to quota sampling.
Stages:
2.4. Snowball sampling. With this approach, you initially contact a few
potential respondents and then ask them whether they know of anybody with the
58
same characteristics that you are looking for in your research. For example, if you
wanted to interview a sample of vegetarians/cyclists/people with a particular
disability/people who support a particular political party etc., your initial contacts
may well know (through e.g. support group) of others.
59
Worksheet # 15
60
Worksheet # 16
1) Can you think of some different "strata" (other than marital status) that could
be used if you were undertaking a stratified sample of the adult population of
a certain city? Give example and Discuss.
2) Company ZANDREX employs 150 part-time staff and 550 full-time staff, and
you want to carry out depth interviews with 20 of these. If you were to take a
random sample, you might find that all of the names you selected were the
part-time staff. For this reason, you have decided to undertake a stratified
sample. Work out how many part-time and how many full-time employees you
should interview to accurately reflect the proportions of the two groups in the
whole workforce.
61
Worksheet # 17
You are employed by a DENR research department and have been asked to
undertake a study across the whole of the Country to determine attitudes towards
the Nuclear Power Plant.
You have been asked to interview 10,000 people in total. Consider what
would happen if you undertook a simple random survey....... Of these 10,000
you might, for example, have to interview 2 people in NCR, 3 in the R -III, a dozen
scattered across the R - IV, and so on. As this is uneconomic, both in terms of time
and money, it would be considered more efficient to undertake a Multi-stage
Cluster Sample. Think of some ways in which you could divide the country up into
progressively smaller sections to undertake this type of sample.
62
Worksheet # 18
63
Lesson 5
HYPOTHESIS TESTING
=================
To answer a statistical question, the question is translated into a
hypothesis - a statement that can be subjected to test. Depending on the result of
the test, the hypothesis is accepted or rejected.
The hypothesis tested is known as the null hypothesis (H0). This must be
a true/false statement. For every null hypothesis, there is an alternative
hypothesis (HA).
Formulation of Hypotheses
Inferential statistics is all about hypothesis testing. The research
hypothesis will typically be that there is a relationship between the independent and
dependent variable, or that treatment has an effect, which generalizes to the
population. On the other hand, the null hypothesis, upon which the probabilities are
based, will typically be that there is no relationship between the independent and
dependent variable, or that the treatment has no real effect. In other words,
apparent differences in the samples are flukes. To examine these hypotheses many
different kinds of statistical tests are used. These tests are chosen based on the kinds
of variables being examined and the kinds of samples.
Constructing and testing hypotheses is an important skill, but the best way
to construct a hypothesis is not necessarily obvious:
• The null hypothesis has priority and is not rejected unless there is strong
evidence against it.
• If one of the two hypotheses is 'simpler' it is given priority so that a more
'complicated' theory is not adopted unless there is sufficient evidence
against the simpler one.
• In general, it is 'simpler' to propose that there is no difference between two
sets of results than to say that there is a difference.
The outcome of a hypothesis testing is "reject H0" or "do not reject H0". If we
conclude, "do not reject H0", this does not necessarily mean that the null hypothesis
64
is true, only that there is insufficient evidence against H0 in favor of HA. Rejecting
the null hypothesis suggests that the alternative hypothesis may be true.
Types of Error
Let us look first at the upper left-hand corner of the table. Let us suppose
you sampled ten people and from the study, you concluded that the drug-affected. It
appeared to reduce their diastolic blood pressure. If your conclusion is not the result
of a fluke, a product of sampling error or measurement error or something like that,
then your conclusion is correct, and you have the truth. You are in the upper left
corner of the table. On the other hand, if you conclude there is no effect, that is you
tried drug A to reduce diastolic blood pressure and you conclude it did not work, and
that is the case you've made another correct decision. You have the truth and you're
in the lower right-hand corner of the table. So, if the test showed no effect and there
is no effect, then you're in the lower right corner.
Reality
Study's Conclusion True Effect No Effect
Type I
True Effect (Reject null Truth or
hypothesis) Alpha Error
Type II
No Effect (Don't reject the null or Truth
hypothesis) Beta Error
Let us rephrase it. If there is an effect and you conclude there is an effect,
you have made the correct decision in the upper left corner. If there is no effect and
65
you conclude there is no effect, you've made the correct decision in the lower right
corner. In both these cases, you've made the correct decision.
Looking at the table it should be apparent that you could also make incorrect
decisions or errors. If there is no effect, but you conclude there is an effect, then
you've made an error, the error in the upper right corner, a Type I Error or alpha
error. Alpha is the probability of a Type I Error.
If there is an effect, but you conclude there is no effect, then you've made the
error in the lower-left corner, a Type II Error or beta error. Beta is the
probability of a Type II Error.
Sometimes people have a hard time remembering whether alpha and beta go
with Type I or Type II. Just remember both alpha and "I" are first and both beta and
"II" are second. They go together.
Let's be more concrete now. The first kind of error, Type I or alpha
errors are like false alarms. You think you're on to something; there's an effect. But,
in reality, there isn't. Type I Error occurs when you do an experiment and an
apparent effect isn't a real one. Maybe groups were a little strange so that they
responded better or worse to the drug than the average person might. So, you get an
experiment in which you seem to affect when in reality there is no effect. That's
called a Type I or alpha error. It's a false positive.
Now you can also have an experiment where you find no effect. A lot of
experiments are like this, where you have no effect that you can distinguish but the
treatment, in reality, was producing or could produce in the population a good
result. In this case, you've missed a good treatment, because our little experiment
didn't show an effect. That sort of error is beta error or Type II Error. It's a false
negative. So, Type I Error is a false positive, and Type II error is a false negative.
66
Worksheet # 19
H0 :
HA :
H0 :
HA :
3. Culture and Climate at Company XYZ in Olongapo City as Assess by the Managers
and Rank-in-File as Basis for more Efficient Production.
H0 :
HA :
67
4. Philosophical Practices of College Educators and Administrators in a certain
University as a Baseline Study to Enhance the Teaching-Learning Process.
H0 :
HA :
H0 :
HA :
H0 :
HA :
68
Lesson 6
STATISTICAL
RELATIONSHIPS
AMONG VARIABLES
================
Statistical relationships between variables rely on notions of correlation.
This concept aims to describe how variables relate to one another
Correlation
Correlation tests are used to determine how strongly the scores of two
variables are associated – or correlated – with each other. A researcher might want
to know, for instance, whether a correlation exists between Accounting students'
pre-board examination scores and their scores on the CPA board examination.
Examples:
69
from different parties. In many elections, Party XXX is unlikely to vote for Party
ZZZ, and vice versa.
70
Statistical Significance of a Correlation
Table (r) is used to determine the significance of r. The left column df
represents the degrees of freedom: df = Npairs - 2 (the number of pairs of XY scores
minus 2). df represents the number of values that are free to vary when the sum of
the variable is set; df compensates for small values of N by requiring higher absolute
values of r before being considered significant.
Step 1. Find the degrees of freedom in the left column: df = Npairs of data
minus 2
Step 2. Read across the df row and compare the obtained r value with the
value listed in one of the columns. The heading at the top of each column indicates
the odds of a chance occurrence, (the probability of error when declaring r to be
significant.) p=.10 is the 10% probability; p=.05 is the 5% probability level, and
p=.01 is the 1% probability level.
If r locates between values in any two columns, use the left of the two
columns (greater odds for chance). If r does not equal or exceed the value in the
p=.10 column, it is said to be non-significant (NS). Negative r’s are read using the
absolute value of r.
Values of the Correlation Coefficient (r)
df p = .10 p = .05 p = .01
1 .9877 .9969 .9999
2 .900 .950 .990
3 .805 .878 .959
4 .729 .811 .917
5 .669 .754 .875
6 .621 .707 .834
7 .582 .666 .798
8 .549 .632 .765
9 .521 .602 .735
10 .497 .576 .708
11 .476 .553 .684
12 .457 .532 .661
13 .441 .514 .641
14 .426 .497 .623
15 .412 .482 .606
16 .400 .468 .590
17 .389 .456 .575
18 .378 .444 .561
19 .369 .433 .549
20 .360 .423 .537
25 .323 .381 .487
30 .296 .349 .449
35 .275 .325 .418
40 .257 .304 393
45 .243 .288 .372
50 .231 .273 .354
60 .211 .250 .325
70 .195 .232 .302
80 .183 .217 .283
90 .173 .205 .267
100 .164 .195 .254
71
Pearson Product-Moment Correlation
Coefficient
The most widely used type of correlation coefficient is Pearson r also called
linear or product-moment correlation. Using non-technical language, one can
say that the correlation coefficient determines the extent to which values of X and Y
variables are "proportional" to each other. The value of the correlation does not
depend on the specific measurement units used; for example, the correlation
between height and weight will be identical regardless of whether inches and pounds
or centimeters and kilograms are used as measurement units. Proportional means
linearly related; that is, the correlation is high if it can be approximated by a straight
line (sloped upwards or downwards).
Formula:
or
Where:
Illustrations:
By looking at the formula we can see that we need the following items to
calculate r using the raw score formula:
73
Computations:
= (10)(506) - (55)(103)
= 5060 - 5665
= -605
(28.723) (58.318)
= -605
1675.0679
r = - 0.36
74
2. Relationship of Scores on 20-point English and Solving Problem quizzes
of five (5) students.
= (5)(937) - (70)(65)
= 4685 - 4550
= 135
(13.038) (12.247)
= 135
159.68
r= 0.845
75
There is indeed a high positive correlation between the English and
Problem-Solving scores.
H 0: r = 0
H1: r > 0
Note: Our null hypothesis, for the Pearson r, states that r is 0. The alternative
hypothesis states that r has a significant positive value.
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making
a type I error.
r = 0.845
df = N - 2 = 5 - 2 = 3
Note: Since our calculated value of r of .845 is lesser than .878 we accept the null
hypothesis.
76
Spearman Rho
The Spearman Rho correlation tells you the magnitude and direction of
the association between two variables that are on an interval or ratio scale.
Hypotheses:
Correlation coefficients can vary from -1.0 to +1.0. The Rho here of
.382 is positive, which means that a higher score on the commitment scale is
correlated with a higher score on the achievement scale. Thus we can reject the null
hypothesis that these variables are not related. A Rho of 0.0 would mean that there
is no correlation between the two variables. A Rho of .382 means there is a
moderate correlation between the two scales. It appears that individuals who score
high on intellectual commitment also score high on intellectual achievement.
The magnitude is the strength of the correlation. The closer the correlation
is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close
to 0, there is no association between the two variables. Here, we have a moderate
correlation (r = -.392).
The direction of the correlation tells us how the two variables are related.
If the correlation is positive, the two variables have a positive relationship (as one
increases, the other also increases). If the correlation is negative, the two variables
have a negative relationship (as one increases, the other decreases). Here, we have a
negative correlation (r = -.392). As self-esteem increases, anxiety decreases.
77
X Y
7 4
5 7
8 9
9 8
X Y
2 1
1 2
3 4
4 3
rs = 1 - 6 ∑D2
N3 - N
Where:
Illustrations:
78
Student English (X) Solving Problem (Y) D D2
A 3 3 0 0
B 1 2 -1 1
C 2 1 1 1
D 5 4 1 1
E 4 5 -1 1
N =5 0 4
Computations:
rs = 1 - 6 ∑D2
N3 - N
= 1 – 6 (4)
53 - 5
= 1– 24
125 - 5
= 1 - 24
120
= 1 - .2
79
2. We have seven students whose art projects are being ranked by two judges, and
we wish to know if the two judges’ rankings are related to one another. In other
words, do the two judges tend to rank the seven art show contestants in the same
way?
Computations:
rs = 1 - 6 ∑D2
N3 - N
= 1– 6(10)
73 - 7
= 1 – 60
343 - 7
= 1 - 60
336
= 1 - 0.1786
80
Thus, we can see that there is a high positive correlation, but not a perfect
correlation between the two judges’ ratings.
To do this we first convert the reading test scores to ranks by assigning the
highest score a rank of 1, the next highest a rank of 2, etc.
Now we are looking at rankings on two variables and can use the
Spearman Rank-Difference Correlation Coefficient to test the significance
of the relationship. The two sets of ranks, as well as the difference between the pairs
of ranks (D) and the differences squared (D2), are shown in the following table.
1 3 -2 4
2 2 0 0
3 1 2 4
4 4 0 0
5 5 0 0
6 6 0 0
7 8 -1 1
8 7 1 1
9 10 -1 1
10 9 1 1
Total 12
81
Computations:
= 1 – 6(12)
103 - 10
= 1 – 72
1000 - 10
= 1 - 72
990
= 1 – 0.072
df = N - 2 = 10 - 2 = 8
H0: rS = 0
H1: rS > 0
Note: Our null hypothesis states that there is no significant relationship between the
two variables. The alternative hypothesis states that there is a significant positive correlation
between the two variables.
α = .05
82
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.
rS = .93
df = N - 2 = 10 - 2 = 8
Note: To write the decision rule we had to know the critical value for r S,
with an alpha level of .05 and 8 degrees of freedom. We can do this by looking at
Appendix Table, this is the same table we used for the Pearson r, and noting the
tabled value for the column for the .05 level and the row for 8 df (.632).
Note: Since our calculated value of r S (.93) is greater than .632, we reject
the null hypothesis and accept the alternative hypothesis.
83
Worksheet # 20
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Time Examination
(minutes) Score
34 78
46 92
60 56
37 82
41 96
54 66
56 79
48 85
39 92
50 80
84
Worksheet # 21
State the null hypothesis, the alternative hypothesis, set the alpha level, the
calculated value of the statistic, the decision rule, the decision statement, and the
statement of result. Problem:
Rank 1 Rank 2
1 2
2 5
3 4
4 1
5 6
6.5 3
6.5 7.5
8 10
9 9
10 12
11.5 7.5
11.5 11
85
Worksheet # 22
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: Below is the mean evaluations done by the evaluators and the number of
minutes devoted to doing the project by the eleven contestants. Is there a
significant positive correlation between these two variables?
86
Worksheet # 23
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: Two different medical doctors evaluated ten brands of milk. Based upon
each judge's evaluation, the milk is ranked from 1 to 10. Is there a significant
positive correlation between the rankings of the two doctors?
87
Worksheet # 24
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: Find the Spearman Rank Order Correlation Coefficient between the two
rankings.
88
Lesson 7
CHI-SQUARE TEST
=================
Chi-square (X2) Test
A test is similar to tests of correlation in that it measures the strength of
associations between variables. The Chi-square test can be used to test associations
in one or more groups, and it does this by comparing actual (observed) numbers in
each group, with those that would be expected according to theory or simply by
chance. The Chi-square test requires that the data be expressed as frequencies, i.e.
numbers in each category; this is the nominal level of measurement. It should be
noted that in most cases almost any data can be reduced to categorical or frequency
data, but it is not always wise to do this because the information is invariably lost in
the process.
89
We will now consider a widely used non-parametric test, Chi-square,
which we can use with data at the nominal level that is classificatory data.
X2 = ∑ ( 0 – E ) 2
E
Where:
df = C - 1
Where:
90
One-Variable Chi-Square (Goodness-of-
Fit Test)
We can use the Chi-square statistic to test the distribution of measures over
levels of a variable to indicate if the distribution of measures is the same for all
levels. This is the first use of the one-variable chi-square test. This test is also
referred to as the goodness-of-fit test.
Using the example, we already mentioned the frequency with which entering
freshman, when required to purchase a computer for college use, select Macintosh
Computers, IBM Computers, or Some other brand of computer. We want to know if
there is a significant difference among the frequencies with which these
three brands of computers are selected or if the students select equally
among the three brands.
The data for 100 students are recorded in the table below (the observed
frequencies).
We have also indicated the expected frequency for each category. Since there
are 100 measures or observations and there are three categories (Macintosh, IBM,
and Other) we would indicate the expected frequency for each category to be 100/3
or 33.333. In the third column of the table, we have calculated the square of the
observed frequency minus the expected frequency divided by the expected
frequency. The sum of the third column would be the value of the chi-square
statistic.
X2 = ∑ ( O – E ) 2
E
df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for
the .05 level and with degrees of freedom of 2 obtained from Appendix Table
(Distribution of Chi-Square) Looking under the column for .05 and the row for df =
2 we see that the critical value for chi-square is 5.991.
91
Application of the Statistical Test
1. State the null hypothesis and the alternative hypothesis based on your
research question.
Ho : O = E
H1: O ≠ E
Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate hypothesis
states that there are significant differences between the observed and expected frequencies.
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees
of freedom for the statistical test if necessary.
X2 = 13.820
df = C - 1 = 2
Note: To write the decision rule we had to know the critical value for chi-square, with an
alpha level of .05 and 2 degrees of freedom. We can do this by looking at Appendix Table and
noting the tabled value for the column for the .05 level and the row for 2 df.
Note: Since our calculated value of X2 (13.820) is greater than 5.991, we reject the null
hypothesis and accept the alternative hypothesis.
92
Two-Variable Chi-Square (Test of
Independence)
Now let us consider the case of the two-variable chi-square test, also known
as the test of independence.
The two variables we are considering here are hometown size (small,
medium, or large) and gender (male or female). Another way of putting our research
question is: Is gender independent of the size of hometown? The data for 30 females
and 6 males is in the following table.
The frequency with which Males and Females come from Small,
Medium, and Large cities
Female 10 14 6 30
Male 4 1 1 6
Total 14 15 7 36
X2 = ∑ ( 0 – E ) 2
E
Where:
df = (C - 1)(R - 1)
Where:
93
In the table above we have the observed frequencies (six of them). Now we
must calculate the expected frequency for each of the six cells. For two-variable chi-
square we find the expected frequencies with the formula:
In the table above we can see that the Column Totals are 14 (small), 15
(medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The
total is 36.
Using the formula we can thus find the expected frequency ( E ) for each
cell.
We can put these expected frequencies in our table and also include the
values for (O - E)2/E. The sum of all these is the value of Chi-square.
X2 = ∑ ( 0 – E ) 2
E
X2 = 0.238 + 0.180 + 0.005 + 1.191 + 0.900 + 0.024 = 2.538
94
Application of the Statistical Test
We now have the information we need to complete the six-step process for
testing statistical hypotheses for our research problem.
H1: O ≠ E
α = .05
X2 = 2.538
df = (C - 1)(R - 1) = (2)(1) = 2
Note: To write the decision rule we had to know the critical value for chi-
square, with an alpha level of .05 and 2 degrees of freedom. We can do this by
looking at Appendix Table and noting the tabled value for the column for the .05
level and the row for 2 df.
Fail to reject H0
Note: Since our calculated value of X2 (2.538) is not greater than 5.991,
we fail to reject the null hypothesis and are unable to accept the alternative
hypothesis.
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: Students from the various colleges are classified as to their school
organization membership and their department. Is belonging to a school
organization membership independent of the department where they
belong?
96
Worksheet # 26
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: A consumer research group asked a group of women to use each kind of
lotion for one month. After the trial period, each man indicated the lotion he
preferred. Using the results below, determine whether there is a significant
preference for any of the brands of lotions.
97
Worksheet # 27
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
98
Worksheet # 28
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: The researcher surveys 100 men and 100 women and determines the
numbers who are deemed healthy or unhealthy. The data is shown in the 2x2
contingency table below.
99
Lesson 8
STATISTICAL
DIFFERENCES
BETWEEN GROUPS
=================
Statistical tests can be used to analyze differences in the scores of two or more
groups. The following statistical tests are commonly used to analyze differences
between groups.
t-Tests
A t-test is used to determine if the scores of two groups differ on a single
variable. A t-test is designed to test for the differences in mean scores. For instance,
you could use a t-test to determine whether performance differs among students in
two classrooms.
Two-Sample Tests
This is a very common problem to compare the means of, for example, the
experimental group and the control group on some independent variable.
100
matched) and for pre-test/post-test comparisons where the pre-test and post-test
are taken on the same group of subjects.
For example, the t-test can be used to test for a difference in test scores
between a group of students who were given a traditional method of teaching and a
control group who received a modular approach. Theoretically, the t-test can be used
even if the sample sizes are very small (as small as 10; some researchers claim that
even smaller n's are possible), as long as the variables are normally distributed
within each group and the variation of scores in the two groups is not reliably
different.
The p-level reported with a t-test represents the probability of error involved
in accepting our research hypothesis about the existence of a difference. Technically
speaking, this is the probability of error associated with rejecting the hypothesis of
no difference between the two categories of observations (corresponding to the
groups) in the population when, in fact, the hypothesis is true. Some researchers
suggest that if the difference is in the predicted direction, you can consider only one
half (one "tail") of the probability distribution and thus divide the standard p-level
reported with a t-test (a "two-tailed" probability) by two. Others, however, suggest
that you should always report the standard, two-tailed t-test probability.
Where:
101
SS1 = The sum of squares for group 1,
and
SS2 = ∑ X2 2 - ( ∑ X2 ) 2
N2
We can see that each sum of squares is the sum of the squared scores in the
sample minus the sum of the scores quantity squared divided by the size of the
sample (n).
We also need to know the degrees of freedom for the independent t-test
which is:
df = n1 + n2 - 2
102
Steps in Looking for the t-critical value
1. Determine Alpha. Take alpha from the problem and if (a) test is one-
tailed - leave it alone, or (b) if it is a two-tailed test the divide alpha by 2 (alpha/2)
3. Locate df in Table. The first column of the t-table lists df. Locate the df
from step 2, if that number of df is not in the table - round DOWN to the next
nearest number.
4. Find t-Critical Value. Using the row associated with the df identified in
step 3, go over to the number under the column representing your alpha level from
step 1 (ie. - if alpha = .05 then the critical value would be in the column labeled
"t.050") This number is then your critical value.
103
We can calculate the quantities we need to solve this problem as follows:
We can use the totals from this worksheet and the number of subjects in
each group to calculate the sum of squares for group 1, the sum of squares for group
2, the mean for group 1, the mean for group 2, and the value for the independent t.
SS1 = ∑ X1 2 - ( ∑ X1 ) 2
n1
= 55837 - 600625
11
= 55837 - 54602.27
= 1234.73
SS2 = ∑ X2 2 - ( ∑ X2 ) 2
N2
104
= 40982 - 438244
11
= 40982 - 39840.36
= 1141.64
X1 = 775 = 70.45
11
X2 = 662 = 60.18
11
t = 70.45 - 60.18
1234.73 + 1141.64 1 + 1
11 + 11 –2 11 11
= 70.45 - 60.18
2376.37 .182
20
= 10.27
(118.82) (.182)
= 10.27
4.650
= 2.209
105
Application of the Statistical Test
1. State the null hypothesis and the alternative hypothesis based on
your research question.
Ho : μ1 = μ 2
H1 : μ 1 ≠ μ 2
Note: Our problem did not state which direction of significance we will be looking for;
therefore, we will be looking for a significant difference between the two means in either direction.
t = 2.209
df = n1 + n2 - 2 = 11 + 11 - 2 = 20
Note: We have calculated the t-value and will also need to know the degrees of freedom when we go to
look up the critical values of t.
Note: To write the decision rule we need to know the critical value for t, with an alpha level
of .05 and a two-tailed test. We can do this by looking at Appendix (Distribution of t). Look for the
column of the table under .05 for the Level of significance for two-tailed tests, read down the column
until you are level with 20 in the df column, and you will find the critical value of t which is 2.086. That
means our result is significant if the calculated t value is less than or equal to -2.086 or is greater than or
equal to 2.086.
Note: Since our calculated value of t (2.209) is greater than or equal to 2.086, we reject the
null hypothesis and accept the alternative hypothesis.
106
t-Test for Dependent Samples
Within-group Variation. As explained, the size of a relation between two
variables, such as the one measured by a difference in means between two groups,
depends to a large extent on the differentiation of values within the group.
Depending on how differentiated the values are in each group, a given "raw
difference" in-group means will indicate either a stronger or weaker relationship
between the independent (grouping) and dependent variable.
Note that, in a sense, this fact is not much different than in cases when the
two groups are entirely independent, where individual differences also contribute to
the error variance; but in the case of independent samples, we cannot do anything
about it because we cannot identify (or "subtract") the variation due to individual
differences in subjects. However, if the same sample was tested twice, then we can
easily identify (or "subtract") this variation. Specifically, instead of treating each
group separately, and analyzing raw scores, we can look only at the differences
between the two measures ("pre-test" and "posttest") in each subject. By subtracting
the first score from the second for each subject and then analyzing only those "pure
(paired) differences," we will exclude the entire part of the variation in our data set
that results from unequal base levels of individual subjects. This is precisely what is
being done in the t-test for dependent samples, and as compared to the t-test for
independent samples, it always produces "better" results.
107
For example, if you compare the average grade in a sample of students
before and after grading period, but using a different counting method or different
units in the second measurement, then a highly significant t-test value could be
obtained due to an artifact; that is, to the change of units of measurement. The
formula for the dependent t-test is:
Where:
The pre-test and post-test scores, as well as D and D2 are shown in the
following table.
Pre and Post-Test Scores for 10 BSBA Students on the Review Sessions
Pre-Test Post-Test D
D2
(X1) (X2) (X2-X1)
14 0 -14 196
6 0 -6 36
4 3 -1 1
15 20 5 25
3 0 -3 9
3 0 -3 9
6 1 -5 25
5 1 -4 16
6 1 -5 25
3 0 -3 9
-39 351
108
Solution:
t = -39
10(351) – ( -39 ) 2
10 -1
= -39
3510 – 1521
= -39
221
= -39
14.866
t = - 2.623
df = n - 1
df = 10 – 1 = 9
109
Application of the Statistical Test
1. State the null hypothesis and the alternative hypothesis based on
your research question.
Ho : μ1 = μ 2
H1 : μ 1 ≠ μ 2
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.
t = -2.623
df = n - 1 = 10 - 1 = 9
Note: We have calculated the t-value and will also need to know the degrees of freedom when
we go to look up the critical value of t.
Note: To write the decision rule we need to know the critical value for t,
with an alpha level of .05 and a one-tailed test. We can do this by looking at
Appendix (Distribution of t). Look for the column of the table under .05 for the Level
of significance for one-tailed tests, read down the column until you are level with 9
in the df column, and you will find the critical value of t which is 1.833. That means
our result is significant if the calculated t value is less than or equal to -1.833.
Note: Since our calculated value of t (-2.623) is less than or equal to -1.833, we reject the null
hypothesis and accept the alternative hypothesis.
Formula: Z = (M – μ)/σM
Degree of freedom:
df = n – 1
Z=X - μ
σ
If we wish to compare a sample mean with the population mean when the
population standard deviation is not known we use the one-sample t-test.
Illustration:
We give the subjects intensive "Get Smart" training and then administer an
IQ test. The sample mean IQ is 113 and the sample standard deviation is 10. Did the
training result in a significant increase in IQ score?
111
In this problem, we see that we have a single sample and we wish to
compare the sample mean with a population mean. We know what the population
standard deviation is. From what we have said we can see that the inferential
statistic we need to use here is the Z-test.
Z = X - μx
σx
Where:
σx = σ
√n
= 15
√9
= 15 = 5
3
We are now ready to calculate the value of Z for our problem. We have all
the information we need:
112
Z = X - μx
σx
3. Locate # in Table. Refer to the body of the Z-table (inside cover of the
book) and find the number that is closest to the number you calculated in step 2.
4. Identify Z-Critical Value. Starting from the number you located in step
3, follow the row that number is into the left (the first column), record the number
there (falls between 0.0 - 6.0), then starting again back at the number you located in
step 3, follow that column up to the top (first row) and record that number (falls
between 0.00 - 0.09). Combine these two numbers and you have your Z-critical
value.
113
F-Test (ANOVA)
The ANOVA is a statistical test that makes a single, overall decision as to
whether a significant difference is present among three or more sample means.
ANOVA is similar to a t-test. However, the ANOVA can also test multiple
groups to see if they differ on one or more variables. The ANOVA can be used to test
between-groups and within-groups differences.
Formula:
F = MSb
MSw
Where:
dfB = K - 1
dfW = Ntotal - K
Where:
114
Steps
1. Calculate SSB, the sum of squares between groups, where X1 is a score
from Group 1, X2 is a score from Group 2, X3 is a score from Group 3, n1 is the
number of subjects in group 1, n2 is the number of subjects in group 2, n3 is the
number of subjects in group 3, XT is a score from any subject in the total group of
subjects, and NT is the total number of subjects in all groups.
dfB = K - 1
dfW = NT - K
4. Calculate MSB, the mean square between groups, MSW, the mean
square within groups, and F, the F ratio.
115
6. Present the results of the analysis in an analysis of variance table.
SS df MS F
The three groups of students received the following scores on the Test
Anxiety Index (TAI) at the end of treatment.
116
TAI Scores for Three Groups of Students
Group 1 - 5 hours Group 2 - 10 hours Group 3 - 15 hours
48 55 51
50 52 52
53 53 50
52 55 53
50 53 50
The following table contains the quantities we need to calculate the means
for the three groups, the sum of squares, and the degrees of freedom:
117
Steps
1. Calculate SSB, the sum of squares between groups, where X1 is a score
from Group 1, X2 is a score from Group 2, X3 is a score from Group 3, n1 is the
number of subjects in group 1, n2 is the number of subjects in group 2, n3 is the
number of subjects in group 3, XT is a score from any subject in the total group of
subjects, and NT is the total number of subjects in all groups. Where: X1= 253; X2=
268; X3= 256; and XT = 777
dfB = K - 1 = 3 - 1 = 2
118
= 29.2
The degrees of freedom within groups is:
dfW = NT - K = 15 - 3 = 12
= 40303 - 603729
15
= 40303 - 40248.6
= 54.4
4. Calculate MSB, the mean square between groups, MSW, the mean
square within groups, and F, the F ratio.
F = MSb = 12.60
MSw 2.4333
F = 5.178
119
5. To test the significance of the F value we obtained, we need to compare
it with the critical F value with an alpha level of .05, 2 degrees of freedom
between groups (or degrees of freedom in the numerator of the F ratio), and 12
degrees of freedom within groups (or degrees of freedom in the denominator of the F
ratio).
Look in the table under column 2 (2 degrees of freedom for the numerator)
and row 12 (12 degrees of freedom for the denominator) and read the non-boldfaced
entry (for .05 level) of 3.88 - this is the critical value for F.
One way of indicating this critical value of F at the .05 level, with 2
degrees of freedom between groups and 12 degrees of freedom within groups is
F.05(2,12) = 3.88
H0 : μ 1 = μ 2 = μ 3
H1 : not H0
Note: Our null hypothesis, for the F test, states that there are no differences among the three
means. The alternate hypothesis states that there are significant differences among some or all of the
individual means. An unequivocal way of stating this is not H0.
120
2. Set the alpha level.
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.
F(2,12) = 5.178
Note: We have indicated the value of F from our analysis of the variance
table. We have also indicated by (2,12) that there are 2 degrees of freedom between
groups, and 12 degrees of freedom within groups.
Note: To write the decision rule we had to know the critical value for F, with an alpha level of
.05, 2 degrees of freedom in the numerator (df between groups), and 12 degrees of freedom in the
denominator (df within groups). We can do this by looking at Appendix Table and noting the tabled value
for the .05 level in the column for 2 df and the row for 12 df.
Note: Since our calculated value of F (5.178) is greater than 3.88, we reject
the null hypothesis and accept the alternative hypothesis.
In the problem above, we rejected the null hypothesis and found that
there is indeed a significant difference among the three cell means. We know that
Group 1 had the lowest mean (50.6), while group 3 had a higher mean (51.2) while
group 2 had the highest mean of all (53.6).
121
We only do these post hoc comparisons when there is a significant F ratio. It will
make no sense to look for differences with a post hoc test if no differences exist.
The formulas for these tests and their application to the ANOVA
problem are:
F 1 and 2 = ( X1 - X2 ) 2
MWS 1 + 1
MSw n1 n2 (k–1)
= ( -3 ) 2 = 9 = 4.630
2.43 (.4)(2) 1.944
122
F 1 and 3 = ( X1 - X3 ) 2
1 + 1
MSw n1 n3 (k–1)
MWS
= ( -.06 ) 2 = 36 = 0.185
2.43 (.4)(2) 1.944
F 2 and 3 = ( X2 - X3 ) 2
1 + 1
MSw n2 n3 (k–1)
We compare these values with the critical value for F.05(2,12) = 3.88 and
note that the only significant difference is between group one and group two (4.62 is
greater than 3.88)
123
Now that we know how to conduct an ad hoc analysis of the
significance of the differences between pairs of group means, we should
modify steps 3 and 6 of our hypothesis testing strategy to include the results of the
post hoc analysis.
The amended steps 3 and 6 are as follows:
Group 1 (the five-hour therapy group) has a significantly lower score on the
TAI than does Group 2 (the ten-hour therapy group).
124
Worksheet # 29
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
125
Worksheet # 30
Two fourth-grade teachers in School want to know if the gender of the main
character in a story makes a difference in their male students' interest in the story.
Each class had 10 boys in the class. One group of boys read a story that had a female
as the main character, while the other group of boys read a story that had a male as
the main character. After reading the stories each boy completed an interest
inventory to indicate how interested he was in the story. Was there a significant
difference in the interest inventory scores for the two groups?
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
126
Worksheet # 31
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
127
Worksheet # 32
College students were given a Science test. Based on the scores on this test,
two groups of matched subjects were formed. Students in group 1 were given the
lecture method of instruction, while students in group 2 were given a new,
laboratory method of instruction. At the end of the semester, the students were given
a Science achievement test. Analyze the results below to determine whether the
new reading instructional program leads to significantly higher
Science achievement scores for group 2.
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
128
Worksheet # 34
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
129
Worksheet # 35
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
130
Worksheet # 36
The Omnibus Personality Inventory was used to measure the intellectual
disposition of three groups of students. Group 1 was composed of student
government leaders, group 2 was composed of athletes, and group 3 was composed
of part-time students. Analyze the data below to determine whether there is a
significant difference between the three groups.
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
131
Worksheet # 37
Given the following set of data presented in the table below, determine if
there a significant difference between the assessments of the students
on the Registrar’s services when grouped according to course.
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
132
References
Berger, Statistical Decision Theory and Bayesian Analysis, 2nd E., Springer-Verlag
Calmorin, Melchor. Statistics for Sciences and Education with Application to
Research. Rex Books Store, Phils., 1997.
Hogg and Craig, Introduction to Mathematical Statistics, 4th Ed., MacMillan
Del Rosario, Asuncion, C.M. Business Statistics. ISBN 971-91220-3-X, 2nd Printing 1999.
Dyer, Jean-Roger. Understanding and Evaluating Educational Research. Addison,
Wesley Publishing Company, Inc. Philippines, 1979.
Ewen, R.B. (1998). The workbook for introductory statistics for the behavioral
sciences. Orlando, FL: Harcourt Brace Jovanovich.
Gnanadesikan, Methods for Statistical Data Analysis of Multivariate
Observations, Wiley
Hartwig, F., Dearing, B.E. Exploratory data analysis. Newberry Park, CA: Sage
Publications, Inc. 1999.
Hayes, J. R., Young, R.E., Matchett, M.L., McCaffrey, M., Cochran, C., and Hajduk, T., eds.
Reading empirical research studies: The rhetoric of research. Hillsdale, NJ:
Lawrence Erlbaum Associates, 1992.
Hinkle, Dennis E., Wiersma, W., and Jurs, S.G. Applied statistics for the behavioral
sciences. Boston: Houghton, 1988.
Kleinbaum, David G., Kupper, L.L. and Muller K.E. Applied regression analysis, and
other multivariable methods 2nd ed. Boston: PWS-KENT Publishing Company.
Kolstoe, R.H. Introduction to statistics for the behavioral sciences. Homewood, ILL:
Dorsey, 1999.
Levin, J., and James, A.F. Elementary statistics in social research, 5th ed. New York:
HarperCollins, 1991.
Liebetrau, A.M. Measures of association. Newberry Park, CA: Sage Publications, Inc.,
1993.
Mendenhall, W. Introduction to probability and statistics, 4th ed. North Scltuate, MA:
Duxbury Press, 1995.
Mood, Graybill, and Boes, Introduction to the Theory of Statistics, 3rd Ed., McGraw Hill
Moore, David S. Statistics: Concepts and controversies, 2nd ed. New York: W. H.
Freeman and Company, 1999.
Punzalan, Twila, G. and Uriarte, Gabriel G. Statistics, A Simplified Approach. Rex
Books Store, Philippines, 1998.
Rao, Linear Statistical Inference and Its Applications, 2nd Ed., Wiley
Runyon, R.P., and Haber, A. Fundamentals of behavioral statistics, 3rd ed. Reading,
MA: Addison-Wesley Publishing Company, 1996.
Schoeninger, D.W. and Insko, C.A.. Introductory statistics for the behavioral
sciences. Boston: Allyn and Bacon, Inc., 1991.
Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall
Stevens, J. Applied multivariate statistics for the social sciences. Hillsdale, NJ:
Lawrence Erlbaum Associates, 1996.
Stockberger, D. W. Introductory Statistics: Concepts, models and applications,
1996
133
Appendix A
Critical T-Values Table
If your degree of freedom is not on this table, use the value for next
lower degree of freedom
df . . . . . .
134
17 0.689 1.333 1.740 2.110 2.567 2.898
135
Appendix B
Critical Z-Values Table
0.00 0.01 0.02 0.03 0.04 0.05
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989
136
Appendix C
Critical Chi-Square-Values Table
df\area .050 .025 .010
137
Appendix C
Critical F-Values Table (F Table for alpha=.01)
df2/df1 1 2 3 4 5 6 7 8 9
138
28 7.636 5.453 4.568 4.074 3.754 3.528 3.358 3.226 3.120
120 6.851 4.787 3.949 3.480 3.174 2.956 2.792 2.663 2.559
inf 6.635 4.605 3.782 3.319 3.017 2.802 2.639 2.511 2.407
1 6055.84 6106.32 6157.28 6208.73 6234.63 6260.64 6286.78 6313.03 6339.39 6365.86
2 99.399 99.416 99.433 99.449 99.458 99.466 99.474 99.482 99.491 99.499
3 27.229 27.052 26.872 26.690 26.598 26.505 26.411 26.316 26.221 26.125
4 14.546 14.374 14.198 14.020 13.929 13.838 13.745 13.652 13.558 13.463
5 10.051 9.888 9.722 9.553 9.466 9.379 9.291 9.202 9.112 9.020
6 7.874 7.718 7.559 7.396 7.313 7.229 7.143 7.057 6.969 6.880
7 6.620 6.469 6.314 6.155 6.074 5.992 5.908 5.824 5.737 5.650
8 5.814 5.667 5.515 5.359 5.279 5.198 5.116 5.032 4.946 4.859
9 5.257 5.111 4.962 4.808 4.729 4.649 4.567 4.483 4.398 4.311
10 4.849 4.706 4.558 4.405 4.327 4.247 4.165 4.082 3.996 3.909
11 4.539 4.397 4.251 4.099 4.021 3.941 3.860 3.776 3.690 3.602
12 4.296 4.155 4.010 3.858 3.780 3.701 3.619 3.535 3.449 3.361
13 4.100 3.960 3.815 3.665 3.587 3.507 3.425 3.341 3.255 3.165
14 3.939 3.800 3.656 3.505 3.427 3.348 3.266 3.181 3.094 3.004
15 3.805 3.666 3.522 3.372 3.294 3.214 3.132 3.047 2.959 2.868
16 3.691 3.553 3.409 3.259 3.181 3.101 3.018 2.933 2.845 2.753
17 3.593 3.455 3.312 3.162 3.084 3.003 2.920 2.835 2.746 2.653
18 3.508 3.371 3.227 3.077 2.999 2.919 2.835 2.749 2.660 2.566
19 3.434 3.297 3.153 3.003 2.925 2.844 2.761 2.674 2.584 2.489
139
20 3.368 3.231 3.088 2.938 2.859 2.778 2.695 2.608 2.517 2.421
21 3.310 3.173 3.030 2.880 2.801 2.720 2.636 2.548 2.457 2.360
22 3.258 3.121 2.978 2.827 2.749 2.667 2.583 2.495 2.403 2.305
23 3.211 3.074 2.931 2.781 2.702 2.620 2.535 2.447 2.354 2.256
24 3.168 3.032 2.889 2.738 2.659 2.577 2.492 2.403 2.310 2.211
25 3.129 2.993 2.850 2.699 2.620 2.538 2.453 2.364 2.270 2.169
26 3.094 2.958 2.815 2.664 2.585 2.503 2.417 2.327 2.233 2.131
27 3.062 2.926 2.783 2.632 2.552 2.470 2.384 2.294 2.198 2.097
28 3.032 2.896 2.753 2.602 2.522 2.440 2.354 2.263 2.167 2.064
29 3.005 2.868 2.726 2.574 2.495 2.412 2.325 2.234 2.138 2.034
30 2.979 2.843 2.700 2.549 2.469 2.386 2.299 2.208 2.111 2.006
40 2.801 2.665 2.522 2.369 2.288 2.203 2.114 2.019 1.917 1.805
60 2.632 2.496 2.352 2.198 2.115 2.028 1.936 1.836 1.726 1.601
120 2.472 2.336 2.192 2.035 1.950 1.860 1.763 1.656 1.533 1.381
140
Appendix D
Critical F-Values Table (F Table for alpha=.05)
df2/df1 1 2 3 4 5 6 7 8 9
141
27 4.2100 3.3541 2.9604 2.7278 2.5719 2.4591 2.3732 2.3053 2.2501
120 3.9201 3.0718 2.6802 2.4472 2.2899 2.1750 2.0868 2.0164 1.9588
inf 3.8415 2.9957 2.6049 2.3719 2.2141 2.0986 2.0096 1.9384 1.8799
1 241.8817 243.9060 245.9499 248.0131 249.0518 250.0951 251.1432 252.1957 253.2529 254.3144
2 19.3959 19.4125 19.4291 19.4458 19.4541 19.4624 19.4707 19.4791 19.4874 19.4957
3 8.7855 8.7446 8.7029 8.6602 8.6385 8.6166 8.5944 8.5720 8.5494 8.5264
4 5.9644 5.9117 5.8578 5.8025 5.7744 5.7459 5.7170 5.6877 5.6581 5.6281
5 4.7351 4.6777 4.6188 4.5581 4.5272 4.4957 4.4638 4.4314 4.3985 4.3650
6 4.0600 3.9999 3.9381 3.8742 3.8415 3.8082 3.7743 3.7398 3.7047 3.6689
7 3.6365 3.5747 3.5107 3.4445 3.4105 3.3758 3.3404 3.3043 3.2674 3.2298
8 3.3472 3.2839 3.2184 3.1503 3.1152 3.0794 3.0428 3.0053 2.9669 2.9276
9 3.1373 3.0729 3.0061 2.9365 2.9005 2.8637 2.8259 2.7872 2.7475 2.7067
10 2.9782 2.9130 2.8450 2.7740 2.7372 2.6996 2.6609 2.6211 2.5801 2.5379
11 2.8536 2.7876 2.7186 2.6464 2.6090 2.5705 2.5309 2.4901 2.4480 2.4045
12 2.7534 2.6866 2.6169 2.5436 2.5055 2.4663 2.4259 2.3842 2.3410 2.2962
13 2.6710 2.6037 2.5331 2.4589 2.4202 2.3803 2.3392 2.2966 2.2524 2.2064
14 2.6022 2.5342 2.4630 2.3879 2.3487 2.3082 2.2664 2.2229 2.1778 2.1307
15 2.5437 2.4753 2.4034 2.3275 2.2878 2.2468 2.2043 2.1601 2.1141 2.0658
16 2.4935 2.4247 2.3522 2.2756 2.2354 2.1938 2.1507 2.1058 2.0589 2.0096
17 2.4499 2.3807 2.3077 2.2304 2.1898 2.1477 2.1040 2.0584 2.0107 1.9604
18 2.4117 2.3421 2.2686 2.1906 2.1497 2.1071 2.0629 2.0166 1.9681 1.9168
19 2.3779 2.3080 2.2341 2.1555 2.1141 2.0712 2.0264 1.9795 1.9302 1.8780
20 2.3479 2.2776 2.2033 2.1242 2.0825 2.0391 1.9938 1.9464 1.8963 1.8432
142
21 2.3210 2.2504 2.1757 2.0960 2.0540 2.0102 1.9645 1.9165 1.8657 1.8117
22 2.2967 2.2258 2.1508 2.0707 2.0283 1.9842 1.9380 1.8894 1.8380 1.7831
23 2.2747 2.2036 2.1282 2.0476 2.0050 1.9605 1.9139 1.8648 1.8128 1.7570
24 2.2547 2.1834 2.1077 2.0267 1.9838 1.9390 1.8920 1.8424 1.7896 1.7330
25 2.2365 2.1649 2.0889 2.0075 1.9643 1.9192 1.8718 1.8217 1.7684 1.7110
26 2.2197 2.1479 2.0716 1.9898 1.9464 1.9010 1.8533 1.8027 1.7488 1.6906
27 2.2043 2.1323 2.0558 1.9736 1.9299 1.8842 1.8361 1.7851 1.7306 1.6717
28 2.1900 2.1179 2.0411 1.9586 1.9147 1.8687 1.8203 1.7689 1.7138 1.6541
29 2.1768 2.1045 2.0275 1.9446 1.9005 1.8543 1.8055 1.7537 1.6981 1.6376
30 2.1646 2.0921 2.0148 1.9317 1.8874 1.8409 1.7918 1.7396 1.6835 1.6223
40 2.0772 2.0035 1.9245 1.8389 1.7929 1.7444 1.6928 1.6373 1.5766 1.5089
60 1.9926 1.9174 1.8364 1.7480 1.7001 1.6491 1.5943 1.5343 1.4673 1.3893
120 1.9105 1.8337 1.7505 1.6587 1.6084 1.5543 1.4952 1.4290 1.3519 1.2539
143