Eda PDF

Lesson 1
CONCEPTS AND
VARIABLES
=================
Statistics Defined
Statistical tests are universally used in quantitative research. The rationale
of this chapter is to present a brief introduction as to the purpose of employing
statistics, divisions of statistics, and levels of measurements, concepts, and variables.
Statistics is the science of collecting, organizing, analyzing, and interpreting
numerical facts, which we call data. Statistics is a set of tools used to organize and
analyze data. Data must either be numeric in origin or transformed by researchers
into numbers. For instance, statistics could be used to analyze the percentage scores
students receive on a test. The percentage scores ranging from 0 to 100 are already
in numeric form. Statistics could also be used to analyze grades on an essay by
assigning numeric values to the letter grades, for example, A=1, B=2, C=3, and D=5.
Importance of Statistics
Data bombards us in everyday life. Most of us correlate statistics with the
bits of data that appear in news reports: baseball batting averages, imported car
sales, the latest poll of the president's popularity, average pulse rate and beat per
minute, and the average high temperature for today. Advertisements often claim
that data show the superiority of the advertiser's product. All sides in public debates
about economics, education, and social policy argue from data. Yet the usefulness of
statistics goes far beyond these everyday examples. The study and collection of data
are important in the work of many professions so that training in the science of
statistics is valuable preparation for a variety of careers. For example, government
statistical offices release the latest numerical information on unemployment and
inflation. Economists and financial advisors as well as policy makers in
government and business study these data to make informed decisions. Doctors
must understand the origin and trustworthiness of the data that appear in medical
journals if they are to offer their patients the most effective treatment. Politicians
4
rely on data from polls of public opinion. Market research data that reveal
consumer tastes influence business decisions. Farmers study data from field trials
of new crop varieties. Engineers gather data on the quality and reliability of
manufactured products. A teacher can also make use of statistical concepts in
analyzing the reliability and validity of the tests given to his students. He can also
make necessary predictions about the success of his students in various board
examinations. Most areas of academic study make use of numbers, and therefore
also make use of the method of statistics. We can no more escape data than we can
avoid the use of words. Just as words on a page are meaningless to the illiterate or
confusing to the partially educated, so data do not interpret themselves but must be
read with understanding. A writer can arrange words into convincing arguments or
incoherent nonsense. Similarly, you can manipulate data to be compelling,
misleading, or simply irrelevant. Numerical literacy, the ability to follow and
understand numerical arguments is important for everyone. The ability to express
oneself numerically, to be an author rather than just a reader is a vital skill in many
professions and areas of study. The study of statistics is therefore essential to a
sound education. We must learn how to read data, critically and with
comprehension; we must learn how to produce data that provide clear answers to
the important question, and we must learn sound methods for drawing trustworthy
conclusions based on data as well as acquire the ability to effectively communicate
valid conclusions. Statistics teaches you how to gather, organize, and analyze data,
and then infer the underlying reality from these data. It is a powerful intellectual
method that is applied in many contexts and most disciplines. Persons in industry
and government make decisions that are increasingly dependent upon the
collection and interpretation of data, and employers demand greater quantitative
sophistication from their employees (or prospective employees). Statistics students
learn to define problems, to think critically, to analyze and to synthesize which
prepares them to explore widely throughout their professional lives, and to be
creative and productive citizens -- regardless of the precise nature of a career. They
also learn to discover the integrity of data, the uncertainty of measurements, and,
through these, the development of understanding of the powers and limitations of
science. Moreover, students discover that there are things scientists can do and other
things they cannot do and that experimental results are not exact, but scientists can
usually evaluate the range of uncertainty within specified confidence limits. The
development of an understanding of the powers and limitations of science is
essential to rational participation in the resolution of societal issues. The language of
statistics is a foreign language to most students. Students must understand the
language to understand the concepts and the statistical procedures. Learning the
language of statistics provides students with insights and an awareness of ideas and
thoughts beyond the realm of previous experience. Statistical language requires
precision and careful attention to exactly communicate valid conclusions and
interpretations, which result from data analysis.
Purposes of Employing Statistics

Employing statistics serves two purposes, description and prediction
using inferential statistics. Statistics are used to describe the characteristics of
5
groups. These characteristics are referred to as variables. Data is gathered and
recorded for each variable. Descriptive statistics can then be used to reveal the
distribution of the data in each variable. Statistics is also frequently used for
purposes of prediction. Prediction is based on the concept of generalizability: if
enough data is compiled about a particular context (students studying Mathematics
in a specific set of classrooms), the patterns revealed through analysis of the data
collected about that context can be generalized (or predicted to occur in) similar
contexts. The prediction of what will happen in a similar context is probabilistic.
That is, the researcher is not certain that the same things will happen in other
contexts; instead, the researcher can only reasonably expect that the same things will
happen. Prediction is a method employed by individuals throughout daily life. For
instance, if writing students begin class every day for the first half of the semester
with a five-minute free writing exercise, then they will likely come to class the first
day of the second half of the semester prepared to again free write for the first five
minutes of class. The students will have predicted the class content based on their
previous experiences in the class: Because they began all previous class sessions with
free writing, it would be probable that their next class session will begin the same
way. Statistics is used to perform the same function; the difference is that precise
probabilities are determined in terms of the percentage chance that an outcome will
occur, complete with a range of error. Prediction is a primary goal of inferential
statistics.
Divisions of Statistics and Levels of

Measurement
A. Descriptive statistics is used to reveal patterns through the analysis of
numeric data. Descriptive statistics, not surprisingly, "describe" data that have been
collected. Commonly used descriptive statistics include frequency counts, ranges
(high and low scores or values), means, modes, median scores, and standard
deviations. Two concepts are essential to understanding descriptive statistics:
variables (levels of Measurement) and distributions.
VARIABLE
QUALITATIVE QUANTITATIVE
Nominal Ordinal
Discrete Continuous
Interval Ratio
6
Variables (Levels of Measurement)
Statistics are used to explore numerical data. Numerical data are

observations, which are recorded in the form of numbers. Numbers are variable in
nature, which means that quantities vary according to certain factors. For example,
when analyzing the grades on student essays, scores will vary for reasons such as the
writing ability of the student, the students' knowledge of the subject, and so on. In
statistics, these reasons are called variables. Variables are divided into four basic
categories:
1. Nominal Variables (Categorical)
Nominal variables classify data. This process involves labeling categories

and then counting frequencies of occurrence. A researcher might wish to compare
essay grades between male and female students. Tabulations would be compiled
using the categories "male" and "female." Gender would be a nominal variable.
Sometimes called "categorical". Numbers are used to classifying things, with no
implication that one number is better than another e.g. hotel rooms, footballers'
jerseys, house numbers. They don't have to involve numbers for example blood
groups (A, B, O, etc.). Note: The categories themselves are not quantified. Maleness
or femaleness is not numerical in nature, rather the frequencies of each category
result in data that is quantified -- 25 males and 30 females
2. Ordinal Variables
Ordinal variables order (or rank) data in terms of degree. Ordinal variables
do not establish the numeric difference between data points. They indicate only that
one data point is ranked higher or lower than another. For instance, a researcher
might want to analyze the numeric or letter grades written on a student’s transcript
of record. A grade of A=1.0 would be ranked higher than B=2.0 and B=2.0 higher
than C=3.0. However, the difference between these data points, the precise distance
between 1.0 and 2.0, is not defined. Numeric or Letter grades are an example of an
ordinal variable. In other words, it involves ordering or ranking the variable under
consideration. Examples are ranking of professors in the college (Prof. I, Prof. II,
Prof. III, etc.), and fruits or vegetables classification. Categories are ordered
concerning the degree to which they possess particular characteristics without being
able to say how much of the character they possess. Another example is "greatly
dislike, moderately dislike, indifferent, moderately like, greatly like". Numbers
assigned to such variables indicate rank-order only, the "distance" between the
numbers has no meaning.
3. Interval Variables
Interval variables score data. Thus, the order of data is known as well as the
precise numeric distance between data points. A researcher might analyze the actual
percentage scores, assuming that percentage scores are given by the instructor. A
score of 99 (A) ranks higher than a score of 85 (B), which ranks higher than a score
of 75 (C). Not only is the order of these three data points known, but so is the exact
7
distance between them -- 14 percentage points between the first two, 10 percentage
points between the second two, and 24 percentage points between the first and last
data points. It is said to be equally spaced variables, e.g. temperature. The difference
between a temperature of 66 degrees and 67 degrees is taken to be the same as the
difference between 76 degrees and 77 degrees.
4. Ratio Scales
Like interval scale but with absolute (true) zero. Equal intervals are
equivalent (44cm is twice as long as 22cm; unit increases in weight from 12kg to 18
kg and same as the increase in 155kg to 161kg.
Qualitative and Quantitative Variables

A variable is anything that can take on differing or varying values, e.g. height
of students; sugar content of soft drink; the age of customers in a store. Variables
can be grouped in different ways: Qualitative and Quantitative. A qualitative
variable is one, which describes a characteristic for example: small; wide;
attractive, etc. A quantitative variable is one, which can be given a numerical
value. Two main types of quantitative variables: discrete and continuous:
Discrete variable one which can only have certain definite values, often whole
numbers (but e.g. shoe size 5½, 6, 6½, 7, 7½ are discrete). A continuous
variable is one, which can take up any value within a certain range and is usually
obtained by measuring, e.g. foot size of the child. Aged 10 = 20cm; Aged 12 = 25cm.
There is no length between 20 and 25 cm that the foot did not measure at some time.
B. Inferential statistics are used to draw conclusions and make

predictions based on the analysis of numeric data. Inferential statistics are used
to draw conclusions and make predictions based on the descriptions of data. Try to
explore inferential statistics by using an extended example of experimental studies.
Key concepts used are probability, populations, and sampling.
1. Experiments. A typical experimental study involves collecting data on
the behaviors, attitudes, or actions of two or more groups and attempting to
answer a research question (often called a hypothesis). Based on the analysis of the
data, a researcher might then attempt to develop a causal model that can be
populations. A question that might be addressed through experimental research
might be "Does computer-based instruction in the college sciences produce better
professionals than traditional-based instruction?" Because it would be impossible
and impractical to observe, interview, survey, etc. all first-year students and
instructors in classes using one or the other of these instructional approaches, a
researcher would study a sample or a subset of a population. Sample or subset of a
population – is used by many researchers who desire to make sense of some
phenomenon. To analyze differences in the ability of a student who is taught in
each type of classroom, the researcher would compare the performance of the two
groups of students. Two key concepts used to conduct the comparison are
Dependent Variable and Independent Variable.
8
Dependent Variables. In an experimental study, a variable whose score
depends on (or is determined or caused by) another variable is called a dependent
variable. For instance, an experiment might explore the extent to which the quality
of students is affected by the approach of instruction they received. In this case, the
dependent variable would be the quality of graduates.
Independent Variables. In an experimental study, a variable that

determines (or causes) the score of a dependent variable is called an independent
variable. For instance, an experiment might explore the extent to which the quality
of students who graduated from computer-based instruction is affected by the
approach they received. In this case, the independent variable would be the
approach of instruction students received.
Identification of Variables based on Hypothesis. A hypothesis can

be described as "a tentative answer to a research question" or a "provisional
prediction".
Characteristics of a Hypothesis
Stated clearly, using appropriate terminology

Testable
Statement of relationships between variables
Limited in scope
Examples of hypotheses:
Health education programs influence the number of people who smoke

Newspapers affect people's voting patterns
Attendance at lectures influences exam marks
Diet influences intelligence
In the above examples, "something" (diet, lecture attendance) affects

"something else" (intelligence, exam marks). These are variables. A variable is
anything, which is free to vary, and to describe them quantitatively; they have to be
expressed in appropriate units (IQ scores, exam percentages). The pairs of variables
in the above examples have separate names. The variable we manipulate is called the
independent variable (IV). The variable we are hypothesizing will alter as a
result of our manipulations is called the dependent variable (DV). The
dependent variable alters as a consequence of the value of the independent variable -
its value is dependent on this. The value of the independent variable is free to vary
according to the whims of the experimenters.
Note: Many variables can be either dependent or independent, within the

context of a particular study. For example, it could be argued that "Intelligence
influences diet" or "Exam scores influence attendance at lectures".
9
Independent Variables Dependents Variables
Diet Intelligence
Newspaper Voting Patterns
Attendance at Lectures Exam Scores
Health Education Programs Number of People who Smoke
2. Probability. Beginning researchers most often use the word probability

to express a subjective judgment about the likelihood, or degree of certainty, that a
particular event will occur. People say such things as: "It will probably rain
tomorrow." "It is unlikely that we will win the ball game." It is possible to assign a
number to the event being predicted, a number between 0 and 1, which represents
the degree of confidence that the event will occur. For example, a student might say
that the likelihood an instructor will give an exam next week is about 90 percent, or
.9. Where 100 percent, or 1.00, represents certainty, .9 would mean the student is
almost certain the instructor will give an exam. If the student assigned the number
.6, the likelihood of an exam would be just slightly greater than the likelihood of no
exam. A rating of 0 would indicate complete certainty that no exam would be given.
The probability of a particular outcome or set of outcomes is called a p-value. In our
discussion, a p-value will be symbolized by a p followed by parentheses enclosing a
symbol of the outcome or set of outcomes. For example, p (X) should be read, "the
probability of a given X score". Thus p (exam) should be read, "The probability an
instructor will give an exam next week."
3. Population. A population is a group, which is studied. In educational

research, a population is usually a group of people. Researchers seldom can study
every member of a population. Usually, they instead study a representative sample
or subset of a population. Researchers then generalize their findings of the sample to
the population as a whole.
4. Sampling. Sampling is performed so that a population under study can

be reduced to a manageable size.
10
Worksheets #1
1) Define the Following Terms:
1.1) Statistics
1.2) Descriptive Statistics
1.3) Prediction
1.4) Nominal Variable
1.5) Ordinal Variable
1.6) Interval Variable
1.7) Ratio Scale
1.8) Inferential statistics
1.9) Dependent Variable
1.10) Independent Variable
2) Discuss the importance and uses of statistics in your field of specialization.

3) When can we consider if a variable is dependent and independent? Illustrate
your answer by giving concrete examples.
4) A hypothesis can be described as "a tentative solution or answer to a research
question", give five examples of hypotheses based on the identified
characteristics of s good hypothesis.
5) From the formulated hypotheses in Q #4, identify the dependent and
independent variables.
5.1) Statement of Hypothesis #1:

Dependent Variable:
Independent Variable:

Dependent Variable:

Dependent Variable:

Dependent Variable:

Dependent Variable:
11
Worksheet # 2
Indicate whether the variables concerned are Nominal, Ordinal, Interval,
or Ratio:
Variables Classification
1. Gender: Male or Female
2. Skin Color
3. Temperature of the Room
4. Your Height (in centimeters)
5. Population of Mango Tree
6. Social Class
7. Your Blood Type
8. Number of Students in this Room
9. Enrollees from Elementary to College
10.Level of Aggressiveness
11.Salary Grade
12.Academic Ranks
13.Marital Status
14.Academic Performance
15.Political Affiliation
16.Attitude Toward Nuclear Plant
17.Family Size
18.Courses Offered in the College
19.Perception of Leadership Style
20.Weight of the Baby
21.Business Classification
22.Community of Cow
23.Your phone number
24.Residents in Olongapo City
25.Volume of Water
12
Worksheet # 3
Indicate whether the following variables are Qualitative or Quantitative.
If they are quantitative, indicate whether they are Discrete or Continuous.
Variables Classification
1. The height of a plant between 1995 and 1996
2. The color of your professor’s shoes
3. Volume of Water
4. Your Height (in centimeters)
5. The marks of a class of students in a course
6. Social Class
7. Your Blood Type
8. The shoe sizes of that child between 5 and 10
9. Community of Carabao
10. Weight of the Baby
11.Salary Grade
12.Academic Ranks
13.Marital Status
14.Academic Performance
15.Political Affiliation
16.Attitude Toward Nuclear Plant
17. Residents in Olongapo City
18.Courses Offered in the College
19.Perception of Leadership Style
20. Level of Aggressiveness
21.The scent of the flowers of that plant
22. Enrollees from Elementary to College
23.Your phone number
24. Family Size
25. Temperature of the Room
13
Lesson 2
DATA PRESENTATION
TOOLS
=================
Descriptive Statistics enable us to understand data through summary
values and graphical presentations. It is important to look at summary statistics
along with the data set to understand the entire picture, as the same summary
statistics may describe very different data sets. Descriptive statistics can be
illustrated understandably by presenting them graphically using statistical and data
presentation tools.
A. Tabular Presentation of Data

Tables display numbers or words arranged in a grid. The tabular method can
be used to represent data sets with two variables. They are good for situations where
exact numbers need to be presented. A tabular presentation is illustrated in Table 1.
Table 1
Profile of the Respondents in Terms of Age and Gender
Male Female
Age Bracket F % F %
21 - 25 9 8.18 4 3.64
26 – 30 8 7.27 14 12.73
31 – 35 7 6.36 6 5.45
36 – 40 6 5.45 7 6.36
41 – 45 9 8.18 22 20.00
46 – 50 5 4.55 8 7.27
51- Above 2 1.82 3 2.73
Total 46 41.82 64 58.18
14
Frequency Distribution
The easiest method of organizing data is a frequency distribution, which
converts raw data into a meaningful pattern for statistical analysis.
The following are the steps of constructing a frequency distribution:
1. Specify the number of class intervals. A class is a group (category) of

interest. No accepted rule tells us how many intervals are to be used. Between 5 and
15 class intervals are generally recommended. Note that the classes must be both
mutually exclusive and all-inclusive. Mutually exclusive means that classes must be
selected such that an item can't fall into two classes, and all-inclusive classes are
classes that together contain all the data.
2. When all intervals are to be the same width, the following rule may be
used to find the required class interval width:
W = (L - S) / N
Where:
W= class width,
L= the largest data,
S= the smallest data,
N= number of classes
Example: Suppose the ages of a sample of 10 students are: 20.9, 18.1, 18.5,
21.3, 19.4, 25.3, 22.0, 23.1, 23.9, and 22.5. We select N=4 and W=(25.3 -
18.1)/4 = 1.8 which is rounded-up to 2. The frequency table is as follows:
Class Interval..…Class Frequency........Relative Frequency
18--20................................3..................................30%
20--22................................2..................................20%
22--24................................4..................................40%
24--26................................1..................................10%
Note: The sum of all the relative frequencies must always be equal to 1.00
or 100%. In the above example, we see that 40% of all students are younger than 24
years old but older than 22 years old. The relative frequency may be determined for
both quantitative and qualitative data and is a convenient basis for the comparison
of similar groups of different sizes.
15
What does Frequency Distribution tell Us?
1. It shows how the observations cluster around a central value; and
2. It shows the degree of difference between observations.
For example, in the above problem, we know that no student is younger

than 18 and the age below 24 is most typical. The most common age is between 22
and 24, which from general information we know to be higher than usual for the
students who enter college right after high school and graduate about age 22. The
students in the sample are generally older. The population may be made up of night
students who work on their degrees on a part-time basis while holding full-time jobs.
This descriptive analysis provides us with an image of the student sample, which is
not available from raw data. As we will see in lecture number 3, the frequency
distribution is the basis for probability theory.
Illustration: Consider the following set of data, which are the scores
recorded for 30 participants. We wish to summarize this data by creating a
frequency distribution of the scores.
Data Set - Scores Recorded for 30 Participants

50 45 49 50 43
49 50 49 45 49
47 47 44 51 51
44 47 46 50 44
51 49 43 43 49
45 46 45 51 46
To create a frequency distribution from this data we proceed as

follows:
1. Identify the highest and lowest values in the data set. For our scores, the
highest score is 51 and the lowest is 43.
2. Create a column with the title of the variable we are using, in this case, the
score. Enter the highest score at the top and include all values within the
range from the highest score to the lowest score.
3. Create a tally column to keep track of the scores as you enter them into the
frequency distribution. Once the frequency distribution is completed you
can omit this column. Most printed frequency distributions do not retain the
tally column in their final form.
4. Create a frequency column, with the frequency of each value, as shown in the
tally column, recorded.
5. At the bottom of the frequency column record the total frequency for the
distribution proceeded by N =
6. Enter the name of the frequency distribution at the top of the table.
16
If we applied these steps to the data, we would have the following frequency
distribution.
Frequency Distribution Scores Recorded for 30 Participants: N = 30

Scores Recorded Tally Frequency
51 //// 4
50 //// 4
49 ////// 6
48 0
47 /// 3
46 /// 3
45 //// 4
44 /// 3
43 /// 3
Cumulative Frequency Distribution

A cumulative frequency distribution can be created from a frequency
distribution by adding a column called "Cumulative Frequency." For each score
value, the cumulative frequency for that score value is the frequency up to and
including the frequency for that value. In the cumulative frequency distribution for
the high scores data below, notice that the cumulative frequency for the lowest score
(43) is 3 and that the cumulative frequency for the score 44 is 3+3 or 6. The
cumulative frequency for a given value can also be obtained by adding the frequency
for the value to the cumulative value for the value below the given value.
In summary then, to create a cumulative frequency distribution:
1) Create a frequency distribution

2) Add a column entitled cumulative frequency
3) The cumulative frequency for each score is the frequency up to and including
the frequency for that score
4) The highest cumulative frequency should equal N (the total of the frequency
column)
Example: The cumulative frequency for 45 is 10 which is the cumulative

frequency for 44 (6) plus the frequency for 45 (4). Finally, notice that the cumulative
frequency for the highest value (51 in the current case) should be the same as the
total of the frequency column (30 in the case of the score data).
17
Cumulative Frequency Distribution for Scores Recorded for 30
Participants: N = 30
Scores Recorded Tally Frequency Cumulative Frequency
51 //// 4 30
50 //// 4 26
49 ////// 6 22
48 0 16
47 /// 3 16
46 /// 3 13
45 //// 4 10
44 /// 3 6
43 /// 3 3
Grouped Frequency Distribution

In some cases, it is necessary to group the values of the data to summarize the data
properly.
Guidelines for Classes
1) There should be between 5 and 20 classes.

2) The class width should be an odd number.
3) The classes must be mutually exclusive. This means that no data value can fall into
two different classes
4) The classes must be all-inclusive or exhaustive. This means that all data values must
be included.
5) The classes must be continuous.
6) The classes must be equal in width.
Creating a Grouped Frequency Distribution
1) Find the largest and smallest values

2) Compute the Range = Highest Score – Lowest Score
3) Select the number of classes desired. This is usually between 5 and 20.
4) Find the class width by dividing the range by the number of classes and rounding it
up.
5) Pick a suitable starting point less than or equal to the minimum value. You will be able
to cover: "the class width times the number of classes" values. You need to cover one
more value than the range. Follow this rule and you'll be okay: The starting point plus
the number of classes times the class width must be greater than the maximum
value. Your starting point is the lower limit of the first class. Continue to add the class
width to this lower limit to get the rest of the lower limits.
18
6) To find the upper limit of the first class, subtract one from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the rest
of the upper limits.
7) Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5
units from the upper limits. The boundaries are also halfway between the upper limit
of one class and the lower limit of the next class.
8) Tally the data.
9) Find the frequencies.
10) Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find the cumulative frequencies.
11) If necessary, find the relative frequencies and/or relative cumulative frequencies.
Illustration: Look at the following data set. The highest score is 59 and the
lowest is 39. If we were to create a simple frequency distribution of this data we
would have 21 values. This is greater than 20 values so we should create a grouped
frequency distribution.
Data Set – Scores in the Major Examination (Statistics)

57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 59 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41
If we use this data and follow the suggestions for the creation of a grouped frequency
distribution, we will create the following grouped frequency distribution.
Grouped Frequency Distribution for Scores in the Major Examination

(Statistics): N = 50
Class Interval Tally Interval Midpoint Frequency
57-59 ////// 58 6
54-56 /////// 55 7
51-53 /////////// 52 11
48-50 ///////// 49 9
45-47 /////// 46 7
42-44 ////// 43 6
39-41 //// 40 4
19
Cumulative Grouped Frequency Distribution
It is a simple matter to create a cumulative grouped frequency distribution.

We just add a cumulative frequency column to the grouped frequency distribution,
and we have a cumulative grouped frequency distribution. Adding a cumulative
frequency column created the cumulative grouped frequency distribution below.
Cumulative Grouped Frequency Distribution for Scores in the Major

Examination (Statistics)
Class Interval Cumulative
Tally Frequency
Interval Midpoint Frequency
57-59 ////// 58 6 50
54-56 /////// 55 7 44
51-53 /////////// 52 11 37
48-50 ///////// 49 9 26
45-47 /////// 46 7 17
42-44 ////// 43 6 10
39-41 //// 40 4 4
N = 50
B. Graphical Presentation of Data

Several types of statistical/data presentation tools exist, including (a) charts
displaying frequencies (bar, pie charts, (b) chart displaying trends (run and line
charts), (c) charts displaying distributions (histograms), and (d) charts displaying
associations (scatter diagrams). Different types of data require different kinds of
statistical tools. There are two types of data. Attribute data are countable data or
data that can be put into categories: examples like the number of people willing to
pay, the number of complaints, the percentage who want blue/percentage who want
red/percentage who want yellow. Variable data are measurement data, based on
some continuous scale (length, time, cost).
1. Bar Graph (Histogram). Bar graphs show quantities represented by
horizontal or vertical bars and are useful for displaying:
• The activity of one thing through time.

• Several categories of results at once.
• Data sets with few observations.
Standard deviations may be displayed on bar graphs by using a deviation bar

that extends beyond the top of the data bar. Divided bar graphs are a variation that,
20
similar to pie charts, show proportional relationships between data within each bar.
In addition, divided bar graphs can show changes over time.
Bar Graph Example
Steps in the Preparation of Bar Graph
Step 1. Choose the type of bar chart that stresses the results to be focused
on. Grouped and stacked bar charts will require at least two classification variables.
For a stacked bar chart, tally the data within each category into combined totals
before drawing the chart.
Step 2. Draw the vertical axis to represent the values of the variable of
comparison (e.g., number, cost, time). Establish the range for the data by subtracting
the smallest value from the largest. Determine the scale for the vertical axis at
approximately 1.5 times the range and label the axis with the scale and unit of
measure.
Step 3. Determine the number of bars needed. The number of bars will
equal the number of categories for simple or stacked bar charts. For a grouped bar
chart, the number of bars will equal the number of categories multiplied by the
number of groups. This number is important for determining the length of the
horizontal axis.
Step 4. Draw bars of equal width for each item and label the categories and
the groups. Provide a title for the graph that indicates the sample and the period
covered by the data; label each bar.
2. Pie Chart or Circle Graph. Pie charts show proportions about a whole,
with each wedge representing a percentage of the total. Pie charts are useful for
displaying:
• The parts of a whole in percentages.

• Budget, geographic, or population analysis.
21
How to Use a Pie Chart
Step 1. Taking the data to be charted, calculate the percentage contribution

for each category. First, total all the values. Next, divide the value of each
category by the total. Then, multiply the product by 100 to create a
percentage for each value.
Step 2. Draw a circle. Using the percentages, determine what portion of the
circle will be represented by each category. By eye, divide the circle into four
quadrants, each representing 25 percent.
Step 3. Draw in the segments by estimating how much larger or smaller

each category is. Calculating the number of degrees can be done by multiplying the
percent by 3.6 (a circle has 360 degrees) and then using a compass to draw the
portions.
Step 4. Provide a title for the pie chart that indicates the sample and the
period covered by the data.
Pie Chart Example
Caution:
a. Be careful not to use too many notations on the charts. Keep them as
simple as possible and include only the information necessary to
interpret the chart.
b. Do not draw conclusions not justified by the data. For example,

determining whether a trend exists may require more statistical tests
and probably cannot be determined by the chart alone. Differences
among groups also may require more statistical testing to determine if
they are significant.
22
c. Whenever possible, use bar or pie charts to support data interpretation.
Do not assume that results or points are so clear and obvious that a
chart is not needed for clarity.
d. To ensure that this does not happen, follow these guidelines:
• Scales must be in regular intervals

• Charts that are to be compared must have the same scale and
symbols
• Charts should be easy to read
3. Line Graph (Polygon Method). Line graphs show sets of data points
plotted over some time and connected by straight lines. Line graphs are useful for
displaying:
• Any set of figures that needs to be shown over time.

• Results from two or more groups compared over time.
• Data trends over time.
23
Worksheet # 4
In the Table below is selected data for 20 college students.
Distance to Number of
Number Gender Major
School (Km) Siblings
1 F BSED 5 3
2 F BSBA 9 5
3 M BSED 11 6
4 F AB 14 3
5 M BSN 3 3
6 F BSED 5 6
7 M AB 20 3
8 F AB 7 1
9 F BSBA 9 6
10 F AB 6 3
11 M BSN 10 4
12 F BSBA 10 2
13 M BSED 5 1
14 F BSBA 9 3
15 M ECE 10 1
16 F AB 3 2
17 F Educ 15 4
18 M AB 6 5
19 F ECE 30 3
20 M AB 8 5
Using the collected data in the Table above:
1. Prepare a frequency distribution for gender and major.

2. Prepare a cumulative frequency distribution for several siblings.
3. Prepare a grouped frequency distribution for distance to school.
24
Worksheet # 5
Given the following set of data as the actual scores of fifty BSA students in
their major examination, prepare a table showing the following:
1) Class Interval
2) Tally
3) Interval Midpoint
4) Frequency
5) Cumulative Frequency
Data Set – Scores of 50 BSA Students in Major Examination

40 35 58 33 44
63 66 44 25 33
44 40 46 26 55
34 55 78 45 65
41 35 44 27 37
35 57 48 44 48
25 29 56 33 39
44 37 33 55 40
67 28 56 66 50
59 25 37 45 30
25
Worksheet # 6
Using the data available presented in Table (worksheet # 4) for 20 college

students, prepare a bar graph to illustrate the following:
1) Gender
2) Major
3) Distance to School
4) Number of Siblings in the family
Using the data available presented in Table (Worksheet #4) for 20 college
students, prepare a line graph to illustrate the following:
1) Gender
2) Major
Using the data available presented in Table (Worksheet # 4) for 20 college

students, prepare a pie chart to illustrate the following:
1) Gender
2) Major
26
Lesson 3
MEASURES OF
CENTRAL TENDENCY
AND VARIATIONS
=================
A. Central Tendency. Central tendency is measured in three
ways: Mean or Arithmetic Mean, Median, and Mode.
The Mean is simply the average score of a distribution.
The Median is the center or middle score within a distribution.
The Mode is the most frequent score within a distribution.
In a normal distribution, the mean, median, and mode are identical.
1. Arithmetic Mean (X )
Arithmetic Mean is defined as the sum of all of the items, divided by the
number of them:
Formulas:
27
Where:
X = The raw data,

= ("sigma") = The sum of items
N or n = The number of scores.
Mean of Ungrouped Data
X ═ ΣX
N
Where:
X = Mean
Σ = Summation Symbol
X = Frequency
N = Number of cases
Illustration: Find the mean of 9, 4, 7, 7, 4
ΣX = 9+4+7+7+4
N = 5
= 31
5
= 6.2
Mean Age Computation
Consider the following frequency distribution of ages of the respondents in

particular research conducted.
Note that we have added a column called fX or the score times the
frequency of that score.
Illustration: There are two eleven-year-old so the fX for eleven-year-old is

22, there are four ten-year-old so the fX for ten-year-old is 40, etc.….
Also, note that the sum of the fX column is recorded at the bottom of the
column.
28
Frequency Distribution of Ages for Respondents in
the Research Conducted
Age (X) Frequency (f) fX
11 2 22
10 4 40
9 8 72
8 7 56
7 3 21
6 2 12
5 1 5
N= 27 228
The formula for the population means for a set of scores in a frequency
distribution is:
μ = ∑ fX
N
Thus, the mean age of the population of respondents is:
μ = 228 = 8.44
27
Mean of Grouped Data
Consider the following working formula for the mean of the grouped data.
Formula:
__
X ═ ΣFM
N
Where:
X = Mean
F = Frequency
M = Midpoint.
N = Number of cases
Illustration: Now let us consider the data presented in the table. We can
compute the mean age of the two groups of respondents, the males, and the
females, by determining the (M) midpoint of the age bracket.
29
Male Female
Age Bracket F % F %
21 - 25 9 8.18 4 3.64
26 – 30 8 7.27 14 12.73
31 – 35 7 6.36 6 5.45
36 – 40 6 5.45 7 6.36
41 – 45 9 8.18 22 20.00
46 – 50 5 4.55 8 7.27
51- Above 2 1.82 3 2.73
Total 46 41.82 64 58.18
Complete the Tables below to compute the mean age of the male and
female respondents using the equation above:
Illustration 1: (For Male Respondents)

Age Bracket Midpoint (M) F FM
21 - 25 23 9
26 – 30 28 8
31 – 35 33 7
36 – 40 38 6
41 – 45 43 9
46 – 50 48 5
51- Above 53 2
Total 46
Mean Age: ____________________
Illustration 2: (For Female Respondents)

Age Bracket Midpoint (X) F FX
21 - 25 23 4
26 – 30 28 14
31 – 35 33 6
36 – 40 38 7
41 – 45 43 22
46 – 50 48 8
51- Above 53 3
Total 64
Mean Age: ____________________
30
Weighted Mean
Computation of Weighted Mean (WM) in case the researcher is using

5,4,3,2,1 scales (weights).
Data on Pupils’ Attitudes Toward Science
Choices
STATEMENTS WM DR
5 4 3 2 1
1) I prefer science to other 47 57 60 62 64 2.87 Sometimes
subjects.
2) I consider science learning
an interesting and exciting 102 85 43 46 14 3.74 Oftentimes
experience.
3) I feel that science concepts
and skills answer a lot of 52 110 52 38 38 3.34 Sometimes
problems I encounter.
4) I work cooperatively with 83 83 61 43 20 3.57 Oftentimes
other members of the
group.
5) I like to work on difficult
concepts and accept… 44 62 46 53 85 2.75 Sometimes
Formula:
--
X ═ ΣFX
N
--
Where: X = Weighed Mean
F = Frequency
X = Weights of each item
N = Number of cases
Illustration: Processing the weighted mean of item #1:
--
X ═ Σ (47)(5) + (57)(4) + (60)(3) + (62)(2) + (64)(1)
290
--
X ═ 235 + 228 + 180 + 124 + 64
290
31
--
X ═ 831
290
--
X = 2.8655 or 2.87
Note: Follow the same process to compute the mean of items 2 to 5
2. Median (Mdn). The median is the middle score in a distribution or the middle
most score of the distribution.
Median of Ungrouped Data. To calculate the median, just arrange the

scores from smallest to largest (or from largest to smallest) and then count up to
the middle score. If there are even numbers of scores, then the median is one-half of
the sum of the two middle scores (the average of the two middle scores).
Illustrations: If 5 people earn 60 pesos, 70 pesos, 100 pesos, 115 pesos,

and 320 pesos, the median is the value of the middle item, which is 100 pesos.
The formula for finding the median position of ungrouped data is:
Mdn = (n+1) / 2
If there are even numbers of items, two middle items are added together and
divided by two.
For example, if 6 people earn 60 pesos, 70 pesos, 100 pesos, 110 pesos, 115
pesos, 320 pesos per week, the median position would be (n+1) / 2, which equals
(6+1) / 2, which is 3.5.
Median = (100 + 110) / 2 = 105 pesos
For the following set of scores (already arranged in order of size): 10, 14,
16, 16, 19, 23, 27
The median would be the fourth score as there are seven scores. The value of
the fourth score is 16 so the median of the scores is 16.
For the following set of scores (already arranged in order of size): 10, 14,
16, 19, 23, 27
The median would be halfway between the third and fourth scores as there
are six scores. Halfway between 16 (the third score) and 19 (the fourth score) is (16 +
19)/2 which is 17.5, so the median is 17.5
32
Median of Grouped Data. The Formula for finding the Median of
grouped data is:
N/2 - F
Mdn = L+ (i)
fm
Where:
Mdn = Median
L = Exact lower limit of interval containing the Mdn class
fm = Frequency of interval containing the Mdn class
F = Frequency below L
N = Number of cases
I = Class Interval
Illustration: Determine the Mdn of the data below:
Age Bracket F Cum f <

21 - 25 4 4
26 – 30 14 18
31 – 35 6 24
36 – 40 7 31
41 – 45 22 53
46 – 50 8 61
51- Above 3 64
Total 64 --
Mdn = 40.5 + 64/2 - 31 ( 5 )

22
= 40.5 + 32 - 31 ( 5 )
22
= 40.5 + 1 ( 5 )
22
= 40.5 + 5
22
= 40.5 + 0.227
Mdn = 40.73
33
3. Mode (Mo). The mode is the most frequently occurring value in a distribution.
Mode of Ungrouped Data. The mode for the ungrouped data is the
specific score, which has a higher frequency. We can identify this value by
inspection.
Illustrations:
1.The ages of a group of students at Columban College High School

Department I identified as 17; 17; 17; 18; 18; 18; 18; 18; 18; 18; 19; 19; 20; 21.
Thus, by inspection, 18 is the most frequently occurring age. Therefore, 18 is

the mode.
2. Sometimes two figures tie for "most popular", as with the ages 17; 17; 17;
17; 18; 18; 18; 18; 19; 19; 20; 21.
Figures 17 and 18 are referred to as bimodal.
3. For the following set of 10 scores: 14, 18, 16, 17, 20, 15, 18, 17, 16, 17
The mode is 17 as it is the most frequently occurring score
Mode of Grouped Data. The mode for the grouped data is identified as
the midpoint of the interval containing the largest number of cases of frequency.
Find the mode of the data in the table below:
Age Bracket F
21 - 25 4
26 – 30 3
31 – 35 6
36 – 40 5
41 – 45 10
46 – 50 8
51 - 55 5
56 - 60 6
61 - 65 2
Total 49
The mode ( Mo) is 43 because the class interval 41 – 45 has the largest
frequency and 43 is the midpoint of the class interval.
34
B. Measures of Variation. The measures of variability we will
consider are the range, the variance, and the standard deviation.
Range. The range is the distance between the lowest data point and the
highest data point. Deviation scores are the distances between each data point and
the mean.
The Range of a Set of Data. The range of a set of data is the difference
between the highest and lowest values in the set. To find the range, first, order the
data from least to greatest. Then subtract the smallest value from the largest value in
the set.
Illustrations:
1. John Erick took 7 math tests in one marking period. What is the range of
her test scores?
89, 73, 84, 91, 87, 77, 94
Solution: Ordering the test scores from least to greatest, we get:
73, 77, 84, 87, 89, 91, 94
Highest - lowest = 94 - 73 = 21
The range of these test scores is 21 points
In the problem above, the set of data consists of 7 test scores. It is ordered
the data from least to greatest before finding the range. It is recommended that you
do this, too. This is especially important with large sets of data.
2. The Ramos family drove through 6 cities on their summer vacation.

Gasoline prices varied from city to city. What is the range of gasoline prices?
$1.79, $1.61, $1.96, $2.09, $1.84, $1.75
Solution: Ordering the data from least to greatest, we get
$1.61, $1.75, $1.79, $1.84, $1.96, $2.09
Highest - lowest = $2.09 - $1.61 = $0.48
The range of gasoline prices is $0.48
35
3. A marathon race monitored by Mr. Redondo is completed by 5
participants. What is the range of times given in hours below?
2.7 hr, 8.3 hr, 3.5 hr, 5.1 hr, 4.9 hr
Solution: Ordering the data from least to greatest, we get:
2.7, 3.5, 4.9, 5.1, 8.3
Highest - lowest = 8.3 hr - 2.7 hr = 5.6 hr
The range of swim times is 5.6 hr.
4. Consider the following two sets of scores:
Set A: 4, 5, 6, 7, 8
Set B: 2, 4, 6, 8, 10
We can see that the means for these two sets of scores are the same:
Mean of set A is (4 + 5 + 6 + 7 + 8)/5 = 6
Mean for set B is (2 + 4 + 6 + 8 + 10)/5 = 6
Although these two sets of scores have the same Mean, they differ in how
spread out the scores are.
The scores in Set A vary over a smaller set of values (4 through 8) than does
Set B which varies over score values from 2 through 10.
We can represent this variability by a numerical index of variability called

the range.
Range = Highest Score - Lowest Score
Range for set A = 8 - 4= 4
Range for set B = 10 - 2= 8
Mean deviation is the average of the absolute values of the deviation

scores; that is, the mean deviation is the average distance between the mean and the
data points. Closely related to the measure of mean deviation is the measure of
variance.
Variance also indicates a relationship between the mean of a distribution

and the data points; it is determined by averaging the sum of the squared deviations.
36
Squaring the differences instead of taking the absolute values allows for greater
flexibility in calculating further algebraic manipulations of the data. A deviation
score is a measure of by how much each point in a frequency distribution lies above
or below the mean for the entire dataset:
-
X -X
Where: X = The raw score

-
X = The mean
To define the amount of deviation of a dataset from the mean, calculate the
mean of all the deviation scores, i.e. the variance:
For Population Variance:
σ 2 = ∑ ( X - μ) 2
N
Where:
σ2 = (Sigma squared), Population variance
( X - μ) 2 = Square of the differences
∑ ( X - μ) 2 = Sum of squared differences
N = The number of scores
For Sample Variance:
-
S2 = ∑ ( X - X ) 2
n-1
Where:
S2 = Symbol for the sample variance is s squared
X = Symbol for the sample mean is X bar
n = Sample size
37
Calculating Variance by the Deviation Score Method. Let us first
consider the deviation score method for calculating the variance for a population of
scores. Although in practice the deviation score is rarely used, it does help us
understand how variance is calculated. It follows our previous definition of variance
and could thus be referred to as the definitional method of calculating variance. The
formula for the variance for a population using the deviation score method is as
follows:
We can see that the formula truly does say what the definition of variance
says - variance is the average squared deviation of the scores from the mean. We will
use this formula to find the variance for the seven scores found in the following
worksheet. In the first column, we have the seven scores. At the bottom of the
column is the sum of the scores and this can be used to find the mean. The mean =
the sum of the scores divided by the number of scores or 28/7 = 4.
In the second column is recorded the value for the score minus the mean
and in the third column is recorded this value squared, that is the square of the
score minus the mean. At the bottom of the third column is the sum of the
squared deviations of the scores from the mean and this is the quantity we need
to calculate the variance.
Worksheet for Calculating the Variance for 7 scores

X X- μ (X - μ)2
5 1 1
3 -1 1
4 0 0
4 0 0
3 -1 1
4 0 0
5 1 1
28 4
Mean = 4
= 4
7
= 0.57
38
If we were to consider the scores above as a sample rather than a
population, we would use the formula for the variance for a sample using the
deviation score method. That formula is:
S2 = ∑ ( X – X )2
n-1
There are three differences between the formula for the sample variance and
the formula for the population variance. Thus, we can see that for the sample
variance for our problem above we simply divide by n-1 rather than N.
S2 = ∑ ( X – X ) 2
n-1
S2 = 4
7–1
= 0.667
For our problem then, the sample variance is 0.667 (slightly larger than the
population variance).
Calculating Variance by the Raw Score Method. When we calculate

the variance for a set of scores, we generally use the raw score method. This is also
the method used by computers or calculators to find the variance. The formula for
the variance by the raw score method is mathematically equivalent to the deviation
score method. The raw score formula for the variance is:
We can see from this formula that to find the population variance we sum up
the squared individual scores ∑ X 2 and subtract from them the sum of the scores
quantity squared divided by the number of scores.
(∑ X )2
N
39
The variance then is this quantity divided by the number of scores. We can
see that the only data we need to make this calculation are the sum of the scores, the
sum of the squared scores, and the number of scores.
The worksheet below shows these quantities.

X X2
5 25
3 9
4 16
4 16
3 9
4 16
5 25
28 116
The quantity recorded at the bottom of the first column is the sum of the
scores ∑ X and the quantity at the bottom of the second column is the sum of the
squared scores ∑ X 2. The number of scores N can be obtained by counting the
number of scores in the first column. We can put these quantities into the formula
for population variance to obtain 0.57, the same answer as we got with the deviation
score method.
116 - (28)2
= 7
7
= 116 - 112
7
= 4 / 7 = 0.57
If we were to consider the scores above as a sample rather than a

population, we would use the formula for the variance for a sample using the raw
score method. That formula is:
40
There are three differences between the formula for the sample variance and
the formula for the population variance
1. The symbol for the sample variance is s squared (S2 )
The symbol for the number of scores is n rather than N
2. The denominator of the equation uses the sample size -1 (n -1) rather
than N.
Thus, we can see that for the sample variance for our problem above we
simply divide by n-1 rather than N. For our problem then, the sample variance is
0.667 (slightly larger than the population variance).
116 - (28)2
= 7
7–1
= 116 - 112
7-1
= 4 / 6 = 0.667
Standard Deviation (SD) is the square root of the variance. This

calculation is useful because it allows for the same flexibility as variance regarding
further calculations and yet also expresses variation in the same units as the original
measurements.
To determine the standard deviation of a dataset:
1. Calculate the mean of all the scores

2. Find the deviation of each score from the mean
3. Square each deviation
4. Calculate the average of the deviations
5. Take the square root of the average deviation
41
Formulas:
For a Sample:
For a Population:
Standard Deviation by the Deviation Score Method. If we look at the

formula for the standard deviation for a population using the deviation score
method, we will see that this is true.
The formula tells us to take a score and subtract the mean from it and square
the difference. We then sum up all of these squared differences and divide by the
number of scores. Finally, we take the square root of the product and this is the
standard deviation.
Let's do a problem in finding the standard deviation using the same data we
used earlier to find the variance.
Worksheet for Calculating the Standard Deviation for 7 scores

X X- μ ( X - μ) 2
5 1 1
3 -1 1
4 0 0
4 0 0
3 -1 1
4 0 0
5 1 1
28 4
42
The sum of squared deviation scores is found at the bottom of the third
column and this is the quantity we need (as well as N) to find the standard deviation.
So using the formula we find the standard deviation to be 0.755
= 4
7
= 0.57
= 0.755
If our data is considered a sample instead of a population, we must use the

formula for the standard deviation of a sample using the deviation score method.
This differs from the population formula by substituting n-1 for N in the formula.
So, for our example, the sample standard deviation by the deviation score
method is 0.816.
Σ (X – X)2
S = n - 1
S = 4
7 - 1
S = 4
6
S = 0.667
S = 0.816
43
Standard Deviation by the Raw Score Method. We can calculate the
standard deviation for a population of scores using the raw score method by using
the following formula:
We can calculate the standard deviation for a sample of scores using

the raw score method by using the following formula:
The two formulas only differ by the term in the denominator. With the
population, the denominator consists of N, while for the sample we use n-1 in the
denominator.
Let’s use the scores in the following worksheet to find the standard deviation
for a population and a sample by the raw score method.

X X2
2 4
4 16
6 36
8 64
10 100
30 220
At the bottom of the first column is the sum of the scores (30) and at the
bottom of the second column is the sum of the squared scores (220).
We need these quantities and the number of scores (5) to calculate the
standard deviation.
The standard deviation for a population is:
44
220 – 180
= 220 – (30) 2 = 5
5
= 8
= 2.828
The standard deviation for a sample is:
220 – 180
= 220 – (30) 2 = 4
5
5-1
= 10 = 3.162
So, for our five scores, the standard deviation considering the scores as a
population is 2.828 while the standard deviation for the same five scores if we
consider them a sample is 3.162
45
Worksheet # 7
Assuming that you are interested in measuring the influence of self-efficacy

(belief in oneself) on grades attained in school. The data set below represents your
initial data collection from a single class of 20 BSBA students in their junior year.
The Self-Efficacy measurement is on a 100-pt. scale. Each score represents one
student’s overall rating.
Data Set – Scores of 20 BSBA Students

80 40 76 59 71
76 59 78 81 39
46 72 66 40 72
69 50 46 52 33
In the above study, compute the Mean, Median, and Mode for the data
set for the BSBA students’ self-efficacy scale.
Mean = ___________________
Median= __________________
Mode = __________________
46
Worksheet # 8
How would you work the Mean, Median, and the Mode of the following set
of numbers?
Try to think this through without looking back to the previous pages.
13, 11, 17, 14, 20, 6, 15, 15, 12, 6, 13, 24, 22, 19
Mean = ___________________
Median= __________________
Mode = __________________
47
Worksheet # 9
In the study of senior college students, they considered Socio-Economic

Status (SES) as a strong predictor of academic success.
The 25 junior students were asked to write a 1 next to their name on the role
sheet if their family income is less than P15, 000; a 2 if their income is P16, 000 to
P20, 000; 3 if their income is P21, 000 to P25, 000; 4 if their income is 26,000 to
30,000; 5 if their income is 31,000 to 35,000; 6 if their income is 36,000 or more.
The following data set is the results.
Data Set for 20 Junior Students for their SES

1 1 1 4 4
2 2 3 3 3
6 6 4 4 5
1 5 2 5 2
3 4 3 1 2
Translate the above SES data set into a Grouped Frequency

Distribution and compute the following: Mean; Median and Mode.
48
Worksheet # 10
Compute the weighted mean (wx) of each item and supply the
descriptive rating based on the computed mean:
5 – Always
4 – Oftentimes
3 - Sometimes
2 - Seldom
1 – Never
STATEMENTS Choices
WX DR
5 4 3 2 1
1) I tolerate criticisms.
79 62 54 50 45
2) I like to create and suggest.
83 58 51 45 53
3) I like to use instructional
materials. 79 69 68 29 45
4) I cooperate with my
teachers in doing. 92 64 53 46 35
5) I study and perform
assignments in science. 76 52 59 41 62
49
Worksheet # 11
The manager of Bank XXX is confused about the variations of the salary
given to the 10 rank-in-file employees. The following are the individual gross salary
received by the employees per month. Calculate the Variance by Deviation
Method:

X X- μ (X - μ)2
P 6,500
P 7,445
P 5,445
P 8,330
P 9,029
P 6,944
P 7,056
P 4,899
P 6,923
P 5,589
50
Worksheet # 12
The Head Nurse is trying to make necessary computations of the variance
regarding volume (liter) of dextrose consumed by 10 patients every three days of
confinement. Calculate the Variance by Raw Method:
Worksheet for Calculating the Variance for

X X2
9
7
4
8
6
3
5
4
7
5
51
Worksheet # 13
Calculate the Standard Deviation by the Deviation Method

X X- μ (X - μ)2
45
50
55
44
47
59
33
43
29
48
56
53
41
40
38
34
22
52
Worksheet # 14
Calculate the Standard Deviation by Raw Score Method
Worksheet for Calculating the Standard Deviation

X X2
44
23
23
22
17
11
22
29
18
44
29
26
54
16
48
33
27
37
53
Lesson 4
SAMPLING DESIGN
AND TECHNIQUE
=================
When undertaking any survey, it is essential that you obtain data from
people that are as representative as possible of the group that you are studying. Even
with the perfect questionnaire, your survey data will only be regarded as useful if it is
considered that your respondents are typical of the population as a whole.
For this reason, an awareness of the principles of sampling is essential to the

implementation of most methods of research, both quantitative and qualitative.
• Population refers to the group of people, items, or units under

investigation.
• Census is obtained by collecting information about each member of a
population.
• The sample is obtained by collecting information only about some
members of a "population".
• Sampling Frame refers to the list of people from which the sample is
taken. It should be comprehensive, complete, and up-to-date.
Examples: School Register; Department File; Birth Rate Book.
Sample Size. The sample size depends on:
• Methodology selected
• Degree of accuracy required for the study (how much error can be
tolerated)
• The extent to which there is variation in the population about key
characteristics of the study
• Likely response rate (which itself will depend on sampling method
selected)
• Time and money available
54
Probability and Non-Probability Sampling
A probability sample is one in which each member of the population has
an equal chance of being selected.
In a non-probability sample, some people have a greater, but unknown,

chance than others of selection.
1. Probability Samples. There are five main types of a probability sample. The
choice of these depends on the nature of the research problem, the availability of a
good sampling frame, money, time, desired level of accuracy in the sample, and data
collection methods. Each has its advantages, each it's disadvantages. They are:
• Simple random
• Systematic
• Random route
• Stratified
• Multi-stage cluster sampling
1.1 Simple random sample. This is perhaps an unfortunate term because

it isn't that simple and it isn't done at random, in the sense of "haphazardly".
Characteristics:
• Each person has the same chance as any other of being selected
• Standard against which other methods are sometimes evaluated
• Suitable where the population is relatively small and where the sampling
frame is complete and up-to-date
Procedure:
1. Obtain a complete sampling frame

2. Give each case a unique number, starting at one
3. Decide on the required sample size
4. Select that many numbers from a table of random numbers or using a
computer
Decide on a pattern of movement through the table and stick to it, e.g.
numbers from every second column and every row. If a number comes up twice or a
number is selected which is larger than the population number, discard it.
55
1.2. Systematic sampling. Similar to simple random sampling, but
instead of selecting random numbers from tables, you move through the list (sample
frame) picking every nth name.
You must first work out sampling fraction by dividing population size by
required sample size. For example, for a population of 500 and a sample of 100, the
sampling fraction is 1/5. Thus, you will select one person out of every five in the
population. The random number needs to be used only to decide on starting point.
With the sampling fraction of 1/5, the starting point must be within the first 5 people
on your list
Disadvantage: Effect of periodicity (bias caused by particular

characteristics arising in the sampling frame at regular units). An example of this
would occur if you used a sampling frame of adult residents in an area composed of
predominantly couples or young families. If this list was arranged: Husband / Wife /
Husband / Wife etc. and if every tenth person was to be interviewed, there would be
an increased chance of males being selected.
1.3. Random Route Sampling. Used in market research surveys - mainly

for sampling households, shops, garages, and other premises in urban areas. The
address is selected at random from the sampling frame (usually electoral register) as
a starting point. The interviewer gave instructions to identify further addresses by
taking alternate left- and right-hand turns at road junctions and calling at every nth
address (shop, garage, etc.)
Advantages:
• May be saving in time

• Bias may be reduced because the interviewer has to call at clearly
defined addresses - not able to choose
Problems:
• Characteristics of particular areas (upper class/middle class) may mean

that sample is not representative
• Open to abuse by interviewer because difficult to check that instructions
fully carried out
1.4. Stratified Sampling. All people in the sampling frame are divided
into "strata" (groups or categories). Within each stratum, a simple random sample or
systematic sample is selected.
An example of stratified sampling is if we want to ensure that a sample of 5

students from a group of 50 contains both male and female students in the same
proportions as in the full population. We first divide that population into males and
56
females. In this case, there are 22 male students and 28 females. To work out the
number of males and females in the sample, we have:
No. of males in sample = (5 / 50) x 22 = 2.2
No. of females in sample = (5 / 50) x 28 = 2.8
We obviously cannot interview .2 of a person or .8 of a person and have to

"round" the numbers. Therefore, we choose 2 males and 3 females in the sample.
These would be selected using simple random or systematic sample methods.
1.5. Multi-stage cluster sampling. As the name implies, this involves

drawing several different samples. It does so in such a way that the cost of final
interviewing is minimized.
Basic procedure: First draw a sample of areas. Initially, large areas

selected then progressively smaller areas within the larger area are sampled.
Eventually, end up with a sample of households and use the method of selecting
individuals from these selected households.
2. Non-Probability Samples. It isn't always possible to undertake a probability

method of sampling, such as in random sampling. For example, there is not a
complete sampling frame available for certain groups of the population e.g. the
elderly; people who are attending a football match; people who shop in a particular
part of town. Another factor to bear in mind is that many of the probability sampling
methods described above may mean that researchers would have to undertake a
postal or telephone survey delivery or might be expected to go from house to house.
We will discuss some of the problems of low response rate later on in this workbook,
but you might find that a probability sample with a poor response rate doesn't in the
end give you a particularly good representation of the population being examined.
Advantages of non-probability methods:
• Cheaper
• Used when sampling frame is not available
• Useful when the population is so widely dispersed that cluster sampling
would not be efficient
• Often used in exploratory studies, e.g. for hypothesis generation
• Some research is not interested in working out what proportion of the
population gives a particular response but rather in obtaining an idea of
the range of responses on ideas that people have.
2.1. Purposive Sampling. A purposive sample is one, which is selected by

the researcher subjectively. The researcher attempts to obtain a sample that appears
57
to him/her to be representative of the population and will usually try to ensure that a
range from one extreme to the other is included.
Often used in political polling - districts chosen because their pattern has in
the past provided a good idea of outcomes for the whole electorate.
2.2. Quota Sampling. Have you ever been ambling along your local High
Street, noticed a Market Researcher with a clipboard, and thought "I don't mind
being asked some questions - it might be interesting", only to find that the
researcher looks straight through you? No? Well, for those people who have had that
happen, there is no need to take it personally. It is all due to quota sampling.
Quota sampling is often used in market research. Interviewers are required

to find cases with particular characteristics. They are given a quota of particular
types of people to interview and the quota is organized so that the final sample
should be representative of the population.
Stages:
• Decide on the characteristic of which sample is to be representative, e.g.

age
• Find out the distribution of this variable in population and set quota
accordingly. For example, if 20% of the population is between 20 and
30, and the sample is to be 1,000 then 200 of the sample (20%) will be
in this age group
Complex quotas can be developed so that several characteristics (age, sex,

marital status) are used simultaneously. By the end of the day, the researcher may be
looking for a widowed man in his nineties who looks as though he might buy a
particular brand of detergent.
The disadvantage of quota sampling - Interviewers choose whom they

like (within the above criteria) and may therefore select those who are easiest to
interview, so bias can result. Also, impossible to estimate accuracy (because not a
random sample)
2.3. Convenience sampling. A convenience sample is used when you

simply stop anybody in the street who is prepared to stop, or when you wander
round a business, a shop, a restaurant, a theatre, or whatever, asking people you
meet whether they will answer your questions. In other words, the sample comprises
subjects who are simply available in a convenient way to the researcher. There is no
randomness and the likelihood of bias is high. You can't draw any meaningful
conclusions from the results you obtain.
2.4. Snowball sampling. With this approach, you initially contact a few
potential respondents and then ask them whether they know of anybody with the
58
same characteristics that you are looking for in your research. For example, if you
wanted to interview a sample of vegetarians/cyclists/people with a particular
disability/people who support a particular political party etc., your initial contacts
may well know (through e.g. support group) of others.
2.5. Self-selection. Self-selection is perhaps self-explanatory.

Respondents themselves decide that they would like to take part in your survey.
59
Worksheet # 15
Which of them do you feel would have simple random sampling as an

appropriate sampling method? Are there any limitations? In which would random
sampling be inappropriate? In each of the examples below, identify the sampling
frame you would use if you were to undertake a simple random sample.
1) A survey of disco goers on a Friday or Saturday night.

2) A study of employee’s attitude towards family planning in JAMES GORDON
MEMORIAL HOSPITAL.
3) A survey of Olongapo City residents’ attitudes to Nuclear Power Plant
4) Survey of a theatre audience following a play.
5) College students’ acceptability of school policies on discipline.
6) Feedback from hotel clients on perceptions of their last visit to that hotel.
7) Nutritional Status of Elementary School Pupils in the Division of Olongapo
City.
60
Worksheet # 16
1) Can you think of some different "strata" (other than marital status) that could
be used if you were undertaking a stratified sample of the adult population of
a certain city? Give example and Discuss.
2) Company ZANDREX employs 150 part-time staff and 550 full-time staff, and
you want to carry out depth interviews with 20 of these. If you were to take a
random sample, you might find that all of the names you selected were the
part-time staff. For this reason, you have decided to undertake a stratified
sample. Work out how many part-time and how many full-time employees you
should interview to accurately reflect the proportions of the two groups in the
whole workforce.
Answer = ________ Part-time and ________ Full-time
61
Worksheet # 17
You are employed by a DENR research department and have been asked to
undertake a study across the whole of the Country to determine attitudes towards
the Nuclear Power Plant.
What are the possible sampling frames for this study?
You have been asked to interview 10,000 people in total. Consider what
would happen if you undertook a simple random survey....... Of these 10,000
you might, for example, have to interview 2 people in NCR, 3 in the R -III, a dozen
scattered across the R - IV, and so on. As this is uneconomic, both in terms of time
and money, it would be considered more efficient to undertake a Multi-stage
Cluster Sample. Think of some ways in which you could divide the country up into
progressively smaller sections to undertake this type of sample.
62
Worksheet # 18
1) How might you go about undertaking a purposive sample of people aged 60

and over who live in the isolated area of Zambales to be sure of being able to
speak to both genders, the very old as well as the newly retired, and including a
range of socio-economic groupings?
2) If you were to undertake a convenience sample of people who were walking
past you in the main street in Boracay island on a Sunday morning, which
groups of the population would be over-represented and which groups would
be under-represented?
3) If you were undertaking a questionnaire on college student attitudes to banking
services at the BANK ZZZ and decided to deliver it through a convenience
sample in the bar of the Students Organization, which groups of students
would be over-represented and which under-represented?
4) Can you think of some ways in which you might draw attention to your project
when looking for a self-selected sample? What do you suppose is the main
disadvantage of self-selection?
63
Lesson 5
HYPOTHESIS TESTING
=================
To answer a statistical question, the question is translated into a
hypothesis - a statement that can be subjected to test. Depending on the result of
the test, the hypothesis is accepted or rejected.
The hypothesis tested is known as the null hypothesis (H0). This must be
a true/false statement. For every null hypothesis, there is an alternative
hypothesis (HA).
Formulation of Hypotheses
Inferential statistics is all about hypothesis testing. The research
hypothesis will typically be that there is a relationship between the independent and
dependent variable, or that treatment has an effect, which generalizes to the
population. On the other hand, the null hypothesis, upon which the probabilities are
based, will typically be that there is no relationship between the independent and
dependent variable, or that the treatment has no real effect. In other words,
apparent differences in the samples are flukes. To examine these hypotheses many
different kinds of statistical tests are used. These tests are chosen based on the kinds
of variables being examined and the kinds of samples.
Constructing and testing hypotheses is an important skill, but the best way
to construct a hypothesis is not necessarily obvious:
• The null hypothesis has priority and is not rejected unless there is strong
evidence against it.
• If one of the two hypotheses is 'simpler' it is given priority so that a more
'complicated' theory is not adopted unless there is sufficient evidence
against the simpler one.
• In general, it is 'simpler' to propose that there is no difference between two
sets of results than to say that there is a difference.
The outcome of a hypothesis testing is "reject H0" or "do not reject H0". If we
conclude, "do not reject H0", this does not necessarily mean that the null hypothesis
64
is true, only that there is insufficient evidence against H0 in favor of HA. Rejecting
the null hypothesis suggests that the alternative hypothesis may be true.
To decide whether to accept or reject the null hypothesis, the level of

significance of the result is used (0.05 or 0.01).
This allows us to state whether or not there is a "significant difference"

between populations, i.e. whether any difference between populations is a matter of
chance, e.g. due to experimental error, or so small as to be unimportant.
Steps for Hypothesis Testing

1. Define H0 and HA, based on the guidelines given above.
2. Choose a value for the level of significance.
3. Calculate the value of the test statistic.
4. Compare the calculated value with a table of the critical values of the test
statistic.
5. Decide to reject or fail to reject the null hypothesis.
6. State the result and conclusion.
Types of Error
Let us look first at the upper left-hand corner of the table. Let us suppose
you sampled ten people and from the study, you concluded that the drug-affected. It
appeared to reduce their diastolic blood pressure. If your conclusion is not the result
of a fluke, a product of sampling error or measurement error or something like that,
then your conclusion is correct, and you have the truth. You are in the upper left
corner of the table. On the other hand, if you conclude there is no effect, that is you
tried drug A to reduce diastolic blood pressure and you conclude it did not work, and
that is the case you've made another correct decision. You have the truth and you're
in the lower right-hand corner of the table. So, if the test showed no effect and there
is no effect, then you're in the lower right corner.
Reality
Study's Conclusion True Effect No Effect
Type I
True Effect (Reject null Truth or
hypothesis) Alpha Error
Type II
No Effect (Don't reject the null or Truth
hypothesis) Beta Error
Let us rephrase it. If there is an effect and you conclude there is an effect,
you have made the correct decision in the upper left corner. If there is no effect and
65
you conclude there is no effect, you've made the correct decision in the lower right
corner. In both these cases, you've made the correct decision.
Looking at the table it should be apparent that you could also make incorrect
decisions or errors. If there is no effect, but you conclude there is an effect, then
you've made an error, the error in the upper right corner, a Type I Error or alpha
error. Alpha is the probability of a Type I Error.
If there is an effect, but you conclude there is no effect, then you've made the
error in the lower-left corner, a Type II Error or beta error. Beta is the
probability of a Type II Error.
Sometimes people have a hard time remembering whether alpha and beta go
with Type I or Type II. Just remember both alpha and "I" are first and both beta and
"II" are second. They go together.
Let's be more concrete now. The first kind of error, Type I or alpha
errors are like false alarms. You think you're on to something; there's an effect. But,
in reality, there isn't. Type I Error occurs when you do an experiment and an
apparent effect isn't a real one. Maybe groups were a little strange so that they
responded better or worse to the drug than the average person might. So, you get an
experiment in which you seem to affect when in reality there is no effect. That's
called a Type I or alpha error. It's a false positive.
Now you can also have an experiment where you find no effect. A lot of
experiments are like this, where you have no effect that you can distinguish but the
treatment, in reality, was producing or could produce in the population a good
result. In this case, you've missed a good treatment, because our little experiment
didn't show an effect. That sort of error is beta error or Type II Error. It's a false
negative. So, Type I Error is a false positive, and Type II error is a false negative.
66
Worksheet # 19
Given the following Research Problems, formulate null and alternative

hypotheses and identify possible sources of errors.
1. Nutritional Status and Academic Performance of Grade School Students in

Olongapo City.
H0 :
HA :
2. School-related Practices of Parents of Grade VI Pupils at San Marcelino

Elementary School, Zambales as Perceived by the Teachers and Parents Themselves
during the Academic Year 2015-2016.
H0 :
HA :
3. Culture and Climate at Company XYZ in Olongapo City as Assess by the Managers
and Rank-in-File as Basis for more Efficient Production.
H0 :
HA :
67
4. Philosophical Practices of College Educators and Administrators in a certain
University as a Baseline Study to Enhance the Teaching-Learning Process.
H0 :
HA :
5. Environmental Performance Based on the Green Audit of Public Elementary

Schools as Evaluated by the School Administrators and Teachers in the District of
San Marcelino, Zambales during the academic year 2015-2016.
H0 :
HA :
6. Relationship of the Perceptual Strengths of College Students and their

Mathematics Performance during the Academic Year 2015-2016.
H0 :
HA :
68
Lesson 6
STATISTICAL
RELATIONSHIPS
AMONG VARIABLES
================
Statistical relationships between variables rely on notions of correlation.
This concept aims to describe how variables relate to one another
Correlation
Correlation tests are used to determine how strongly the scores of two
variables are associated – or correlated – with each other. A researcher might want
to know, for instance, whether a correlation exists between Accounting students'
pre-board examination scores and their scores on the CPA board examination.
Correlation is measured using values between +1.0 and -1.0. Correlations

close to 0 indicate little or no relationship between two variables, while correlations
close to +1.0 (or -1.0) indicate strong positive (or negative) relationships.
Correlation denotes positive or negative association between variables in

a study. Two variables are positively associated when larger values of one tend to
be accompanied by larger values of the other. The variables are negatively
associated when larger values of one tend to be accompanied by smaller values of
the other.
Examples:
1. An example of a strong positive correlation would be the correlation

between age and job experience. Typically, the longer people are alive, the more job
experience they might have.
2. An example of a strong negative relationship might occur between
the strength of people's party affiliations and their willingness to vote for a candidate
69
from different parties. In many elections, Party XXX is unlikely to vote for Party
ZZZ, and vice versa.
How to Evaluate a Correlation

Correlation is calculated as the level and direction of a relationship between
two variables X & Y. The range of values of a correlation coefficient is from "-1" to
"+1". The closer the value is to +1, the stronger the positive correlation, the closer the
value is to -1, the stronger the negative correlation. Monroe suggests:
Strength of Negative Positive

Correlation
Negligible Correlation "-0.20" - "0.00" "+0.20" - "0.00"
Low Correlation "-0.21" - "-0.40" "+0.21" - "+0.40"
Moderate Correlation "-0.41" - "-0.70" "+0.41" - "+0.70"
High Correlation "-.071" - "-0.90" "+0.71" - "+0.90"
Very High Correlation/ "-0.91" - "-0.99" "+0.91" - "+0.99"
Perfect Correlation "-1.00" "+1.00"
The Pearson Product Moment Correlation (r) is the most common

"correlation coefficient." Interval or ratio data is required for both variables to
calculate a "Pearson's" r.
The strength is measured 0 to 1, the direction is measured "+" positive or "-"

negative.
Correlation Does Not Imply Causation

Just because one variable relates to another variable does not mean that
changes in one cause change in the other.
Other variables may be acting on one or both of the related variables and
affect them in the same direction. Cause-and-effect may be present, but correlation
does not prove causation.
For example, the length of a person’s pants and the length of their legs are
positively correlated - people with longer legs have long pants, but increasing one’s
pant length will not lengthen one’s legs.
70
Statistical Significance of a Correlation
Table (r) is used to determine the significance of r. The left column df
represents the degrees of freedom: df = Npairs - 2 (the number of pairs of XY scores
minus 2). df represents the number of values that are free to vary when the sum of
the variable is set; df compensates for small values of N by requiring higher absolute
values of r before being considered significant.
Step 1. Find the degrees of freedom in the left column: df = Npairs of data
minus 2
Step 2. Read across the df row and compare the obtained r value with the
value listed in one of the columns. The heading at the top of each column indicates
the odds of a chance occurrence, (the probability of error when declaring r to be
significant.) p=.10 is the 10% probability; p=.05 is the 5% probability level, and
p=.01 is the 1% probability level.
If r locates between values in any two columns, use the left of the two
columns (greater odds for chance). If r does not equal or exceed the value in the
p=.10 column, it is said to be non-significant (NS). Negative r’s are read using the
absolute value of r.
Values of the Correlation Coefficient (r)
df p = .10 p = .05 p = .01
1 .9877 .9969 .9999
2 .900 .950 .990
3 .805 .878 .959
4 .729 .811 .917
5 .669 .754 .875
6 .621 .707 .834
7 .582 .666 .798
8 .549 .632 .765
9 .521 .602 .735
10 .497 .576 .708
11 .476 .553 .684
12 .457 .532 .661
13 .441 .514 .641
14 .426 .497 .623
15 .412 .482 .606
16 .400 .468 .590
17 .389 .456 .575
18 .378 .444 .561
19 .369 .433 .549
20 .360 .423 .537
25 .323 .381 .487
30 .296 .349 .449
35 .275 .325 .418
40 .257 .304 393
45 .243 .288 .372
50 .231 .273 .354
60 .211 .250 .325
70 .195 .232 .302
80 .183 .217 .283
90 .173 .205 .267
100 .164 .195 .254
71
Pearson Product-Moment Correlation
Coefficient
The most widely used type of correlation coefficient is Pearson r also called
linear or product-moment correlation. Using non-technical language, one can
say that the correlation coefficient determines the extent to which values of X and Y
variables are "proportional" to each other. The value of the correlation does not
depend on the specific measurement units used; for example, the correlation
between height and weight will be identical regardless of whether inches and pounds
or centimeters and kilograms are used as measurement units. Proportional means
linearly related; that is, the correlation is high if it can be approximated by a straight
line (sloped upwards or downwards).
This line is called the regression line or least-squares line because it is

determined such that the sum of the squared distances of all the data points from the
line is the lowest possible. Pearson correlation assumes that the two variables are
measured on at least interval scales.
Formula:
or
Where:
N = Represents the number of pairs of data

∑ = Denotes the summation of the items indicated
∑X = Denotes the sum of all X scores
∑X2 = Indicates that each X score should be squared and then those
squares summed
(∑X)2 = Indicates that the X scores should be summed and the total
square. [avoid confusing ∑X2 (the sum of the X squared scores) and
(∑X)2 (the square of the sum of the X scores]
∑Y = Denotes the sum of all y-scores
∑Y2 = Indicates that each Y score should be squared and then those
squares summed
72
(∑Y)2 = Indicates that the Y scores should be summed and the total
squared
∑XY = Indicates that each X score should be first multiplied by its
corresponding Y score and the product (XY) summed
Illustrations:
1. Calculate the correlation between Mathematics (X) and Physics

(Y) for the 10 students whose scores appeared in the Table below.
By looking at the formula we can see that we need the following items to
calculate r using the raw score formula:
1. The number of subjects, N

2. The sum of each subjects X score times the Y score, summation XY
3. The sum of the X scores, summation X
4. The sum of the Y scores, summation Y
5. The sum of the squared X scores, summation X squared
6. The sum of the squared Y scores, summation Y squared
Each of these quantities can be found as shown in the computation table

below:
Correlation Between Math and Physics

Student Math (X) Physics (Y) X2 Y2 XY
1 3 11 9 121 33
2 7 1 49 1 7
3 2 19 4 361 38
4 9 5 81 25 45
5 8 17 64 289 136
6 4 3 16 9 12
7 1 15 1 225 15
8 10 9 100 81 90
9 6 15 36 225 90
10 5 8 25 64 40
Sum 55 103 385 1401 506
73
Computations:
= (10)(506) - (55)(103)
(10)(385) – (55)2 (10)(1401) – (103)2
= 5060 - 5665
3850 - 3025 14010 - 10609
= -605
(28.723) (58.318)
= -605
1675.0679
r = - 0.36
The correlation we obtained was -.36, showing us that there is a low

negative correlation between Mathematics and Physics. The correlation
coefficient is a number that can range from -1 (perfect negative correlation) through
0 (no correlation) to 1 (perfect positive correlation).
74
2. Relationship of Scores on 20-point English and Solving Problem quizzes
of five (5) students.
Student English (X) Solving Problem (Y)

A 11 11
B 13 10
C 18 17
D 12 13
E 16 14
N=5 70 65
Student English Solving Problem XY X2 Y2

(X) (Y)
A 11 11 121 121 121
B 13 10 130 169 100
C 18 17 306 324 289
D 12 13 156 144 169
E 16 14 224 256 196
N =5 70 65 937 1014 875
Computations:
= (5)(937) - (70)(65)
(5)(1014) – (70)2 (5)(875) – (65)2
= 4685 - 4550
5070 - 4900 4375 - 4225
= 135
(13.038) (12.247)
= 135
159.68
r= 0.845
75
There is indeed a high positive correlation between the English and
Problem-Solving scores.
Steps in Testing Statistical Hypothesis
1. State the null hypothesis and the alternative hypothesis based on

your research question.
H 0: r = 0
H1: r > 0
Note: Our null hypothesis, for the Pearson r, states that r is 0. The alternative
hypothesis states that r has a significant positive value.
2. Set the alpha level.
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making
a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the

degrees of freedom for the statistical test if necessary.
r = 0.845
df = N - 2 = 5 - 2 = 3
4. Write the decision rule for rejecting the null hypothesis.
Reject H0 if r >= .878
5. Write a summary statement based on the decision.
Reject H0, p < .05, one-tailed
Note: Since our calculated value of r of .845 is lesser than .878 we accept the null
hypothesis.
6. Write a statement of results.
There is no significant correlation between English and Problem Solving.
76
Spearman Rho
The Spearman Rho correlation tells you the magnitude and direction of
the association between two variables that are on an interval or ratio scale.
Hypotheses:
Null: There is no association between the two variables.

Alternate: There is an association between the two variables.
How to Describe Output from a Bi-variate

Correlation (Spearman’s Rho)
A bi-variate correlation is a statistic that measures how well two variables fit
together. In this case, we are using Spearman’s Rho correlation coefficient which is
a nonparametric statistic. This just means that the two variables compared do not
have to have a normal distribution.
Correlation coefficients can vary from -1.0 to +1.0. The Rho here of
.382 is positive, which means that a higher score on the commitment scale is
correlated with a higher score on the achievement scale. Thus we can reject the null
hypothesis that these variables are not related. A Rho of 0.0 would mean that there
is no correlation between the two variables. A Rho of .382 means there is a
moderate correlation between the two scales. It appears that individuals who score
high on intellectual commitment also score high on intellectual achievement.
A correlation coefficient is a number between +1 and -1. This number tells us

about the magnitude and direction of the association between two variables.
The magnitude is the strength of the correlation. The closer the correlation
is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very close
to 0, there is no association between the two variables. Here, we have a moderate
correlation (r = -.392).
The direction of the correlation tells us how the two variables are related.
If the correlation is positive, the two variables have a positive relationship (as one
increases, the other also increases). If the correlation is negative, the two variables
have a negative relationship (as one increases, the other decreases). Here, we have a
negative correlation (r = -.392). As self-esteem increases, anxiety decreases.
Spearman's rho is a measure of the linear relationship between two

variables. It differs from Pearson's correlation only in that the computations are
done after the numbers are converted to ranks. When converting to ranks, the
smallest value on X becomes a rank of 1, etc. Consider the following X-Y pairs:
77
X Y
7 4
5 7
8 9
9 8
Converting these to ranks would result in the following:
X Y
2 1
1 2
3 4
4 3
The first value of X (which was a 7) is converted into a 2 because 7 is the

second lowest value of X. The X value of 5 is converted into a 1 since it is the lowest.
Spearman's rho can be computed with the formula for Pearson's r using the ranked
data. For this example, Spearman's rho = 0.60 Spearman's rho is an example of a
"rank-randomization" test
The formula is:
rs = 1 - 6 ∑D2
N3 - N
Where:
D = Rank of X – Rank of Y (Difference score).

N = Number of Cases
∑ = Summation Symbol
Illustrations:
1. Compute the relationship of Accounting and Mathematics

Performances of five students given below:
Student Accounting Performance Mathematics Performance

A 3 3
B 1 (Highest) 2
C 2 1 (Highest)
D 5 4
E 4 5
N=5
78
Student English (X) Solving Problem (Y) D D2
A 3 3 0 0
B 1 2 -1 1
C 2 1 1 1
D 5 4 1 1
E 4 5 -1 1
N =5 0 4
Computations:
rs = 1 - 6 ∑D2
N3 - N
= 1 – 6 (4)
53 - 5
= 1– 24
125 - 5
= 1 - 24
120
= 1 - .2
rs = 0.80 ; High Positive Correlation
79
2. We have seven students whose art projects are being ranked by two judges, and
we wish to know if the two judges’ rankings are related to one another. In other
words, do the two judges tend to rank the seven art show contestants in the same
way?
The following table contains the data for this problem.
Judge A's Judge B's

Student D D2
Ranking Ranking
Erick 1 2 -1 1
Obeth 2 1 1 1
Jecka 3 4 -1 1
Ronald 4 3 1 1
John 5 6 -1 1
Aries 6 7 -1 1
Jeff 7 5 2 4
Sum 10
Computations:
rs = 1 - 6 ∑D2
N3 - N
= 1– 6(10)
73 - 7
= 1 – 60
343 - 7
= 1 - 60
336
= 1 - 0.1786
rs = 0.82 ; High Positive Correlation
80
Thus, we can see that there is a high positive correlation, but not a perfect
correlation between the two judges’ ratings.
Illustration of Statistical Test

Is there a significant positive correlation between the rankings
of 10 students on an Oral Commutation test and their teacher's ranking
of their Speaking ability?
In this problem, we are relating a set of scores (interval level of

measurement) with the teacher's ranking of the student in speaking (ordinal level of
measurement).
To do this we first convert the reading test scores to ranks by assigning the
highest score a rank of 1, the next highest a rank of 2, etc.
Now we are looking at rankings on two variables and can use the
Spearman Rank-Difference Correlation Coefficient to test the significance
of the relationship. The two sets of ranks, as well as the difference between the pairs
of ranks (D) and the differences squared (D2), are shown in the following table.
Oral Communication Test Teacher's Ranking

D D2
Score Rank on Speaking Ability
1 3 -2 4
2 2 0 0
3 1 2 4
4 4 0 0
5 5 0 0
6 6 0 0
7 8 -1 1
8 7 1 1
9 10 -1 1
10 9 1 1
Total 12
81
Computations:
= 1 – 6(12)
103 - 10
= 1 – 72
1000 - 10
= 1 - 72
990
= 1 – 0.072
rs = 0.93 ; Very High Positive Correlation
df = N - 2 = 10 - 2 = 8
Steps in Testing Statistical Hypothesis

We now have the information we need to complete the six steps process for
testing statistical hypotheses for our research problem.

H0: rS = 0
H1: rS > 0
Note: Our null hypothesis states that there is no significant relationship between the
two variables. The alternative hypothesis states that there is a significant positive correlation
between the two variables.
α = .05
82
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of
making a type I error.

rS = .93
df = N - 2 = 10 - 2 = 8
Reject H0 if rS >= .632
Note: To write the decision rule we had to know the critical value for r S,
with an alpha level of .05 and 8 degrees of freedom. We can do this by looking at
Appendix Table, this is the same table we used for the Pearson r, and noting the
tabled value for the column for the .05 level and the row for 8 df (.632).
Note: Since our calculated value of r S (.93) is greater than .632, we reject
the null hypothesis and accept the alternative hypothesis.
There is a significant positive correlation between the student's ranks on an

oral communication test and their teacher's ranking of them on speaking ability.
83
Worksheet # 20
State the null hypothesis, the alternative hypothesis, set the alpha
level, the calculated value of the statistic, the decision rule, the decision
statement, and the statement of result.
Problem: A professor investigates the relationship between the number of minutes

needed to complete an examination and the score on the examination. Using the
following data, determine whether there is a significant negative relationship
between these two variables.
Time Examination
(minutes) Score
34 78
46 92
60 56
37 82
41 96
54 66
56 79
48 85
39 92
50 80
84
Worksheet # 21
State the null hypothesis, the alternative hypothesis, set the alpha level, the
calculated value of the statistic, the decision rule, the decision statement, and the
statement of result. Problem:
The two tasters rank twelve brands of chocolate. Is there a significant

positive correlation between these two sets of ranks?
Rank 1 Rank 2
1 2
2 5
3 4
4 1
5 6
6.5 3
6.5 7.5
8 10
9 9
10 12
11.5 7.5
11.5 11
85
Worksheet # 22
Problem: Below is the mean evaluations done by the evaluators and the number of
minutes devoted to doing the project by the eleven contestants. Is there a
significant positive correlation between these two variables?
Mean Number of Minutes

Evaluation Devoted
4.55 25
3.40 19
3.26 12
3.73 23
4.05 20
3.88 19
4.00 27
3.00 30
4.79 27
3.55 19
3.00 20
86
Worksheet # 23
Problem: Two different medical doctors evaluated ten brands of milk. Based upon
each judge's evaluation, the milk is ranked from 1 to 10. Is there a significant
positive correlation between the rankings of the two doctors?
Ranking of Milk by Two Doctors

Brand Doctor 1 Doctor 2
A 5 6
B 8 5
C 2 1
D 10 7
E 1 3
F 6 10
G 9 8
H 3 2
I 7 9
J 4 4
87
Worksheet # 24
Problem: Find the Spearman Rank Order Correlation Coefficient between the two
rankings.
Ranks on Education and Leadership for Five Students

Students Education Rank Leadership Rank
Chanda 1 4
Edhel 2 5
Jecka 3 2
Mike 4 3
Urzula 5 1
88
Lesson 7
CHI-SQUARE TEST
=================
Chi-square (X2) Test
A test is similar to tests of correlation in that it measures the strength of
associations between variables. The Chi-square test can be used to test associations
in one or more groups, and it does this by comparing actual (observed) numbers in
each group, with those that would be expected according to theory or simply by
chance. The Chi-square test requires that the data be expressed as frequencies, i.e.
numbers in each category; this is the nominal level of measurement. It should be
noted that in most cases almost any data can be reduced to categorical or frequency
data, but it is not always wise to do this because the information is invariably lost in
the process.
For example, the weight (interval measurement) of individual members of

a group may be different for each member of that group, but the individuals could be
assigned to one of two categories (over weight and underweight), by use of a suitable
cut off point in the data. The data would then be categorical in that you would have
the numbers of people in the category "overweight", and the numbers in the category
"underweight"; but in doing this the researcher has lost a lot of information about
the weight of individuals in the group.
To be reliable the Chi-Square statistic also requires that the

expected frequencies in each category should not fall below 5 - this can
cause problems when the sample size is relatively small.
Finally, the different categories used must be independent of each other.

This means that it must not be possible for data to fall into more than one category.
For example, if the effectiveness of two different treatments was being

compared and some of the patients were receiving both treatments, then Chi-square
could not be used for the analysis.
89
We will now consider a widely used non-parametric test, Chi-square,
which we can use with data at the nominal level that is classificatory data.
For example, we know the frequency with which entering freshman

computer science students, when required to purchase a computer for their personal
use, select Macintosh Computers, IBM Computers, or Some other brand of
computer. We want to know if there is a difference among the frequencies
with which these three brands of computers are selected or if they
choose equally among the three brands. This is a problem we can use the chi-
square statistic for.
The chi-square statistic is used to compare the observed frequency of

some observations (such as frequency of buying different brands of computers) with
an expected frequency (such as buying equal numbers of each brand of a
computer). The comparison of observed and expected frequencies is used to
calculate the value of the chi-square statistic, which in turn can be compared with
the distribution of chi-square to make an inference about a statistical problem.
The symbol for chi-square and the formula are as follows:
X2 = ∑ ( 0 – E ) 2
E
Where:
O is the Observed frequency
E is the Expected frequency
The degrees of freedom (df) for the one-dimensional chi-square

statistic is:
df = C - 1
Where:
C is the number of categories or levels of the independent variable.
90
One-Variable Chi-Square (Goodness-of-
Fit Test)
We can use the Chi-square statistic to test the distribution of measures over
levels of a variable to indicate if the distribution of measures is the same for all
levels. This is the first use of the one-variable chi-square test. This test is also
referred to as the goodness-of-fit test.
Using the example, we already mentioned the frequency with which entering
freshman, when required to purchase a computer for college use, select Macintosh
Computers, IBM Computers, or Some other brand of computer. We want to know if
there is a significant difference among the frequencies with which these
three brands of computers are selected or if the students select equally
among the three brands.
The data for 100 students are recorded in the table below (the observed
frequencies).
We have also indicated the expected frequency for each category. Since there
are 100 measures or observations and there are three categories (Macintosh, IBM,
and Other) we would indicate the expected frequency for each category to be 100/3
or 33.333. In the third column of the table, we have calculated the square of the
observed frequency minus the expected frequency divided by the expected
frequency. The sum of the third column would be the value of the chi-square
statistic.
Computer Observed Expected (O-E)2/E

Frequency Frequency
IBM 47 33.333 5.604
Macintosh 36 33.333 0.213
Other 17 33.333 8.003
Total (chi-square) 100 13.820
X2 = ∑ ( O – E ) 2
E
X2 = 5.604 + 0.213 + 8.003 = 13.820
df = C - 1 = 3 - 1 = 2
We can compare the obtained value of chi-square with the critical value for
the .05 level and with degrees of freedom of 2 obtained from Appendix Table
(Distribution of Chi-Square) Looking under the column for .05 and the row for df =
2 we see that the critical value for chi-square is 5.991.
91
Application of the Statistical Test
1. State the null hypothesis and the alternative hypothesis based on your
research question.
Ho : O = E
H1: O ≠ E
Note: Our null hypothesis, for the chi-square test, states that there are no
differences between the observed and the expected frequencies. The alternate hypothesis
states that there are significant differences between the observed and expected frequencies.
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a
type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees
of freedom for the statistical test if necessary.
X2 = 13.820
df = C - 1 = 2
Reject H0 if X2 >= 5.991.
Note: To write the decision rule we had to know the critical value for chi-square, with an
alpha level of .05 and 2 degrees of freedom. We can do this by looking at Appendix Table and
noting the tabled value for the column for the .05 level and the row for 2 df.
Reject H0, p < .05
Note: Since our calculated value of X2 (13.820) is greater than 5.991, we reject the null
hypothesis and accept the alternative hypothesis.
6. Write a statement of results
There is a significant difference among the frequencies with which students

purchased three different brands of computers.
92
Two-Variable Chi-Square (Test of
Independence)
Now let us consider the case of the two-variable chi-square test, also known
as the test of independence.
For example, we may wish to know if there is a significant difference

in the frequencies with which males come from small, medium, or large
cities as contrasted with females.
The two variables we are considering here are hometown size (small,
medium, or large) and gender (male or female). Another way of putting our research
question is: Is gender independent of the size of hometown? The data for 30 females
and 6 males is in the following table.
The frequency with which Males and Females come from Small,
Medium, and Large cities
Small (S) Medium (M) Large (L) Total
Female 10 14 6 30
Male 4 1 1 6
Total 14 15 7 36
X2 = ∑ ( 0 – E ) 2
E
Where:
O is the Observed frequency, and
E is the Expected frequency.
The degrees of freedom (df) for the two-dimensional chi-square

statistic is:
df = (C - 1)(R - 1)
Where:
C is the number of columns or levels of the first variable
R is the number of rows or levels of the seconded variable.
93
In the table above we have the observed frequencies (six of them). Now we
must calculate the expected frequency for each of the six cells. For two-variable chi-
square we find the expected frequencies with the formula:
Expected Frequency for a Cell = (Column Total X Row

Total)/Grand Total
In the table above we can see that the Column Totals are 14 (small), 15
(medium), and 7 (large), while the Row Totals are 30 (female) and 6 (male). The
total is 36.
Using the formula we can thus find the expected frequency ( E ) for each
cell.
1. E for the S female cell is 14X30/36 = 11.667
2. E for the M female cell is 15X30/36 = 12.500
3. E for the L female cell is 7X30/36 = 5.833
4. E for the S male cell is 14X6/36 = 2.333
5. E for the M male cell is 15X6/36 = 2.500
6. E for the L male cell is 7X6/36 = 1.167
We can put these expected frequencies in our table and also include the
values for (O - E)2/E. The sum of all these is the value of Chi-square.
Small (S) Medium (M) Large (L) Total
(O- (O- (O-

O E O E O E
E)2/E E)2/E E)2/E
Female 10 11.66 0.238 14 12.50 0.180 6 5.83 0.005 30
Male 4 2.333 1.191 1 2.50 0.900 1 1.16 0.024 6
Total 14 15 7 36
X2 = ∑ ( 0 – E ) 2
E
X2 = 0.238 + 0.180 + 0.005 + 1.191 + 0.900 + 0.024 = 2.538
df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2
94
We now have the information we need to complete the six-step process for
testing statistical hypotheses for our research problem.

Ho : O = E
H1: O ≠ E
α = .05

X2 = 2.538
df = (C - 1)(R - 1) = (2)(1) = 2
Reject H0 if X2 >= 5.991.
Note: To write the decision rule we had to know the critical value for chi-
square, with an alpha level of .05 and 2 degrees of freedom. We can do this by
looking at Appendix Table and noting the tabled value for the column for the .05
level and the row for 2 df.
Fail to reject H0
Note: Since our calculated value of X2 (2.538) is not greater than 5.991,
we fail to reject the null hypothesis and are unable to accept the alternative
hypothesis.
There is not a significant difference in the frequencies with which males

come from small, medium, or large towns as compared with females. Hometown size
is not independent of gender.
95
Worksheet # 25
Problem: Students from the various colleges are classified as to their school
organization membership and their department. Is belonging to a school
organization membership independent of the department where they
belong?
Belong to Do not Belong to

Department
School Organization School Organization
BSBA 19 16
AS 11 14
BSN 16 9
BSCE 19 6
EDUC 14 20
ITC 10 5
CS 17 12
96
Worksheet # 26
Problem: A consumer research group asked a group of women to use each kind of
lotion for one month. After the trial period, each man indicated the lotion he
preferred. Using the results below, determine whether there is a significant
preference for any of the brands of lotions.
Lotion Brand Number of Women Preferring

A 42
B 36
C 22
D 33
E 57
F 42
G 19
H 24
I 34
97
Worksheet # 27
Problem: A major airline is considering eliminating the first-class section on its

airplanes. To determine how this move would be accepted, the airline asked several
people across the country their opinion about eliminating the first-class section. All
people sampled had previously booked a flight with the airline. These people were
also asked how many times they fly during the year. Using the data below, determine
whether a person's opinion is independent of how frequently the person flies.
Opinion Flying Frequency

Frequent Moderate Little
Yes 206 167 181
No 198 151 230
98
Worksheet # 28
Problem: The researcher surveys 100 men and 100 women and determines the
numbers who are deemed healthy or unhealthy. The data is shown in the 2x2
contingency table below.
Healthy Unhealthy Total

Men 55 45 100
Women 42 58 100
Total 97 103 200
99
Lesson 8
STATISTICAL
DIFFERENCES
BETWEEN GROUPS
=================
Statistical tests can be used to analyze differences in the scores of two or more
groups. The following statistical tests are commonly used to analyze differences
between groups.
t-Tests
A t-test is used to determine if the scores of two groups differ on a single
variable. A t-test is designed to test for the differences in mean scores. For instance,
you could use a t-test to determine whether performance differs among students in
two classrooms.
Note: A t-test is appropriate only when looking at paired data. It is

useful in analyzing scores of two groups of participants on a particular variable or in
analyzing scores of a single group of participants on two variables.
Two-Sample Tests
This is a very common problem to compare the means of, for example, the
experimental group and the control group on some independent variable.
The two-sample t-tests we will consider are
1. The independent t-test, which is used to compare two sample means

when the two samples are independent of one another.
2. The non-independent or dependent t-test is used for matched
samples (where the two samples are not independent of one another as they are
100
matched) and for pre-test/post-test comparisons where the pre-test and post-test
are taken on the same group of subjects.
t-Test for independent samples

Purpose, Assumptions. The t-test is the most commonly used method to
evaluate the differences in means between two groups.
For example, the t-test can be used to test for a difference in test scores
between a group of students who were given a traditional method of teaching and a
control group who received a modular approach. Theoretically, the t-test can be used
even if the sample sizes are very small (as small as 10; some researchers claim that
even smaller n's are possible), as long as the variables are normally distributed
within each group and the variation of scores in the two groups is not reliably
different.
The p-level reported with a t-test represents the probability of error involved
in accepting our research hypothesis about the existence of a difference. Technically
speaking, this is the probability of error associated with rejecting the hypothesis of
no difference between the two categories of observations (corresponding to the
groups) in the population when, in fact, the hypothesis is true. Some researchers
suggest that if the difference is in the predicted direction, you can consider only one
half (one "tail") of the probability distribution and thus divide the standard p-level
reported with a t-test (a "two-tailed" probability) by two. Others, however, suggest
that you should always report the standard, two-tailed t-test probability.
Arrangement of Data. To perform the t-test for independent samples,

one independent (grouping) variable (Gender: male/female) and at least one
dependent variable (a test score) is required. The means of the dependent variable
will be compared between selected groups based on the specified values (male and
female) of the independent variable.
The formula for the independent t-test is:
Where:
X1 = Mean for group 1,
X2 = Mean for group 2,
101
SS1 = The sum of squares for group 1,
SS2 = The sum of squares for group 2,
n1 = Number of subjects in group 1, and
n2 = Number of subjects in group 2.
The sum of squares is a new way of looking at variance. It gives us an

indication of how spread out the scores in a sample is. The t-value we are finding is
the difference between the two means divided by their sum of squares and
considering the degrees of freedom.
SS1 = ∑ X1 2 - ( ∑ X1 ) 2
n1
and
SS2 = ∑ X2 2 - ( ∑ X2 ) 2
N2
We can see that each sum of squares is the sum of the squared scores in the
sample minus the sum of the scores quantity squared divided by the size of the
sample (n).
So, to calculate the independent-t value we need to know:
1. The mean for sample or group 1

2. The mean for sample or group 2
3. The summation X and summation X squared for group 1
4. The summation X and summation X squared for group 2
5. The sample size for group 1 (n1)
6. The sample size for group 2 (n2)
We also need to know the degrees of freedom for the independent t-test
which is:
df = n1 + n2 - 2
102
Steps in Looking for the t-critical value
1. Determine Alpha. Take alpha from the problem and if (a) test is one-
tailed - leave it alone, or (b) if it is a two-tailed test the divide alpha by 2 (alpha/2)
2. Determine Degrees of Freedom (df). Use the associated df from the

test you have chosen for your business problem. The basic rule is that for each
sample there will be n-1 degrees of freedom. For instance, in a test of two means
where there are 2 samples used, the df becomes n1 + n2 - 2 (which is the same as
(n1 - 1) + (n2 - 1)).
3. Locate df in Table. The first column of the t-table lists df. Locate the df
from step 2, if that number of df is not in the table - round DOWN to the next
nearest number.
4. Find t-Critical Value. Using the row associated with the df identified in
step 3, go over to the number under the column representing your alpha level from
step 1 (ie. - if alpha = .05 then the critical value would be in the column labeled
"t.050") This number is then your critical value.
Illustration: Research Problem: Job satisfaction as a function of work

schedule was investigated in two different factories. In the first factory, the
employees are on a Night shift system while in the second factory the workers have a
Day shift system. Under the Night shift system, a worker always works the same
shift, while under the Day shift system; a worker works the whole shift. Using the
scores below determine if there is a significant difference in job satisfaction
between the two groups of workers.
Work Satisfaction Scores for Two Groups of Workers

Night Shift (X1 ) Day Shift (X2)
79 63
83 71
68 46
59 57
81 53
76 46
80 57
74 76
58 52
49 68
68 73
103
We can calculate the quantities we need to solve this problem as follows:
Worksheet to calculate independent t-test value.

X1 (X1)2 X2 (X2)2
79 6241 63 3969
83 6889 71 5041
68 4624 46 2116
59 3481 57 3249
81 6561 53 2809
76 5776 46 2116
80 6400 57 3249
74 5476 76 5776
58 3364 52 2704
49 2401 68 4624
68 4 624 73 5329
775 55837 662 40982
We can use the totals from this worksheet and the number of subjects in
each group to calculate the sum of squares for group 1, the sum of squares for group
2, the mean for group 1, the mean for group 2, and the value for the independent t.
SS1 = ∑ X1 2 - ( ∑ X1 ) 2
n1
SS1 = 55837 - ( 775 ) 2

11
= 55837 - 600625
11
= 55837 - 54602.27
= 1234.73
SS2 = ∑ X2 2 - ( ∑ X2 ) 2
N2
SS2 = 40982 - ( 662 ) 2

11
104
= 40982 - 438244
11
= 40982 - 39840.36
= 1141.64
X1 = 775 = 70.45
11
X2 = 662 = 60.18
11
t = 70.45 - 60.18
1234.73 + 1141.64 1 + 1
11 + 11 –2 11 11
= 70.45 - 60.18
2376.37 .182
20
= 10.27
(118.82) (.182)
= 10.27
4.650
= 2.209
105
Ho : μ1 = μ 2
H1 : μ 1 ≠ μ 2
Note: Our problem did not state which direction of significance we will be looking for;
therefore, we will be looking for a significant difference between the two means in either direction.

α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.

t = 2.209
df = n1 + n2 - 2 = 11 + 11 - 2 = 20
Note: We have calculated the t-value and will also need to know the degrees of freedom when we go to
look up the critical values of t.
Reject H0 if t is >= 2.086 or if t <= -2.086
Note: To write the decision rule we need to know the critical value for t, with an alpha level
of .05 and a two-tailed test. We can do this by looking at Appendix (Distribution of t). Look for the
column of the table under .05 for the Level of significance for two-tailed tests, read down the column
until you are level with 20 in the df column, and you will find the critical value of t which is 2.086. That
means our result is significant if the calculated t value is less than or equal to -2.086 or is greater than or
equal to 2.086.
Reject H0, p < .05, two-tailed
Note: Since our calculated value of t (2.209) is greater than or equal to 2.086, we reject the
null hypothesis and accept the alternative hypothesis.
There is a significant difference in job satisfaction between the two groups of

workers.
106
t-Test for Dependent Samples
Within-group Variation. As explained, the size of a relation between two
variables, such as the one measured by a difference in means between two groups,
depends to a large extent on the differentiation of values within the group.
Depending on how differentiated the values are in each group, a given "raw
difference" in-group means will indicate either a stronger or weaker relationship
between the independent (grouping) and dependent variable.
Purpose. The t-test for dependent samples helps us to take advantage

of one specific type of design in which an important source of within-group
variation (or so-called, error) can be easily identified and excluded from the
analysis. Specifically, if two groups of observations (that are to be compared) are
based on the same sample of subjects who were tested twice (before and after
treatment), then a considerable part of the within-group variation in both groups of
scores can be attributed to the initial individual differences between subjects.
Note that, in a sense, this fact is not much different than in cases when the
two groups are entirely independent, where individual differences also contribute to
the error variance; but in the case of independent samples, we cannot do anything
about it because we cannot identify (or "subtract") the variation due to individual
differences in subjects. However, if the same sample was tested twice, then we can
easily identify (or "subtract") this variation. Specifically, instead of treating each
group separately, and analyzing raw scores, we can look only at the differences
between the two measures ("pre-test" and "posttest") in each subject. By subtracting
the first score from the second for each subject and then analyzing only those "pure
(paired) differences," we will exclude the entire part of the variation in our data set
that results from unequal base levels of individual subjects. This is precisely what is
being done in the t-test for dependent samples, and as compared to the t-test for
independent samples, it always produces "better" results.
Furthermore, if we want to study the effect of a new teaching method on

Statistics scores so we administer a statistics test to the group (pretest), then we
apply the experimental teaching method to the group of students, and follow this by
administering a statistics test to the students again (the posttest). We then see if
there is a significant difference between the pre-test scores and the post-test scores.
Assumptions. The theoretical assumptions of the t-test for independent

samples also apply to the dependent samples test; that is, the paired differences
should be normally distributed. If these assumptions are not met, then one of the
nonparametric alternative tests should be used.
Arrangement of Data. Technically, we can apply the t-test for dependent

samples to any two variables in our data set. However, applying this test will make
very little sense if the values of the two variables in the data set are not logically and
methodologically comparable.
107
For example, if you compare the average grade in a sample of students
before and after grading period, but using a different counting method or different
units in the second measurement, then a highly significant t-test value could be
obtained due to an artifact; that is, to the change of units of measurement. The
formula for the dependent t-test is:
Where:
D is the difference between pairs of scores,

D = X2 - X1
The degrees of freedom for the dependent-t test is

Df = n – 1
n is the number of pairs of subjects in the study.
Research Problem: The pre-test was administered to ten BSBA

undergoing management qualifying examination. After four weeks of review
sessions, the same test was administered again (post-test). Do the management
review sessions significantly increase the scores on the posttest?
The pre-test and post-test scores, as well as D and D2 are shown in the
following table.
Pre and Post-Test Scores for 10 BSBA Students on the Review Sessions
Pre-Test Post-Test D
D2
(X1) (X2) (X2-X1)
14 0 -14 196
6 0 -6 36
4 3 -1 1
15 20 5 25
3 0 -3 9
3 0 -3 9
6 1 -5 25
5 1 -4 16
6 1 -5 25
3 0 -3 9
-39 351
108
Solution:
t = -39
10(351) – ( -39 ) 2
10 -1
= -39
3510 – 1521
= -39
221
= -39
14.866
t = - 2.623
The degrees of freedom is:
df = n - 1
df = 10 – 1 = 9
109
Ho : μ1 = μ 2
H1 : μ 1 ≠ μ 2
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error.

t = -2.623
df = n - 1 = 10 - 1 = 9
Note: We have calculated the t-value and will also need to know the degrees of freedom when
we go to look up the critical value of t.
Reject H0 if t is <= -1.833
Note: To write the decision rule we need to know the critical value for t,
with an alpha level of .05 and a one-tailed test. We can do this by looking at
Appendix (Distribution of t). Look for the column of the table under .05 for the Level
of significance for one-tailed tests, read down the column until you are level with 9
in the df column, and you will find the critical value of t which is 1.833. That means
our result is significant if the calculated t value is less than or equal to -1.833.
Note: Since our calculated value of t (-2.623) is less than or equal to -1.833, we reject the null
hypothesis and accept the alternative hypothesis.
The management review sessions did significantly increase the post-test

scores for the students.
110
Z-Test
Most statistical tests have been designed to determine whether a treatment
(IV) affects a DV.
They can be used when the IV is a manipulated variable (true experiment)

and when the IV represents pre-existing groups, such as gender, etc. They all work
essentially like the z-test.
Formula: Z = (M – μ)/σM
Degree of freedom:
df = n – 1
Two roles for the z-test:
A substitute for z when some of the population info for z is unknown

A test for comparing two samples when no info about the population is
available
Single Sample Tests

To test statistical hypotheses involving a single score we calculated the
scores Z-score. We referred to this as the Z-score test. As a reminder the formula
for the Z-score (or the Z-score test) was:
Z=X - μ
σ
If we wish to compare a sample mean with the population mean when the
population standard deviation is not known we use the one-sample t-test.
Illustration:
Research Problem: We randomly select a group of 9 subjects from a

population with a mean IQ of 100 and a standard deviation of 15.
( X = 100 ; SD = 15 )
We give the subjects intensive "Get Smart" training and then administer an
IQ test. The sample mean IQ is 113 and the sample standard deviation is 10. Did the
training result in a significant increase in IQ score?
111
In this problem, we see that we have a single sample and we wish to
compare the sample mean with a population mean. We know what the population
standard deviation is. From what we have said we can see that the inferential
statistic we need to use here is the Z-test.
The formula for the Z-test is:
Z = X - μx
σx
Where:
X = The sample mean

μx = The sampling distribution of the mean.
The sampling distribution of the mean is the mean of a set of many

samples means taken from a population. It is the mean of all the means.
In practice, the sampling distribution of the mean is the same as the
population mean, so we can use μ instead of μx. In our problem, the population
mean is 100.
σx is the standard error of the mean. It is the standard deviation of
many sample means. Unfortunately for us the standard error of the mean does not
equal the population standard deviation but instead is equal to the population
standard deviation (sigma) divided by the square root of the sample size (n). So for
our problem:
σx = σ
√n
= 15
√9
= 15 = 5
3
We are now ready to calculate the value of Z for our problem. We have all
the information we need:
1. The sample mean = 113

2. The sampling distribution of the mean, which equals the population
mean = 100
3. The standard error of the mean, which is the population standard
deviation divided by the square root of the sample size = 5
So, the value of Z for our problem is:
112
Z = X - μx
σx
Z = 113 - 100 = 13 = 2.6

5 5
Four Steps in looking for the Z-Critical

Values
1. Determine Alpha. Take the alpha given in the problem. Then consider
whether the test is a one or two-tailed test. If it is a one-tailed test (< or > signs in the
hypotheses) then alpha remains unchanged. If it is a two-tailed test (no < or > signs
in the hypotheses) then you divide alpha by 2. (alpha/2).
2. Determine Probability. Use the following formula to calculate the

probability figure you will look up in the body of the Z-table: .5 - alpha from step
one.
3. Locate # in Table. Refer to the body of the Z-table (inside cover of the
book) and find the number that is closest to the number you calculated in step 2.
4. Identify Z-Critical Value. Starting from the number you located in step
3, follow the row that number is into the left (the first column), record the number
there (falls between 0.0 - 6.0), then starting again back at the number you located in
step 3, follow that column up to the top (first row) and record that number (falls
between 0.00 - 0.09). Combine these two numbers and you have your Z-critical
value.
113
F-Test (ANOVA)
The ANOVA is a statistical test that makes a single, overall decision as to
whether a significant difference is present among three or more sample means.
ANOVA is similar to a t-test. However, the ANOVA can also test multiple
groups to see if they differ on one or more variables. The ANOVA can be used to test
between-groups and within-groups differences.
One-Way ANOVA: This tests a group or groups to

determine if there are differences in a single set of scores.
For instance, a one-way ANOVA could determine whether freshmen,

sophomores, juniors, and seniors differed in their statistics performance
Formula:
F = MSb
MSw
Where:
MSb = Mean Square Between

MSw = Mean Square Within
MSb = SSB ( Sum of Squares Between )

dfB ( Degrees of Freedom Between)
MSw = SSW ( Sum of Squares Within )

dfW (Degrees of Freedom Within )
dfB = K - 1
dfW = Ntotal - K
Where:
K = The Number of Groups
Ntotal = The Total Number of Scores in All Groups
114
Steps
1. Calculate SSB, the sum of squares between groups, where X1 is a score
from Group 1, X2 is a score from Group 2, X3 is a score from Group 3, n1 is the
number of subjects in group 1, n2 is the number of subjects in group 2, n3 is the
number of subjects in group 3, XT is a score from any subject in the total group of
subjects, and NT is the total number of subjects in all groups.
SSb = (∑ X1)2 + (∑ X2)2 + (∑ X3)2 - (∑ XT)2

n1 n2 n3 nT
The degrees of freedom between groups is:
dfB = K - 1
Where: K is the number of groups.
2. Calculate SSW, the sum of squares within groups.
SSw = ∑ X12 - (∑ X1)2 + ∑ X22 - (∑ X2)2 + ∑ X32 - (∑ X3)2

n1 n2 n3
The degrees of freedom within groups is:
dfW = NT - K
Where: NT is the total number of scores in all groups
3. Calculate SST, the total sum of squares.
SST = SSB + SSW
4. Calculate MSB, the mean square between groups, MSW, the mean
square within groups, and F, the F ratio.
5. To test the significance of the F value we obtained, we need to compare

it with the critical F value with an alpha level, degrees of freedom between
groups (or degrees of freedom in the numerator of the F ratio), and degrees of
freedom within groups (or degrees of freedom in the denominator of the F ratio).
115
6. Present the results of the analysis in an analysis of variance table.
Summary Table of the Formulas
SS df MS F
Between SS(B) k-1 SS(B) MS(B)

----------- --------------
k-1 MS(W)
Within SS(W) N-k SS(W) .

-----------
N-k
Total SS(W) + SS(B) N-1
Steps for Looking an F-critical Value

1. Determine Alpha. Take alpha from the problem - use alpha/2 if it is a
two-tailed test - and this alpha will determine which F-table you will use. Each F-
table represents a single alpha value.
2. Determine Degrees of Freedom (df). There is two df in an F-test. (1)

Numerator df: take sample 1 (which will be the numerator in your F-test formula)
and use n1-1 to determine the numerator df. (2) Denominator df: take sample 2 (the
denominator in the f-test formula) and use n2-1 for the denominator df.
3. Locate both df in Table. The top row is where the numerator df is

labeled - find your num. df from step 3. The first column is where the denominator
df is labeled - find your den. df from step 3.
4. Find F-Critical Value. Using the column indicated by the numerator df

and the row indicated by the denominator df - follow both of these into the body of
the table, where they intersect is your F-critical value. If the exact df is not listed in
the table, always take the smaller df value from the table.
Illustration: Three groups of students, 5 in each group, were receiving

therapy for severe test anxiety. Group 1 received 5 hours of therapy, group 2 - 10
hours, and group 3 - 15 hours. At the end of therapy, each subject completed an
evaluation of test anxiety (the dependent variable in the study). Did the amount
of therapy affect the level of test anxiety?
The three groups of students received the following scores on the Test
Anxiety Index (TAI) at the end of treatment.
116
TAI Scores for Three Groups of Students
Group 1 - 5 hours Group 2 - 10 hours Group 3 - 15 hours
48 55 51
50 52 52
53 53 50
52 55 53
50 53 50
The following table contains the quantities we need to calculate the means
for the three groups, the sum of squares, and the degrees of freedom:
Worksheet for Test Anxiety Study

Group 1 - 5 hours Group 2 - 10 hours Group 3 - 15 hours
X1 (X1)2 X2 (X2)2 X3 (X3)2
48 2304 55 3025 51 2601
50 2500 52 2704 52 2704
53 2809 53 2809 50 2500
52 2704 55 3025 53 2809
50 2500 53 2809 50 2500
253 12817 268 14372 256 13114
XT = 777 XT2 = 40303
Mean of Each Group:
The mean for group 1 is 253/5 = 50.6
Are the differences between these three means significant? We

can use analysis of variance to answer that question. Since we only have one
independent variable, the amount of therapy, we will use a one-way analysis of
variance.
If we were concerned with the effect of two independent variables on the

dependent variable, then we would use a two-way analysis of variance which is not
yet presented in this book.
117
Steps
1. Calculate SSB, the sum of squares between groups, where X1 is a score
from Group 1, X2 is a score from Group 2, X3 is a score from Group 3, n1 is the
number of subjects in group 1, n2 is the number of subjects in group 2, n3 is the
number of subjects in group 3, XT is a score from any subject in the total group of
subjects, and NT is the total number of subjects in all groups. Where: X1= 253; X2=
268; X3= 256; and XT = 777
SSb = (∑ X1)2 + (∑ X2)2 + (∑ X3)2 - (∑ XT)2

n1 n2 n3 nT
SSB = (253)2 + (268)2 + (256)2 - (777)2

5 5 5 15
SSB = 64009 + 71824 + 65536 - 603729

5 5 5 15
= 12801.8 + 14364.8 + 13107.2 - 40248.6 = 25.2
The degrees of freedom between groups is:
dfB = K - 1 = 3 - 1 = 2
Where: K is the number of groups.
2. Calculate SSW, the sum of squares within groups.
SSw = ∑ X12 - (∑ X1)2 + ∑ X22 - (∑ X2)2 + ∑ X32 - (∑ X3)2

n1 n2 n3
= 12817 - (253) 2 + 14372 - (268)2 + 13114 - (256)2

5 5 5
= (12817 – 12801.8 ) + (14372 – 14364.8) + ( 13114 – 13107.2)
= 15.2 + 7.2 + 6.8
118
= 29.2
The degrees of freedom within groups is:
dfW = NT - K = 15 - 3 = 12
Where: NT is the total number of subjects.
3. Calculate SST, the total sum of squares.
SST = ∑ XT2 – ( ∑ XT) 2

NT
SST = 40303 – ( 777) 2

15
= 40303 - 603729
15
= 40303 - 40248.6
= 54.4
SST = SSB + SSW
54.4 = 25.2 + 29.2
4. Calculate MSB, the mean square between groups, MSW, the mean
square within groups, and F, the F ratio.
MSb = SSB ( Sum of Squares Between ) = 25.2 = 12.60

dfB ( Degrees of Freedom Between) 2
MSw = SSW ( Sum of Squares Within ) = 29.2 = 2.43

dfW (Degrees of Freedom Within ) 12
F = MSb = 12.60
MSw 2.4333
F = 5.178
119
5. To test the significance of the F value we obtained, we need to compare
it with the critical F value with an alpha level of .05, 2 degrees of freedom
between groups (or degrees of freedom in the numerator of the F ratio), and 12
degrees of freedom within groups (or degrees of freedom in the denominator of the F
ratio).
We can look up the critical value of F in Appendix Table (The 5 percent

(Lightface Type) and 1 percent (Boldface Type) points for the Distribution of F).
Look in the table under column 2 (2 degrees of freedom for the numerator)
and row 12 (12 degrees of freedom for the denominator) and read the non-boldfaced
entry (for .05 level) of 3.88 - this is the critical value for F.
One way of indicating this critical value of F at the .05 level, with 2
degrees of freedom between groups and 12 degrees of freedom within groups is
F.05(2,12) = 3.88
6. When using analysis of variance, it is a common practice to present the

results of the analysis in an analysis of variance table.
The Analysis of Variance Table for our problem would appear as

follows:
Analysis of Variance Table

Source of Sum of Degrees of Mean
F Ratio p
Variation Squares Freedom Square
Between Groups 25.20 2 12.60 5.178 <.05
Within Groups 29.20 12 2.43
Total 54.40 14
Application of Statistical Test

H0 : μ 1 = μ 2 = μ 3
H1 : not H0
Note: Our null hypothesis, for the F test, states that there are no differences among the three
means. The alternate hypothesis states that there are significant differences among some or all of the
individual means. An unequivocal way of stating this is not H0.
120
α = .05
Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

F(2,12) = 5.178
Note: We have indicated the value of F from our analysis of the variance
table. We have also indicated by (2,12) that there are 2 degrees of freedom between
groups, and 12 degrees of freedom within groups.
Reject H0 if F is >= 3.88
Note: To write the decision rule we had to know the critical value for F, with an alpha level of
.05, 2 degrees of freedom in the numerator (df between groups), and 12 degrees of freedom in the
denominator (df within groups). We can do this by looking at Appendix Table and noting the tabled value
for the .05 level in the column for 2 df and the row for 12 df.
Reject H0, p < .05
Note: Since our calculated value of F (5.178) is greater than 3.88, we reject
the null hypothesis and accept the alternative hypothesis.
There is a significant difference among the scores the three groups of

students received on the Test Anxiety Index.
In the problem above, we rejected the null hypothesis and found that
there is indeed a significant difference among the three cell means. We know that
Group 1 had the lowest mean (50.6), while group 3 had a higher mean (51.2) while
group 2 had the highest mean of all (53.6).
We would like to know which of these differences in means are significant.

We can analyze the significance of the difference between pairs of means in the
analysis of variance by the use of post hoc (after the fact) comparisons.
121
We only do these post hoc comparisons when there is a significant F ratio. It will
make no sense to look for differences with a post hoc test if no differences exist.
In the next process, we will make these comparisons by a method known as

the Scheffe Test.
Making Post Hoc Comparisons Among

Pairs of Means with the Scheffe Test
In an analysis of variance, if F is significant, we can use the Scheffe test to
see which specific cell mean differs from which other specific cells mean. To do this
we calculate an F ratio for the difference between the means of two cells and then
test the significance of this F value.
Consider the Following Steps:
1. Calculate F1 and 2 to see if there is a significant difference between

the means of groups 1 and 2.


The formulas for these tests and their application to the ANOVA
problem are:
F 1 and 2 = ( X1 - X2 ) 2
MWS 1 + 1
MSw n1 n2 (k–1)
F 1 and 2 = ( 50.6 – 53.6 )2
2.43 ( 1/5 + 1/5) (3-1)

2.44
= ( -3 ) 2 = 9 = 4.630
2.43 (.4)(2) 1.944
122
F 1 and 3 = ( X1 - X3 ) 2
1 + 1
MSw n1 n3 (k–1)
MWS
F 1 and 3 = ( 50.6 – 51.2 )2
2.43 ( 1/5 + 1/5) (3-1)
= ( -.06 ) 2 = 36 = 0.185
2.43 (.4)(2) 1.944
F 2 and 3 = ( X2 - X3 ) 2
1 + 1
MSw n2 n3 (k–1)
F 2 and 3 = ( 53.6 – 51.2 )2
2.43 ( 1/5 + 1/5) (3-1)
= ( 2.4 ) 2 = 5.76 = 2.963

2.43 (.4)(2) 1.944
Summary of Scheffe Test Results
Group One versus Group Two = 4.62
Group One versus Group Three = 0.18
Group Two versus Group Three = 2.96
We compare these values with the critical value for F.05(2,12) = 3.88 and
note that the only significant difference is between group one and group two (4.62 is
greater than 3.88)
123
Now that we know how to conduct an ad hoc analysis of the
significance of the differences between pairs of group means, we should
modify steps 3 and 6 of our hypothesis testing strategy to include the results of the
post hoc analysis.
The amended steps 3 and 6 are as follows:
Step 3: Calculate the value of the appropriate statistic. Also

indicate the degrees of freedom for the statistical test if necessary.
F(2,12) = 5.178, value of the F ratio
F.05(2,12) = 3.88, critical value of F
F12 = 4.630, Scheffe test value for comparing means 1 and 2
Step 6: Write a statement of results.
There is a significant difference among the scores the three

groups of students received on the Test Anxiety Index.
Group 1 (the five-hour therapy group) has a significantly lower score on the
TAI than does Group 2 (the ten-hour therapy group).
124
Worksheet # 29
Given the following data presented in the table, analyze if there a

significant difference between the assessment of the student-
respondents and personnel-respondents on the Registrar’s service in
school XXX.
Personnel (X1) Students(X2)

5.0 3.36
5.0 3.20
5.0 3.52
5.0 3.34
4.33 3.84
4.88 3.22
4.88 3.43
4.55 3.29
4.00 3.52
5.00 3.30
1) Computations: Compute and have all the information you need to complete the
six-step statistical inference process:
2. Application: Do the application of the six-step statistical inference process.
2.1) State the null hypothesis and the alternative hypothesis based on your
research question.
2.2) Set the alpha level.
2.3) Calculate the value of the appropriate statistic. Also indicate the degrees of
freedom for the statistical test if necessary.
2.4) Write the decision rule for rejecting the null hypothesis.
2.5) Write a summary statement based on the decision.
2.6) Write a statement of results.
125
Worksheet # 30
Two fourth-grade teachers in School want to know if the gender of the main
character in a story makes a difference in their male students' interest in the story.
Each class had 10 boys in the class. One group of boys read a story that had a female
as the main character, while the other group of boys read a story that had a male as
the main character. After reading the stories each boy completed an interest
inventory to indicate how interested he was in the story. Was there a significant
difference in the interest inventory scores for the two groups?
Interest Inventory Scores for Two Groups

Female as the main character Male as the main character
17 18
10 20
16 17
15 16
15 20
16 19
14 18
research question.
126
Worksheet # 31
A special program, developed to enhance children's self-concept, is

implemented with seven- and nine-year-old children in Sunny Days school. The
intervention lasts ten weeks and involves various activities in the class and at home.
All the children in the program are pretested and post-tested using the Purdue Self-
Concept Scale for Preschool Children (PSCS). The test is comprised of a series of 40
pairs of pictures, and scores can range from 0 to 40. The dependent t-test will be
used to test the hypothesis that students would improve significantly
on the posttest, compared with their pretest scores.
Pretest and Posttest Scores of Ten Children

Pretest Posttest
31 34
30 31
33 33
35 40
32 36
34 39
32 40
29 42
37 44
36 39
research question.
127
Worksheet # 32
College students were given a Science test. Based on the scores on this test,
two groups of matched subjects were formed. Students in group 1 were given the
lecture method of instruction, while students in group 2 were given a new,
laboratory method of instruction. At the end of the semester, the students were given
a Science achievement test. Analyze the results below to determine whether the
new reading instructional program leads to significantly higher
Science achievement scores for group 2.
Reading Achievement Scores for Two Groups of Children

Group 1 Group 2
Lecture Method Laboratory Method
79 85
83 92
93 89
86 88
56 64
65 71
71 70
66 69
50 63
research question.
128
Worksheet # 34
An industrial psychologist is interested in evaluating four different types of training

on worker productivity. Using a standard measure of productivity, the psychologist
measures the productivity of a set of workers who have been trained using each one
of the four procedures. Using the data below, determine whether there is a
significant difference between the training methods. Larger numbers on
the dependent variable indicate higher productivity.
Productivity Scores for Four Groups of Workers Trained by

Different Methods
Group 1 Group 2 Group 3 Group 4
On the Job Computer-Assisted Lecture Videotape
67 68 46 37
68 62 39 46
61 59 38 49
62 71 47 48
60 60 46 49
56 66 49 53
research question.
129
Worksheet # 35
A school guidance counselor investigates the influence of different

motivational devices on the academic achievement of students. The counselor
arranges for one group of students to receive immediate feedback upon the
completion of an English assignment. The second group of students receives
feedback at the end of the day, while a third group receives feedback at the end of the
week. Using the students' grades on a standardized English test, determine whether
there is a significant difference between the groups. If necessary, perform
Scheffe tests.
English Test Results for Groups of Students Receiving Various

Types of Feedback
Group 3
Group 1 Group 2
Week's End
Immediate Feedback Day's End Feedback
Feedback
49 40 36
40 37 32
41 42 31
46 39 39
42 45 40
50 39 39
research question.
130
Worksheet # 36
The Omnibus Personality Inventory was used to measure the intellectual
disposition of three groups of students. Group 1 was composed of student
government leaders, group 2 was composed of athletes, and group 3 was composed
of part-time students. Analyze the data below to determine whether there is a
significant difference between the three groups.
Omnibus Personality Inventory Scores for Three Groups of

Students
Group 1 Group 2 Group 3
3 5 7
8 2 3
3 7 8
1 3 6
6 8 9
2 10
9
3
research question.
131
Worksheet # 37
Given the following set of data presented in the table below, determine if
there a significant difference between the assessments of the students
on the Registrar’s services when grouped according to course.
ITC CS AS/EDUC. ENG BSBA

(N=12) (N=25) (N=49) (N=22) (N=40) TOTAL
3.58 3.00 3.73 3.31 3.17 16.79
3.91 2.84 3.13 3.13 3.00 16.01
4.25 3.00 3.72 3.72 2.90 17.59
4.16 3.26 3.22 3.22 2.85 16.71
4.08 3.82 3.90 3.90 3.42 19.12
2.83 3.12 3.35 3.40 3.40 16.10
3.16 3.12 3.70 3.70 3.45 17.13
4.08 2.80 2.96 2.96 3.63 16.43
3.25 3.12 3.67 3.67 3.90 17.61
3.33 2.90 3.39 3.39 3.50 16.51
36.63 30.99 34.77 34.40 33.22 170
research question.
132
References
Berger, Statistical Decision Theory and Bayesian Analysis, 2nd E., Springer-Verlag
Calmorin, Melchor. Statistics for Sciences and Education with Application to
Research. Rex Books Store, Phils., 1997.
Hogg and Craig, Introduction to Mathematical Statistics, 4th Ed., MacMillan
Del Rosario, Asuncion, C.M. Business Statistics. ISBN 971-91220-3-X, 2nd Printing 1999.
Dyer, Jean-Roger. Understanding and Evaluating Educational Research. Addison,
Wesley Publishing Company, Inc. Philippines, 1979.
Ewen, R.B. (1998). The workbook for introductory statistics for the behavioral
sciences. Orlando, FL: Harcourt Brace Jovanovich.
Gnanadesikan, Methods for Statistical Data Analysis of Multivariate
Observations, Wiley
Hartwig, F., Dearing, B.E. Exploratory data analysis. Newberry Park, CA: Sage
Publications, Inc. 1999.
Hayes, J. R., Young, R.E., Matchett, M.L., McCaffrey, M., Cochran, C., and Hajduk, T., eds.
Reading empirical research studies: The rhetoric of research. Hillsdale, NJ:
Lawrence Erlbaum Associates, 1992.
Hinkle, Dennis E., Wiersma, W., and Jurs, S.G. Applied statistics for the behavioral
sciences. Boston: Houghton, 1988.
Kleinbaum, David G., Kupper, L.L. and Muller K.E. Applied regression analysis, and
other multivariable methods 2nd ed. Boston: PWS-KENT Publishing Company.
Kolstoe, R.H. Introduction to statistics for the behavioral sciences. Homewood, ILL:
Dorsey, 1999.
Levin, J., and James, A.F. Elementary statistics in social research, 5th ed. New York:
HarperCollins, 1991.
Liebetrau, A.M. Measures of association. Newberry Park, CA: Sage Publications, Inc.,
1993.
Mendenhall, W. Introduction to probability and statistics, 4th ed. North Scltuate, MA:
Duxbury Press, 1995.
Mood, Graybill, and Boes, Introduction to the Theory of Statistics, 3rd Ed., McGraw Hill
Moore, David S. Statistics: Concepts and controversies, 2nd ed. New York: W. H.
Freeman and Company, 1999.
Punzalan, Twila, G. and Uriarte, Gabriel G. Statistics, A Simplified Approach. Rex
Books Store, Philippines, 1998.
Rao, Linear Statistical Inference and Its Applications, 2nd Ed., Wiley
Runyon, R.P., and Haber, A. Fundamentals of behavioral statistics, 3rd ed. Reading,
MA: Addison-Wesley Publishing Company, 1996.
Schoeninger, D.W. and Insko, C.A.. Introductory statistics for the behavioral
sciences. Boston: Allyn and Bacon, Inc., 1991.
Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall
Stevens, J. Applied multivariate statistics for the social sciences. Hillsdale, NJ:
Lawrence Erlbaum Associates, 1996.
Stockberger, D. W. Introductory Statistics: Concepts, models and applications,
1996
133
Appendix A
Critical T-Values Table
If your degree of freedom is not on this table, use the value for next
lower degree of freedom
Conf. Level 50% 80% 90% 95% 98% 99%
One Tail 0.250 0.100 0.050 0.025 0.010 0.005
Two Tail 0.500 0.200 0.100 0.050 0.020 0.010
df . . . . . .
1 1.000 3.078 6.314 12.706 31.821 63.657
2 0.816 1.886 2.920 4.303 6.965 9.925
3 0.765 1.638 2.353 3.182 4.541 5.841
4 0.741 1.533 2.132 2.776 3.747 4.604
5 0.727 1.476 2.015 2.571 3.365 4.032
6 0.718 1.440 1.943 2.447 3.143 3.707
7 0.711 1.415 1.895 2.365 2.998 3.499
8 0.706 1.397 1.860 2.306 2.896 3.355
9 0.703 1.383 1.833 2.262 2.821 3.250
10 0.700 1.372 1.812 2.228 2.764 3.169
11 0.697 1.363 1.796 2.201 2.718 3.106
12 0.695 1.356 1.782 2.179 2.681 3.055
13 0.694 1.350 1.771 2.160 2.650 3.012
14 0.692 1.345 1.761 2.145 2.624 2.977
15 0.691 1.341 1.753 2.131 2.602 2.947
16 0.690 1.337 1.746 2.120 2.583 2.921
134
17 0.689 1.333 1.740 2.110 2.567 2.898
18 0.688 1.330 1.734 2.101 2.552 2.878
19 0.688 1.328 1.729 2.093 2.539 2.861
20 0.687 1.325 1.725 2.086 2.528 2.845
21 0.686 1.323 1.721 2.080 2.518 2.831
22 0.686 1.321 1.717 2.074 2.508 2.819
23 0.685 1.319 1.714 2.069 2.500 2.807
24 0.685 1.318 1.711 2.064 2.492 2.797
25 0.684 1.316 1.708 2.060 2.485 2.787
26 0.684 1.315 1.706 2.056 2.479 2.779
27 0.684 1.314 1.703 2.052 2.473 2.771
28 0.683 1.313 1.701 2.048 2.467 2.763
29 0.683 1.311 1.699 2.045 2.462 2.756
30 0.683 1.310 1.697 2.042 2.457 2.750
40 0.681 1.303 1.684 2.021 2.423 2.704
50 0.679 1.299 1.676 2.009 2.403 2.678
60 0.679 1.296 1.671 2.000 2.390 2.660
70 0.678 1.294 1.667 1.994 2.381 2.648
80 0.678 1.292 1.664 1.990 2.374 2.639
90 0.677 1.291 1.662 1.987 2.368 2.632
100 0.677 1.290 1.660 1.984 2.364 2.626
z 0.674 1.282 1.645 1.960 2.326 2.576
135
Appendix B
Critical Z-Values Table
0.00 0.01 0.02 0.03 0.04 0.05
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989
136
Appendix C
Critical Chi-Square-Values Table
df\area .050 .025 .010
1 3.84146 5.02389 6.63490
2 5.99146 7.37776 9.21034
3 7.81473 9.34840 11.34487
4 9.48773 11.14329 13.27670
5 11.07050 12.83250 15.08627
6 12.59159 14.44938 16.81189
7 14.06714 16.01276 18.47531
8 15.50731 17.53455 20.09024
9 16.91898 19.02277 21.66599
10 18.30704 20.48318 23.20925
11 19.67514 21.92005 24.72497
12 21.02607 23.33666 26.21697
13 22.36203 24.73560 27.68825
14 23.68479 26.11895 29.14124
15 24.99579 27.48839 30.57791
16 26.29623 28.84535 31.99993
17 27.58711 30.19101 33.40866
18 28.86930 31.52638 34.80531
19 30.14353 32.85233 36.19087
20 31.41043 34.16961 37.56623
21 32.67057 35.47888 38.93217
22 33.92444 36.78071 40.28936
23 35.17246 38.07563 41.63840
24 36.41503 39.36408 42.97982
25 37.65248 40.64647 44.31410
26 38.88514 41.92317 45.64168
27 40.11327 43.19451 46.96294
28 41.33714 44.46079 48.27824
29 42.55697 45.72229 49.58788
30 43.77297 46.97924 50.89218
137
Appendix C
Critical F-Values Table (F Table for alpha=.01)
df2/df1 1 2 3 4 5 6 7 8 9
1 4052.18 4999.50 5403.352 5624.58 5763.65 5858.98 5928.35 5981.07 6022.47
2 98.503 99.000 99.166 99.249 99.299 99.333 99.356 99.374 99.388
3 34.116 30.817 29.457 28.710 28.237 27.911 27.672 27.489 27.345
4 21.198 18.000 16.694 15.977 15.522 15.207 14.976 14.799 14.659
5 16.258 13.274 12.060 11.392 10.967 10.672 10.456 10.289 10.158
6 13.745 10.925 9.780 9.148 8.746 8.466 8.260 8.102 7.976
7 12.246 9.547 8.451 7.847 7.460 7.191 6.993 6.840 6.719
8 11.259 8.649 7.591 7.006 6.632 6.371 6.178 6.029 5.911
9 10.561 8.022 6.992 6.422 6.057 5.802 5.613 5.467 5.351
10 10.044 7.559 6.552 5.994 5.636 5.386 5.200 5.057 4.942
11 9.646 7.206 6.217 5.668 5.316 5.069 4.886 4.744 4.632
12 9.330 6.927 5.953 5.412 5.064 4.821 4.640 4.499 4.388
13 9.074 6.701 5.739 5.205 4.862 4.620 4.441 4.302 4.191
14 8.862 6.515 5.564 5.035 4.695 4.456 4.278 4.140 4.030
15 8.683 6.359 5.417 4.893 4.556 4.318 4.142 4.004 3.895
16 8.531 6.226 5.292 4.773 4.437 4.202 4.026 3.890 3.780
17 8.400 6.112 5.185 4.669 4.336 4.102 3.927 3.791 3.682
18 8.285 6.013 5.092 4.579 4.248 4.015 3.841 3.705 3.597
19 8.185 5.926 5.010 4.500 4.171 3.939 3.765 3.631 3.523
20 8.096 5.849 4.938 4.431 4.103 3.871 3.699 3.564 3.457
21 8.017 5.780 4.874 4.369 4.042 3.812 3.640 3.506 3.398
22 7.945 5.719 4.817 4.313 3.988 3.758 3.587 3.453 3.346
23 7.881 5.664 4.765 4.264 3.939 3.710 3.539 3.406 3.299
24 7.823 5.614 4.718 4.218 3.895 3.667 3.496 3.363 3.256
25 7.770 5.568 4.675 4.177 3.855 3.627 3.457 3.324 3.217
26 7.721 5.526 4.637 4.140 3.818 3.591 3.421 3.288 3.182
27 7.677 5.488 4.601 4.106 3.785 3.558 3.388 3.256 3.149
138
28 7.636 5.453 4.568 4.074 3.754 3.528 3.358 3.226 3.120
29 7.598 5.420 4.538 4.045 3.725 3.499 3.330 3.198 3.092
30 7.562 5.390 4.510 4.018 3.699 3.473 3.304 3.173 3.067
40 7.314 5.179 4.313 3.828 3.514 3.291 3.124 2.993 2.888
60 7.077 4.977 4.126 3.649 3.339 3.119 2.953 2.823 2.718
120 6.851 4.787 3.949 3.480 3.174 2.956 2.792 2.663 2.559
inf 6.635 4.605 3.782 3.319 3.017 2.802 2.639 2.511 2.407

Cont….
df2/df1 10 12 15 20 24 30 40 60 120 INF
1 6055.84 6106.32 6157.28 6208.73 6234.63 6260.64 6286.78 6313.03 6339.39 6365.86
2 99.399 99.416 99.433 99.449 99.458 99.466 99.474 99.482 99.491 99.499
3 27.229 27.052 26.872 26.690 26.598 26.505 26.411 26.316 26.221 26.125
4 14.546 14.374 14.198 14.020 13.929 13.838 13.745 13.652 13.558 13.463
5 10.051 9.888 9.722 9.553 9.466 9.379 9.291 9.202 9.112 9.020
6 7.874 7.718 7.559 7.396 7.313 7.229 7.143 7.057 6.969 6.880
7 6.620 6.469 6.314 6.155 6.074 5.992 5.908 5.824 5.737 5.650
8 5.814 5.667 5.515 5.359 5.279 5.198 5.116 5.032 4.946 4.859
9 5.257 5.111 4.962 4.808 4.729 4.649 4.567 4.483 4.398 4.311
10 4.849 4.706 4.558 4.405 4.327 4.247 4.165 4.082 3.996 3.909
11 4.539 4.397 4.251 4.099 4.021 3.941 3.860 3.776 3.690 3.602
12 4.296 4.155 4.010 3.858 3.780 3.701 3.619 3.535 3.449 3.361
13 4.100 3.960 3.815 3.665 3.587 3.507 3.425 3.341 3.255 3.165
14 3.939 3.800 3.656 3.505 3.427 3.348 3.266 3.181 3.094 3.004
15 3.805 3.666 3.522 3.372 3.294 3.214 3.132 3.047 2.959 2.868
16 3.691 3.553 3.409 3.259 3.181 3.101 3.018 2.933 2.845 2.753
17 3.593 3.455 3.312 3.162 3.084 3.003 2.920 2.835 2.746 2.653
18 3.508 3.371 3.227 3.077 2.999 2.919 2.835 2.749 2.660 2.566
19 3.434 3.297 3.153 3.003 2.925 2.844 2.761 2.674 2.584 2.489
139
20 3.368 3.231 3.088 2.938 2.859 2.778 2.695 2.608 2.517 2.421
21 3.310 3.173 3.030 2.880 2.801 2.720 2.636 2.548 2.457 2.360
22 3.258 3.121 2.978 2.827 2.749 2.667 2.583 2.495 2.403 2.305
23 3.211 3.074 2.931 2.781 2.702 2.620 2.535 2.447 2.354 2.256
24 3.168 3.032 2.889 2.738 2.659 2.577 2.492 2.403 2.310 2.211
25 3.129 2.993 2.850 2.699 2.620 2.538 2.453 2.364 2.270 2.169
26 3.094 2.958 2.815 2.664 2.585 2.503 2.417 2.327 2.233 2.131
27 3.062 2.926 2.783 2.632 2.552 2.470 2.384 2.294 2.198 2.097
28 3.032 2.896 2.753 2.602 2.522 2.440 2.354 2.263 2.167 2.064
29 3.005 2.868 2.726 2.574 2.495 2.412 2.325 2.234 2.138 2.034
30 2.979 2.843 2.700 2.549 2.469 2.386 2.299 2.208 2.111 2.006
40 2.801 2.665 2.522 2.369 2.288 2.203 2.114 2.019 1.917 1.805
60 2.632 2.496 2.352 2.198 2.115 2.028 1.936 1.836 1.726 1.601
120 2.472 2.336 2.192 2.035 1.950 1.860 1.763 1.656 1.533 1.381
inf 2.321 2.185 2.039 1.878 1.791 1.696 1.592
140
Appendix D
df2/df1 1 2 3 4 5 6 7 8 9
1 161.447 199.500 215.707 224.583 230.161 233.986 236.768 238.882 240.543
2 18.5128 19.0000 19.1643 19.2468 19.2964 19.3295 19.3532 19.3710 19.3848
3 10.1280 9.5521 9.2766 9.1172 9.0135 8.9406 8.8867 8.8452 8.8123
4 7.7086 6.9443 6.5914 6.3882 6.2561 6.1631 6.0942 6.0410 5.9988
5 6.6079 5.7861 5.4095 5.1922 5.0503 4.9503 4.8759 4.8183 4.7725
6 5.9874 5.1433 4.7571 4.5337 4.3874 4.2839 4.2067 4.1468 4.0990
7 5.5914 4.7374 4.3468 4.1203 3.9715 3.8660 3.7870 3.7257 3.6767
8 5.3177 4.4590 4.0662 3.8379 3.6875 3.5806 3.5005 3.4381 3.3881
9 5.1174 4.2565 3.8625 3.6331 3.4817 3.3738 3.2927 3.2296 3.1789
10 4.9646 4.1028 3.7083 3.4780 3.3258 3.2172 3.1355 3.0717 3.0204
11 4.8443 3.9823 3.5874 3.3567 3.2039 3.0946 3.0123 2.9480 2.8962
12 4.7472 3.8853 3.4903 3.2592 3.1059 2.9961 2.9134 2.8486 2.7964
13 4.6672 3.8056 3.4105 3.1791 3.0254 2.9153 2.8321 2.7669 2.7144
14 4.6001 3.7389 3.3439 3.1122 2.9582 2.8477 2.7642 2.6987 2.6458
15 4.5431 3.6823 3.2874 3.0556 2.9013 2.7905 2.7066 2.6408 2.5876
16 4.4940 3.6337 3.2389 3.0069 2.8524 2.7413 2.6572 2.5911 2.5377
17 4.4513 3.5915 3.1968 2.9647 2.8100 2.6987 2.6143 2.5480 2.4943
18 4.4139 3.5546 3.1599 2.9277 2.7729 2.6613 2.5767 2.5102 2.4563
19 4.3807 3.5219 3.1274 2.8951 2.7401 2.6283 2.5435 2.4768 2.4227
20 4.3512 3.4928 3.0984 2.8661 2.7109 2.5990 2.5140 2.4471 2.3928
21 4.3248 3.4668 3.0725 2.8401 2.6848 2.5727 2.4876 2.4205 2.3660
22 4.3009 3.4434 3.0491 2.8167 2.6613 2.5491 2.4638 2.3965 2.3419
23 4.2793 3.4221 3.0280 2.7955 2.6400 2.5277 2.4422 2.3748 2.3201
24 4.2597 3.4028 3.0088 2.7763 2.6207 2.5082 2.4226 2.3551 2.3002
25 4.2417 3.3852 2.9912 2.7587 2.6030 2.4904 2.4047 2.3371 2.2821
26 4.2252 3.3690 2.9752 2.7426 2.5868 2.4741 2.3883 2.3205 2.2655
141
27 4.2100 3.3541 2.9604 2.7278 2.5719 2.4591 2.3732 2.3053 2.2501
28 4.1960 3.3404 2.9467 2.7141 2.5581 2.4453 2.3593 2.2913 2.2360
29 4.1830 3.3277 2.9340 2.7014 2.5454 2.4324 2.3463 2.2783 2.2229
30 4.1709 3.3158 2.9223 2.6896 2.5336 2.4205 2.3343 2.2662 2.2107
40 4.0847 3.2317 2.8387 2.6060 2.4495 2.3359 2.2490 2.1802 2.1240
60 4.0012 3.1504 2.7581 2.5252 2.3683 2.2541 2.1665 2.0970 2.0401
120 3.9201 3.0718 2.6802 2.4472 2.2899 2.1750 2.0868 2.0164 1.9588
inf 3.8415 2.9957 2.6049 2.3719 2.2141 2.0986 2.0096 1.9384 1.8799

Cont…..
df2/df1 10 12 15 20 24 30 40 60 120 INF
1 241.8817 243.9060 245.9499 248.0131 249.0518 250.0951 251.1432 252.1957 253.2529 254.3144
2 19.3959 19.4125 19.4291 19.4458 19.4541 19.4624 19.4707 19.4791 19.4874 19.4957
3 8.7855 8.7446 8.7029 8.6602 8.6385 8.6166 8.5944 8.5720 8.5494 8.5264
4 5.9644 5.9117 5.8578 5.8025 5.7744 5.7459 5.7170 5.6877 5.6581 5.6281
5 4.7351 4.6777 4.6188 4.5581 4.5272 4.4957 4.4638 4.4314 4.3985 4.3650
6 4.0600 3.9999 3.9381 3.8742 3.8415 3.8082 3.7743 3.7398 3.7047 3.6689
7 3.6365 3.5747 3.5107 3.4445 3.4105 3.3758 3.3404 3.3043 3.2674 3.2298
8 3.3472 3.2839 3.2184 3.1503 3.1152 3.0794 3.0428 3.0053 2.9669 2.9276
9 3.1373 3.0729 3.0061 2.9365 2.9005 2.8637 2.8259 2.7872 2.7475 2.7067
10 2.9782 2.9130 2.8450 2.7740 2.7372 2.6996 2.6609 2.6211 2.5801 2.5379
11 2.8536 2.7876 2.7186 2.6464 2.6090 2.5705 2.5309 2.4901 2.4480 2.4045
12 2.7534 2.6866 2.6169 2.5436 2.5055 2.4663 2.4259 2.3842 2.3410 2.2962
13 2.6710 2.6037 2.5331 2.4589 2.4202 2.3803 2.3392 2.2966 2.2524 2.2064
14 2.6022 2.5342 2.4630 2.3879 2.3487 2.3082 2.2664 2.2229 2.1778 2.1307
15 2.5437 2.4753 2.4034 2.3275 2.2878 2.2468 2.2043 2.1601 2.1141 2.0658
16 2.4935 2.4247 2.3522 2.2756 2.2354 2.1938 2.1507 2.1058 2.0589 2.0096
17 2.4499 2.3807 2.3077 2.2304 2.1898 2.1477 2.1040 2.0584 2.0107 1.9604
18 2.4117 2.3421 2.2686 2.1906 2.1497 2.1071 2.0629 2.0166 1.9681 1.9168
19 2.3779 2.3080 2.2341 2.1555 2.1141 2.0712 2.0264 1.9795 1.9302 1.8780
20 2.3479 2.2776 2.2033 2.1242 2.0825 2.0391 1.9938 1.9464 1.8963 1.8432
142
21 2.3210 2.2504 2.1757 2.0960 2.0540 2.0102 1.9645 1.9165 1.8657 1.8117
22 2.2967 2.2258 2.1508 2.0707 2.0283 1.9842 1.9380 1.8894 1.8380 1.7831
23 2.2747 2.2036 2.1282 2.0476 2.0050 1.9605 1.9139 1.8648 1.8128 1.7570
24 2.2547 2.1834 2.1077 2.0267 1.9838 1.9390 1.8920 1.8424 1.7896 1.7330
25 2.2365 2.1649 2.0889 2.0075 1.9643 1.9192 1.8718 1.8217 1.7684 1.7110
26 2.2197 2.1479 2.0716 1.9898 1.9464 1.9010 1.8533 1.8027 1.7488 1.6906
27 2.2043 2.1323 2.0558 1.9736 1.9299 1.8842 1.8361 1.7851 1.7306 1.6717
28 2.1900 2.1179 2.0411 1.9586 1.9147 1.8687 1.8203 1.7689 1.7138 1.6541
29 2.1768 2.1045 2.0275 1.9446 1.9005 1.8543 1.8055 1.7537 1.6981 1.6376
30 2.1646 2.0921 2.0148 1.9317 1.8874 1.8409 1.7918 1.7396 1.6835 1.6223
40 2.0772 2.0035 1.9245 1.8389 1.7929 1.7444 1.6928 1.6373 1.5766 1.5089
60 1.9926 1.9174 1.8364 1.7480 1.7001 1.6491 1.5943 1.5343 1.4673 1.3893
120 1.9105 1.8337 1.7505 1.6587 1.6084 1.5543 1.4952 1.4290 1.3519 1.2539
143

Eda PDF

Uploaded by

Copyright:

Available Formats

Eda PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eda PDF

Uploaded by

Copyright:

Available Formats

Lesson 1

Purposes of Employing Statistics

Divisions of Statistics and Levels of

Statistics are used to explore numerical data. Numerical data are

Nominal variables classify data. This process involves labeling categories

Qualitative and Quantitative Variables

B. Inferential statistics are used to draw conclusions and make

Independent Variables. In an experimental study, a variable that

Identification of Variables based on Hypothesis. A hypothesis can

Stated clearly, using appropriate terminology

Health education programs influence the number of people who smoke

In the above examples, "something" (diet, lecture attendance) affects

Note: Many variables can be either dependent or independent, within the

2. Probability. Beginning researchers most often use the word probability

3. Population. A population is a group, which is studied. In educational

4. Sampling. Sampling is performed so that a population under study can

2) Discuss the importance and uses of statistics in your field of specialization.

5.1) Statement of Hypothesis #1:

5.2) Statement of Hypothesis #2:

5.3) Statement of Hypothesis #3:

5.4) Statement of Hypothesis #4:

5.5) Statement of Hypothesis #5:

A. Tabular Presentation of Data

The following are the steps of constructing a frequency distribution:

1. Specify the number of class intervals. A class is a group (category) of

Class Interval..…Class Frequency........Relative Frequency

1. It shows how the observations cluster around a central value; and

2. It shows the degree of difference between observations.

For example, in the above problem, we know that no student is younger

Data Set - Scores Recorded for 30 Participants

To create a frequency distribution from this data we proceed as

Frequency Distribution Scores Recorded for 30 Participants: N = 30

Cumulative Frequency Distribution

In summary then, to create a cumulative frequency distribution:

1) Create a frequency distribution

Example: The cumulative frequency for 45 is 10 which is the cumulative

Grouped Frequency Distribution

Guidelines for Classes

1) There should be between 5 and 20 classes.

Creating a Grouped Frequency Distribution

1) Find the largest and smallest values

Data Set – Scores in the Major Examination (Statistics)

Grouped Frequency Distribution for Scores in the Major Examination

It is a simple matter to create a cumulative grouped frequency distribution.

Cumulative Grouped Frequency Distribution for Scores in the Major

B. Graphical Presentation of Data

• The activity of one thing through time.

Standard deviations may be displayed on bar graphs by using a deviation bar

Bar Graph Example

Steps in the Preparation of Bar Graph

• The parts of a whole in percentages.

Step 1. Taking the data to be charted, calculate the percentage contribution

Step 3. Draw in the segments by estimating how much larger or smaller

Pie Chart Example

b. Do not draw conclusions not justified by the data. For example,

d. To ensure that this does not happen, follow these guidelines:

• Scales must be in regular intervals

• Any set of figures that needs to be shown over time.

In the Table below is selected data for 20 college students.