Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Presentation Ed 52

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 145

Ed 51:

Social Science Statistics

PROF: RYAN T. CAPANGYARIHAN


 HISTORY & DEFINITION
 The word statistics means different things to different people. To a college student, statistics are scores
on all quizzes, seatwork, assignments and recitations made in his subject. To a biological researcher
investigating the effects of pollution to our environment, statistics are evidence of success of research efforts.
To a school president, statistics are information on faculty and employee salary, tardiness & absenteeism, and
increase or decrease in enrollment. To a manager of a food chain, statistics may be kind of food frequently
served to customers, and to the president of a country, statistics are the information to jobs created, housing
projects, increase or decrease in economic situation, etc.
 They are using statistics correctly, yet they use it in different ways and purposes.
 The word statistik comes from the Italian word statista which means “statesman”. The word was first used by
Gottfried Achenwall (1719-1772), a professor at Marlborough and Gottingen, while Dr. E.A.W. Zimmerman
introduced it in England. Its used was popularized by Sir John Sinclair in his work, Statistical Account of
Scotland (1791-1799), However, people had been recording and using data long before the 18 th century.
 Presently, Statistics is defined as the branch of scientific methodology which deals with the collection,
classification, description and interpretation of data obtained through survey or experiment.

STATISTICS IS A SCIENTIFIC BODY OF KNOWLEDGE THAT
DEALS WITH THE COLLECTION, ORGANIZATION OR
PRESENTATION, ANALYSIS AND INTERPRETATION OF DATA.

COLLECTION REFERS TO THE GATHERING OF INFORMATION OR


DATA.
ORGANIZATION OR PRESENTATION INVOLVES SUMMARIZING
DATA OR INFORMATION IN TEXTUAL, GRAPHICAL OR TABULAR
FORMS.
ANALYSIS INVOLVES DESCRIBING THE DATA BY USING
STATISTICAL METHODS AND PROCEDURES.
INTERPRETATION REFERS TO THE PROCESS OF MAKING
CONCLUSIONS BASED ON THE ANALYZED DATA.
Functions of Statistics
 To provide investigators means of measuring scientifically the conditions that may be involved
in a given problem and assessing the way in which they are related.
 To show the laws underlying facts and events that cannot be determined by individual
observations.
 To show relations of cause and effect that otherwise may remain unknown.
 To find the trends and behavior in related conditions which otherwise may remain ambiguous.

Importance of Statistics to Research


 It gives the most exact kind of description.
 It provides the most definite and exact procedures in analyzing data.
 It summarizes results in a meaningful and convenient form.
 It draws a general conclusion.
 It predicts possible outcomes under certain conditions.
ORIGIN AND DEVELOPMENT OF STATISTICS

The history of statistics can be traced back at least to the Biblical times in Ancient Egypt, Babylon and
Rome. As early as 3,500 years before the birth of Christ, statistics had been used in Egypt in the form of
recording the number of sheep or cattle owned, the amount of grain produced, and the number of people living
in a particular city. In 3800 BC., Babylonian government used statistics to measure the number of men under
the king’s rule and the vast territory that he occupied. It was his belief that the more men under his command
and the more lands he conquered, the more powerful his kingdom would become. In 700 B.C., Roman empires
used statistics by conducting registration to record population for the purpose of collecting taxes.

In modern times, statistical methods have been used to record and predict such things as birth and
death rates, employment and inflation rates, sports achievement, and other economic and social trends. Try
have even used to assess opinions from polls and unlock secret codes from the game of chance.

Modern Statistics is said to have begun with John Graunt (1620-1674), an English tradesman. Graunt
collected published records called “bills of mortality” that included information about the numbers and causes
of deaths in the city of London. Graunt analyzed more than fifty years of data and created the first mortality
table, a table that shows how long a person may be expected to live after reaching a certain age.
There were so many other great men who made important contribution to statistics. One of them was
Karl Friedrich Gauss (1777-1855), the brilliant German mathematician who used statistical methods in making
predictions about the positions of the planets in our solar system. Adolphe Quetelet (1796-1874), A Belgian
astronomer developed the idea of the “average man” from his studies of the Belgian census. He was also known
as the “Father of Modern Statistics”. Karl Pearson (1857-1936), an English mathematician made important
links between probability and statistics. In the 20th century, the British statistician Sir Ronald Aylmer Fisher
developed the F-tool in inferential statistics (derived after his name), this tool has been very useful in testing
improvements of production from agricultural experiments and improvement of precision of results from
medical, biological and industrial experimentation. The American George Gallup (1901-1984) was instrumental
in making statistical polling, a common tool in political campaigns.

In this age of information technology, a lot of computer programs such as Microstat, Soritec Sampler,
SPSS, and others are made available in diskette or websites that perform more than the manual calculations in
statistics. People working in some government agencies, in laboratories, in media, and in business generally use
these electronic devices to easily access data, improve graphics, and obtain ready-made analyses interpretations
about the data.
APPLICATION OF STATISTICS
In Education
Through statistical tool, a teacher can determine the effectiveness of a particular teaching method by analyzing test
scores obtained by their students. Results of this study may be used to improve teaching-learning activities.
In Business
A business firm collects and gathers data or information from its everyday operation. Statistics is used to
summarize and describe those data such as the amount of sales, expenditures, and production to enable the management to
understand and determine the status of the firm. Data that have been organized and analyzed provide the management a
baseline to make wise decision pertaining to the operation of the business.

In Psychology
Psychologists are able to interpret meaningful aptitude tests, IQ tests and other psychological tests using statistical
procedure or tools.
In Politics & Government
Public Opinion and election polls are commonly used to assess the opinions or preferences of the public for issues
or candidates of interest. Statistics plays an important role in conducting surveys or interviews for that purpose.
In Medicine
Statistics is also used in determining the effectiveness of new drug products in treating a particular type of disease. To
illustrate, a drug company wants to test the effectiveness of its new drug product in treating tuberculosis. An experiment or a
clinical trial is conducted. Ten tuberculosis patients are treated using the new drug product and another are treated using the
existing drug. The results are analyzed statistically to find out if the new product is more effective in treating tuberculosis.
In Agriculture
Through statistical tools, an agriculturist can determine the effectiveness of a new fertilizer in the growth of plants or
crops. Moreover, crop production and yield can be better analyzed through the use of statistical methods.
In Industry
The most favorite actresses and actors can be determined by using surveys. Ratings of the members of the board of
judges in a beauty contest are statistically analyzed. Interviews are used to determine the most widely viewed television show.
The top grosser movies for this year are reported based on statistical records of movie houses. All these activities involve the
use of statistics.
In everyday life
The number of cars passing through streets or a highway is recorded to enable traffic enforcers to manage efficiently.
Even the number of pedestrians crossing the street, the number of people entering a warehouse or a department store, and the
number of people engaged in video games involve the use of statistics. In short statistics is found and used in everyday life.
BRANCHES OF STATISTICS

Descriptive Statistics – is a statistical procedure concerned with describing the characteristics


and properties of a group of persons, places or things.

For example, we may describe a collection of persons by stating how many are poor and how
many are rich, how many are literate and how many are illiterate, how many fall into various
categories of age, height, civil status, IQ, and many more. We may also describe a particular
barangay in terms of the number of families it has, the number of grade-schoolers, the number
of professionals, the number of households with certain kinds of appliances, the number of
siblings in each household, or the rate of unemployment.

Generally, descriptive statistics involve gathering, organizing, presenting and describing data.
Inferential Statistics – is a statistical procedure that is used to draw inferences or
information about the properties or characteristics by a large group of people,
places, or things or the basis of the information obtained from a small portion of a
large group.

Suppose we want to know the most favorite brand of toothpaste of a certain


barangay and we do not have enough time and money to interview all the
residents of that barangay, we may just ask selected residents. With the data
obtained from the interviews, we shall draw or make conclusions as to barangay’s
favorite brand of toothpaste. This example involves the use of inferential
statistics.
TERMINOLOGIES IN STATISTICS
Some important terms are commonly used in the study of Statistics. These terms should be understood fully in
order to facilitate the study of statistics.
1. Population refers to a large collection of objects, places or things. To illustrate this, suppose a researcher wants
to determine the average income of the residents of a certain barangay and there are 1500 residents in the barangay.
Then all of these residents comprise the population. A population is usually denoted or represented by N. Hence,
this case, N = 1500.
2. Sample is a small portion or part of a population. It could also be define as a sub-group, subset, or representative
of a population. For instance, suppose the above-mentioned researcher does not have enough time and money to
conduct the study using the whole population and he wants to use only 200 residents. These 200 residents comprise
the sample. A sample is usually denoted by n, thus n = 200.
3. Parameter is any numerical or nominal characteristics of a population. It is a value or measurement obtained
from a population. It is usually referred to as the true or actual value. If in the preceding illustration, the researcher
uses the whole population (N=1500), then the average income obtained is called a parameter.
4. Statistic is an estimate of a parameter. It is a value or measurement obtained from the sample. If the researcher
in the preceding illustration makes use of the sample (n=200), then the average income obtained is called statistic
5. Data –(singular form is datum) are facts, or a set of information or observation under study. More
specifically, data are gathered by the researcher from a population or from a sample. Data may be classified into
two categories, qualitative or quantitative
Qualitative data are data which can assume values that manifest the concepts of attributes. These are sometimes
called categorical data. Data falling in this category cannot be subjected to meaningful arithmetic. They cannot
be added, subtracted or divided. Gender and nationality are qualitative data.

Gender is a qualitative dichotomous variable since an individual may take one of the two values “male or female”.
In an opinion poll, the response of an individual towards an issue whether to “go” for it, “against” it or
“undecided” is an example of qualitative trichotomous variable. Smoking habits of an individual in different
situations may be classified as “Always/Very Often”, “often”, “Seldom”, “Very Seldom”, or “Never”. This set of
qualitative values is called multinomous variable.

Quantitative Data are data which are numerical in nature. These are data obtained from counting or measuring.
In addition, meaningful arithmetic operations can be done with this type of data. Test scores and height are
quantitative data.
6. A Variable is a characteristic or property of a population or sample which makes the members different from
each other. If a class consists of boys and girls, then gender is a variable in this class. Height is also a variable
because different people have different heights. Variables may be classified on the basis of whether they are
discrete or continuous and whether they are dependent or independent.
Discrete Variable
A discrete variable is one that can assume a finite number of values. In other words, it can assume specific values
only. The values of a discrete variable are obtained through the process of counting. The number of students in a
class is a discrete variable. If there are 40 students in a class, it cannot reported that there are 40.2 students or 40.5
students, because it is impossible for a fractional part of a student to be in the class.
Continuous Variable
A continuous variable is one that can assume infinite values within a specified interval. The values of a continuous
variable are obtained through measuring. Height is a continuous variable. If one reports that the height of a
building is 15 m, it is also possible that another person reports that the height of the same building is 15.1m or
15.12m, depending on the precision of the measuring device used. In other words, height of the building can
assume several values.
Dependent Variable
A dependent variable is a variable which is affected or influenced
by another variable.
Independent Variable
An independent Variable is one in which affects or influences the
dependent variable. To illustrate Independent and dependent
variables, consider the problem entitled, The Effect of
Computer-Assisted Instruction on the Students’ Achievement
in Mathematics. Here the independent variable is the computer-
assisted instruction while the dependent variable is the
achievement of students in mathematics.
Constant refers to the fundamental quantities that do not change
in value, fixed costs and acceleration due to gravity are examples
of such.
SCALES OF MEASUREMENT

Nominal Scale- This is the most primitive level of measurement. The


nominal level of measurement used when we want to distinguish one
object from another for identification purposes. In this level, we can
only say that one object is different from another, but the amount of
difference between them cannot be determined. We cannot tell that
one is better or worse than the other. Gender, nationality and civil
status are of nominal scale.

Ordinal scale – in the ordinal level of measurement, data are arranged


in some specified order or rank. When objects are measured in this
level, we can say that one is better or greater than the other. But we
cannot tell how much more or how much less of the characteristic one
objects than the other. The ranking of contestants in a beauty contest,
or siblings in the family, or of honor students in the class are of
ordinal scale.
Interval Scale- If data are measured in the interval level, we
can say not only that one object is greater or less than another,
but we can also specify the amount of difference. The scores in
an examination are of interval scale of measurement. To
illustrate, suppose Kensly Kyle got 50 in a Math examination
while Kwenn Anne got 40. We can say the Kensly Kyle got
higher score than Kwenn Ann by 10 points.
Ratio Scale- The ratio level of measurement is like the interval
level. The only difference is that the ratio level always starts
from an absolute or true zero point. In addition, in the ratio
level, there is always the presence of units of measure. If data
are measured in this level, we can say that one object is so many
times as large or as small as the other. For example, suppose
Mrs. Reyes weight 50 kg, while her daughter weighs 25 kg. We
can say that Mrs. Reyes is twice heavy as her daughter. Thus,
weight is an example of data measured in the ratio.
DATA GATHERING TECHNIQUES

SOURCES OF DATA

There are two sources of obtaining data. One is called primary source from
which a first-hand information is obtained usually by means or personal
interview and actual observation. On the other hand, the secondary source of
information is taken from other’s works, news reports, readings, journals,
magazines, and those that are kept by the National Statistics Office, Securities
and Exchange Commission, Social Security System and other government and
private agencies.

Data are said to be an asset of a company if they are accurate, updated and
available when needed. Hence, any institution or business organization must
have a database called Management Information System where all information
about their business are made available in order to facilitate verification of
claims and to come up with wise management decision.

METHODS OF COLLECTING DATA: Its Advantages and Disadvantages

Direct or Interview Method – is a person-to-person interaction between a interviewer and an


interviewee. Tape recorded or written interview will help the researcher obtain exact information
from the interviewee.

Advantages: Precise and consistent answers can be obtained by modifying or rephrasing the
questions especially to illiterate or to children under study.
Disadvantages: It is time, money and effort consuming and it will be applicable only for small
population, except when conducting a census.

Indirect or Questionnaire Method- is an alternative method for the interview method. Written
responses are obtained by distributing questionnaires (a list of questions intended to elicit answers
to a given problem, must be given in a logical order and not too personal) to the respondents
through mail or hand-carry

Advantages: Lesser time, money, and efforts are consumed.


Disadvantages: Many responses may not b consistent due to the poor construction of the
questionnaire. The meaning of the questions may be different from each respondent. Inconsistent
responses can no longer be modified, thus, it reduces valid number of respondents.
Registration Method – is enforced by private organization or government
agencies for recording purposes.
Advantages: Organized data from an institution can serve as ready
references for future study or for personal claims of people’s record.
Disadvantages: Problem arises only when an agency doesn’t have a
Management Information system and if the system or process of
registration is not implemented well.

Observation Method – is a scientific method of investigation that makes


possible use of all senses to measure or obtain outcomes/responses from
the object of study
Advantages: Observation method is usually applied to respondents that
cannot be asked or need not speak especially when behaviors of
persons/culture of organization/performance outcomes of
employees/students are to be considered.
Disadvantages: Subjectivity of information sought cannot be avoided.
Experimentation- is used when the objective is to determine the
cause-and-effect of a certain phenomenon under some controlled
conditions.
Advantages: There is objectivity of information since a scientific
method of inquiry is used. An equal number of respondents with
relatively similar characteristics are being examined to obtain the
different effects of something applied to the experimental group.
Disadvantages: It’s too difficult to find respondents with almost similar
characteristics. The whole method must be repeated if the desired
outcome is not reached.

Data that are collected by these methods are usually referred to as raw
data. Responses out from taped interviews, answered questionnaires,
furnished registration forms, recorded observations, and results from
an experiment are considered raw data since they are not yet
organized and presented in a form ready for interpretation.
CLASSIFICATION OF VARIABLES AND DATA

vE
VARIABL

QUALITATIVE  Dependent
 Independent QUANTITATIVE

 Dichotomous *Discrete
 Trichotomous *Continuous
 Multinomous

DATA

SCALES OF
SOURCES PRESENTATION
MEASUREMENT
*Primary METHODS  Textual
*Nominal
* Secondary *Ordinal  Tabular
*Interview
*Questionnaire *Interval  Graphical/Chart
*Registration *Ratio -Line Graph
*Observation -Bar Graph
*Experimentation -Pie Graph
-Pictograph
-Map/Cartogram
-Scatter Point Diagram

N’S FORMULA IN DETERMINING THE SAMPLE SIZE


SLOVIN’S FORMULA IN DETERMINING THE SAMPLE SIZE

In research, we seldom use the entire population because of the cost and time involved. In fact,
most researchers do not use the population in their study. Instead, the sample which is small
representative of a population is used. The characteristics of the whole entire population are
described using the characteristics observed from the sample.

The sample size can be obtained by the formula

N where n – sample size


n = ---------------- N – population size
1 + Ne2 e – margin of error

Observe that there is a margin of error. When we use a sample, we do not get the actual value
but just an estimate of the parameter. Hence, there is an error associated when using the sample.

To illustrate, suppose we want to find out the average age of the students in Manila. However,
due to insufficient time, only the students in three particular schools were used to estimate the
average age. Obviously, the result is not the actual average age but just an estimate and thus,
there is really an error when we use the sample instead of the population.
Study the examples below in finding the sample size.
Example 1. A group of researcher will conduct a survey to find out the opinion of residents of a particular community regarding the oil price
hike. If there are 10,000 residents in the community and the researchers plan to use a sample using a 10% margin of error, what should the
sample size be?

Solution: N= 10,000, e= 10% or .10

N 10,000 10,000 10,000 10,000


n = ------------ , = ---------------------- , = --------------------, n = ----------, n = ---------, n = 99.01 = 99
1 + Ne2 1 + 10,000(.10)2 1 + 10,000(.01) 1+ 100 101

Hence, the researchers will just conduct the survey using 99 residents. A 10% margin or error means that the researcher is 90% confident that
the result obtained using the sample will closely approximate the result had he used the population.
Example 2. Suppose that in example 1, the researcher would like to use a 5% margin of error. What
should be the size of the sample?

Solution: N=10,000 e = 5% or .05

10,000 10,000 10,000 10,000


n = ------------------ , n = ----------------------, n= ------------, n = -----------, n = 384.62 or 385
1+ 10,000(.05)2 1 + 10,000(.0025) 1 + 25 26

Observe from examples 1 & 2 that as we reduce the margin of error, the sample size gets larger. Hence if
we want to have a more accurate result, we have use a larger sample.

3. A researcher plans to conduct a survey. If the population size is 18,000, find the sample size if the
desired margin of error is
a. 10% b 5% c. 1% d. 3%
SAMPLING TECHNIQUES
Sampling Technique- is a procedure used to determine the
individuals or members of a sample.

A – PROBABILITY OR RANDOM SAMPLING TECHNIQUE is


a sampling technique wherein each member or element of the
population has an equal chance of being selected as members of the
sample.
1.) Simple Random Sampling

a.) Lottery Method


Suppose Mrs. Cruz wants to send five students to attend a 2-day training or seminar in
basic computer programming. To avoid bias in selecting these five students from her
40 students, she can use the lottery sampling. This is done by assigning a number of
paper to each student and then writing these numbers on pieces of paper. Then, these
pieces of paper will be rolled or folded and placed in a box called lottery box. The
lottery box should be thoroughly shaken and then five pieces of paper will be picked or
drawn from the box. The students who were assigned to the numbers chosen will be
sent to the training. In this case, the selection of the students is done without bias.
Note that we can simply assign1 to the first student, 2 to the second student and so on.
•Sampling with the use of Table of Random Numbers
Below is a proportion of the table of roman Random Numbers
Let us illustrate how these random numbers are use to select the members of the
sample. Let us consider the preceding example wherein Mrs. Cruz wants to select 5
students from her 40 students. Again, we will assign a number to each student, say
from 1 to 40.
31871 60770 59235 41702

87134 32839 17850 37359

06728 16314 81076 42172

95646 67486 05167 07819

44085 87246 47378 98338


Since there are 40 students, we will use the two-digit number of the table of random
number when selecting the members of the sample. This is because the students have been
assigned with number 01, 02, 03,. . . up to 40. Looking at the first column of the table of
random numbers above, we see that the number formed by the first two-digit is 31, hence,
the student assigned to number 31 is chosen as a member of the sample. If we proceed
down the column, we see that the number formed is 87 which cannot be used because we
have only 40 members. In a similar manner, the third number is 06 so that the student
assigned to number 6 is chosen. Notice that the next two numbers from the table are 95 and
44, numbers we cannot use for the same reason as before. When we get to the bottom of the
column, we move up the column and merely shift one digit to the right for the next random
number. Thus, we will have 18 as our next number. Thus is one of the many alternatives.
We can have other ways of selecting the members of the sample until we complete the 5
students.
2.) Systematic Sampling
Let us use the example wherein Mrs. Cruz wants to select 5
students from her 40 students. First, we select a random starting point.
This is done by dividing the number of members in the population by the
number of the members in the sample. Hence, in our case we shall have i
= 8. The next step is to write the numbers 1, 2, 3, 4, 5, 6, 7, and 8 on
pieces of paper and draw one number by lottery. If we were able to get 5,
this means that we will select every 5th student in the population as
members of the sample. Therefore, the 5th, 10th, 15th, 20th, and 25th student
shall be the members of the sample. If, for instance, we were able to
obtain the number 6, then the members of the sample will be the 6 th, 12th,
18th, 24th and 30th students.
3.) Stratified Random Sampling
 There are some instances whereby the members of the population do not belong to
the same category, class, or group. To illustrate this, let us suppose that we want to
determine the average income of the families in a certain community or barangay. In a
typical barangay, different families belong to different income brackets. If we will draw or
select members of the sample using simple random sampling, there is a possibility or chance
that none of the families or a disproportionate number of the families from the low-income,
average income, or high-income group will be include in the sample. In this case, the result
of the study would turn out into biased. For example, if the sample comes only from the
high-income families, then we will conclude that the average income of the families living
in this barangay is high. This suggest that the sample that should be drawn from the
population should be proportionally drawn from each group or category – the high, the
average, and the low-income families.
To do this, we will use the stratified random sampling. The word stratified
comes from the root word strata which means group or categories (singular form is
stratum). When we use this method, we are actually dividing the elements of the
population into different categories or subpopulation and then the members of the
sample are drawn or selected proportionally from each subpopulation.
Example. Suppose a community consists of 5000 families belonging to different
income brackets. We will draw 200 families as our random sample using stratified
random sampling. Below are the subpopulations and corresponding number of
families belonging to each subpopulation or stratum.
Strata Number of Families
High-Income Families 1000
Average-Income Families 2500

Low-Income Families 1500


N=5000
Solution: the first step is to find the percentage of each stratum. This is done
by dividing the number of families in each stratum by the total of families. Then, we
multiply each percentage by desired number of families in the sample.
From the above table, we see that if we are going to draw 200 members from the
population of 5000, we should draw 40 families belonging to the high-income, 100
from the average, and 60 from the low-income groups. Observe that the number of
families drawn as sample in each stratum is proportional to the number of families
from the population.
Strata Number of Percentage Number of From the above table, we see that if we
Families Families in the are going to draw 200 members from
the population of 5000, we should draw
Sample
40 families belonging to the high-
income, 100 from the average, and 60
High 1000 1000/5000= 0.2 or 0.2x200= 40
from the low-income groups. Observe
20%
that the number of families drawn as
Average 2500 2500/5000=0.5 or 0.5x200=100
sample in each stratum is proportional
50%
to the number of families from the
Low 1500 1500/5000=0.3 or 0.3x 200=60 population.
30%
N=5000 n = 200
4.) Cluster Sampling
Cluster sampling is sampling wherein groups or clusters instead of individuals are randomly
chosen. Recall that in the simple random sampling we select members of the sample
individually. In cluster sampling, we will select or draw the members of the sample by group
and then we select a sample of elements from each cluster or group randomly. Cluster
sampling is sometimes called area sampling because this is usually applied when population is
large.

To illustrate the use of this sampling method, let’s suppose that we want to determine the
average income of the families in Manila. Let us assume there are 250 barangay in Manila.
We can draw a random sample of 20 barangays using simple random sampling, and then a
certain number of families from each of the 20 barangays may be chosen.
5.) Multi-Stage Sampling
Multi-stage sampling is a combination of several sampling techniques.
This method is usually used by the researchers who are interested in studying
a very large population, say the whole island of Luzon or even the
Philippines. This is done by starting the selection of the members of the
sample using cluster sampling and then dividing each number or group into
strata. Then, from each stratum individuals are drawn using simple random
sampling.
B. Non-Probability or Non- Random Sampling Techniques
The non-probability sampling is a sampling technique wherein members
of the sample are drawn from the population based on the judgment of the
researchers. The results of a study using this sampling technique are relatively
biased. This technique lacks objectivity of selection; hence, it is sometimes
called subjective sampling. Inferences made based on the sample obtained using
this technique is not so reliable.
Non-probability sampling techniques are used because they are
convenient and economical. Researchers use these methods because they are
inexpensive and easy to conduct.
1.) Convenience Sampling
As the name implies, convenience sampling is used because of
the convenience it offers to the researcher. For example, a researcher
who wishes to investigate the most popular noontime show may just
interview the respondents through the telephone. The result of this
interview will be biased because the opinions of those without
telephone will not be included. Although convenience sampling may be
used occasionally, we cannot depend on it in making inferences about a
population.
2.) Quota Sampling
In this type of sampling, the proportions of the various subgroups in the
population are determined and the sample is drawn to have the same percentage
in it. This is very similar to the stratified random sampling the only difference is
that the selection of the members of the sample using quota sampling is not done
randomly. To illustrate this, let us suppose that we want to determine the
teenagers’ most favorite brand of T-shirt. If there are 1000 female and 1000 male
teenagers in the population and we want to draw 150 members for our sample,
we can select 75 female and 75 male teenagers from the population without
using randomization. This is quota sampling.
3.) Judgment or Purpose Sampling
Another method of drawing the members of the sample using
non-probability is by using purposive sampling. Let us suppose that the
target is to find out the effectiveness of a certain kind of shampoo. Of
course, bald fellows will not be the sample.
4.) Incidental Sampling
This design is applied to those samples which are taken
because they are the most available. The investigator simply takes
the nearest individuals as subjects of the study until it reaches the
desired size. In an interview, for instance, an interviewer can
simply choose to ask those people around him or in a coffee shop
where he is taking a break.
Exercise1
•A researcher would like to investigate the perception of the students in Mathematics. He
divided the population into sub-population as shown below. Use stratified random
sampling if the sample to be drawn consists of 500 students.

Strata Number of Students

First Year 2000

Second Year 1000

Third Year 1500

Fourth Year 2000


2.) A TV journalist would like to know the most favorite noontime show
for this month. He decided to conduct a survey on 5 barangays. The table
below shows the list of barangay and the number of residents in each
barangay. Use stratified random sampling to draw 10000 residents who
will be included in the survey.
Barangay Number of Residents

Maayapa 2500

Maganda 1000

Makisig 1500

Malinis 3000

Mahangin 2000
ORGANIZATION & PRESENTATION OF DATA

Ungrouped data are data that are not either organized, or if arranged,
could only be from highest to lowest or lowest to highest.

Grouped data- are data that are organized and arranged into different
classes or categories.
Forms of Presentation of Data
Textual- this form of presentation combines text and numerical facts in a statistical
report.

Example1. Below are the test scores of 50 students in Calculus:


25 30 18 17 50 12 43 35 40 9
33 37 41 21 20 31 35 46 10 36
28 19 18 13 28 16 42 27 28 31
40 48 40 39 32 32 26 13 3 50
26 15 14 10 38 35 34 29 30 20
Arranging the scores from the lowest to highest will facilitate the enumeration of important
characteristics of the data. The test scores of the 50 students in Calculus arranged from lowest
to highest are shown below:

3 13 17 20 27 30 32 35 40 43
9 13 18 21 28 30 33 36 40 46
10 14 18 25 28 31 34 37 40 48
10 15 19 26 28 31 35 38 41 50
12 16 20 26 29 32 35 39 42 50

The highest scores obtained is 50 and the lowest is 3. Ten students got a score of 40 and above,
while only 4 got ten and below. Generally, the students performed well in the test with 33
students or 66% getting a score of 25 and above.
•Stem – and – leaf plot which sorts data according to a certain pattern. It involves
separating a number into two parts. In a two-digit number, the stem consists of the
first digit, and the leaf consists of the second digit. While in the three digit number,
the stem consists of the first two digits, and the leaf consists of the last digit. In a
one-digit number, the stem is zero.
Table 1.1
Stem-and-leaf Plot of an arranged Test Scores in Calculus of 50 Students
Stem Leaves

0 3,9
1 0,0,2,3,3,4,5,6,7,8,8,9
2 0,0,1,5,6,6,7,8,8,8,9
3 0,0,1,1,2,2,3,4,5,5,5,6,7,8,9
4 0,0,0,1,2,3,6,8
5 0,0
By looking at the stem-and –leaf plot, we can easily
rank the data or put them in order. Thus, the ten
lowest scores are 3,9,10,10,12,13,13,14,15 and 16
while the ten highest scores are
40,40,40,41,42,43,46,48,50 and 50.
Tabular- this form of presentation is better than textual form
because it provides numerical facts in a more concise and
systematic manner. Statistical tables are constructed to facilitate
the analysis of relationships. Each class/subclass is assigned to a
particular row or column and figures for various classifications are
noted in appropriate cells.
 Advantages of Tabular Presentation
 It is brief, it reduces the matter to the minimum.
 It provides the reader a good grasp of the meaning of the quantitative relationship indicated in the report.
 It tells the whole story without the necessity of mixing textual matter with figures.
 The systematic arrangement of columns and rows makes them easily read and readily understood.
 The column and rows make comparison easier.
 The table has the following parts:
 Table number: This is for easy reference to the table.
 Table Title: It briefly explains the content of the table.
 Column header: It describes the data in each column.
 Row classifier: It shows the classes and categories.
 Body: This is the main part of the table.
 Source note: This is placed below the table when the data written are not original.
 FREQUENCY DISTRIBUTION
A frequency distribution table –is a table which shows the data arranged into different classes and the number of
cases which fall into each class.

Parts of Frequency Table


CLASS LIMITS/CLASS INTERVAL – grouping or categories defined by lower and upper limits
Example: 16-20
21-25
26-30
CLASS SIZE – width of each class interval
Lower limit upper limit
L.L. U.L.
16 - 20

} class size = 5
21 - 25
3. CLASS BOUNDARIES or REAL or EXACT CLASS LIMITS are the
numbers used to separate class but without gaps created by class limits. The
number to be added or subtracted is half the difference between the upper limit of
one class and the lower limit of the preceding class.

Example
Class interval Class boundaries
L.L – U.L L.C.B – U.C.B
16 – 20 15.5 - 20.5
21 – 25 20.5 - 25.5
26 - 30 25.5 - 30.5
4. CLASS MARKS are the midpoints of the classes. They can be formed by
adding and lower and upper limits and then divide by 2.
Example:
Class interval class mark/midpoint (X)
16-20 18
21-25 23
26-30 28
STEPS IN CONSTRUCTING A FREQUENCY DITRIBUTION TABLE
1. Decide on the number of class intervals. There should not be too many to avoid many empty classes, and there
should not be few to avoid long details. Use the formula suggested by Sturge.
k = 1 + 3.3 log N
2. Compute the range. The range R, is defined as the difference between the highest score and the lowest score.
3. Divide the range R by the number of class intervals (k) to obtain the size of the class interva i= R/ k
or c = R / k
4. Starting from the larger integer less than or equal to the minimum score, construct class intervals of size c until
the maximum score is reached.
5. Set up the class boundaries.
6. Tally the scores in appropriate classes and then add tallies for each class in order to obtain the frequency.
7. Solve the class mark or midpoint of each class. This is obtained by adding the lowest classlimit and the upper
class limit, then divide by 2.
 Example 1. The following are the entrance examination scores of 60 students.
19 31 36 26 34 32
44 33 37 39 45 21
24 38 40 42 39 32
43 18 24 32 49 33
33 33 40 24 46 22
29 33 37 30 43 43
26 39 57 30 40 33
25 33 48 39 34 29
29 37 39 35 41 29
23 32 48 28 45 19
Range = 57 – 18 = 39

K = 1 + 3.3 log N c = Range / k


= 1 + 3.3 log 60 = 39 / 7
= 1 + 3.3 (1.77815) = 5.57
= 6.867 ≈6
≈7
Table 1.2
Grouped Frequency Distribution for the Entrance Examination Scores of 60 Students
Class Limits Class Boundaries Tally Frequency Class mark/midpoint
18-23 17.5 - 23.5 1111-1 6 20.5
24-29 23.5 - 29.5 1111-1111-1 11 26.5
30-35 29.5 - 35.5 1111-1111-1111-11 17 32.5
36-41 35.5 - 41.5 1111-1111-1111 14 38.5
42-47 41.5 - 47.5 1111-111 8 44.5
48-53 47.5 - 53.5 111 3 50.5
54-59 53.5 - 59.5 1 1 56.5
N = 60
Quiz # 3 3 13 17 20 27 30 32 35 40 43
9 13 18 21 28 30 33 36 40 46
10 14 18 25 28 31 34 37 40 48
10 15 19 26 28 31 35 38 41 50
12 16 20 26 29 32 35 39 42 50
Construct a frequency distribution table using the suggested steps .
(50 Points)
Example 2. The following are the entrance examination scores of 60 students
19 31 36 26 34 32
44 33 37 39 45 21
24 38 40 42 39 32
43 18 24 32 49 33
33 33 40 24 46 22
29 33 37 30 43 43
26 39 57 30 40 33
25 33 48 39 34 29
29 37 39 35 41 29
23 32 48 28 45 19
Construct a frequency distribution table of 13 classes. In this problem, c = 39/13, c= 3
Table 1.3
Grouped Frequency Distribution for the Entrance Examination Scores of 60 Students
Class Limits Class Boundaries Tally Frequency Classmark
18-20 17.5 - 20.5 111 3 19 Notice that the class
21-23 20.5 - 23.5 111 3 22 interval 54-56 is
24-26 23.5 - 26.5 1111-1 6 25 already the 13th
27-29 26.5 - 29.5 1111 5 28 class interval, but
30-32 29.5 - 32.5 1111-11 7 31 the highest value
33-35 32.5 - 35.5 1111-1111 10 34
which is 57 has not
36-38 35.5 - 38.5 1111 5 37
been reached. Yet
39-41 38.5 - 41.5 1111-1111 9 40
addition one more
42-44 41.5 - 44.5 1111 5 43
45-47 44.5 - 47.5 111 3 46 class is allowed in
48-50 47.5 - 50.5 111 3 49 cases like this to
51-53 50.5 - 53.5 0 52 accommodate all
54-56 53.5 - 56.5 0 55 the values in the set.
57-59 56.5 - 59.5 1 1 58
N = 60
Cumulative Frequency Distribution
The “less than” cumulative frequency distribution (<cf) is obtained by adding
successively from the lowest to the highest interval while the “greater than” cumulative
frequency distribution (>cf) is obtained by adding frequencies from the highest class
interval to the lowest class interval.
Class limits frequency <cf >cf
18-23 6 6 60
24-29 11 17 54
30-35 17 34 43
36-41 14 48 26
42-47 8 56 12
48-53 3 59 4
54-59 1 60 1
Class limits frequency <cf >cf
2-4 3 3
60
5-7 9 12 57
8-10 14 26 48
11-13 18 44 34
14-16 10 54 16
17-19 6 60 6
N=60
Relative Frequency Distribution
The relative frequency distribution of a class is the frequency divided by the total frequency of all classes and is
generally expressed as a percentage.

Frequency of each class interval


Relative Frequency= -----------------------------------------
Total number of observation

Class limits frequency Relative Frequency rf (%)


18-23 6 0.10
10
24-29 11 0.183 18.3
30-35 17 0.283 28.3
36-41 14 0.233 23.3
42-47 8 0.133 13.3
48-53 3 0.05
5
Example 2.
Class limits frequency Relative Frequency rf(%)
2-4 3 0.05
10
5-7 9 0.15
15
8-10 14 0.233
23.3
11-13 18 0.30
30
14-16 10 0.167 16.7
17-19 6 0.10
10
N=60
Graphical Presentation – this forms is the most effective means of organizing and presenting statistical data because
the important relationships are brought out more clearly and creatively in virtually solid and colorful figures.

GRAPHICAL REPRESENTATION OF THE FREQUENCY DISTRIBUTION


The following graphs can be constructed to represent a frequency distribution:
Histogram is a graph represented by vertical or horizontal rectangles whose bases are the class marks and whose
heights are the frequencies.
Frequency polygon is a line graph whose bases are the class marks and whose heights are the frequencies.
Ogives is obtained by plotting the cumulative frequency by connecting points of intersection between the class
boundaries versus cumulative frequencies “less than” or “greater than”
-it is the graph of a cumulative frequency distribution.
A pie chart is a circle graph showing the proportion of each class through either the relative or percentage
frequency.
A bar chart is a graph represented by either vertical or horizontal rectangles whose bases represent the class
intervals and whose heights represent the frequencies.
Quiz # 3:
A group of second year Business Administration students took the qualifying examination for the
admittance to the course Bachelor of Science in Accounting. The results are as follows:

64 50 59 73 88 75 78 52 75 53
54 95 68 58 71 57 76 37 49 90
50 49 90 87 42 84 68 56 70 60
44 61 71 66 74 71 39 34 68 65
60 75 74 85 65 46 64 77 77 78
Prepare a frequency distribution using 7 classes starting with 34.
Include the columns of % relative frequency, less than cumulative frequency and greater than
cumulative frequency.
A. The data below is the frequency distribution of the Intelligence Quotient (IQ) of 200 students
taking up Behavioral Psychology and Business Statistics on a certain university.

Classes frequency (f)


131-135 7
126-130 10
121-125 14
116-120 35
111-115 42
106-110 21
101-105 18
96-100 29
91-95 10
86-90 14
N=200

Find the following:


1. Class interval size
2. Lower class boundary of the highest class interval.
3. Number of students whose IQ is greater than 115.5
4. Number of students whose IQ is less than 96
5. Lowest upper class limit
6. Upper class boundary of the 3rd class interval
7. Upper class limit of the 5th class interval
8. Number of students whose IQ is between 126-130
9. Highest upper class limit
10. Class mark of the 4th class of the highest frequency
Measures of Central
Tendency
AUGUST 14, 2021, SATURDAY
Measures of Central Tendency

 Measures of central tendency are numerical


descriptive measures which indicate or locate the
center of the distribution or data set.
 In layman’s term, a measure of central tendency is
the average
MEAN
The mean of the set of values or measurements is the sum of all the
measurements divided by the number of measurements in the set.
SAMPLE MEAN FOR UNGROUPED DATA

where: x = mean
∑x = sum of the measurements or
values
n = number of measurements
Example: Below are the travel time in minutes spent by
Kenneth in going to school last week

DAY TIME SPENT IN TRAVELLING


Monday 60 minutes
Tuesday 45 minutes
Wednesday 50 minutes
Thursday 53 minutes
Friday 47 minutes

Σ 𝑥 60+ 45+50+ 53+ 47


𝑥= = =53 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
𝑛 5
Weighted Average Mean
SUBJECT UNIT GRADE
 Math 1 80
Example: To the right are
English 1 82
Dona’s Subjects and the
Filipino 1 83
corresponding number of units
Science 2 81
and grades she got for the first
Social Studies 1 80
grading period. Compute her
PEHM 1.5 85
grade point average (GPA).
Technology & HE 2 82

Σ 𝑥𝑊 80 (1 )+ 82 ( 1 )+ 83 ( 1 ) +81 ( 2 ) + 80 (1 )+ 85 ( 1.5 ) + 82(2) 778.5


𝑥= = = =81.95
Σ𝑊 1+1+1+ 2+1+1.5+ 2 9.5
Therefore, Dona has the GPA of 81.95 for the first grading period.
MEAN FOR GROUPED DATA

To compute the mean for grouped data, we can use


two formulas, namely:
1. The class mark formula and
2. The coded formula
THE CLASSMARK FORMULA
X= where,
x =measurement or score
f=frequency
X= classmark
N = total frequency
Example: Below is the frequency distribution
of the scores of 40 students in Mathematics.

Class Interval F X fX
16-23 1 19.5 19.5
24-31 3 27.5 82.5
32-39 6 35.5 213
40-47 12 43.5 522
48-55 10 51.5 515
56-63 8 59.5 476
N=40 ΣfX= 1828
Solution
Coded Formula
X = Xam + (Σfd/n)i
where:
Xam = assumed mean
f=frequency
d=coded deviation
N=total frequency
i=class size
Example:
Class F d fd
Interval
16-23 1 -3 -3
24-31 3 -2 -6
32-39 6 -1 -6
40-47 12 0 0
48-55 10 1 10
56-63 8 2 16
N=40 Σfd= 11
Solution:

X = Xam + (Σfd/n)i
= 43.5+ (11/40) 8
= 43.5 + 2.2
= 45.7
CHARACTERISTICS OF THE MEAN
1.) The mean is the most appropriate measure of central tendency
when the data are in the interval or ratio scale.
2.) The mean lies between the largest and the smallest values or
measurements.
3.) There is only one value for the mean for a given set of values or
measurements.
4.) The mean is easily influenced by extreme values because all values
contribute to the average. If there are high values, the mean tends to
be high also. If there are extremely low values, the mean tends to be
low also.
Seatwork:
The following are the test scores obtained by III-1 students in
Statistics. Compute the mean using the:
1. Classmark formula and b. Coded formula. What is the
average score obtained by the students?

Class Interval f
20-24 4
25-29 6
30-34 7
35-39 10
40-44 5
45-49 8
MEDIAN
Median is the middle value of a given set
of measurements, provided that the values or
measurements are arranged in an array. An
array is an arrangement of values in
increasing or decreasing order.
MEDIAN FOR UNGROUPED DATA

Example 1. The following are the ages of the


Mathematics teachers in Pontevedra North Elementary
School: 21, 23, 32, 28, 25, 50, 48. Compute the median.

Arrange the data in an array. 21, 23, 25, 28, 32, 48, 50.
The median is 28.
In an English test, eight students obtained the
following scores: 10, 15, 12, 18, 16, 20, 12, 14. Find
the median.

Arrange the scores in an array, that is,


10, 12, 12, 14, 15, 16, 18, 20
The median is (14+15)/2= 14.5
MEDIAN FOR GROUPED DATA

For grouped data, we have the following formula in finding the median:
Median = l + (n/2 - <cf) i
f
where l = lower class boundary of the median class
n = total frequency
<cf = less than cumulative frequency above the median class
i = size of the class interval
f = frequency of the median class
Let us illustrate how to compute the median of grouped data using the distribution of the test
scores of 40 students in Mathematics given in the example of the preceding section.
Example:

Class Interval f <cf


16-23 1 1
24-31 3 4
32-39 6 10
40-47 12 22
48-55 10 32
56-62 8 40
n=40
Median = lb + ( n/2 - <cf) i
f

= 39.5 + (20 – 10) 8


12
= 46.17
CHARACTERISTICS OF THE MEDIAN
1.) The median is the most appropriate measure of central tendency for
interval data.
2.) The median lies between the highest and lowest measurements.
3.) There is only one value for the median in a given set of
measurements.
4.) The median is not influenced by extreme values.
5.) The median is used when the middle value is desired. It is the value
where 50% or half of the distribution lies above it and 50% lies below
it.
MODE
Mode is the value which occurs most frequently in a
set of measurement or values.

MODE FOR UNGROUPED DATA


The mode for ungrouped data is fairly easy to find. It
is just the value or measurement which occurs the most
number of times. In other words, it is the most popular
value.
A distribution may have only one mode. In this case,
the distribution is said to be unimodal. Data that have
two
values for the mode are said to be bimodal. It is also
possible that the set of data is multimodal if there are
more than two values for the mode. If all the scores in a
set of data occur only once, then the set of data has no
mode.
Example 1: The data on the number of times 10 mothers go to
market every week are shown below.

Mother A B C D E F G H I J
No. of times mother 2 1 3 3 1 3 2 3 3 2
goes to market

Find the mode.

Solution: The mode is 3. This means that the majority of the


mothers go to market three times a week.
MODE FOR GROUPED DATA
For grouped data, we have the formula to find the mode:

Where lm = lower class boundary of the modal class


fm = frequency of the modal class
fa = frequency below the modal class
fb = frequency above the modal class
i = size of the class interval
Below are the steps in finding the mode of grouped data:
1.)Find the modal class. This is the class interval with the highest frequency.
2.)Use the formula to find the mode.
It is important to note that the formula for the mode given above holds only
for unimodal distribution. For multimodal distribution, the rough mode is
given by the formula
Mode = 3(Median) – 2(Mean)
Let us use the distribution of scores of 40 students in Mathematics to
illustrate how to compute the mode for grouped data.
Example: Find the mode of the data whose frequency
distribution is given below.
Class Interval f
16-23 1
24-31 3
32-39 6
40-47 12
48-55 10
56-63 8
n=40
Notice that the class intervals are arranged from lowest to highest group. The modal
class is the class interval 40-47. The lower class boundary of the modal class is 39.5,
the frequency of the modal class is 12, the frequency below the modal class is 6, the
frequency above the modal class is 10, and the size of the class interval is 8.
Substituting these values in the formula, we have

= 39.5 + ( 12 - 6 )8
2(12)-10-6

= 45.5
In case a distribution has at least 2 modes, a rough
mode can be computed as follows:
Moderough = 3 Mdn – 2Mn

where Mdn = median


Mn = mean
CHARACTERISTICS OF THE MODE
1.) The mode is the most appropriate measure of central tendency
when the data are nominal in scale.
2.) The mode is the least reliable among the three measures of
central tendency because its value is undefined in some distributions.
3.) The mode is used when we want to find the value which occurs
most often.
4.) The mode is a quick approximation of the average. The mode is
sometimes referred to as an inspection average.

SEATWORK:
The following data give the time (in minutes) taken to commute
from home to school for 20 students of LCCC.
10 50 64 33 48 5 11 23 37 26
26 32 17 7 13 19 29 43 21 22
Construct a grouped data frequency distribution table with 5 classes.

Find the mean, median & mode


OTHER MEASURES OF LOCATION
The measure of central tendency is a measure of location. It indicates the
center of a given data. Other descriptive measures which are used to locate the
position of values or scores in the distribution are quartiles, deciles, and
percentiles. In this book, we shall not discuss these measures comprehensively, but
instead we shall just define them.
Recall that the median is the value where 50% of the distribution fails or lies
above it while 50% of the distribution lies below it. The midpoint of the line
segment is the median.
In other words, the median is the value which divides the distribution into
two equal parts. We define the quartiles, deciles, and percentiles in a similar
manner. These descriptive measures – quartiles, deciles, and percentiles – are
called fractiles.
Quartiles are values which divide the distribution into four equal
parts.
Deciles are values which divide the distribution into ten equal parts.
Percentiles are values which divide the distribution into 100 equal
parts.
The first quartile (Q1) is the value where 25% of the distribution
lies below it while 75% of the distribution lies above it. The third
quartile (Q3) is the value where the 75% of the distribution lies
below it while 25% of the distribution lies above it. Observe that
the median is equal to the second quartile (Q2).
Computation of the Quantiles for Ungrouped Data
To determine any quantile, change it first to percentile and
follow the steps below.
Step 1. Arrange first the scores according to magnitude or size.
Step 2. Find the position of the given percentile/decile/quartile
in the distribution using the formula P(n + 1)/100
Step 3. Locate the score corresponding to the obtained position
in the distribution starting from the lowest score.
Step 4. Interpolate to get the score if the obtained position from
step 2 is not exact.
Formula:
Px = x(n + 1)/100
Dx = x(n + 1)/10
Qx = x(n + 1)/4
 Example 1. Find the 20th percentile or P20 of the following scores.
25
22
20
16
17
12
8
6
5
Steps:
1. Locate the position of the score corresponding to the 20 th
percentile using Px = x(n + 1)/100
Solution: P20 = 20 (9 + 1)/100 = 2
2. Locate the 2nd score from the lowest. The answer is 6.
3. Hence, the 20th percentile or P20 is 6.

This means that 20% of the cases scored below 6.


Example 2. Find the 60th percentile or P60 of the following
scores.
99
95
80
75
70
60
40
 Steps: 1. Compute for the position of the 60 th percentile using Px = x(n +
1)/100
Solution: P60 = 60 (7 + 1)/100 = 4.8
2. Since 4.8 is between the 4th and the 5th score from the bottom, we have to
interpolate to find the answer.
Take the 4th score from the bottom which is 75 and the 5th score which is 80.
3. Solve the difference between these two scores, 80-75, which is 5.
4. Multiply the difference 5 by the decimal part obtained in step 1 which is 0.8.
The product is 4.
5. Finally, add this product to the lower score, 75. Thus, P60 = 75 + 4 = 79.
This means that 60 % of the cases fall below the score 79.
Computations of the Quantiles for Grouped Data
The computation of any quantile for grouped data is similar to that of the
median. The formula is
Px = lb +
Where: Px = the desired percentile
lb = exact lower limit of the class interval containing P x
n = number of cases
cf = cumulative frequency immediately below the class interval
containing Px
f = frequency of the class interval containing P x
i = class size
Dx = lb +
Qx = lb +
Computations of Some Quantiles
Class Interval f <cf
64-67 7 90
60-63 6 83
56-59 7 77
52-55 5 70
48-51 10 65
44-47 14 55
40-43 10 41
36-39 5 31
32-35 9 26
28-31 7 17
24-27 6 10
20-23 4 4
n=90
Compute for Q3 or P75
Solution:
Step 1: Compute for xn/4
3(90)/4 = 67.5
Step 2: Locate 67.5 under the <cf column. The 67.5 score is contained in the <cf 70 which corresponds
to the class
interval 52-55. Therefore, Q3 or P75 lie within this class interval.
Step 3: Find the exact lower limit of 52 which is 51.5. So, lb = 51.5
Step 4: Find the <cf value immediately below the class interval, 52-55. Thus <cf = 65.
Step 5: Find f value or frequency of the class interval, 52-55. So, f = 5.
Step 6: Get the class size which is 4. So, i = 4.
Step 7: Compute for Q3 or P75 by substituting all the needed values in the formula.
P75 = lb + = 51.5 + = 51.5 + 2 = 53.5
This means that 75% of the cases lie below the score 53.5.
Solve for D4 or P40
Solution:
xn/10 = 4(90) / 10 = 36
<cf = 31
f = 10
i =4
lb = 39.5
Substitute the values in the formula:
D4 = lb + = 39.5 + = 39.5 + 2 = 41.5
Find the Q1, P36, & D8 of the data whose frequency distribution is
given below.

Class Interval f <cf


16-23 1 1
24-31 3 4
32-39 6 10
40-47 12 22
48-55 10 32
56-63 8 40
n=40
1.) Q1
Solution:
xn/4 = 1(40) / 4 = 10
<cf = 4
f =6
i =8
lb = 31.5

Substitute the values in the formula:


Q1 = lb + = 31.5 + = 31.5 + 8 = 39.5
2.) P36
Solution:
xn/100 = 36(40) / 100 = 14.4
<cf = 10
f = 12
i =8
lb = 39.5
Substitute the values in the formula:
P36 = lb + = 39.5 + = 39.5 + 2.9 = 42.4
3.) D8
Solution:
xn/10 = 8(40) / 10 = 32
<cf = 22
f = 10
i =8
lb = 47.5
Substitute the values in the formula:
D8 = lb + = 47.5 + = 47.5 + 8 = 55.5
CHARACTERISTICS AND USES OF THE QUARTILE
There are three quartiles actually, the first, the second, and
the third quartiles. The second quartile is the median. The
quartiles are computed and used when a group is to be divided
into four equal subgroups according to some traits or
characteristics such as ability. The quartiles are also used in the
computation of the quartile deviation, a measure of variability.
The quartiles are also used to determine who of a group belong
to the lower quartile, middle 50%, or upper quartile.
USES OF PERCENTILE
The percentile is computed and used when:
1. a scaled group of scores are to be divided into 100 equal parts of
subgroups.
2. percentile bands are needed, that is, when a group of scores is to be
divided into a number of subgroups with equal or unequal number of
scores in the subgroups. This is used especially in the transmutation of
raw scores into school marks or grades.
3. percentile ranks of scores are desired. When there is a large number
of scores, percentile ranks are more useful than ordinal ranks.

MEASURES OF VARIABILITY OR DISPERSION
Measures of Variability or Dispersion are measures of the average distance
of each observation from the center of the distribution. They measure the
homogeneity or heterogeneity of a particular group.
 A small measure of variability would indicate that the data are
 Clustered closely around the mean;
 More homogeneous;
 Less variable;
 More consistent and;
 More uniformly distributed.
Consider the following sets of grades in
Mathematics of two groups of 5 students each:
Male Group Female Group

Vincent Troy:70 Marianne Joy: 82


Kensly Kyle:95 Kwenn Anne: 80
Patrick:60 Patricia: 83
Gabriel: 80 Olivia: 81
Jerod: 100 Lurice: 79
Mean: 81 Mean: 81
The mean grade of both groups is 81. By just looking at their mean grade, we can only conclude that both
groups performed equally well in the said test, but this does not explain how far apart the grades are from one
another. Let us picture the position of each grade in a number line.

MALES

! ! ! ! ! !
60 70 80 90 95 100

FEMALES

! ! ! ! !
60 70 80 90 100
Notice that the grades of the males are far apart from each other, while the grades of the female are more
compressed or clustered together. Thus the measure of the center of the distribution is of little help in
describing and comparing these two sets of data. By getting the average distance of each item from the center
of the distribution, the group can be described more completely and, likewise, similarities and differences can
be easily identified.
DIFFERENT MEASURES OF VARIABILITY Other measures of Dispersions

1. RANGE 1. Interquartile Range


2. AVERAGE DEVIATION or MEAN ABSOLUTE DEVIATION 2. QUARTILE DEVIATION or Semi-
3. STANDARD DEVIATION Interquartile Range
4. VARIANCE 3. The 10-90 Percentile Range
 THE RANGE
The range refers to the difference between the highest and the lowest score. If the highest
score is 25 and the lowest score in a distribution is 10, then the range is equal to 25-10 = 15.
This range is specifically called as the exclusive range. If the difference between the exact
lower limit of the lowest score and the exact upper limit of the highest score is solved, the result
is called the inclusive range. Using the same example, we get 16 from 25.5-9.5. The inclusive
range is simply determined by adding 1 to the exclusive range. Generally, the exclusive range is
used for ungrouped data while the inclusive range is used for grouped data.
The range is the easiest and simplest to determine among the measures of variability
because it depends only on the pair of extreme values. However, it is also the most unstable
because its value easily fluctuates with the change in either of the highest or lowest scores. It is
also considered the most unreliable because it does not give the dispersion or spread of the
scores in between the two extreme values. A more reliable measure should involve all the values
in a distribution to provide us an adequate spread of all the scores from the average.
Example 1. Find the range of the grades in Math of thee two groups of students in the preceding
example.
Male: 100-60 = 40
Female: 83-79=4
The range of grades of the male group is 40 while that of female group is 4. This shows that the grades
of the males are scattered while the grades of the female group are close to each other. It shows further
that females are more homogeneous than the males in their math ability.
Uses of the Range
 The range may be computed and used when
 The spread or scatter of the scores is all that is wanted;
 The scores are very few;
 There are extremely low and extremely high scores that there are big gaps between the scores; and
 A rough comparison of the variability of two or more groups is desired. A group whose range is
smaller is more homogeneous than a group with a bigger range.
THE INTERQUARTILE RANGE AND QUARTILE
DEVIATION/SEMI-INTERQUARTILE RANGE

The interquartile range is the distance between the quartiles


Q1 and Q3. It includes the middle 50% of the scores. But it is the
semi-interquartile range called quartile deviation which is more
commonly used as a measure of dispersion or variability.

IQR = Q3- Q1 SIQR =


COMPUTATION OF THE IQR & SIQR (AVERAGE DEVIATION)

The class frequency distribution of test scores in Math 2.


Class Interval f <cf
95 – 99 2 50
90 - 94 4 48
85 - 89 7 44
80 - 84 9 37
75 - 79 10 28
70 - 74 8 18
65 - 69 6 10
60 – 64 3 4
55 - 59 1 1
n=50
Solve for Q1: Solve for Q3:
xn/4 = 1(50) / 4 = 12.5 xn/4 = 3(50) / 4 = 37.5
<cf = 10 <cf = 37
f =8 f =7
i =5 i =5
lb = 69.5 lb = 84.5

Substitute the values in the formula: Substitute the values in the formula:
𝑥𝑛 𝑥𝑛
−<𝑐𝑓 12.5−10 −<𝑐𝑓 37.5−37
Q1 = lb + ቀ ቁ 𝑖
4
𝑓
= 69.5 + ൫ 8
൯5 Q3 = lb + ቀ ቁ 𝑖
4
𝑓
= 84.5 + ൫ 7
൯5
= 69.5 + 1.56 = 71.06 = 84.5 + 0.36 = 84.86
Solve for IQR:
IQR = Q3 – Q1 = 84.86 – 71.06 = 13.8

Solve for SIQR:


SIQR = = = 6.9
USES OF THE QUARTILE DEVIATION
The quartile deviation/semi-interquartile range may be computed and used under
the following conditions:
When the measure of central tendency is the median. The quartile deviation is used with
the median when classifying students.
When the homogeneity or heterogeneity of a group is to be determined. When quartile
deviation is small the group is more or less homogeneous but when the quartile deviation
is large the group is more or less heterogeneous.
When there are extremely high and low scores especially when there are big gaps
between scores. This is because other measures of variability, especially the standard
deviation, are seriously disproportionate under this condition.
When the main concern is the concentration of the middle 50% of the scores around the
median
THE AVERAGE OR MEAN ABSOLUTE DEVIATION (MAD)
Mean Absolute Deviation is the average of the summation of the absolute deviation of each
observation from the mean.
Σ/x-𝑥ҧ/ x is a value or score from the raw data
MAD = ----------, where 𝑥ҧis the mean
N N is the total number of scores
Example 1. A. Find the mean absolute deviation of the male group in example 1.
Solution: The mean of the male group is 81.
Score Mean /x-𝑥ҧ/
x 𝑥ҧ
70 81 11
95 81 14
60 81 21
80 81 1
100 81 19
Σ/x-𝑥ҧ/=66

Σ/x-𝑥ҧ/ 66
MAD= ----------- = ------------- = 13.2
n 5
B. Find the mean absolute deviation of the female group.
Solution: The mean of the female group is 81.
Score Mean /x-𝑥ҧ/
x 𝑥ҧ
82 81 1
80 81 1
83 81 2
81 81 0
79 81 2
Σ/x-𝑥ҧ/=6

Σ/x-𝑥ҧ/ 6
M AD= ----------- = ------ = 1.2
n 5

The male group has a MAD of 13.2 while the female group has 1.2. These results confirm our earlier findings
using the range that the female group is more homogeneous than the male group.
VARIANCE
Variance is the average of the squared deviation from the mean. Formulas for finding the variance for
ungrouped data are shown below:
Σ/x-𝑥ҧ/2 Σ/x-𝑥ҧ/2
Population Variance : Vp= ---------- Sample Variance: Vs = ---------
N n-1

STANDARD DEVIATION
Standard Deviation is the square root of the average deviation from the mean, or simply the square root
of the variance.

Σ/x−𝑥ҧ/2 Σ/x−𝑥ҧ/2
Population Standard Deviation: SDp= ට N
Sample Standard Deviation: SDs = ට n−1

Example 4. Find the variance and the standard deviation of male groups in Example 1.
Example 4. Find the variance and the standard deviation of male groups in Example 1.

Male group:
X 𝑥ҧ x-𝑥ҧ (x-𝑥ҧ)2

70 81 -11 121

95 81 14 196

60 81 -21 441

80 81 -1 1

100 81 19 361

Σ (x-𝑥ҧ)2= 1120

a. Treating the data as population, the variance and the standard deviation is
Σ/x-𝑥ҧ/2 1120
Vp = ------------- = -------- = 224 square units SDp = ξ 224 = 14.97 units
N 5
b. Treating the data as sample, the variance and the standard deviation is:
Σ/x-𝑥ҧ/2 1120 1120
Vs = ----------- = --------- = -------- = 280 square units SDs = ξ 280 = 16.73 units
n-1 5-1 4
 Find the va ria nce a nd the sta nda rd devia tion of the fe ma le group.
An equivalent formula gives another way of computing the standard deviation. It does away with
computing for the mean and the deviation of the scores. We can use the equivalent formula
below.

(Σ x )2
Σx 2 −
SDs = ඨ n
n –1

Male Group:
X X2

70 4900

95 9025

60 3600

80 6400

100 10,000

Σx=405 Σx2= 33925

2 (Σ x )2 (405 )2 164025
SDs = ඨ Σx −
n
= ඨ 33925 − 5
= ඨ 33925 − 5
= ට
33925 − 32805
= ට
1120
= ξ 280 = 16.73 units
n –1 5–1 4 4 4

variance = 280, standard deviation = 16.73


STANDARD DEVIATION AND VARIANCE OF GROUPED DATA

Σf(d ′ )2 𝛴𝑓𝑑 ′ 2
SD = i ට n
–ቀ 𝑛

Example: The grouped data which gives the wages per day of the laborers in a certain construction.
Wages No. of Laborers (f) d’ fd’ f(d’)2 or d’(fd’)
80-84 8 2 16 32
75-79 12 1 12 12
70-74 16 0 0 0
65-69 13 -1 -13 13
60-64 9 -2 -18 36
Σf= 58 Σfd’= -3 Σf(d’)2= 93
𝛴𝑓(𝑑 ′ )2 𝛴𝑓𝑑 ′ 2
SD = i ට –ቀ ቁ =
ξ 1.603 − 0.003
𝑛 𝑛

93 −3 2
SD = 5ට 58 − ቀ58 ቁ = 5 ξ 1.6

= 5ඥ1.603 − ሺ−0.052ሻ2 = 5 (1.265) SD= 6.325

Another formula:
Σfx−𝑥ҧ2
Vp = Σf (x-𝑥ҧ)2 / N and SDp =ට
N

Σfx−𝑥ҧ2
Vs= Σf (x-x)2 / n-1 and SDs = ට
n−1
Example: The grouped data which gives the wages per day of the laborers in a certain construction.
Wages No. of Mean
Laborers (f) X fX 𝑥ҧ x-𝑥ҧ (x-𝑥ҧ)2 f(x-𝑥ҧ)2
80-84 8 82 656 71.74 10.26 105.2676 842.1408
75-79 12 77 924 71.74 5.26 27.6676 332.0112
70-74 16 72 1152 71.74 0.26 0.0676 1.0816
65-69 13 67 871 71.74 -4.74 22.4676 292.0788
60-64 9 62 558 71.74 -9.74 94.8676 853.8084
Σf= 58 fx=4161 2321.1208
Mean = Σfx/ N =4161/58=71.74

Population Variance Population Standard Deviation


Σfx−𝑥ҧ2
Vp = Σf (x-𝑥ҧ)2 / N and SDp = ට N
= ξ 40.02 = 𝟔.𝟑𝟐𝟔
= 2321.1208/ 58 = 40.02

Sample Variance Sample Standard Deviation


2 Σfx−𝑥ҧ2
Vs= Σf (x-𝑥ҧ) / n-1 and SDs = ට = ξ 40.72 = 𝟔.𝟑𝟖
n−1
= 2321.1208 / 57 = 40.72
Exercises:

1. The IQ’s of 5 members of the family are 108, 112, 127, 118 and 113.
Find the
a. Range
b.. Variance
c. Standard Deviation

2. Below is a frequency distribution showing the result of an IQ test of a group of students in a


certain college.
Classes f
80-85 2
86-91 8
92-97 19
98-103 21
104-109 25
110-115 52
116-121 12
122-127 11

Find: a. Range b .Mean Absolute Deviation


c. Population Variance d. Population Standard Deviation
1. For 108 randomly selected high school students, the following IQ frequency distributions were
obtained.
Class limits frequency
90-98 8
99-107 22
108-116 43
117-125 28
126-134 9
N=

Find the population variance and population standard deviation.


COEFFICIENT OF VARIATION
 To relate the measure of dispersion to its average and to convert it to percent form,
the standard deviation is divided by the arithmetic mean. Stating this measure in
percentage form solves the problem presented by differing units. The resulting measure
developed by Pearson is known as the Coefficient of variation.
CV= (SD/mean)100%
Other comparative coefficient of dispersion may be computed when using the other measures
of dispersion:
Quartile Coefficient of Dispersion
CVq= (Q3- Q1) / (Q1+Q3) x 100%
Example 1. Below is the summary of sales of 100 sales
representatives.
Compute the coefficient of variations.
Amount Number of sales Representatives
(in Thousands)

70-74 12
75-79 18
80-84 25
85-89 29
90-94 9
95-99 4
100-104 3

Total 100
Answer the following
items. Show your
solution.
1. Below is the frequency distribution showing the result of an IQ test of a group of students in a certain
college.
Classes frequency(f)
80-85 2
86-91 8
92-97 19
98-103 21
104-109 25
110-115 52
116-121 12
122-127 11

1. Determine the following using the frequency distribution above,


a. size of the class interval f. modal class
b. median class g. lower limit of the modal class
c. lower boundary of the median class h. number of students with an IQ greater than 115.5
d. midpoint of the median class i. midpoint of the modal class
e. upper limit of the median class
2. Compute the value of the following:
a. Mean b. Mode c. Q2 d. P75 e. D3
2. The NSAT scores of 12 students in a certain college
were taken and are shown below.
86, 95, 84, 87, 91, 90, 99, 84, 83, 88, 96, 92
Determine:
a.Mean b.Median c. Q3 d.P4
3.) Solve the range, quartile deviation, mean absolute deviation, standard deviation and
coefficient of variation.

The following are the heights of 50 students in LCCC. (expressed in inches)

x (heights) f
60-64 6
65-69 13
70-74 15
75-79 12
80-84 4
End of Session

Thank you!

You might also like