Lesson 4
Lesson 4
Lesson 4
Definition of terms:
Statistics is a branch mathematics that deals with the collection, organization or presentation, analysis,
and interpretation of data. Its fundamental purpose is to describe and draw inferences about the numerical
properties of a population.
Luke 2:1-4
principles of descriptive statistics.
In those days a decree went out from
Caesar Augustus that all the world Among the measurements falling under descriptive statistics are the
should
be registered. 2 This was the first measures of central tendency, measures of variability, skewness, kurtosis,
registration
when[a] Quirinius was governor of minimum, maximum, summation and other items which help in describing a data
Syria. 3 And all went to be registered,
each to his own
town. 4 And Joseph also went up
set.
from Galilee, from the town
of Nazareth, to Judea, to the city of Descriptive statistics answers the questions such as:
David, which is
called Bethlehem, because he was of 1. How many students are interested to take online classes?
the house and lineage of David…
2. What months has the highest and the lowest number of covid-19 positive?
3. What are the most likable Netflix series according to students?
4. Who performed better in the entrance examination?
5. What proportion of the ULS college students likes online class?
Inferential Statistics is a statistical procedure used to draw inferences for the population on the
basis of the information obtained from the sample. With inferential statistics, you are going to try to arrive
at conclusions extending beyond the data alone. You may use it to make judgments of the possibility that
an observed difference between groups/data is a dependable one or it just happened due to chance. It is a
matter of deciding between reality and coincidence.
Sample – is a small portion or part of a population; a representative of the population in a research study.
Statistic – is any numerical value which describes a sample.
Example: Out of the 7, 592 students enrolled in a Marian Institution, 3,568 are Female.
n = 3,568 Statistic (n)
Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order
or ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do
exist; however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there
exists a true zero. in addition, true ratios exist when the same variable is measured on two different
members of the population. e.g. height
Sampling Techniques
In doing research, if the population is too big a scientific number of samples is acceptable. One way
of getting a number of samples is by using the RAOSOFT survey tool. You can use the raosoft calculator
tool online to compute the desired and substantial sample size. http://www.raosoft.com/samplesize.html
Note that the “e” is called the margin of error. It is a value which quantifies possible sampling errors.
Usually the margin of error is either 0.01 or 1%, 0.10 or 10 % and 0.05 or 5%. Sampling error means that
the results in the sample differ from those of the target population because of the “luck of the draw”.
Note this: Since you already know what to use to compute the appropriate sample size, the
Sampling is the process of next is how to select the samples from the population. This is referred to as sampling.
selecting samples from a
given population We will only consider and discuss the probability sampling techniques, these are:
There are two types of
sampling techniques Simple random sampling. This is a procedure where a sample is selected in such a way
(1) Probability that every element is as likely to be selected as any other element in the population.
Sampling: samples
are chosen in such a way
that each Example:
member of the
population has an equal Lottery: this needs a complete list of the population. you write the names or
chance of being selected
in the samples codes of each member and place them in a container, then randomly
(2) Non Probability draw the desired number of samples. This is easy if the population is
Sampling: each
member of the small.
population does not have
a known chance of being Systematic random sampling. This method is a sampling procedure with a random start.
included in the sample.
Hence, personal Samples are randomly chosen using the rules set by the researchers. This involves
judgment plays an 𝑁
important role in the choosing the 𝑘𝑡ℎ member of the population, with 𝑘 = , but there should be a random
selection.
𝑛
start.
Example: Choose a sample of size 10 from N = 500.
1. Choose a random start, say 10.
500
2. Determine the 𝑘𝑡ℎperiod by 𝑘 = = 50, so every 50th member will be chosen starting from 10
10
3. So the respondents will be member number 10, 60, 110, 160, 210, 260, 310, 360, 410, 460.
Stratified random sampling. This is used when the population can be naturally classified
into groups or strata.
Example: A survey to find out families living in a certain municipality are in favor of charter
change will be conducted. To ensure that all income groups are represented, respondents will
be divided into high-income (Class A), middle (class B) and low-income (class C) groups.
Below is the distribution of income groups.
Strata Number of Families
Class A 1000
Class B 2 500
Class C 1 500
N 5 000
1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50% response
rate, n = 357
2. Use proportional allocation, how many from each group should be taken as sample?
So, 71 families should be taken as respondents from Class A, 179 from Class B and 107 from
Class C, for a total of 357.
Collecting and Organizing Data in a Table
The study of statistics begins with the collection of data or measurements. Data collected should be
organized systematically for easier and faster interpretation. They may be presented in any of the
following forms:
The textual form can be used if the data to be presented is few.
The tabular and graphical forms are used when more detailed information about the data is to be
presented.
A table is used when you want to present a data in a systematic and organized manner so that
reading and interpretation will be simpler and easier.
A frequency table is constructed by listing the measurements from highest to lowest, then making
tally marks to record how often each number occurs. After tallying, count the marks and record them in
the proper column.
Example 2: The following are the ages of 45 patients confined in a certain hospital suffering from mild
pneumonia:
17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20
Solution: To prepare a frequency table for the given set of scores, the scores are listed from
highest to lowest, tally marks are made and counted. The counted tally marks will then
be recorded under the column frequency. Notice that every 5 th tally crosses the first four
tallies. This is done to make counting of marks easier especially if the number of cases
is rather big.
If the number of measures in consideration is rather big, the presentation of data is further
simplified by grouping the measures into class intervals called a frequency distribution.
A frequency distribution is a distribution of the total number of measures or frequencies over
arbitrarily defined categories or classes. The number of measures falling under a class is called class
frequency.
Example 1.
The frequency distribution below shows the scores obtained by 300 students in an
Biostatistics test of 50 items.
Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300
In the example above, the symbol 45-49 and the other symbols which follow up to 10-14 are
called class intervals. The end numbers are called class limits. For instance, in the class interval 45-49,
45 is called the lower limit while 49 is called the upper limit.
Each class interval has also a lower boundary and a higher boundary. For the class interval 45-49,
the lower boundary is 44.5 while the higher boundary is
49.5. Hence, for the class interval 45-49, 44.5 – 49.5 are called the class boundaries.
The size of the class interval, also called class size is the difference between the upper boundary
and the lower boundary. Hence, the class size in the given example is 5
A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the lower and
upper class limit. For instance, the midpoint of the class interval 45-49 is 45−49 or 47.
2
Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖=
𝑛
3. Start the lowest class interval with the lowest value / score in the given data set. (lowest score
plus i). Continue until the highest value in the distribution is reached
The mean (also known as the arithmetic mean) is the most commonly used measure of central
position. It is the sum of measures divided by the number of measures in a variable. It is symbolized
as x (read as x bar).
The mean is used to describe a set of data where the measures cluster or concentrate at a point. As
the measures cluster around each other, a single value appears to represent distinctively the total
measures. It is, however, affected by extreme measures, that is, very high or very low measures can easily
change the value of the mean.
𝑥̅ = ∑ 𝑥
𝑛
where ∑x = the summation of x (sum of the measures) n = number of values of x
Example 1: The grades in Chemistry of 10 students are 87, 84, 85, 85, 86, 90, 79,
82, 78, 76. What is the average grade of the 10 students?
Solution:
Note:
87 + 84 + 85 + 85 + 86 + 90 + 79 + 82 + 78 + 76 832
𝑥̅ = = = 83.2
Mean 10 10
∑ 𝑓𝑥
𝑥̅ =
𝑛 Example 2: Find the mean salary for a small company that pays
Where: 𝑥̅ = mean samonthly laries to its employees as shown in the frequency
𝑓 = frequency
𝑥 = score
n = number of distribution.
observation
Salary (x)𝑥̅∑=𝑓𝑥 Number of fx
Employees (f) 𝑛
Php7 000.00 8 56 000 427 000
=
Php8 000.00 11 88 000 45
Php9 250.00 14 129 500 = 9 488.89
Php10 500.00 9 94 500
Php17 000.00 2 34 000
Php25 000.00 1 25 000
The median is the middle entry or term in a set of data arranged in either increasing or
decreasing order.
The median is a positional measure. Thus the values of the individual measures in a set of
data do not affect it. It is affected by the number of measures and not by the size of the extreme
values.
To find the median of a given set of data, take note of the following:
Example 1: The number of books borrowed in the library from Monday to Friday last
week were 58, 60, 54, 35, and 97 respectively. Find the median.
Solution: Arrange the number of books borrowed in increasing order.
35, 54, 58, 60, 97
The median is 58.
Example 2: Cora’s quizzes for the second quarter are 8, 7, 6, 10, 9, 5, 9, 6, 10, and 7.
Find the median.
Solution: Arrange the scores in increasing order.
5, 6, 6, 7, 7, 8, 9, 9, 10, 10
Since the number of measures is even, then the median is the average of the two
𝟕+𝟖
middle scores 𝒙̃ = 𝟐 = 𝟕. 𝟓
The mode is another measure of position. The mode is the measure or value which occurs
most frequently in a set of data. It is the value with the greatest frequency. To find the mode for a
set of data:
- select the measure that appears most often in the set;
- if two or more measures appear the same number of times, and the frequency they
appear is greater than any other measures, then each of these values is a mode;
- if every measure appears the same number of times, then the set of data has no mode.
Answer: The mode is 6 since it is the shoe size that occurred the most number of times.
Example 2: The sizes of 9 classes in a certain school are 50, 52, 55, 50, 51, 54, 55, 53 and
54.
Answer: The modes are 54 and 55 since the two measures occurred the same number
of times. The distribution is bimodal.
When the number of items in a set of data is too big, items are grouped for convenience. The
manner of computing for the mean of grouped data is given by the formula:
∑(𝑓𝑥𝑚)
𝑥̅ =
where: 𝑥̅ is the mean 𝑁
f is the frequency of each class
𝑥𝑚 is the class mark
∑(𝑓𝑥𝑚) is the summation of the product of the frequency and the class mark
𝑁 is the sum of all the frequency
Examples:
Compute the mean of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1
The frequency distribution for the data is given below. The columns X and fX are added.
Class f 𝑥𝑚 𝑓𝑥𝑚
46 – 50 1 48 48 ∑(𝑓𝑥𝑚)
41 – 45 5 43 215 𝑥̅ =
𝑁
36 – 40 11 38 418 1 549
31 – 35 12 33 396 =
48
26 – 30 11 28 308 = 32.27
21 – 25 5 23 115
16 – 20 2 18 36
11 – 15 1 13 13
The mean score is 32.
The Median of Grouped Data
The median is the middle value in a set of quantities. It separates an ordered set of
data into two equal parts. Half of the quantities found above the median and the other half
is found below it.
In computing for the median of grouped data, the following formula is used:
𝑁
− 𝑐𝑓
𝑏
𝑥̃ = 𝑥𝑙𝑏 + [ ]𝑖
2 𝑓
𝑚
where: 𝑥̃ = median
𝑥𝑙𝑏 = the lower boundary of the median class
N = total frequency
𝑐𝑓𝑏 = the cumulative frequency of the lower class next to the
median class
𝑓𝑚 = frequency of the median class
𝑖 = size of the class interval
Examples:
1. Compute the median of the scores of the students in a Mathematics test.
Class Frequency
46 – 50 1
41 – 45 5
36 – 40 11
31 – 35 12
26 – 30 11
21 – 25 5
16 – 20 2
11 – 15 1
The frequency distribution for the data is given below. The columns for lb and “less than”
cumulative frequency are added.
48
Class f lb “<” cf Compute 𝑁, since 𝑁 = = 𝑁
46 – 50 1 45.5 48 2 2 2
−
41 – 45 5 40.5 47 24, then the class interval 𝑥̃ = 𝑥𝑙𝑏 + [ 2𝑐𝑓 ] 𝑖
𝑏
36 – 40 11 35.5 42 containing 24 in the less 𝑓𝑚
31 – 35 12 30.5 31 than cumulative frequency will 24 − 19
26 – 30 11 25.5 19 = 30.5 + [ ]5
be the median class. In this 12
21 – 25 5 20.5 8 case the median class is the 31 = 32.58
16 – 20 2 15.5 3
– 35 class interval.
11 – 15 1 10.5 1
The median score is 32.58.
The Mode of Grouped Data
The mode of grouped data can be approximated using the following formula:
∆1
𝑥̂ = 𝑥𝑙𝑏 + [ ]𝑖
∆1 + ∆2
where: 𝑥𝑙𝑏 is the lower bound ary of the modal class.
∆1 is the difference between the frequencies of the modal class and the next
lower class.
∆2 is the difference between the frequencies of the modalclass and the next
upper class.
𝑖 is the class interval.
The modal class is the class interval with the highest frequency.
Example:
The frequency distribution for the data is given below. The column for lb is added.
The table below is the summary and interpretation of the mean responses in the Likert-type of
statements 1 - 3.
5 4 3 2 1 𝑥̅ Interpretation of 𝑥̅
1. 36 51 18 0 1 4.14 Agree
2. 18 44 37 8 1 3.65 Agree
3. 18 48 28 0 1 3.86 Agree
T 72 143 83 8 3 3.88 Agree
MEASURES OF VARIABILITY
Definition The first part of this module dealt with the concepts of measurements that
MEASURES or describe the middle or the center of the distribution. But the measures of central
VARIABILITY OR
DISPERSION tendency do not describe how the observations spread out from the center of the
distribution. In this lesson, we will deal with the measures of variability or spread
These are measures
of the average of a distribution. For the measures of skewness and kurtosis, it will be your task to
distance of each
observation from the read about it a link will be provided for your reading.
center of the
distribution. They Consider the following measurements, in liters, for two samples of apple juice in a
measure the
homogeneity or tetra packed by companies A and B.
heterogeneity of a Sample A Sample B
particular group.
0.95 1.06
1.00 1.01
0.92 0.88
1.03 0.91
1.10 1.14
𝑥̅ = 1.00 𝑥̅ = 1.00
Company A
0.8 0.9 1 1.1 1.2
Company B
R = H - L
The main disadvantage of the range is that it does not consider every measure in the data.
Examples:
1. The IQs of 5 members of a family are 108, 112, 127, 118 and 113. Find the
range.
2. The range of each of the set of scores of the three students is as follows:
H = 98 L = 92 R = 98 - 92 = 6
Student A
H = 97 L = 90 R = 97 - 90 = 7
Student B
H = 97 L = 90 R = 97 - 90 = 7
Student C
Observe that two students are “ tie.” This indicates that the range is not a reliable
measure of dispersion. It is a poor measure of dispersion, particularly if the size of the sample
or population is large. It considers only the extreme values and tells us nothing about the
distribution of numbers in between.
Variance
Since the range considers only two scores in the data set, it is an
Definition unreliable measure of variability, it cannot be used to directly compare two sets
of data and it is an unstable measure of variability especially for a very large
Variance is the
sample.
average of the squared
deviation from the
mean. The formulas for Consider the following
finding the variance for
ungrouped data are: For male group
Sample A (𝑥 − 𝑥̅) (𝑥 −
Population Variance: a. Treating the data as
∑(𝑥 − 𝜇)2 𝑥̅)2
𝜎2 = population, the variance is
𝑁
65 -19 361 ∑(𝑥 − 𝜇)2 820
Sample Variance: 𝜎2 = =
∑(𝑥 − 𝑥̅)2 𝑁 5
𝑠2 = 75 -9 81 = 164 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝑛−1
Where:
85 1 1 b. Treating the data as sample,
𝑥 = individual
value/score from the variance is
the raw data 95 11 121 ∑(𝑥 − 𝑥̅)2 820 820
𝑥̅ = the sample mean 𝑠2 = = =
𝑛−1 5−1 4
𝑁 = total population = 205 𝑠𝑞𝑢𝑎𝑟𝑒 𝑢𝑛𝑖𝑡𝑠
𝑛 =number of sample 100 16 256
𝜇 = population mean 𝛴 = 820
𝜎2 = the population
variance
𝑠2 = the sample
variance
Note this
Using the variance as a measure of variability for the two sets of grades, the males showed more variability in
performance. Note that the higher the variance, the more variable or far apart the values are from each other.
Standard Deviation
Since the obtained variance is in squared units then the deviation from the
mean is squared. You cannot still picture out the true meaning of the data set.
Definition
Hence, extract the square root of the variance, defining another measure of
standard deviation variability called the standard deviation.
is the square root of the
variance. For Male Group
Population Standard a. Treating the data as population, the standard deviation is
Deviation:
∑(𝑥 − 𝜇)2 ∑(𝑥 − 𝜇)2 820
𝜎=√ 𝜎=√ =√ = √164 = 12.81
𝑁 𝑁 5
Note this:
The computation of the variance and the standard deviation is far too easy if there only few cases; however, if
you have a larger set of data your calculator or your computer can do everything for you, all you have to do is
to input the data and interpret the result. To aide you in the computation of the variance and standard
deviation click the link below, be sure you have your calculator.
For casio fx 570ms and fx 991 ms https://www.youtube.com/watch?v=WTUYZZXU1Ws
For casio 50f ms https://www.youtube.com/watch?v=3m2bFJ43pWs
Coefficient of Variance
When comparing two sets of data with different units, you cannot use the variance or the standard
deviation. You have to use the coefficient of variance.
Definition
Coefficient of variance It is the ratio of the standard deviation to the mean. It is used to compare
the variability of two or more sets of data even when they are expressed in different units of
measurement.
𝑠𝑑
𝑐𝑣 = • 100%
𝑥̅
Where:
𝑐𝑣 = Coefficient
of variance
𝑠𝑑 = standard deviation
𝑥̅ = mean
Example: The following are the time taken (in minutes) to complete a homework by the two
groups of students in a day:
Group A :24, 26, 33, 37, 29, 31
Group B : 26 , 28, 33, 35, 37, 29 , 27, 25
a. Find the range.
b. Find the coefficient of variation.
Solution;
∑(𝑥 − 𝑥̅)2
𝑠𝑑 = √
𝑛−1
112
𝑠𝑑 = √
6−1
sd = 4.73
Frequency Distribution is a distribution of the total number of measures or frequencies over arbitrarily
defined categories or classes. The number of measures falling under a class is called class frequency.
There are three measures of tendency, namely, mean, median, mode.
Measures of Variability or Dispersion
These are measures of the average distance of each observation from the center of the distribution. They measure
the homogeneity or heterogeneity of a particular group
1. Range - R=H–L
2. Standard Deviation (𝜎 for population and s for sample)
3. Variance (𝜎 2 for population and s2 for sample) The square of standard deviation.
4. Coefficient of Variation (cv): It is to compare the variability of two or more sets of data
having different units.
REFERENCES
Bluman, Allan G (2012). Elementary Statistics: a step by step approach. (8th Ed) New York: McGraw-
Hill,
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.,
Calmorin, Laurentina P.,Pledad Ma. Lauremelch (2008). Nursing Biostatistics with Computer. Manila:
Rex Bookstore,
Baltazar, E.C, Ragasa, C, Evangelista, J.(2018). Mathematics in the Modern World. C & E.
Publishing:Quezon City Philippines.
Concepcio, Benjamin P. et.al. Business Statistics with Computer Applications. Sta. Monica Printing
Corp.: Manila, Philippines.
Calano, Roel B., et.al. (2009). Biostatistics. (1st ed) Educational Publishing House: Ermita, Manila,
Philippines