Chapter 4
Chapter 4
Chapter 4
Statistics nowadays are very useful. It enables the researchers to easily find
the solutions to the problems, either personal or societal, and interpret the
implications of these solutions in our everyday lives. Furthermore, through this,
many improvements and inventions produced which give an impact in the
society.
Statistics may be used in education, politics, economics, and the like. With
this, it also gives us information on the trends in the society and help us to
discover problems which need an urgent solutions.
STATISTICS AND ITS IMPORTANCE
Key Concepts
• Statistics is a science that deals with the collection, organization
analysis, and interpretation of data.
- Collection means gathering relevant information or data from the
population through survey, test, interview, experiment, etc.
- Organization or presentation refers to the systematic arrangement
of data into textual form, table, graph, or chart.
- Analysis is the careful examination of data and may be with the use
of statistical tool.
- Interpretation of data is making a generalization or conclusion from
the data that have been analyzed.
2. Inferential Statistics:
Inferential statistics deals with techniques used for analysis of data,
making predictions, comparisons, and drawing conclusions about a
population using information gathered about a representative portion or
sample of that population.
Importance of Statistics
Statistics plays a vital role in every field of human activity. Statistical
methods are useful tools in aiding researches and studies in different fields such
as education, economics, social sciences, business, health and many others. It
helps provide more critical analyses of information. Examples: (1) In Economics:
Economics largely depends upon statistics. National income accounts are
multipurpose indicators for the economists and administrators. Statistical
methods are used for preparation of these accounts. (2) In Natural and Social
Sciences: Statistical methods are commonly used for analyzing the experiments
results, testing their significance in Biology, Physics, Chemistry, Mathematics,
Meteorology, Research chambers of commerce, Sociology, Business, Public
Administration, Communication and Information Technology etc…
Properties of Mean
1. A set of data has only one mean and does not have an outlier.
2. All values in the data set are included in computing the mean.
3. It is very useful in comparing two or more data sets.
4. It is affected by the extreme small or large values on a data set.
5. It is appropriate in symmetrical data.
Illustrative Examples:
1. Jean has been working part- time on a fast-food company. The following
numbers represent the number of hours Jean has worked on this fast-food
company for each of the past 8 months: 30, 45, 43, 60, 71, 82, 71, 83. What
is the mean (average) number of hours that Jean worked on this company?
Solution:
Step 1: Add the numbers to determine the total number of hours he
worked.
30 + 45 + 43 + 60 + 71 + 82 + 71 + 83 = 485
485
Step 2: Divide the total by the number of months. = 60.63 hours/month
8
Solution:
Step 1: Add the numbers to determine the total age of the workers.
26 + 23 + 30+ 25+ 29 + 33 + 38 + 35 = 239
Step 2: Divide the total by the number of workers
239
= 29.875 𝑜𝑟 30 𝑦𝑒𝑎𝑟𝑠
8
Weighted Mean/Average
The weighted mean/average is particularly useful when various classes or
groups contribute differently to the total.
Illustrative Example:
Rena, a fourth year student majoring in mathematics took the following
courses with the corresponding units and grade during the first semester of the
school year. What is her average grade?
Solution:
3(1.4) + 1(1.3) + 1(1.5) + 1(1.8) + 3(1.7) + 3(1.8) + 3(1.2) + 3(1.8) + 3(1.7)
𝑥̅ =
21
33.4
𝑥̅ =
21
𝑥̅ = 1.59
MEDIAN (𝑥̃)
The median is the number that falls in the middle position after the data
has been organized either in ascending or descending order or array.
To determine the value of the median for ungrouped data, consider two rules.
1. If n is odd, the median is the middle ranked.
2. If n is even, then the median is the average of the two middle ranked
values.
𝑛+1
Median: 𝑥̃ = 2
Illustrative Examples:
1. Find the median of the following data:12, 3, 17, 8, 14, 10, 6
Solution:
Step 1: Organize the data in an array.
3, 6, 8, 10, 12, 14, 17
Step 2: Since the number of data values is odd, the median is the middle
most position. In this case, the median is the value that is found
in the fourth position of the data in an array.
3, 6, 8, 10 , 12, 14, 17
Step 2: Since the number of data values is even, the median will be
𝑛+1
the mean value of the numbers found before and after the 2
position.
𝑛 + 1 10 + 1 11
= = = 5.5
2 2 2
Step 3: The number before the 5.5 position is 4 and the number after the
5.5 position is 6. Now, you need to find the mean value.
2, 2, 3, 4, 4, 6, 7, 8, 9, 15
4+6
=5
2
The median is 5.
Illustrative Examples:
1. Find the mode of the following data: 76, 81, 76, 80, 76, 83, 77, 79, 82, 76
Solution:
There is no need to organize the data in an array, unless you think
that it would be easier to locate the mode if the numbers are in an array. In
the above data set, the number 76 appears thrice, but all the other numbers
appear only once. Since 76 appears with the greatest frequency, it is the
mode of the data set.
Solution:
The above data set has two values that each occur with a frequency of
3. These values has 2 modes 21, and 24 which is called bi-modal. All other
values occur only once.
3. The coach of a sports team begin to observe the color of t-shirt his athletes
wear. His goal is to find out what color is worn most frequently so that he
can offer a common color or uniform shirts to his athletes.
Monday: Green, Blue, Pink, White, Blue, and Blue
Tuesday: Blue, Red, Black, Pink, Green, and Blue
Wednesday: Orange, White, White, Blue, Blue, and Red
Thursday: Brown, Black, Brown, Blue, White, and Blue
Friday: Black, Blue, Red, Blue, Red, and Pink
What is the mode of the colors above?
Solution:
The color blue was worn 11 times during the week. All other colors
were worn with much less frequency in comparison to the color blue.
Try this!
A. Solve for the mean, median and mode of the following data set and
interpret the results
B. Ben and his friends are comparing the number of times they have been to
the movies in the past year. The table below illustrates how many times
each person went to the movie theatre in each month.
Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
Ben 3 3 2 5 2 3 2 4 2 3 2 2
John 3 2 1 1 1 3 3 3 2 4 1 2
Matthew 1 3 3 2 1 4 5 3 2 2 2 3
Rose 2 2 2 1 3 2 4 1 3 2 3 3
1. By comparing modes, which person went to the movies the least per
month?
2. By comparing medians, which person went to the movies the most per
month?
3. 3. Rank the friends in order of most movies seen to least movies seen
by comparing their means.
6. What is the mean of the medians for each month (the arithmetic
average of the medians of the number of movies seen in each
month)?
Range is the difference of the highest value and the lowest value in the
distribution.
Class Interval/width/size (i) is the distance between the class lower limit and
the class upper limit.
Class Limit is the smallest and largest observation (data, events etc.) in each
class. Hence, each class has two limits: a lower and upper limit.
Class Boundary or True Class Limit. It is 0.5 more of an upper class limit
and 0.5 less of a lower class limit. Therefore, each class has an upper
and lower class boundary or true upper and lower class limit.
Midpoint or Class Mark (X) is found by adding the upper and lower class
limits of any class and dividing the sum by 2
Solution:
The following steps are involved in the construction of a frequency
distribution.
2. Find the range of the data. The range is the difference between the
largest and the smallest value.
Range ( R ) = R = 154 – 118 = 36
4. Decide the starting point. The lower class limit should cover the
smallest value in the raw data. Write down your lowest value for your
first minimum data value. The lowest value is 118.
5. Determine the remaining class limits .When the lowest class limit has
been determined, then by adding the class width/size to the lower
class limit (118 + 8 = 126) the next lower class limit is found. The
remaining lower class limits may be determined by adding the class
size repeatedly until the largest value of the data is observed in the
class. You can compute the upper class limit by subtracting one from
the class width and add that to the minimum data value. For example:
118 + (8 – 1) = 125
or
118 – 125 150-157
126 – 133 142-149
134 – 141 134 – 141
142 – 149 126 – 133
150 – 157 118 – 125
b) What are the true class limits/class boundaries of the first two
classes?
For the first class 118 – 125 , the lower class boundary is
118 – 0.5 = 117.5, and for the second class, 126-133, it is
126 – 0.5 = 125.5
While the true upper limits or upper class boundaries for the first
class 118 – 125, the true upper limit or upper class boundary is
125 + 0.5 = 125.5, and for the second class, 126-133, it is
133 + 0.5 =133.5
Illustrative Example:
Nowadays most people spend their leisure time in facebook. Fifty-eight
students in a class recorded the time they spent in facebook during their free
time. The frequency distribution table data below shows the number of minutes
they spent in facebook.
Classes 𝑓 𝑋 𝑓𝑋
5–9 1 7 7
10 – 14 2 12 24
15 – 19 2 17 34
20 – 24 6 22 132
25 – 29 7 27 189
30 – 34 10 32 320
35 – 39 30 37 1 110
𝛴𝑓 = n = 58 𝛴𝑓𝑋 = 1 816
Solution: Applying the formula,
∑ 𝑓𝑋 1 816
𝑥̅ = = = 31.31
𝑛 58
(Using the same data in the previous lesson). The data shows the time
spent of 43 students in studying during examination in their math course.
Find the median.
Classes 𝑓 < 𝑐𝑓
25 – 29 3 3
30 – 34 2 5
35 – 39 5 10
40 – 44 8 18
45 – 49 Median class 8 26
50 – 54 8 34
55 – 59 9 43
Solution: Follow the steps in determining the median for grouped data.
𝑛 43
a) 2 = 2 = 21.5
43
− 18
𝑥̃ = 44.5 + ( 2 )5
8
21.5 − 18
𝑥̃ = 44.5 + ( )5
8
𝑥̃ = 44.5 + 2.1875
𝑥̃ = 46.69
Median divides the ordered data into 2 equal parts while Quartiles divide a
data set into four equal parts. The three quartiles: Quartile 1 (Q 1) also called the
lower quartile is the value that below which 25% of the data lie; Quartile 2 (Q2)
that is equivalent to the median is the value that below which 50% of the data lie,
and Q3 also called the upper quartile is the value that below which 75% or three-
fourths of the data lie.
Deciles divide the array data set into ten equal parts and there are 9
deciles, denoted by D1, D2, …, D9. The Decile 1 or D1 is the value that below
which 10% of the data lie.
Percentiles divide the array data set into one hundred equal parts. There
are 99 percentiles, denoted by P1, P2, …, P99. The Percentile 1 or P1 is the value
that below which 1% of the data lie.
The figure below illustrates the relationship of the quantiles in a given distribution.
Q1 = P25
Q2 = P50 = D5 = 𝑥̃
Q3 = P75
Illustrative Example:
The monthly salary in pesos of 16 DepEd Elementary teachers are
as follows:
Teacher Salary Teacher Salary
1 30,531 9 22,216
2 32,469 10 32,072
3 36,942 11 21,038
4 20,754 12 21,038
5 23,222 13 21,038
6 21,327 14 20,754
7 37,400 15 20,754
8 45,269 16 20,754
Find the a) lower quartile (Q1), b) 7th decile, and c) 30th percentile.
The 4th observation or item in the table is Php 20 754. Therefore, 25%
of the 16 DepEd Elementary teachers have salaries that are below or lower
than Php 20 754.
b) 7th decile
Using the formula:
𝑘(𝑁 + 1)
𝐷𝑘 =
10
7(16 + 1) 7(17) 119
𝐷7 = = =
10 10 10
𝐷7 = 11.9
The 11th
observation or item which is Php 30 531, this shows that 70%
of the 16 DepEd Elementary teachers have salaries that are below or lower
than Php 30 531.
The 5th observation or item which is Php 21 038, this implies that
30% of the 16 DepEd Elementary teachers have salaries that are below or
lower than Php 21 038.
𝑄1 = 39.5 + 0.46875
𝑄1 = 39.97
Interpretation: 25% of the students spent 39.97 or about 40 minutes or below
in using Facebook.
Worktext in Mathematics in the Modern World 98
b) Solving for the Decile 2:
Decile 2 is two-tenths (or 20%) of the data falls on or below D2.
2(𝑛)
First solve 0.20n or and locate in the column for <cf the
10
2(𝑛) 2(43)
location of D2, so, = =8.6
10 10
2𝑛
0.20𝑛−<𝑐𝑓 −<𝑐𝑓
D2 = 𝐿 + ( )𝑖 or D2 = 𝐿 + ( 10
)𝑖
𝑓𝑚 𝑓𝑚
0.20(43) − 5
D2 = 34.5 + ( )5
5
8.6) − 5
D2 = 34.5 + ( )5
5
D2 = 34.5 + 3.6
D2 = 38.1
Interpretation: 20% of the students spent 38.1 or about 38 minutes
in Facebook.
52𝑛
0.52𝑛−<𝑐𝑓 −<𝑐𝑓
P52 = 𝐿 + ( )𝑖 or P52 = 𝐿 + ( 100𝑓 )𝑖
𝑓𝑚 𝑚
0.52(43) − 18
P52 = 44.5 + ( )5
8
P52 = 44.5 + 2.73
P52 = 47.23
Interpretation: 52% of the students spent 47.23 or about 47
minutes in Facebook.
Try this!
C. Find the, lower quartile, upper quartile, interquartile range, decile 3, and
percentile 49 of the following scores of students in a Science 50-item
summative test: 15, 42, 38, 12, 6, 22, 31, 7, 36, 14, 41, 15, 50, 27 , 65
D. The data below shows the age distribution of residents in 7th street of
South Hill Subdivision: Compute the (1) Mean, (2) Median, (3) Mode,
(4) Percentile 22, (5) Quartile 3 and (6) Decile 8. Interpret the results.
𝑓 < 𝑐𝑓
25 –
Class interval
29 3 3
30 – 34 2 5
35 – 39 5 10
40 – 44 8 18
45 – 49 8 26
50 – 54 8 34
55 – 59 9 43
When the measure of variability is large, the values are widely scattered;
when it is small they are tightly clustered.
There are four measures of variability that will be discussed in this lesson:
the range, mean absolute deviation, variance and standard deviation.
Illustrative Example:
The data below are scores in a 20-item quiz in English of Grade 8 boys
and girls. Find the range of the scores of the two groups. Solve also for the
mean.
Girls’ scores
Girls’ Boys’
8 3
9 7
11 10
12 7
10 13
9 13 Boys’ scores
10 17
11 10
10 10
Solution:
Range (R) = Highest value (H) – Lowest value (L)
R=H-L R=H-L
Rgirls = 12-8 Rboys = 17-3
Rgirls = 4 Rboys = 14
Interpretation: The girls’ scores are more clustered about the mean than
the boys’ scores.
Illustrative Examples:
Find the mean absolute deviation (MAD) of the girls’ and boys’ scores
in the previous example and interpret.
a. Find the MAD of the scores of girls.
Score (𝑥) Mean (𝑥̅ ) |𝑥 − 𝑥̅ |
8 10 2
9 10 1
11 10 1
12 10 2
10 10 0
9 10 1
10 10 0
11 10 1
10 10 0
∑𝑥 = 90 ∑|𝑥 − 𝑥̅ | = 8
Illustrative Examples:
Find the variance and standard deviation of the scores of the girls and boys.
a. Girls 𝑥̅ = 10
Score (𝑥) 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
8 −2 4
9 −1 1
11 1 1
12 2 4
10 0 0
9 −1 1
10 0 0
11 1 1
10 0 0
∑(𝑥 − 𝑥̅ )2 = 12
Solution:
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
12
𝑠2 =
9−1
12
𝑠2 =
8
𝑠 2𝑔𝑖𝑟𝑙𝑠 = 1.5 variance of the girl’s scores
𝑠 = √15
𝑠 = 1.22 standard deviation of the girl’s score
Solution:
∑(𝑥 − 𝑥̅ )2
𝑠2 =
𝑛−1
134
𝑠2 =
9−1
134
𝑠2 =
8
𝑠 2𝑔𝑖𝑟𝑙𝑠 = 16.75. variance of the boy’s scores
𝑠 = √16.75
𝑠 = 4.09 standard deviation of the boy’s score
Direct from the scores Method of Solving the Variance and Standard Deviation
Formulas of Variance and Standard Deviation (Direct from the scores Method)
(∑ 𝑥)2 (∑ 𝑥)2
∑ 𝑥2− ∑ 𝑥2−
2 2
Population: 𝜎 = 𝑁
Sample: 𝑠 = 𝑛
𝑁 𝑛−1
where,
𝜎 2 (sigma-squared) –variance for population
𝑠 2 –variance for a sample
𝜎 –standard deviation for population
s –standard deviation for sample
𝑥 –value of each observation or measurement
Illustrative Examples:
Solve for the variance and standard deviation of the boys’ and girls’ scores
using the same data in the previous example.
Solution:
(∑ 𝑥)2 (90)2
∑ 𝑥2 − 912 −
𝑠2 = 𝑛 = 9
𝑛−1 9−1
8100
912 − 9 912 − 900 12
2
𝑠 = = =
8 8 8
𝑠 2 = 1.5 variance of the girls’ scores
𝑠 = 1.22 standard deviation of the girls’ scores
2. Boys’ scores
Score (𝑥) 𝑥2
3 9
7 49
10 100
7 49
13 169
13 169
17 289
10 100
10 100
∑ 𝑥 = 90 ∑ 𝑥 2 = 1034
Solution:
(∑ 𝑥)2 (90)2
∑ 𝑥2 − 1034 −
𝑠2 = 𝑛 = 9
𝑛−1 9−1
8100
1034 − 9 1034 − 900 134
2
𝑠 = = =
8 8 8
𝑠 2 = 16.75 variance of the boys’ scores
𝑠 = 4.09 standard deviation of the boys’ scores
Interpretation: The same result was obtained using the two formulas, that girls’
scores are more clustered than boys’ scores.
Worktext in Mathematics in the Modern World 107
MEASURES OF VARIABILITY OF GROUPED DATA
The range (R) of grouped data in a frequency distribution may be
determined either by:
Method 1. Finding the difference between the true upper limit (UL)
and the true lower limit (LL).
𝑅 = 𝑈𝐿 − 𝐿𝐿
Method 2. Finding the difference between the midpoints or class
marks (X)
𝑅 = 𝑋𝐻 − 𝑋𝐿
Illustrative Example:
The data below shows the monthly electrical consumption (in kwh) of 100
households. Find the range.
Monthly electrical
𝑓
consumption (kwh)
40 – 44 4
45 – 49 11
50 – 54 20
55 – 59 31
60 – 64 19
65 – 69 11
70 - 74 4
100
Solution:
Method 1: Method 2:
𝑅 = 𝑈𝐿 – 𝐿𝐿 𝑅 = 𝑋𝐻 – 𝑋𝐿
𝑅 = 74.5 – 39.5 𝑅 = 72 − 42
𝑅 = 35 𝑅 = 30
70+74
𝑋𝐻 (midpoint of the highest class, 70 – 74) = = 72
2
40+44
𝑋𝐿 (midpoint of the lowest class, 40 – 44) = = 42
2
Illustrative Example:
Solve the MAD of the sample monthly electrical consumption (kwh) of
100 households.
Solution:
a) First is to solve for the mean:
∑ 𝑓𝑋
𝑥̅ =
𝑛
5 695
𝑥̅ =
100
𝑥̅ = 56.95
𝑀𝐴𝐷 = 5.42
The mean absolute deviation of the monthly electrical consumption is
5.42 kwh.
where,
𝑓 –frequency
𝑋 –class marks or midpoint
2
(∑ 𝑓𝑋)
√ 𝑓𝑋 2 − 𝑛
𝑠=
𝑛−1
(5 695)2
√329 305 − 100 329 305 − 324 330.25
𝑠= =√
100 − 1 99
4 974.75 4 974.75
𝑠=√ =√
99 99
𝑠 = √50.25
𝑠 = 7.0887 or 7.09
Try this!
Number of
𝑓
child(ren)
0 5
1 11
2 9
3 5
4 5
5 0
6 4
7 1
Age 𝑓
18 – 22 15
23 – 27 33
28 – 32 45
33 – 37 26
38 – 42 13
43 – 47 8
n=40
Scatter Diagram
where,
𝑟 –coefficient of correlation
𝑥, 𝑦 –random variable on observed data
𝑛 –sample size
Illustrative Example:
A school psychologist is interested find the relationship between a child’s
Emotional Quotient (EQ) and Creativity. To investigate this problem, the
psychologist samples some children and administers a standard Emotional
Quotient and a test of creativity to each child.
Solution:
a) Solve for 𝑥𝑦, 𝑥 2 , 𝑦 2 and find the sum.
b) Then solve for r.
Student EQ Creativity 𝒙𝒚 𝒙𝟐 𝒚𝟐
1 138 51 7 038 19 044 2 601
2 88 30 2 400 7 744 900
3 120 38 4 560 14 400 1 444
4 90 24 2 160 8 100 576
5 86 20 1 720 7 396 400
6 131 50 6 550 17 161 2 500
7 113 35 3 955 12 769 1 225
8 120 27 3 240 14 400 729
9 95 30 2 850 9 025 900
10 110 24 2 640 12 100 576
11 100 52 5 200 10 000 2 704
12 127 39 4 953 16 129 1 521
13 117 30 3 510 3 689 900
14 81 18 1 458 6 561 324
∑𝑥 = 1 516 ∑𝑦 = 468 ∑𝑥𝑦 = 52 474 ∑𝑥 2 = 168 518 ∑𝑦 2 = 17 300