Mathematics in The Modern World
Mathematics in The Modern World
in the Modern
World
Chapter 4:
Data Management
4.1 Descriptive Statistics
Statistics is a branch of mathematics that deals
with data collection, organization, analysis,
interpretation and presentation.
Continuous
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
Types of Statistical Data
2. Ordinal – the data at this level can be ordered but no differences between the
data. (eg. ten cities are ranked from one to ten, but differences between the
cities don't make much sense, letter grades where we can order things so that A
is higher than B but without any other information).
3. Interval – deals with data that can be ordered, and in which differences
between the data does make sense. But data at this level has no starting point.
(eg. Fahrenheit and Celsius scales of temperatures).
4. Ratio – the highest level of measurement. Data possess all of the features of
the interval level, in addition to an absolute zero. Due to the presence of a zero, it
now makes sense to compare the ratios of measurements.
4.2 Data Collection Method
Methods of Collecting Data
1. In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons: Time consuming, expensive and can be dismissed as anecdotal
2. Mail Surveys
Pros: Can reach anyone and everyone – no barrier
Cons: Expensive, data collection errors, lag time
3. Phone Surveys
Pros: High degree of confidence on the data collected, reach almost
anyone
Cons: Expensive, cannot self-administer, need to hire an agency
4. Web/Online Surveys
Pros: Cheap, can self-administer, very low probability of data errors
Cons: Not all your customers might have an email address/be on the
internet, customers may be wary of divulging information online
Three Ways of Presenting Data
Example 1
Jack joins football practice every Wednesday morning,
Sunday morning and afternoon.
Jack’s team has scored the following numbers of goals in their games,
3, 1, 2, 1, 3, 2, 4, 2, 3, 2, 5, 4, 3, 2.
The following are the 3rd year math grades of an applied math student:
1.6 1.2 1.9 1.5 1.5 1.5 1.0 1.3 1.0
Mean:
X1 + X 2 + ⋯ + X 9
X =
9
Median:
1.0 1.0 1.2 1.3 1.5 1.5 1.5 1.6 1.9
Mode: 1.5
Example 2 (for grouped data)
𝑛
𝑀𝑑 = 𝐿𝐶𝐵 + 2 − 𝑐𝑓𝑝 𝑖
𝑓𝑚
n
Median = LCB + 2 − 𝑐𝑓𝑝 i
fm
15 − 10
= 35.5 + 8 = 39.5
10
The mode for grouped data is given by
𝑓𝑚 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵 + 𝑖
2𝑓𝑚 − 𝑓1 − 𝑓2
𝑓𝑚 − 𝑓1
Mode = LCB + i
2𝑓𝑚 − 𝑓1 − 𝑓2
10 − 7
= 35.5 + 8 = 38.9
20 − 7 − 6
4.5 Measures of Variability
Variability for Ungrouped Data
R HV LV
• Variance
It is defined as the average of the squared deviations from the mean.
It is the measure that considers the position of each observation
relative to the mean.
n x x
2 2
2
𝑥𝑖 − 𝑥
𝑠 2
= or s
2
𝑖
𝑛 −1 n (n 1)
• Standard Deviation (the most widely encountered) - It is
the measure of the spread or dispersion of scores from the
mean of distribution. It is the square root of the variance.
n x x
2 2
𝑥𝑖 − 𝑥 2
𝑠 =
𝑛 −1
or s
𝑖 n (n 1)
s
2
n (n 1) n (n 1)
4.6 Testing a Statistical Hypothesis
Hypothesis testing is the most significant area of statistical
inference. It is a step-by-step process in making inferences
(conclusions) about a population.
Types of Error
1. A researcher wants to know if the average test score of the students taking a
particular examination is 80.
2. A study shows that the daily consumption depends on the age level of a person.
H0: The daily consumption does not depend on the age level of a person.
H1: The daily consumption depends on the age level of a person.
C. Correlation
𝑛( 𝑥𝑦) − ( 𝑥)( 𝑦)
𝑟=
𝑛( 𝑥 2 ) − 𝑥 2 𝑛( 𝑦 2 ) − 𝑦 2
where −1 ≤ 𝑟 ≤ 1 𝑎𝑛𝑑
𝑛 = number of data pairs
If the value of 𝑟 is close to positive 1, then there is a strong positive linear
relationship between the two variables. If 𝑟 is close to negative 1, there is a
strong negative linear relationship between them. However, if the two
variables has a weak or no linear relationship, 𝑟 is close to 0.
Example 3
1. A study is conducted to show how strong is the relationship between sleeping habit of
employees and their level of performance at work.
H0: Sleeping habit of employees is not related to their level of performance at work.
H1: Sleeping habit of employees is related to their level of performance at work.
2. A student wants to know if his grade in Mathematics is associated to his grade in English.
A 7 83
B 3 63
C 2 60
D 6 88
E 3 68
F 4 75
Solution:
To solve for the correlation coefficient r, we must find first the
values of 𝑥𝑦, 𝑥 2 , and𝑦 2 .
A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
Substituting the values to the formula,
6)(1922) − (25)(437
𝑟=
6 123 − 25 2 6 32451 − 437 2
𝑟 = 0.934
𝑛( 𝑥𝑦) − ( 𝑥)( 𝑦)
𝑏=
𝑛( 𝑥 2 ) − 𝑥 2
Example 4
Let us take the example in correlation section since a strong linear relationship exists
between the number of hours of study and test scores on an exam of students.
Solution:
Since 𝑥𝑦, 𝑥 2 , and𝑦 2 are necessary to solve for 𝒂 and 𝒃, we must solve them first.
Hours of Grade
Student 𝑥𝑦 𝑥2 𝑦2
Study (x) (y)
A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
Then we have,
(437)(123) − (25)(1922)
𝑎= 2
= 50.451
6 123 − (25)
(6)(1922) − (25)(437)
𝑏= = 5.372
6 123 − (25)2
Hence, the equation of the regression line is
𝒚′ = 𝟓𝟎. 𝟒𝟓𝟏 + 𝟓. 𝟑𝟕𝟐𝒙
Suppose we want to know the grade (𝒚′ ) of the student if he/she studies in x
hours. For example, let 𝑥 = 9. Then,
𝑦 ′ = 50.451 + 5.372(9)
𝑦 ′ = 98.80
Let 𝑥 = 5. Then,
𝑦 ′ = 50.451 + 5.372(5)
𝑦 ′ = 77.31