Data Management
Data Management
Mean = P101,200
Median = P 36,000
Mode = P 20,000
2. 0 0 0 0 0 10 10 10 10 10
3. 4 4 4 5 5 5 6 6 6
4. 0 5 5 5 5 5 5 10
Measure of Dispersion
Properties that determine the usefulness of the standard
deviation:
➢ It is use to describe the variability of the distribution only
when the mean is used to describe the center.
➢ It is equal to zero when there is no variability. This happens
only when all observations are of the same value.
➢ It has the same units of measurement as the original
observations.
➢ Like the mean, it can be influenced by outliers.
for population: for sample:
Measure of Dispersion
Example. A consumer group has tested a sample of 8 size-AAA
batteries from each of 3 companies. The results of the tests are
shown in the following table. According to these tests, which
company produces batteries for which the values representing
hours of constant use have the smallest standard deviation?
Measure of Relative Position
Percentiles and Quartiles
are useful when you want to know where the score is located
in reference to the other scores.
➢ Percentile is a data value for which the specified percentage
of the data is below that value.
➢ The median is the 50th percentile.
➢ The 25th, 50th , 75th percentiles divide the data into lower
quartile Q1, middle quartile Q2, and upper quartile Q3,
respectively.
➢ In using quartiles, there are five numbers to be used
altogether: min value, Q1, median, Q3, and max value.
➢ Quartiles are useful for box plots.
Measures of Relative Position
z-score
The z-score for a given data value x is the number of standard
deviations that x is above or below the mean of the data.
z-score of xi in a population:
z-score of xi in a sample:
Problem. (Task: Discuss your solutions to each of the 3 problems)
1.The mean time to download a file is 12 minutes with std.
deviation of 4 minutes. Your download time is 20 minutes.
Your friend’s download time is 6 minutes. How can you
compare your download time with your friend?
a. 0.76 kg is 1 standard deviation above the mean of 0.61 kg. In a normal distribution, 34% of all data lie
between the mean and 1 standard deviation above the mean, and 50% of all data lie below the mean. Thus,
34% + 50% = 84% of the tomatoes weigh less than 0.76 kg.
b. 0.31 kg is 2 standard deviations below the mean of 0.61 kg. In a normal distribution, 47.5% of all data lie
between the mean and 2 standard deviations below the mean, and 50% of all data lie above the mean. This
gives a total of 47.5% + 50% = 97.5% of the tomatoes that weigh more than 0.31 kg. Therefore 97.5% of
6000 = 5850 of the tomatoes can be expected to weigh more than 0.31 kg.
c. 0.31 kg is 2 standard deviations below the mean of 0.61 kg and 0.91 kg is 2 standard deviations above the
mean of 0.61 kg. In a normal distribution, 95% of all data lie within 2 standard deviations of the mean.
Therefore 95% of 4500 = 4275 of the tomatoes can be expected to weigh from 0.31 kg to 0.91 kg.
Normal Distribution and Probability
Standard Normal Distribution
Bivariate data are data sets in which each subject has two
observations associated with it.
Examining a Scatterplot
1. Describe the overall pattern of a scatterplot by the form,
direction, and strength of the relationship.
2. Then look for any striking deviations from the pattern. Identify
each occurrence of an outlier.
Linear Regression and Correlation
Linear Regression
– involves using data to calculate a line that best fits that data
and then using that line to predict scores.
Least-Square Regression Line
– is the line that minimizes the sum of the squares of the
vertical deviations from each data point to the line.
The equation of the least-squares line is
where and
Linear Regression and Correlation
Linear Correlation Coefficient
– determine the strength of a linear relationship between two
variables which is denoted by the variable r.
c. Are the predictions in parts (a) and (b) reliable? Why or why not?
Linear Regression and Correlation
Annual Ave. Annual Family
Region Unemployment Rate Income (000,000)
NCR 8.5 4.25
Cordilla 4.8 2.82
I - Ilocos Region 8.4 2.38
II - Cagayan Valley 3.2 2.37
III - Central Luzon 7.8 2.99
IVA - CALABARZON 8.0 3.12
IVB - MIMAROPA 3.3 2.22
V - Bicol Region 5.6 1.87
VI - Western Visayas 5.4 2.26
VII - Central Visayas 5.9 2.39
VIII- Eastern Visayas 5.4 1.97
IX - Zamboanga Peninsula 3.5 1.90
X - Northern Mindanao 5.6 2.21
XI - Davao Region 5.8 2.47
XII - SOCCSKSARGEN 3.5 1.88
Caraga 5.7 1.98
ARMM 3.5 1.39
References:
Aufmann et al (2013). Mathematical Excursions 3ed. Brooks/Cole ,Cengage
Learning.
Bluman, A. G. (2012). Elementary statistics: a step by step approach 8ed. New
York: McGraw-Hill.
COMAP, Inc. (2013). For all practical purposes: mathematical literacy in
today’s world. New York: W.H Freeman and Company.
Johnson & Mowry (2012). Mathematics: a practical odyssey. Brooks/Cole,
Cengage Learning
Lawsky et al (2014). CK-12 advanced probability and statistics, 2ed. CK-12
Foundation.
Nocon, R. & Nocon, E. (2016). Essential mathematics for the modern world..
QC: C & E Publishing, Inc.
Vistru-Yu, C. and Gozon, A. (2016). Statistics a review ppt. CHED’s GE First
Generation Training.