Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

LESSON 4 MMW Data Management

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

DATA

MANAGEMENT
MEASURES OF
CENTRAL
TENDENCY 01
02 MEASURES OF
MEASURES 03 DISPERSION
OF RELATIVE
POSITION
NORMAL
DISTRIBUTION 04
LINEAR
05 REGRESSION AND
CORRELATION
STATISTICS
WHAT IS DATA?

• the raw information from which statistics are


created.
• individual pieces of information recorded and
used for the purpose of analysis.
WHAT IS STATISTICS?

• the results of data analysis, its interpretation


and presentation.
• involves the collection, organization,
summarization, presentation and
interpretation of data.
TYPES OF STATISTICS

1. DESCRIPTIVE STATISTICS – branch of


statistics the involves the collection,
organization summarization, and
presentation of data.
2. INFERENTIAL STATISTICS – branch of
statistics that interprets and draws
conclusion from the data.
MEASURES OF
CENTRAL TENDENCY
MEASURES OF CENTRAL TENDENCY

• One of the most basic statistical concepts of a set of


numerical data.
• Single value that attempts to describe a set of data by
identifying the central position within that set of data.
• It is often helpful to find numerical values that locate,
in some sense, the center of a set of data.
• There are three types of averages (the arithmetic
mean, the median, and the mode) for a numerical
data.
MEAN / ARITHMETIC MEAN

• Sum of all the data values divided by the number of


data values.
• The most used measure of central tendency.
• Often referred to simply as Mean/Average.
• The mean is Unique and not necessarily one of the
data values.
MEAN / ARITHMETIC MEAN

• The mean is very sensitive, a change in extreme value


can drastically change the mean.
• The mean will not be a good indicator of an average
value if a data set has a value that is very different
from most of the data.
• Outlier is an extremely high or low value.
MEAN / ARITHMETIC MEAN

The Mean of n numbers is the sum of the numbers


divided by n
Sample Mean Σ𝑥
𝐱̅ =
𝑛
Population Mean Σ𝑥
𝛍=
𝑁
MEAN / ARITHMETIC MEAN
EXAMPLE
Six friends in a biology class of 20 students received test
grades of
92, 84, 65, 76, 88, and 90
Find the mean of these test scores.
MEAN / ARITHMETIC MEAN
SOLUTION
The 6 friends are a sample of the population of 20
students. Use 𝐱̅ to represent the mean.
Σ𝑥 92 + 84 + 65 + 76 + 88 + 90 495
𝐱̅ത = = = = 𝟖𝟐. 𝟓
𝑛 6 6

The mean of these test scores is 82.5


MEDIAN

• The middle number in a ranked list.


• Ranked List is any list of numbers that is arranged in
numerical order from smallest to largest or largest to
smallest.
• Not affected by outliers.
MEDIAN

The Median of a ranked list of n numbers is:


• ODD: take the middle number
• EVEN: the mean of the two middle numbers
MEDIAN
EXAMPLE
Find the median of the data in the lists.
a) 4, 8, 1, 14, 9, 21, 12
b) 46, 23, 92, 89, 77, 108
MEDIAN
SOLUTION
a) The list has 7 numbers (ODD). Ranking the numbers
from smallest to largest.
1, 4, 8, 9, 12, 14, 21
The middle is 9, Thus 9 is the median.
MEDIAN
SOLUTION
b) The list has 6 numbers (EVEN). Ranking the
numbers from smallest to largest.
23, 46, 77, 89, 92, 108
The two middle numbers are 77 and 89. The mean of 77
and 89 is 83. Thus 83 is the median of the data
MODE

• In a list of numbers, it is the number that occurs most


frequently.
• Some lists of numbers do not have a mode. In these
cases, no numbers occur more often than other
numbers.
• Some lists of numbers have 1 or more than 1 mode. In
these cases, there are 2 or more numbers that have
the same frequency.
• Not changed by changing an extreme value.
MODE
EXAMPLE
Find the mode of the data in the lists.
a) 18, 15, 21, 16, 15, 14, 15, 21
b) 2, 5, 8, 9, 11, 4, 7, 23
MEDIAN
SOLUTION
a) The number 15 occurs more than the other numbers
in the list. Thus 15 is the mode
b) Since there is no number that occurs more than the
others. There is NO Mode
WEIGHTED MEAN

• Used when some data values are more important than


others.
• Numbers in the list are assigned with different
weights.
WEIGHTED MEAN

The Weighted Mean of the n numbers 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , … , 𝒙𝒏


with the respective assigned weights 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑 , … , 𝒘𝒏 is

∑(𝐱̅ ∙ 𝐰)
𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐱̅ =
∑𝒘

∑(𝐱̅ ∙ 𝐰) is the sum of the products formed by multiplying


each number by its assigned weight, and ∑𝒘 is the sum
of all the weights.
WEIGHTED MEAN
EXAMPLE
The table shows Nico’s fall semester course grades. Use
the weighted mean formula to find Nico’s GPA for the fall
semester.
WEIGHTED MEAN
SOLUTION
1.5 × 3 + 1.0 × 2 + 2.5 × 2 + 2.0 × 3
Weighted x =
10

4.5 + 2 + 5 + 6
Weighted x =
10

17.5
Weighted xത = = 𝟏. 𝟕𝟓
10

Nico’s GPA for the fall semester is 1.75


FREQUENCY DISTRIBUTION
• A table that lists observed events and the frequency of
occurrence of each observed event.
• Often used to organize raw data
• Raw Data is data that have not been organized or
manipulated in any manner.
• Large collections of raw data may not provide much
readily observable information.
• Formula for the weighted mean can be used to find the
mean in a frequency distribution by replacing the weights
to frequencies
WEIGHTED MEAN
EXAMPLE
The table lists the number of laptop computers owned by
families in each of 40 homes in a subdivision.
WEIGHTED MEAN
EXAMPLE
WEIGHTED MEAN
SOLUTION
The mean for the frequency distribution
∑(𝐱̅ ∙ 𝐟)
𝐱̅ത =
∑𝒇

0 ∙ 5 + 1 ∙ 12 + 2 ∙ 14 + 3 ∙ 3 + 4 ∙ 2 + 5 ∙ 3 + 6 ∙ 0 + 7 ∙ 1
𝐱̅ത =
40

79
𝐱̅ത = = 𝟏. 𝟗𝟖
40

The mean number of laptop computers per household is


1.975
MEASURES OF
DISPERSION
MEASURES OF DISPERSION

• Statistical values used to measure the dispersion of


data.
• Some characteristics of a set of data may not be
evident from an examination of averages.
• Average values do not reflect the spread or dispersion
of data.
RANGE

• The difference between the greatest data value and


the least data value.
• Since it depends only on the two most extreme
values, the range is a measure that is very sensitive.
• It does not indicate if the values are evenly distributed,
clustered in the middle or clustered in 1 or both
extremes.
RANGE
EXAMPLE
Find the range of the numbers of ounces dispensed by a
soda vending machine. Use the given table of data.
RANGE
SOLUTION
The greatest number of ounces dispensed is 10.07 and
the least is 5.85. The range of the number of ounces is

10.07 − 5.85 = 𝟒. 𝟐𝟐𝒐𝒛


STANDARD DEVIATION

• A measure that uses the amount by which each


individual data value deviates from the mean.
• If the value is large, the data are more dispersed.
Which is useful information in comparing two (or
more) data sets to determine which is more variable.
• Used to determine the consistency of a variable.
STANDARD DEVIATION

Procedure for computing a Standard Deviation


1. Determine the mean of the n numbers.
2. For each number, calculate the difference (deviation)
between the number and the mean of the numbers.
3. Calculate the square of each deviation and find the
sum of these squared deviations.
4. If the data is a population, then divide the sum by n.
If the data is a sample, then divide the sum by n-1
STANDARD DEVIATION
STANDARD DEVIATION
EXAMPLE
The following numbers were obtained by sampling a
population
2, 4, 7, 12, 15
Find the standard deviation of the sample
STANDARD DEVIATION
SOLUTION
STEP 1: The mean of the numbers is

2 + 4 + 7 + 12 + 15 40
𝑥= = =𝟖
5 5
STANDARD DEVIATION
SOLUTION
STEP 2: For each number, calculate the deviation
between the number and the mean.
STANDARD DEVIATION
SOLUTION
STEP 3: Calculate the square of each deviation in Step 2
and find the sum of these squared deviations.
STANDARD DEVIATION
SOLUTION
STEP 4: Because we have a sample of 𝑛 = 5 values,
divide the sum 118 by 𝑛 − 1, which is 4.

118
= 29.5
4
STANDARD DEVIATION
SOLUTION

The standard deviation of the sample is 𝐬 = 𝟐𝟗. 𝟓. To


the nearest hundredth, the standard deviation is 𝐬 = 𝟓. 𝟒𝟑
VARIANCE

• Is the square of the standard deviation.


• It does not have the same unit of measure as the
original data.
VARIANCE
VARIANCE
EXAMPLE
Find the variance of the previous example.
SOLUTION

In the previous example, we found 𝑠 = 29.5. The


variance is the square of the standard deviation. Thus,
the variance is 𝑠 2 = ( 29.5)2 = 𝟐𝟗. 𝟓
MEASURE OF
RELATIVE POSITION
MEASURE OF RELATIVE POSITION

• The position of a value, relative to other


values in a set of data.
Z-SCORE / STANDARD SCORE

• The number of standard deviations between a data


value and the mean.
• The number of standard deviations that the value is
above or below the mean.
Z-SCORE / STANDARD SCORE
Z-SCORE / STANDARD SCORE
EXAMPLE
Barry has taken two tests in his chemistry class. He
scored 72 on the first test, for which the mean of all
scores was 65 and the standard deviation was 8. He
received a 60 on a second test, for which the mean of all
scores was 45 and the standard deviation was 12. In
comparison to the other students, did Barry do better on
the first test or on the second test?
Z-SCORE / STANDARD SCORE
EXAMPLE

𝑥1 = 72 𝑥2 = 60
𝑥1 = 65 𝑥2 = 45
𝑠1 = 8 𝑠2 = 12
Z-SCORE / STANDARD SCORE
SOLUTION
Find the z-score for each test.

72−65
z1 = z72 = = 𝟎. 𝟖𝟕𝟓
8

60 − 45
z2 = z60 = = 𝟏. 𝟐𝟓
12
Z-SCORE / STANDARD SCORE
SOLUTION
Barry scored 0.875 standard deviation above the mean
on the first test and 1.25 standard deviations above the
mean on the second test.

These z-scores indicate that, in comparison to his


classmates. Barry scored better on the second test than
he did on the first test.
PERCENTILES

• the value below which a percentage of data falls.


• a number where a certain percentage of scores fall
below that number.
PERCENTILES
PERCENTILES
EXAMPLE
On a reading examination given to 900 students, Elaine’s
score of 602 was higher than the scores of 576 of the
students who took the examination. What is the
percentile for Elaine’s score?
PERCENTILES
SOLUTION
Find the z-score for each test.

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑜𝑤𝑒𝑟 𝑡ℎ𝑎𝑛 𝟔𝟎𝟐


𝒑𝟔𝟎𝟐 = ∙ 100
900

576
𝒑𝟔𝟎𝟐 = ∙ 100
900

𝒑𝟔𝟎𝟐 = 𝟔𝟒

Elaine’s score of 206, places her at the 64th percentile.


QUARTILES

• Three numbers 𝑄1 , 𝑄2 , 𝑄3 , that partitions a


ranked data set into four equal groups.
QUARTILES
EXAMPLE
The following table lists the calories per 100 millimeters
of 25 popular sodas.
QUARTILES
SOLUTION
STEP 1: Rank the data as shown.
QUARTILES
SOLUTION
STEP 2: The median of these 25 data values has a rank
of 13. Thus, the median is 43. The second quartile 𝑸𝟐 is
the median of the data, so 𝑸𝟐 = 𝟒𝟑
QUARTILES
SOLUTION
STEP 3: There are 12 data values less than the median
and 12 data values greater than the median. The first
quartile is the median of the data values less than the
median. Thus 𝑸𝟏 is the mean of the data values with
ranks of 6 and 7.

39 + 39
𝑄1 = = 𝟑𝟗
2
QUARTILES
SOLUTION
STEP 3: The third quartile is the median of the data
values greater than the median. Thus 𝑸𝟑 , is the mean of
the data values with ranks of 19 and 20

50 + 53
𝑄1 = = 𝟓𝟏. 𝟓
2
PERCENTILES & QUARTILES

MEDIAN
Q1 Q2 Q3
NORMAL
DISTRIBUTION
FREQUENCY DISTRIBUTION

• Displays large sets of data.


• Shows how often/frequently certain events occur.
• Each interval is called a class
• There is a low-class boundary and an upper-class
boundary, any data value that lies on a common
boundary is assigned to the higher class.
FREQUENCY DISTRIBUTION

• Graph that provides a pictorial view of how data are


distributed is called histogram
• A frequency distribution that lists percent of data in
each class is called relative frequency distribution.
FREQUENCY DISTRIBUTION
EXAMPLE
An Internet service provider has installed new computers.
To estimate the new download times its subscribers will
experience, the ISP surveyed 1000 of its subscribers to
determine the time required for each subscriber to
download a file from an internet site
FREQUENCY DISTRIBUTION
EXAMPLE
FREQUENCY DISTRIBUTION
EXAMPLE
NORMAL DISTRIBUTION

• is one of the most important statistical distribution of


data
• forms a bell-shaped curve that is symmetric about a
vertical line through the mean of the data.
NORMAL DISTRIBUTION
NORMAL DISTRIBUTION
EXAMPLE
• (a)a graph of a normal distribution with a mean of 5
• (b)the area of the shaded region is 0.159 units. This
region represents the fact the 15.9% of the data
values are greater than or equal to 10
• (b)Because the area under the curve is 1, the
unshaded region under the curve has area 1 −
0.159 = 0.841, representing the fact that 84.1% of the
data are less than 10.
NORMAL DISTRIBUTION
EXAMPLE
a b
THE EMPIRICAL RULE

• Describes the percent of data that lie within


1, 2 and 3 standard deviations of the mean in
a normal distribution.
THE EMPIRICAL RULE
THE EMPIRICAL RULE
EXAMPLE
A survey of 1000 U.S. gas stations found that the price
charged for a gallon of regular gas could be closely
approximated by a normal distribution with a mean of $3.10
and a standard deviation of $0.18. How many of the stations
charge?

a) Between $2.74 and $3.46 for a gallon of regular gas?


b) Less than $3.28 for a gallon of regular gas?
c) More than $3.46 for a gallon of regular gas?
THE EMPIRICAL RULE
SOLUTION
a) The $2.74 per gallon price is 2 standard deviations
below the mean. The $3.46 price is 2 standard
deviations above the mean.

In a normal distribution, 95% of all data lie within 2


standard deviations of the mean.

95% 1000 = 0.95 1000 = 𝟗𝟓𝟎


THE EMPIRICAL RULE
SOLUTION
a) Therefore, approximately 950 of the stations charge
between $2.74 and $3.46 for a gallon of regular gas.
THE EMPIRICAL RULE
SOLUTION
b) b. The $3.28 price is 1 standard deviation above
the mean.

In a normal distribution, 34% of all data lie between the


mean and 1 standard deviation above the mean.

34% 1000 = 0.34 1000 = 340


THE EMPIRICAL RULE
SOLUTION
b) .
THE EMPIRICAL RULE
SOLUTION
b) Thus, approximately 340 of the stations charge
between $3.10 and $3.28 for a gallon of regular
gasoline. Half of the 1000 stations, or 500 stations,
charge less than the mean.

340 + 500 = 𝟖𝟒𝟎


Therefore, about 840 of the stations charge less than
$3.28 for a gallon of regular gas.
THE EMPIRICAL RULE
SOLUTION
c) The $3.46 price is 2 standard deviations above the
mean. In a normal distribution, 95% of all data are
within 2 standard deviations of the mean.

This means that the other 5% of the data will lie either
above 2 standard deviations of the mean or below 2
standard deviations of the mean.
THE EMPIRICAL RULE
SOLUTION
c) We are interested only in the data that are more than
1
2 standard deviations above the mean, which is of
5
5%, or 2.5%, of the data.

2.5% 1000 = 0.025 1000 = 𝟐𝟓


THE EMPIRICAL RULE
SOLUTION
c) Thus about 25 of the stations charge more than
$3.46 for a gallon of regular gas.
LINEAR REGRESSION
AND CORRELATION
LINEAR REGRESSION

• Attempts to model the relationship between two


variables by fitting a linear equation to observed data.
• Determine whether there is a relationship between the
variables of interest.
• The independent variable is the x-axis and the
dependent variable is the y-axis.
LINEAR REGRESSION

• Does not necessarily imply that one variable causes


the other, but that there is some significant association
between the two variables.
• Data involving two variables are called bivariate data.
LINEAR REGRESSION

• Scatter Plot/Scatter Diagram is a helpful tool in


determining the strength of the relationship between
two variables.
• If there is no association between the variables (i.e.,
the scatterplot does not indicate any increasing or
decreasing trends), then fitting a linear regression
model to the data probably will not provide a useful
model.
LINEAR REGRESSION
LINEAR REGRESSION
LINEAR REGRESSION
LINEAR REGRESSION
LEAST-SQUARES LINE

• most common method for fitting a regression line


• the method calculates the best-fitting line for the
observed data by minimizing the sum of the squares
of the vertical deviations from each data point to the
line.
LINEAR CORRELATION COEFFICIENT

• Used to determine the strength of a linear relationship


between two variables.
• Denoted by the variable r
• If r is positive, the relationship between the variables
has a positive relationship.
• If r is negative, the relationship between the variables
has a negative relationship.
LINEAR CORRELATION COEFFICIENT
LINEAR CORRELATION COEFFICIENT
THANK YOU
Does anyone have any questions?
patrickjustin_ariado@tup.edu.ph
Fonts & colors used
This presentation has been made using the following fonts:

Passion One
(https://fonts.google.com/specimen/Passion+One)

Abel
(https://fonts.google.com/specimen/Abel)

#f9e9d9 #5b72b7

You might also like