Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Measures of Centraltendency

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

STATISTICS

Statistics is the discipline that concerns the collection, organization,


analysis, interpretation and presentation of data. In applying statistics to a
scientific, industrial, or social problem, it is conventional to begin with a
statistical population or a statistical model to be studied.
PROCESS of STATISTICS
Step 1: Identify the research objective
what questions are to be answered?
what groups should be studied?
Step 2: Collect the information needed
can you access the entire population?
how can you collect good sample?
what other methods are available and appropriate?
Step 3: Organize and summarize the information
use of descriptive statistics
use of visual methods such as charts and graphs
numeric methods such a calculations
Step 4: Draw conclusions from the information
Populations and Samples
The study of statistics revolves around the study of data sets.
This lesson describes two important types of data sets - populations and
samples. Along the way, we'll introduce simple random sampling, the main
method used in this tutorial to select samples.
The main difference between a population and
sample has to do with how observations are
assigned to the data set.

A population includes all of the elements from a


set of data.

A sample consists one or more observations drawn from the population.


Depending on the sampling method, a sample can have fewer observations
than the population, the same number of observations, or more observations.
More than one sample can be derived from the same population.
Other differences have to do with nomenclature, notation, and computations.
For example:
A measurable characteristic of a population, such as a mean or
standard deviation, is called a parameter; but a measurable characteristic of a
sample is called a statistic.

We will see in future lessons that the mean of a population is denoted by the
symbol μ; but the mean of a sample is denoted by the symbol x.
We will also learn in future lessons that the formula for the standard deviation
of a population is different from the formula for the standard deviation of a
sample.

Slovin's formula

- is used to calculate the sample size (n)


given the population size (N) and a margin
of error (e).
- it's a random sampling technique formula to estimate sampling size
-It is computed as n = N / (1+Ne2).
whereas:
n = no. of samples
N = total population
e = error margin / margin of error

When to use slovin's formula?

- If a sample is taken from a population, a formula must be used to take into


account confidence levels and margins of error. When taking statistical
samples, sometimes a lot is known about a population, sometimes a little and
sometimes nothing at all.

Example: A researcher plans to conduct a survey. If the population on High


City is 1,000,000 , find the sample size if the margin of error is 25%
First : Convert the Margin Error 25% by dividing it to 100
Given:
N = 1,000,000
e = 25% = 0.025

n = 1,000,000/(1 + 1,000,000 ·0.025² )


n = 1,000,000/(1 + 1,000,000 · 0.000625 )
n = 1,000,000/(1 + 625 )
n = 1,000,000/626
n = 1597.44 or approx. 1597

Example:

Suppose that you have a group of 1,000 city government employees and you
want to survey them to find out which tools are best suited to their jobs. You
decide that you are happy with a margin of error of 0.05. Using Slovin's
formula, you would be required to survey n = N / (1 + Ne^2) people:

1,000 / (1 + 1000 * 0.05 * 0.05) = 286


Types of Data
The word data refers to information that is collected and recorded.
It can be in form of numbers, words, measurements and much more

There are two types of data and these are qualitative data and quantitative
data. The difference between the two types of data is that quantitative data is
used to describe numerical information. For instance, the measurement of
temperature would fall under this kind of data.

On the other hand, qualitative data is used to describe information in words.


After collecting data, it needs to be organized hence the need to separate
grouped data from ungrouped data. Both are useful forms of data but the
difference between them is that ungrouped data is raw data. This means that it
has just been collected but not sorted into any group or classes. On the other
hand, grouped data is data that has been organized into groups from the raw
data.
Measures of Central Tendency
The mean of data indicate how the data are distributed around the
central part of the distribution. That is why the arithmetic numbers are also
known as measures of central tendencies.

Mean of Ungrouped / Raw data


The mean (or arithmetic mean) of n observations
(variates) x1, x2, x3, x4 … xn is given by:

Mean = sum of the variables / total number of variables

Example:
a student scored 80%, 72%, 50%, 64% and 74% marks in five subjects in an
examination. Find the mean percentage of marks obtain by the student?
Solution:
Here, the observations in percentage are:
x1 = 80, x2 = 72, x3 = 50, x4 = 64, x5 = 74

Therefore: mean = ( x1 + x2 + x3 + x4 + x5 ) / 5
mean = 340 / 5
mean = 68
Therefore, mean percentage of marks obtained by the student was 68%.

Mean of Arrayed Data:

If the values of the variable( observations or variates) be x1, x2, x3,…xn and
their corresponding frequencies are f1, f2, f3, … fn then:

Mean = ( x1f1 + x2f2 + x3f3 +…. Xnfn)


f1 + f2 + f3 + … fn
Example:
A class has 20 students whose ages (in years) are as follows.

14, 13, 14, 15, 12, 13, 13, 14, 15, 12, 15, 14, 12, 16, 13, 14, 14, 15, 16, 12
Find the mean age of the students of the class.

Solution:
In the data, only five different numbers appear respectively.
So, we write the frequencies of the variates as below.

Therefore: mean = (12x4 + 13x4 + 14x6 + 15x4 + 16x2)


4+4+6+4+2
mean = 13.8
Median of Ungrouped or Raw data

Median is the most middle value in the arrayed data. It means that
when the data are arranged, the median is the middle value

How to find the median?


Step 1: Given a set of data (e.g. wages),
arrange the numbers in ascending order
i.e. from smallest to largest.

Step 2: If the number of observations is odd, the number in the middle of the
list is the median. This can be found by taking the value of the (n+1)/2 -th term,
where n is the number of observations
Else, if the number of observations is even, then the median is the simple
average of the middle two numbers. In calculation, the median is the simple
average of the n/2 -th and the (n/2 + 1) -th terms..
Example:
Who’s In the Middle?

To compare its employees’ age profile against that of other companies in the
industry, your company, Middle World Co. has asked you to calculate the
median age of your fellow workers.

What is the median age of


these nine workers?
Solution:
Since there are an odd number of observations and the ages
are arranged from youngest to oldest (i.e. in ascending order),
the median age is the age of the

Example:
Suppose we have the monthly wages of 10 employees of a company. How
would you find the median wage?
Solution: Notice that we have arranged their wages above from lowest to
highest. This ranking will help us to determine the median. Using the method
introduced earlier, the median is computed by taking the simple average of the
(n/2)-th = (10/2)-th = 5th and (n/2 + 1)-th = (10/2+1)-th = 6th observations.
Mode of Ungrouped / Raw Data

To derive the mode, all we have to do is to determine the number


(if the indicator is quantitative) or choice (if the indicator is qualitative)
that occurs most frequently.

Example:

Unemployed, your friend seeks your advice on ways to find a job. To


help him, you would like to propose a few ways of searching for a job.
You feel that one way you can help is to suggest that he use the most
popular job search method. Therefore, you conduct an online survey
among 80 of your friends who have found jobs. Which is the most
popular method of job hunting?
Since the method ”Answered advertisements” is the most popular,
this is the mode.
Frequency Distribution Table
Frequency tells you how often something happened. The frequency
of an observation tells you the number of times the observation occurs in the
data. For example, in the following list of numbers, the frequency of the
number 9 is 5 (because it occurs 5 times):
1, 2, 3, 4, 6, 9, 9, 8, 5, 1, 1, 9, 9, 0, 6, 9.
Tables can show either categorical variables (sometimes called qualitative
variables) or quantitative variables (sometimes called numeric variables). You
can think of categorical variables as categories (like eye color or brand of dog
food) and quantitative variables as numbers.
Class ( K ) – a quantitative or qualitative category. A class may be a
range of numerical values (that acts like a “category”) or an actual
category.

K = 1 + 3.3 log ( n ) where: n – total value of frequency

Frequency ( f ) – the number of data values contained in a specific class.

Range ( R ) – it is the difference between the highest and the lowest data.

R = highest value of data – lowest value of data

Class size ( C ) - is the limit of which a class starts at a certain minimum data
and ends at a certain maximum data ( limits )

C=R/K where: R – range and K – is the no. of classes


Example :

The data below shows the mass of 40 students in a class. The


measurement is to the nearest kg.

55 70 57 73 55 59 64 72

60 48 58 54 69 51 63 78

75 64 65 57 71 77 76 62

49 66 62 76 61 63 63 76

52 76 71 61 53 56 67 71

Construct a frequency table for the data using an appropriate scale.


Solution:
Step 1:Find the range.
The range of a set of numbers is the difference between the least
number and the greatest number in the set

In this example, the greatest mass is 78 and the smallest mass is 48. The
range of the masses is then 77 – 48 = 29. The scale of the frequency table
must contain the range of masses. .

Step2: Class ( K )

K = 1 + 3.3 log ( n ) K = 1 + 3.3 log ( 40 ) K = 6.287


( K ) Approx. = 6

Step 3: Find the Class size ( C )

C=R/K C = 29 / 6 C = 4.833
Therefore : there will be 6 rows of classes with 5 value of interval ( limit )
Frequency Distribution Table

CLASSES FREQUENCY ( f )
48 – 52 4
53 – 57 7
58 – 62 7
63 – 67 8
68 – 72 6
73 - 77 8

The values 48, 53, 58 ,63, 68, and 73 are called the lower limits while the
values 52, 57, 62, 67, 72, and 77 are called the upper limits

Class boundary - is the midpoint of the upper class limit of one class and the
lower class limit of the subsequent class. Each class thus has an upper and a
lower class boundary. It must be noted that upper class boundary of one class
and the lower class boundary of the subsequent class are the same
Upper Class Boundary = Upper limit + 0.5
Lower Class Boundary = Lower limit – 0.5
Frequency Distribution Table

CLASSES Frequency ( f ) Class Boundary


48 – 52 4 47.5 – 52.5
53 – 57 7 52.5 – 57.5
58 – 62 7 57.5 – 62.5
63 – 67 8 62.5 – 67.5
68 – 72 6 67.5 – 72.5
73 - 77 8 72.5 – 77.5

Upper Class Boundary = Upper limit + 0.5


Lower Class Boundary = Lower limit – 0.5

@ class 1 Lower Class Boundary = 48 – 0.5 = 47.5


Upper Class Boundary = 52 + 0.5 = 52.5

Class Boundary can be denoted as CB and it is also know as the True Limit
The class midpoint or class mark ( X )
is a specific point in the center of the bins (categories) in a
frequency distribution table; It's also the center of a bar in a
histogram. It is defined as the average of the upper and lower class limits.

X = Upper Limit + Lower Limit


2
CLASSES (f) Class Boundary Mid Point ( X )

48 – 52 4 47.5 – 52.5 50

53 – 57 7 52.5 – 57.5 55

58 – 62 7 57.5 – 62.5 60

63 – 67 8 62.5 – 67.5 65

68 – 72 6 67.5 – 72.5 70

73 - 77 8 72.5 – 77.5 75
Relative Frequency ( %f )
It is the value assigned to each class as the proportion of the total
data set that belongs in the class.
%f = frequency of a class x 100 %
n
n = total data
CLASSES (f) CB (X) %f

48 – 52 4 47.5 – 52.5 50 10%

53 – 57 7 52.5 – 57.5 55 17.5%

58 – 62 7 57.5 – 62.5 60 17.5%

63 – 67 8 62.5 – 67.5 65 20%

68 – 72 6 67.5 – 72.5 70 15%

73 - 77 8 72.5 – 77.5 75 20%


total 40 total 100%
Cumulative frequency distribution – a distribution that
shows the number of observations less than or equal to a
specific value.

Less than cumulative frequency distribution:


It is obtained by adding successively the frequencies of all the previous
classes including the class against which it is written. The cumulate is started
from the lowest to the highest size.

More than cumulative frequency distribution:


It is obtained by finding the cumulate total of frequencies starting from the
highest to the lowest class.
Cumulative Frequency

CLASSES (f) CB (X) %f < cumf >cumf

48 – 52 4 47.5 – 52.5 50 10% 4 40

53 – 57 7 52.5 – 57.5 55 17.5% 11 36

58 – 62 7 57.5 – 62.5 60 17.5% 18 29

63 – 67 8 62.5 – 67.5 65 20% 26 22

68 – 72 6 67.5 – 72.5 70 15% 32 14

73 - 77 8 72.5 – 77.5 75 20% 40 8


total 40 total 100%

Less than cumulative frequency starts from the top to the bottom
Greater than cumulative frequency starts from the bottom to the top
Mean for Grouped Data ( x̅ )
- Mean or mean average is a value or proportion of a value that
Represents a given set of values.

σ 𝒇𝒙
x̅ = “summation of the product of the frequency and
𝒏
midpoint all over the total number of frequency.”
CLASSES (f) CB (X) %f fx

48 – 52 4 47.5 – 52.5 50 10% 200

53 – 57 7 52.5 – 57.5 55 17.5% 385

58 – 62 7 57.5 – 62.5 60 17.5% 420

63 – 67 8 62.5 – 67.5 65 20% 520

68 – 72 6 67.5 – 72.5 70 15% 420

73 - 77 8 72.5 – 77.5 75 20% 600


total 40 total 100% 2,545
σ 𝒇𝒙 𝟐,𝟓𝟒𝟓
x̅ = ; x̅ = ; x̅ = 63.625 ave
𝒏 𝟒𝟎

Therefore: the mean of the grouped data is 63.625 ave.

Step 1: complete the frequency distribution table to obtain the mid point ( x )
Step 2: by obtaining ( x ) multiply the frequency with x to get ( fx )
Step 3: after completing ( fx ) add all values to get σ 𝑓𝑥
Step 4: the sum σ 𝑓𝑥 is then divided by the total value of frequency ( n ).
Median for Grouped Data (x̃)
𝑛
( 2 − <𝑐𝑢𝑚𝑓 𝑏𝑒𝑓𝑜𝑟𝑒 )
x̃= CBlower + x Class size
𝑓𝑐𝑙𝑎𝑠𝑠

𝑛 40
Step 1: start with ∶ = 20
2 2
Step 2: locate where 20 is near to in the <cumulative frequency
in this case 20 is near to ( 18 ) therefore “assume” that the mean is
located at the 3rd class ( 58 – 62 ).
Step 3: proceed with the formula
CLASSES (f) CB (X) %f < cumf >cumf
48 – 52 4 47.5 – 52.5 50 10% 4 40
53 – 57 7 52.5 – 57.5 55 17.5% 11 36
58 – 62 7 57.5 – 62.5 60 17.5% 18 29
63 – 67 8 62.5 – 67.5 65 20% 26 22
68 – 72 6 67.5 – 72.5 70 15% 32 14
73 - 77 8 72.5 – 77.5 75 20% 40 8
total 40 total 100%
𝑛
( − <𝑐𝑢𝑚𝑓 𝑏𝑒𝑓𝑜𝑟𝑒)
x̃ = CBlower + 2
x Class size
𝑓𝑐𝑙𝑎𝑠𝑠
( 20 −11 )
x̃ = 57.5 + x5
7
x̃ = 63.93

Note that 63.96 is not located between the 3rd class having limits of
(58-62) therefore adjust the solution at take a step higher.
Note: x̃ is read as x - curl

CLASSES (f) CB (X) %f < cumf >cumf


48 – 52 4 47.5 – 52.5 50 10% 4 40
53 – 57 7 52.5 – 57.5 55 17.5% 11 36
58 – 62 7 57.5 – 62.5 60 17.5% 18 29
63 – 67 8 62.5 – 67.5 65 20% 26 22
68 – 72 6 67.5 – 72.5 70 15% 32 14
73 - 77 8 72.5 – 77.5 75 20% 40 8
total 40 total 100%
Because the frequency distribution table is consist of 6 classes,
Then the mean can be located at either the 3rd class or 4th class.
In the previous solution we tried to solve the mean at the 3rd class.

Median at the 4th class:

( 20−18 )
̃x = 62.5 + x5 x
̃ = 63.75
8

Note: 63.75 is located in the 4th class with limits ( 63 – 67 ) therefore the
median 63.75 is correct.
Mode for Grouped data

𝑑1
X̂ = CB lower limit +( ) x class size
𝑑1+𝑑2

Where: d1 = highest frequency – frequency before


d2 = highest frequency – frequency after

Note: to find CB lower limit , it is the highest frequency in the distribution


CLASSE (f) Class Boundary Mid Point ( X )
S
48 – 52 4 47.5 – 52.5 50
53 – 57 7 52.5 – 57.5 55
58 – 62 7 57.5 – 62.5 60
63 – 67 8 62.5 – 67.5 65
68 – 72 6 67.5 – 72.5 70
73 - 77 8 72.5 – 77.5 75
Mode for Grouped data

𝑑1
X̂ = CB lower limit + ( ) x class size
𝑑1+𝑑2

CB lower limit = 62.5 d1 = 8 – 7 = 1 d2 = 8 – 6 = 2

1
X̂ = 62.5 + ( )x5 mode: 64.17
1+2

You might also like