Initial Data Analysis: Central Tendency
Initial Data Analysis: Central Tendency
Initial Data Analysis: Central Tendency
Central Tendency
Outline
What is ‘central tendency’?
Classic measures
Mean, Median, Mode
What’s an ‘average’?
Properties of statistics
Sufficiency
Efficiency
Bias
Resistance
Resistant measures
Measures of Central Tendency
While distributions provide an overall picture of
some data set, it is sometimes desirable to represent
some property of the entire data set using a single
statistic
The first descriptive statistic we will discuss are
those used to indicate where the ‘center’ of the
distribution lies.
The expected value
It is not a value that has to be in the dataset itself
There are different measures of central tendency,
each with their own advantages and disadvantages
The Mode
The mode is simply the value of the relevant variable that
occurs most often (i.e., has the highest frequency) in the
sample
Disadvantages
Sometimes not very informative (e.g. cigarettes smoked
in a day)
Can change dramatically from sample to sample
Might be more than one (which is more representative?)
The Median
The median is the point corresponding to the score that lies in
the middle of the distribution (i.e., there are as many data
points above the median as there are below the median).
To find the median, the data points must first be sorted into
either ascending or descending numerical order.
The position of the median value can then be calculated using
the following formula:
Median Location = N + 1
2
Median
Advantage:
Resistant to outliers
Disadvantage:
May not be so informative:
(1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 )