Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views39 pages

Lecture 3

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 39

Data display

Amine Hadji

Leiden University

February 22, 2022


Outline

• Data

• Graphical summary

• Numerical summary

• How to display data badly


Quick reminder

• Observation: the value of a single measurement.

• Variable: is a characteristic that differs from one individual to another.

• Sample size: total number of observations.

• Sample data: measurements taken from a subset of a population.

• Parameter: summary measure for an entire population.

• Statistic: summary measure computed from sample data.


Types of variables
Types:
• Categorical variable: variable that not cannot be tied to a meaningful number
• nominal variable when the categories cannot be ordered
• ordinal variable when the categories can be ordered
• Quantitative variable: variable that can be tied to a meaningful number
• discrete variable when the number is counted
• continuous variable when the number is measured
Summarizing categorical variables
One variable:
Summarizing categorical variables

Two variables:
Pie chart vs Bar chart
Pie charts are the worst
Visualizing categorical variables
Only time pie charts work
Summary - Quantitative variable

The most interesting summary characteristics:


Summary - Quantitative variable

The most interesting summary characteristics:


• Location: mean - median - mode
Summary - Quantitative variable

The most interesting summary characteristics:


• Location: mean - median - mode

• Spread: range - inter-quantile range - standard deviation


Summary - Quantitative variable

The most interesting summary characteristics:


• Location: mean - median - mode

• Spread: range - inter-quantile range - standard deviation

• Shape: symmetry - skewedness - modes


Summary - Quantitative variable

The most interesting summary characteristics:


• Location: mean - median - mode

• Spread: range - inter-quantile range - standard deviation

• Shape: symmetry - skewedness - modes

• +1: Outliers
Five summary statistics
Boxplot
Boxplot in real life
Histogram - Stem-and-leaf - Dotplots
Bin size in histograms
When to use what?
If you are interested into
• Location: all of them (esp. boxplot)

• Shape: histogram - stem-and-leaf

• Individual values: stem-and-leaf - dotplot


When to use what?
If you are interested into
• Location: all of them (esp. boxplot)

• Shape: histogram - stem-and-leaf

• Individual values: stem-and-leaf - dotplot

If you want to
• compare groups: boxplot - histogram (two copies or side by side)

• identify outliers: all of them (esp. boxplot)

• study small datasets: stem-and-leaf - dotplot - boxplot

• study large datasets: histogram - boxplot


Mean & Median

• Mean - numerical average of the data:


n
1X
x̄ = xi
n
i=1

• Median - middle value splitting a data set evenly according to the order
Mean & Median

• Mean - numerical average of the data:


n
1X
x̄ = xi
n
i=1

• Median - middle value splitting a data set evenly according to the order

Influence on extreme value: the mean is more prone to get influenced:

1,2,3,4,5 vs 1,2,3,4,10
Shape on (mean - median)
Mean - Median - Mode
Quartiles - Quantiles

• Lower quartile median of the lower half of the ordered data values Q1 .

• Upper quartile median of the upper half of the ordered data values Q3 .
Quartiles - Quantiles

• Lower quartile median of the lower half of the ordered data values Q1 .

• Upper quartile median of the upper half of the ordered data values Q3 .

• k-th percentile k% of the data values at or below it and (100 − k)% percent at
or above it.
• Interquartile range (IQR): Q3 − Q1 .
Outliers

Possible sources:
• legitimate value; represents natural (high) variability

• mistake made during measurement or typing

• individual in a different group than the rest of the sample


Standard deviation
Formula for the sample variance and standard deviation:
Pn
2 (xi − x̄)2
s = i=1
n−1
sP
n 2
i=1 (xi − x̄)
s=
n−1
Scatter plot
Scatter plots help visualizing the relationship between two quantitative variables.
Scatter plots - Nonlinearity
Scatter plots - Outliers
Scatter plots - Two categories
Hiding/Showing small data

Showing very small amount of data Hiding the data:


Change of Scale
Incorrect labeling
Hiding the main effect
Jiggle Baseline

You might also like