Lecture 3
Lecture 3
Lecture 3
Amine Hadji
Leiden University
• Data
• Graphical summary
• Numerical summary
Two variables:
Pie chart vs Bar chart
Pie charts are the worst
Visualizing categorical variables
Only time pie charts work
Summary - Quantitative variable
• +1: Outliers
Five summary statistics
Boxplot
Boxplot in real life
Histogram - Stem-and-leaf - Dotplots
Bin size in histograms
When to use what?
If you are interested into
• Location: all of them (esp. boxplot)
If you want to
• compare groups: boxplot - histogram (two copies or side by side)
• Median - middle value splitting a data set evenly according to the order
Mean & Median
• Median - middle value splitting a data set evenly according to the order
1,2,3,4,5 vs 1,2,3,4,10
Shape on (mean - median)
Mean - Median - Mode
Quartiles - Quantiles
• Lower quartile median of the lower half of the ordered data values Q1 .
• Upper quartile median of the upper half of the ordered data values Q3 .
Quartiles - Quantiles
• Lower quartile median of the lower half of the ordered data values Q1 .
• Upper quartile median of the upper half of the ordered data values Q3 .
• k-th percentile k% of the data values at or below it and (100 − k)% percent at
or above it.
• Interquartile range (IQR): Q3 − Q1 .
Outliers
Possible sources:
• legitimate value; represents natural (high) variability