11 Data Visualization
11 Data Visualization
DATA VISUALIZATION
• The basic structure for ggplot2 starts with the ggplot function,
which takes the data as its first argument. After that, layers can
be added using the + symbol.
https://en.wikipedia.org/wiki/Frank_Anscombe
Module Code & Module Title Slide Title SLIDE 10
Anscombe's quartet
https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Module Code & Module Title Slide Title SLIDE 11
Anscombe's quartet
4 Clearly shows that one outlier is enough to produce a high correlation coefficient,
even though the relationship between the two variables is not linear.
https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Module Code & Module Title Slide Title SLIDE 12
3
Line and Path plots
• Line and path plots are typically used for time series data.
• Line plots join the points from left to right, while path plots join
them in the order that they appear in the dataset.
• Line plots usually have time on the x-axis, showing how a single
variable has changed over time. Path plots show how two variables
have simultaneously changed over time, with time encoded in the
way that observations are connected.
ggplot(diamonds, aes(x=carat)) +
geom_freqpoly()
https://towardsdatascience.com/understanding-boxplots-
Module Code & Module Title 5e2df7bcbd51
Slide Title SLIDE 23
Boxplot
https://r4ds.had.co.nz/exploratory-data-analysis.html#missing-values-2
ggplot(data=diamonds, aes(x=carat)) +
geom_histogram(col="white", fill="blue")
• Facetting creates tables of graphics by splitting the data into subsets and
displaying the same graph for each subset.
• To facet a plot you simply add a facetting specification with
facet_wrap(), which takes the name of a variable preceded by ˜.
ggplot(diamonds, aes(x=carat)) +
geom_histogram() +
facet_wrap(~cut)