Data Visualization Using Ggplot2
Data Visualization Using Ggplot2
Sayantan Banerjee
Data Visualization
Creating visualizations or graphical representations of data is a key step in being able to communicate information
and findings to others.
But improper or bad visualizations can cause harm.
Need to produce proper and nice visualizations.
Grammar of graphics
ggplot2
Grammar of Graphics
Components of a plot include
the data!
geometric objects (dots, circles, lines, etc.) appearing on the plot
a set of mappings from variables in the data to the aesthetics (appearance) of the geometric objects
statistical transformations used to calculate the data values used in the plot
position adjustments for locating each geometric object on the plot
scales (e.g., range of values) for each aesthetic mapping used
coordinate system used to organize the geometric objects
the facets or groups of data shown in different plots
These components are further organized into layers, where each layer has a single geometric object, statistical
transformation, and position adjustment.
Following this grammar, you can think of each plot as a set of layers of images, where each image’s appearance is
based on some aspect of the data set.
Pre-requisites
You may install ggplot2 separately, but it is better to install the larger package ‘tidyverse’
hwy : a car’s fuel efficiency on the highway, in miles per gallon (mpg).
A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same
distance.
mpg
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l~ f 18 29 p comp~
## 2 audi a4 1.8 1999 4 manual~ f 21 29 p comp~
## 3 audi a4 2 2008 4 manual~ f 20 31 p comp~
## 4 audi a4 2 2008 4 auto(a~ f 21 30 p comp~
## 5 audi a4 2.8 1999 6 auto(l~ f 16 26 p comp~
## 6 audi a4 2.8 1999 6 manual~ f 18 26 p comp~
## 7 audi a4 3.1 2008 6 auto(a~ f 18 27 p comp~
## 8 audi a4 quat~ 1.8 1999 4 manual~ 4 18 26 p comp~
## 9 audi a4 quat~ 1.8 1999 4 auto(l~ 4 16 25 p comp~
## 10 audi a4 quat~ 2 2008 4 manual~ 4 20 28 p comp~
## # ... with 224 more rows
Basics of ggplot2
For a basic plot, you need three primary steps
Create a blank canvas for your plot, using the ggplot() call
Specify aesthetic mappings, which specifies how you want to map variables to visual aspects.
Add layers of geometric objects
Note: We have added the geom layer you used the addition (+) operator. New layers are always added using + to add
onto your visualization.
Aesthetic mappings
An aesthetic is a visual property of the objects in your plot.
Aesthetics include things like the size, the shape, or the color of your points.
You can display a point in different ways by changing the values of its aesthetic properties.
All aesthetics for a plot are specified in the aes() function call
Each geom layer can have its own aes specifications.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot
red:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "red")
Geometric Objects
ggplot2 supports different types of geometric objects, including:
The aesthetics for each geom can be different, so you could show multiple lines on the same plot (or with different
colors, styles, etc).
It is also possible to give each geom a different data argument, so that you can show multiple data sets in the same
plot.
If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer.
It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to
display different aesthetics in different layers.
We can use the same idea to specify different data for each layer.
Let us say that we shall display the smooth line for just a subset of the mpg dataset, the subcompact cars.
The local data argument in geom_smooth() overrides the global data argument in ggplot() for that layer only.
Position adjustments
Each geom also has a default position adjustment which specifies a set of “rules” as to how different components
should be positioned relative to each other.
This position is noticeable in a geom_bar if you map a different variable to the color visual characteristic.
There are a number of other coordinate systems that are occasionally helpful.
coord_flip() switches the x and y axes. This is useful (for example), if you want horizontal boxplots. It’s also useful
for long labels: it’s hard to get them to fit without overlapping on the x-axis.
coord_fixed a cartesian system with a “fixed” aspect ratio (e.g., 1.78 for a “widescreen” plot)
coord_polar a plot using polar coordinates
coord_quickmap a coordinate system that approximates a good aspect ratio for maps.
Labels
We add labels with the labs() function. This example adds a plot title
Scales
Scales control the mapping from data values to things that you can perceive. ggplot2 automatically adds scales for you. For
example, when you type
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
Breaks controls the position of the ticks, or the values associated with the keys.
Labels controls the text label associated with each tick/key. The most common use of breaks is to override the
default choice:
Facets
One way to add additional variables is with aesthetics.
Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
one subset of the data.
To facet your plot by a single variable, use facet_wrap() .
The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name
(here “formula” is the name of a data structure in R, not a synonym for “equation”).
The variable that you pass to facet_wrap() should be discrete.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
To facet your plot on the combination of two variables, add facet_grid() to your plot call.
The first argument of facet_grid() is also a formula. This time the formula should contain two variable names
separated by a ~ .
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
Saving plots
Saving your plots is an important aspect, specially if you are going to use them in your analysis reports.
For plots generated using ggplot2, we can use ggsave() to save plots.
Look into the help function for ggsave using ?ggsave for finer aspects.
ggsave() will save the most recent plot to the disk.
## Saving 7 x 5 in image