Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Unit 3Data Visualization With Ggplot2

The document provides a comprehensive guide on creating data visualizations using the ggplot2 package in R. It covers the basics of initiating a plot, adding layers with various geom functions, and understanding aesthetic mappings for one-dimensional and two-dimensional plotting. Additionally, it discusses geometric objects, statistical transformations, position adjustments, and coordinate systems to enhance the visual representation of data.

Uploaded by

spatnaik9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 3Data Visualization With Ggplot2

The document provides a comprehensive guide on creating data visualizations using the ggplot2 package in R. It covers the basics of initiating a plot, adding layers with various geom functions, and understanding aesthetic mappings for one-dimensional and two-dimensional plotting. Additionally, it discusses geometric objects, statistical transformations, position adjustments, and coordinate systems to enhance the visual representation of data.

Uploaded by

spatnaik9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Visualization with ggplot2

Step1: Creating ggplot:


• You begin a plot with the function ggplot().It creates a coordinate system
that you can add layers to.
• The first argument of ggplot() is the dataset to use in the graph. So
ggplot(data = iris) creates an empty graph.
• Now complete your graph by adding one or more layers to ggplot().
• The function geom_point() adds a layer of points to your plot, which creates
a scatterplot.
• ggplot2 comes with many geom functions that each add a different type of
layer to a plot.
• Each geom function in ggplot2 takes a mapping argument. This defines how
variables in your dataset are mapped to visual properties.
• The mapping argument is always paired with aes(), and the x and y
arguments of aes() specify which variables to map to the x and y-axes.
• ggplot2 looks for the mapped variable in the data argument, in this case, iris.
Step2: Graphing Template
• For making graphs with ggplot2 the following code with a dataset, a geom
function, or a collection of mappings and extend this template to make
different types of graphs:
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
Aesthetic Mappings
• Graph plotting in R is of two types:
1. One-dimensional Plotting:
In one-dimensional plotting, we plot one variable at a time.
For example, we may plot a variable with the number of times each of its
values occurred in the entire dataset (frequency).
So, it is not compared to any other variable of the dataset. These are the 4
major types of graphs that are used for One-dimensional analysis –
• Five Point Summary
• Box Plotting
• Histograms
• Bar Plotting
2. Two-dimensional Plotting:
In two-dimensional plotting, we visualize and compare one variable
with respect to the other.
For example, in Air Quality dataset, we would like to compare how the
AQI varies with the temperature at a particular place.
Temperature and AQI are two different variables and we wish to see
how one changes with respect to the other.
These are the 3 major kinds of graphs used for such kinds of analysis
• Box Plotting
• Histograms
• Scatter plots
• You can add a third variable, like class, to a two-dimensional scatterplot by
mapping it to an aesthetic.
• An aesthetic is a visual property of the objects in your plot.
• Aesthetics include things like the size, the shape, or the color of your points.
You can display a point in different ways by changing the values of its
aesthetic properties.
• By using these aesthetic properties we change the levels of a point’s size,
shape, and color to make the point small, triangular, or blue:
Geometric Objects
• A geom is the geometrical object that a plot uses to represent data.
• Bar charts use bar geoms, line charts use line geoms, boxplots use boxplot
geoms, and so on. Scatterplots break the trend; they use the point geom.
• You can use different geoms to plot the same data. To change the geom in
your plot, change the geom function that you add to ggplot().
• ggplot() is providing two such geom’s- geom_point() and geom_smooth().
• Both plots contain the same x variable and the same y variable, and both
describe the same data. But the plots are not identical.
• Each plot uses a different visual object to represent the data. ggplot2 use
different geoms.
• geom_smooth() will draw a different line, with a different linetype, for each
unique value of the variable that you map to linetype =.
• To display multiple geoms in the same plot, add multiple geom functions to
ggplot():
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Statistical Transformations
• Bar charts seem simple, but they are interesting because they reveal
something subtle about plots.
• A basic bar chart, as drawn with geom_bar().
 Bar charts, histograms, and frequency polygons bin your
data and then plot bin counts, the number of points that fall in each bin.
 Smoothers fit a model to your data and then plot prediction from the model.
 Boxplots compute a robust summary of the distribution and display a
specially formatted box.
• The default value for stat is “count,” which means that geom_bar() uses
stat_count().
 You can re-create the previous plot using stat_count() instead of
geom_bar():

• You might use stat_summary(), which summarizes the y values for each
unique x value, to draw attention.
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.ymin = min,
fun.ymax = max,
fun.y = median)
Position Adjustments
 You can color a bar chart using either the color aesthetic, or
 by fill = x axes variable.
 clarity: the bars are automatically stacked.
 The stacking is performed automatically by the position adjustment specified
by the position argument. So ggplot is providing 3 such position options.
They are: "identity", "dodge" or "fill".
position = "identity“:
It will place each object exactly where it falls in the context of the graph.
This is not very useful for bars, because it overlaps them.
To see that overlapping we either need to make the bars slightly transparent
by setting alpha to a small value, or completely transparent by setting fill =
NA
position = "fill":
It works like stacking, but makes each set of stacked bars the same height.
This makes it easier to compare proportions across groups:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
position = "dodge":
It places overlapping objects directly beside one another. This makes it easier
to compare individual values:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "dodge")
Coordinate Systems
Coordinate systems are probably the most complicated part of ggplot2.
The default coordinate system is the Cartesian coordinate system where the x
and y position act independently to find the location of each point.
There are a number of other coordinate systems that are occasionally helpful:
coord_flip(): It switches the x- and y-axes, if you want horizontal
boxplots. It’s also useful for long labels.
• coord_quickmap() sets the aspect ratio correctly for maps. This is very
important if you’re plotting spatial data with ggplot2. Example: maps

• coord_polar() uses polar coordinates. Polar coordinates reveal an interesting


connection between a bar chart and a Coxcomb chart.

You might also like