This document provides an overview of R for data visualization and graphics. It discusses the main graphics systems in R - base graphics, lattice, and ggplot2. It provides examples of common graph types like histograms, scatterplots, boxplots, etc. using each system. It also covers specialized packages for tasks like visualizing missing data, correlations, categorical data, and model effects. Finally, it discusses interactive and web-based graphics in R using packages like iplots, rggobi, googleVis, and Shiny.
1 of 52
More Related Content
R for data visualization and graphics
1. R for Data Visualizaiton
and Graphics
Rob Kabacoff, Ph.D.
Vice President of Research
Source code for presentation: http://tinyurl.com/Kabacoff-CS20
2. R is a Statistical and Graphical
R Homepage - http://www.r-project.org/
Platform
CRAN Mirrors – http://cran.r-project.org/
•
•
•
•
•
•
•
Free
Open source
State-of-the-art data analysis
Platform for programming new methods
Runs on Windows, Linux, Mac OS X
Enormous user base
Reproducible research
2
4. Statistical Methods
Descriptive Statistics
Experimental Design
Linear , Generalized, Nonlinear,
and Hierarchical Models
Analysis of Categorical Data
Nonparametric Analysis
Survival Analysis
Latent Variable Models
Bayesian Models
Missing Values Analysis
Cluster Analysis
Decision Trees
Data Mining
Classical Test Theory
Item Response Theory
Correspondence Analysis
Multidimensional Scaling
Meta Analysis
Structural Equation Modeling
Complex Survey Design
Time Series Analysis
Longitudinal Analysis
Social Network Analysis
Study of Mediation and
Moderation
Power Analysis
Clinical Trials
and …
4
5. Given : depth
Graphs!
200
300
400
500
10 Meter Contour Spacing
165 170 175 180 185
-35
-25
-15
lat
-35
-25
Meters West
-15
165 170 175 180 185
A Topographic Map of Maunga Whau
600
100 200 300 400 500 600
100
165 170 175 180 185
0
long
0
200
400
600
800
Meters North
Sinc(
8
6
4
2
0
-2
-10
10
r)
5
Y
0
-5
0
X
-5
5
10 -10
Survival on the Titanic
Child
University Salaries by Discipline
Age
Adult
Pearson
residuals:
14.3
Male
No
200000
Yes
Salary
Sex
Survived
discipline
4.0
2.0
0.0
-2.0
-4.0
150000
Theoretical
Applied
Yes No
Female
100000
-11.1
p-value =
<2e-16
50000
0
20
Years Since Ph.D.
40
5
6. A High Level Tour
• General Systems
– base
– lattice
– ggplot2
• Interactive
–
–
–
–
iplots
rggobi
googleVis
Shiny
• Specialized
–
–
–
–
–
–
–
–
–
vcd (categorical data)
VIM (missing data)
likert (likert data)
scatterplot3d (3-D
scatterplot)
car (regression)
corrplot (correlations)
(decision trees)
(dendograms)
effects (glm/ANOVA)
6
7. 60
40
20
0
3 complete
graphics systems
Frequency
80
100
Base Graphics
50000
100000
150000
200000
Salary (dollars)
Lattice Graphics
ggplot2 Graphics
40
100
30
Frequency
Frequency
80
60
20
40
10
20
0
0
50000
50000
100000
150000
Salary (dollars)
200000
100000
150000
Salary (dollars)
200000
12. Monthly Airline Passengers
line charts
Passengers (K)
600
4000
UK Lung Cancer Deaths
3500
Total
Male
Female
500
400
300
200
3000
100
1950
1952
1954
1956
1958
1960
2500
Time
2000
Monthly Airline Passengers
500
1000
Passengers (K)
1500
600
1974
1975
1976
1977
year
1978
1979
1980
500
400
300
200
100
1950
1952
1954
1956
1958
1960
Time
12
13. time series
300
-60
Season Decomposition of a Time Series
300
Season Decomposition of a Time Series
0 20
remainder
60
200
trend
400
500
Season
Decomposition
-20 0 20
seasonal
60
100
data
500
Monthly Air Passengers
-40
Season Decomposition of a Time Series
1950
1952
1954
1956
1958
1960
time
Season Decomposition of a Time Series
13
14. scatterplots
10
15
High Density Scatterplot (n=10,000)
5
Iris Data
Y
7
0
5
-5
4
3
-10
Petal Length (cm)
6
2
-5
1
0
5
10
X
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
Sepal Length (cm)
14
19. lattice graphs
• expands base graphics to include trellis plots
• seeks to improve in graph defaults (symbols, axes, labels)
over base gaphics
• grouping
– color, fill, line type can be mapped to variable values
• facets
– subgroups can be plotted in an array based on the levels of
(usually) one or two variables
• customizable panel functions allow you fine grained control
of what is plotted in each facet
• comments
– clean and fast
– high degree of customization possible
23. ggplot2
• Grammar of Graphics
• graphs built up in layers by plotting "geoms"
• grouping
– color, fill, shape, size can be mapped to variable values
• facets
– subgroups can be plotted in an array based on the levels of
(usually) one or two variables
• comments
–
–
–
–
allows you to create novel plots
can be slow for large problems
no 3D graphs
HOT!
43. rggobi
• GGobi is an open source visualization program for
exploring high-dimensional data
• rggobi provides R command line interface to
GGobi
Installation
1. install GGobi: download from www.ggobi.org
2. in R: install.packages("rggobi")
see:
http://www.ggobi.org/rggobi/introduction.pdf
43
45. googleVis
• Provides access to Google Chart Tools
–
–
–
–
motion charts
annotated time lines
maps
other (e.g. line, bar, bubble, column, area, scatter,
candlestick, pie, org charts)
– https://developers.google.com/chart/
• output is html code containing data and references
to JavaScript functions hosted by Google
• an internet connection required to view the graphs
demo(WorldBank)
Hans Rosling in his TED talks
45
47. Shiny
• Package for building interative web
applications with R
– homepage- http://www.rstudio.com/shiny/
– examples- http://www.rstudio.com/shiny/showcase/
• Distribution
– self hosted (requires free Shiny Server on Linux
server)
pkgs <- c("Rcpp", "httpuv", "shiny")
– Rstudio hosted
install.packages(pkgs)
library(shiny)
– distribute as a package runExample("06_tabsets")
47