Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
63 views

Scientific Data Visualization: Using Ggplot2

This document provides an introduction to scientific data visualization using ggplot2 in R. It discusses importing data, exploring the data structure, and creating basic visualizations like scatter plots, box plots, and line graphs by mapping variables to aesthetics and adding geometric objects. It demonstrates how to customize plots by adding titles, labels, themes and faceting. The goal is to quickly teach strong visualization techniques in R.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Scientific Data Visualization: Using Ggplot2

This document provides an introduction to scientific data visualization using ggplot2 in R. It discusses importing data, exploring the data structure, and creating basic visualizations like scatter plots, box plots, and line graphs by mapping variables to aesthetics and adding geometric objects. It demonstrates how to customize plots by adding titles, labels, themes and faceting. The goal is to quickly teach strong visualization techniques in R.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Scientific data visualization

Using ggplot2

Sacha Epskamp

University of Amsterdam
Department of Psychological Methods

11-04-2014
Hadley Wickham
Hadley Wickham
Evolution of data visualization
Scientific data visualization

I Data and analysis results are best communicated through


visualizations
I The leading software for statistical analyses is the
statistical programming language R
I The leading R extension for data visualization is ggplot2
I This presentation will quickly teach you strong visualization
techniques in R
First use of R

I We will use the environment RStudio for our work in R


I RStudio has 4 panels:
Console This is the actual R window, you can enter
commands here and execute them by
pressing enter
Source This is where we can edit scripts. It is where
you should always be working. Control-enter
sends selected codes to the console
Plots/Help This is where plots and help pages will be
shown
Workspace Shows which objects you currently have
I Anything following a # symbol is treated as a comment!
R workflow

I File → New File → R script


I Write codes in the R script
I Select codes and press control + enter to execute
them
Import data

File <- "http://sachaepskamp.com/files/OPdata.csv"


Data <- read.csv(File)
Look at data
head(Data)

## userID Measurement Gender Age Study Work Neuroticism


## 1 1 1 female 24 yes part time low
## 2 1 2 female 24 yes part time low
## 3 1 3 female 24 yes part time low
## 4 1 4 female 24 yes part time low
## 5 1 5 female 24 yes part time low
## 6 1 6 female 24 yes part time low
## Extraversion Openness Conscienciousness Agreeableness
## 1 low high high high
## 2 low high high high
## 3 low high high high
## 4 low high high high
## 5 low high high high
## 6 low high high high
## Stress
## 1 0.375
## 2 0.875
## 3 1.375
## 4 1.875
## 5 1.875
## 6 0.750
Look at data

names(Data)

## [1] "userID" "Measurement"


## [3] "Gender" "Age"
## [5] "Study" "Work"
## [7] "Neuroticism" "Extraversion"
## [9] "Openness" "Conscienciousness"
## [11] "Agreeableness" "Stress"
Look at data

str(Data)

## 'data.frame': 750 obs. of 12 variables:


## $ userID : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Measurement : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : Factor w/ 2 levels "female","male": 1 1 1 1 1
## $ Age : int 24 24 24 24 24 24 24 24 24 24 ...
## $ Study : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2
## $ Work : Factor w/ 3 levels "full time","none",..: 3 3
## $ Neuroticism : Factor w/ 2 levels "high","low": 2 2 2 2 2 2
## $ Extraversion : Factor w/ 2 levels "high","low": 2 2 2 2 2 2
## $ Openness : Factor w/ 2 levels "high","low": 1 1 1 1 1 1
## $ Conscienciousness: Factor w/ 2 levels "high","low": 1 1 1 1 1 1
## $ Agreeableness : Factor w/ 2 levels "high","low": 1 1 1 1 1 1
## $ Stress : num 0.375 0.875 1.375 1.875 1.875 ...
Look at data

View(Data)
ggplot2

I ggplot2 (Wickham, 2009) is an implementation of the


Grammer of Graphics (Wilkinson, Wills, Rope, Norton, &
Dubbs, 2006)
I Very different from base R plotting but also very flexible
and powerfull
I Uses data frames as input
I Data must be in long format
I This means that each row is an observation and each
column a variable
I Use reshape2 to get data in long format
I Also check out dplyr (http://sachaepskamp.com/
files/dplyrTutorial.html)
Basics of a plot

I A plot is a 2D repressentation of data, in which variables


can be visualized by, e.g.,:
I Horizontal placing
I Vertical placing
I Color
I Different Lines
I Line type
I Size
I Shape
I ...
I These are called aesthetics
I In ggplot2 we first set aesthetic mapping of our data
using aes() inside ggplot()
I Which variables will be mapped to which aesthetics?
install.packages("ggplot2")
library("ggplot2")
ggplot(Data, aes(x = Measurement, y = Stress))

## Error: No layers in plot


Geometrics

I Next, we define how these aesthetics are used and what


we are plotting:
I Lines
I Points
I Boxplots
I Curves
I ...
I These are called geometrics (geoms)
I We can add these to the plot using +
ggplot(Data, aes(x = Measurement, y = Stress)) +
geom_point()

2
Stress

0 10 20 30
Measurement
ggplot(Data, aes(x = Measurement, y = Stress)) +
geom_boxplot()

2
Stress

10 20
Measurement
ggplot(Data, aes(x = Measurement, y = Stress, group = Measurement)) +
geom_boxplot()

2
Stress

0 10 20 30
Measurement
ggplot(Data,
aes(x = Measurement, y = Stress, group = userID)
) + geom_line()

2
Stress

0 10 20 30
Measurement
ggplot(Data,
aes(x = Measurement, y = Stress, group = userID,
colour = Age)) + geom_line() +
facet_grid(Gender ~ .)

female
1

Age

50
0
Stress

40
3
30

20

male
1

0
0 10 20 30
Measurement
Store elements in an object:
g <- ggplot(Data,
aes(x = Measurement, y = Stress, group = userID,
colour = Age))
g <- g + geom_line()
g <- g +facet_grid(Gender ~ .)
Print the object to plot:
print(g)

female
1

Age

50
0
Stress

40
3
30

20

male
1

0
0 10 20 30
Measurement
I Many more graphical options can be added to ggplot calls
xlab Label of x-axis
ylab Label of y -axis
ggtitle Title of plot
theme Many, many graphical settings
theme_bw() A default black and white theme
I Use Google!
g + xlab("Time") + ylab("Amount of Stress") +
ggtitle("A very fancy plot") + theme_bw()

A very fancy plot


3

female
1
Age
Amount of Stress

50
0
40
3
30

20
2

male
1

0
0 10 20 30
Time
str(sumData)

## 'data.frame': 66095 obs. of 5 variables:


## $ user_id : num 1456 1713 1837 1845 21167 ...
## $ rating : num 0.482 -5.225 -6.639 -0.417 -3.008 ...
## $ date_of_birth: Date, format: "2001-06-03" ...
## $ grade : num 8 8 8 8 7 8 7 8 8 8 ...
## $ gender : chr "f" "m" "m" "f" ...
ggplot(sumData, aes(x = date_of_birth, y = rating)) +
geom_point()

20

10

0
rating

-10

-20

2002 2004 2006 2008 2010


date_of_birth
ggplot(sumData, aes(x = date_of_birth, y = rating)) +
stat_binhex()

20

10

count
800
0
600
rating

400

200

-10

-20

2002 2004 2006 2008 2010


date_of_birth
ggplot(sumData, aes(x = date_of_birth, y = rating,
colour = factor(grade))) + geom_point()

20

10
factor(grade)
1
2
3
0
rating

4
5
6
7
-10
8

-20

2002 2004 2006 2008 2010


date_of_birth
ggplot(sumData, aes(x = date_of_birth, y = rating,
colour = factor(grade), fill = factor(grade))) +
geom_point() + geom_smooth(col = "black", method = "lm")

20

10
factor(grade)
1
2
3
0
rating

4
5
6
7
-10
8

-20

2002 2004 2006 2008 2010


date_of_birth
ggplot(sumData, aes(x = date_of_birth, y = rating,
colour = factor(grade), fill = factor(grade))) +
geom_point() + geom_smooth(col = "black", method = "lm",
formula = y ~ poly(x, 2))

20

10
factor(grade)
1
2
3
0
rating

4
5
6
7
-10
8

-20

2002 2004 2006 2008 2010


date_of_birth
ggplot(sumData, aes(x = grade)) + geom_histogram()

12000

9000
count

6000

3000

2 4 6 8
grade
ggplot(sumData,
aes(x = grade, y = rating, colour = factor(grade))
) + geom_violin()

20

10
factor(grade)
1
2
3
0
rating

4
5
6
7
-10
8

-20

2 4 6 8
grade
Betweenness Closeness Strength Zhang Onnela
Xss
Xso
Xsb
Xli
Oun
Oin
Ocr
Oaa
Hsi
Hmo
Hga
Hfa
Ese
Efe
Ede
Ean
Cpr
Cpe
Cor
Cdi
Apa
Age
Afo
Afl

0 10 20 30 0.0020 0.0025 0.0030 0.0035 0.6 0.9 1.2 1.5 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15
Spel: mollenspel
20

Sequence
length
1
2
10
3
Rating

4
5
6
7
0 8
9

Apr 01 Apr 15 May 01 May 15 Jun 01


Date
Spel: mollenspel

20

Sequence
length
1
2
10 3
Rating

4
5
6
7
0
8
9

−10

Nov 03 Nov 10 Nov 17 Nov 24 Dec 01 Dec 08 Dec 15


Date
woord.pdf
8

4
userRating

−4

Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01


created
woord.pdf
Binned by item
8

Number of items made


500
Rating

400
300
0
200
100

−4

Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01


woord.pdf
Binned by item
8

Score − expected
0.5
Rating

0.0
0
−0.5

−1.0

−4

Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01


woord.pdf
Binned by item
8

Response time
40000
Rating

30000

0 20000

10000

−4

Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01


More ggplot2
More ggplot2
More ggplot2
likert package (Bryer & Speerschneider, 2013)
More ggplot2
sjPlot package (Lüdecke, 2014)
More ggplot2
sjPlot package (Lüdecke, 2014)
More ggplot2
sjPlot package (Lüdecke, 2014)
More ggplot2
sjPlot package (Lüdecke, 2014)
ggplot2 conclusion

I ggplot2 can create very complex visualizations with


minimal codes
I Automatizes convenient things such as
I Margins
I Legend
I Documentation: http://docs.ggplot2.org/
Thank you for your attention!
Exercises are on http://sachaepskamp.com/files/
ggplot2_exercises.html
References I

Bryer, J., & Speerschneider, K. (2013). likert: Functions to


analyze and visualize likert type items [Computer
software manual]. Retrieved from
http://CRAN.R-project.org/package=likert
(R package version 1.1)
Lüdecke, D. (2014). sjplot: sjplot - data visualization for
statistics in social science [Computer software manual].
Retrieved from
http://CRAN.R-project.org/package=sjPlot
(R package version 1.3)
Wickham, H. (2009). ggplot2: elegant graphics for data
analysis. Springer New York. Retrieved from
http://had.co.nz/ggplot2/book
Wilkinson, L., Wills, D., Rope, D., Norton, A., & Dubbs, R.
(2006). The grammar of graphics. Springer.

You might also like