Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

Learn R Cheatsheet

Uploaded by

llanos.luis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Learn R Cheatsheet

Uploaded by

llanos.luis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

lOMoARcPSD|9624613

Learn R Cheatsheet

Regression Modelling (Australian National University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

RStudio IDE : : CHEAT SHEET


Documents and Apps Write Code R Support Pro Features
Open Shiny, R Markdown, Navigate Open in new Save Find and Compile as Run Import data History of past Display .RPres slideshows Share Project Active shared
knitr, Sweave, LaTeX, .Rd files tabs window replace notebook selected with wizard commands to File > New File > with Collaborators collaborators
and more in Source Pane code run/copy R Presentation Start new R Session
T H J in current project
Check Render Choose Choose Insert T H J
Close R
spelling output output output code Session in
format location chunk project
Select
Cursors of Re-run Source with or Show file Load Save Delete all Search inside R Version
shared users previous code without Echo outline workspace workspace saved objects environment
Jump to Jump Run Publish Show file PROJECT SYSTEM
previous to next selected to server outline Multiple cursors/column selection Choose environment to display from Display objects
chunk chunk lines list of parent environments as list or grid File > New Project
with Alt + mouse drag.
Code diagnostics that appear in the margin. RStudio saves the call history,
Access markdown guide at
Help > Markdown Quick Reference Hover over diagnostic symbols for details. workspace, and working
directory associated with a
Syntax highlighting based Name of
Jump to Set knitr Run this and Run this project. It reloads each when
on your file's extension current project
chunk chunk all previous code chunk you re-open a project.
options code chunks Tab completion to finish
function names, file paths, Displays saved objects by View in data View function RStudio opens plots in a dedicated Plots pane
arguments, and more. type with short description viewer source code

Multi-language code
snippets to quickly use Navigate Open in Export Delete Delete
common blocks of code. recent plots window plot plot all plots
RStudio recognizes that files named app.R,
server.R, ui.R, and global.R belong to a shiny app Jump to function in file Change file type GUI Package manager lists every installed package
Create Upload Delete Rename Change
folder file file file directory
Install Update Create reproducible package
Run Choose Publish to Manage Path to displayed directory Packages Packages library for your project
app location to shinyapps.io publish Working Maximize,
view app or server accounts Directory minimize panes
Press ! to see Drag pane A File browser keyed to your working directory.
command history boundaries Click on file or directory name to open. Click to load package with Package Delete
library(). Unclick to detach version from
package with detach() installed library
Debug Mode Version Control with Git or SVN RStudio opens documentation in a dedicated Help pane
Launch debugger Open traceback to examine Turn on at Tools > Project Options > Git/SVN
Open with debug(), browser(), or a breakpoint. RStudio will open the mode from origin the functions that R called Push/Pull View
Stage Show file Commit
debugger mode when it encounters a breakpoint while executing code. of error before the error occurred diff staged files to remote History
files:
Home page of Search within Search for
Click next to A
• Added helpful links help file help file
line number to D• Deleted
add/remove a M• Modified Viewer Pane displays HTML content, such as Shiny apps,
R• Renamed Open shell to current
breakpoint. RMarkdown reports, and interactive visualizations
?• Untracked type commands branch

Highlighted
line shows Stop Shiny Publish to shinyapps.io, Refresh
where
execution has
Package Writing app rpubs, RSConnect, …

paused File > New Project > View(<data>) opens spreadsheet like view of data set
New Directory > R Package
Run commands in Examine variables Select function Step through Step into and Resume Quit debug Turn project into package,
environment where in executing in traceback to code one line out of functions execution mode Enable roxygen documentation with
execution has paused environment debug at a time to run Tools > Project Options > Build Tools
Roxygen guide at Filter rows by value Sort by Search
Help > Roxygen Quick Reference or value range values for value

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at www.rstudio.com • RStudio IDE 0.99.832 • Updated: 2016-01
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

1 LAYOUT Windows/Linux Mac 4 WRITE CODE Windows /Linux Mac WHY RSTUDIO SERVER PRO?
Move focus to Source Editor Ctrl+1 Ctrl+1 Attempt completion Tab or Ctrl+Space Tab or Cmd+Space RSP extends the the open source server with a
Move focus to Console Ctrl+2 Ctrl+2 Navigate candidates !/$ !/$ commercial license, support, and more:
Move focus to Help Ctrl+3 Ctrl+3 Accept candidate Enter, Tab, or # Enter, Tab, or #
• open and run multiple R sessions at once
Show History Ctrl+4 Ctrl+4 Dismiss candidates Esc Esc
Undo Ctrl+Z Cmd+Z
• tune your resources to improve performance
Show Files Ctrl+5 Ctrl+5 • edit the same project at the same time as others
Show Plots Ctrl+6 Ctrl+6 Redo Ctrl+Shift+Z Cmd+Shift+Z
Show Packages Ctrl+7 Ctrl+7 Cut Ctrl+X Cmd+X • see what you and others are doing on your server
Show Environment Ctrl+8 Ctrl+8 Copy Ctrl+C Cmd+C • switch easily from one version of R to a different version
Show Git/SVN Ctrl+9 Ctrl+9 Paste Ctrl+V Cmd+V • integrate with your authentication, authorization, and audit practices
Show Build Ctrl+0 Ctrl+0 Select All Ctrl+A Cmd+A Download a free 45 day evaluation at
Delete Line Ctrl+D Cmd+D www.rstudio.com/products/rstudio-server-pro/
2 RUN CODE Windows/Linux Mac Select Shift+[Arrow] Shift+[Arrow]
Search command history Ctrl+! Cmd+! Select Word Ctrl+Shift+ "/# Option+Shift+ "/# 5 DEBUG CODE Windows/Linux Mac
Navigate command history !/$ !/$ Select to Line Start Alt+Shift+" Cmd+Shift+" Toggle Breakpoint Shift+F9 Shift+F9
Move cursor to start of line Home Cmd+" Select to Line End Alt+Shift+# Cmd+Shift+# Execute Next Line F10 F10
Move cursor to end of line End Cmd+ # Select Page Up/Down Shift+PageUp/Down Shift+PageUp/Down Step Into Function Shift+F4 Shift+F4
Change working directory Ctrl+Shift+H Ctrl+Shift+H Select to Start/End Shift+Alt+!/$ Cmd+Shift+!/$ Finish Function/Loop Shift+F6 Shift+F6
Interrupt current command Esc Esc Delete Word Left Ctrl+Backspace Ctrl+Opt+Backspace Continue Shift+F5 Shift+F5
Clear console Ctrl+L Ctrl+L Delete Word Right Option+Delete Stop Debugging Shift+F8 Shift+F8
Quit Session (desktop only) Ctrl+Q Cmd+Q Delete to Line End Ctrl+K
Restart R Session Ctrl+Shift+F10 Cmd+Shift+F10 Delete to Line Start Option+Backspace Windows/Linux
6 VERSION CONTROL Mac
Run current line/selection Ctrl+Enter Cmd+Enter Indent Tab (at start of line) Tab (at start of line) Show diff Ctrl+Alt+D Ctrl+Option+D
Run current (retain cursor) Alt+Enter Option+Enter Outdent Shift+Tab Shift+Tab Commit changes Ctrl+Alt+M Ctrl+Option+M
Run from current to end Ctrl+Alt+E Cmd+Option+E Yank line up to cursor Ctrl+U Ctrl+U Scroll diff view Ctrl+!/$ Ctrl+!/$
Run the current function Ctrl+Alt+F Cmd+Option+F Yank line after cursor Ctrl+K Ctrl+K Stage/Unstage (Git) Spacebar Spacebar
Source a file Ctrl+Alt+G Cmd+Option+G Insert yanked text Ctrl+Y Ctrl+Y Stage/Unstage and move to next Enter Enter
Source the current file Ctrl+Shift+S Cmd+Shift+S Insert <- Alt+- Option+-
Source with echo Ctrl+Shift+Enter Cmd+Shift+Enter Insert %>% Ctrl+Shift+M Cmd+Shift+M
7 MAKE PACKAGES Windows/Linux Mac
Show help for function F1 F1
3 NAVIGATE CODE Windows /Linux Mac Build and Reload Ctrl+Shift+B Cmd+Shift+B
Show source code F2 F2
Goto File/Function Ctrl+. Ctrl+. Load All (devtools) Ctrl+Shift+L Cmd+Shift+L
New document Ctrl+Shift+N Cmd+Shift+N
Fold Selected Alt+L Cmd+Option+L Test Package (Desktop) Ctrl+Shift+T Cmd+Shift+T
New document (Chrome) Ctrl+Alt+Shift+N Cmd+Shift+Opt+N
Unfold Selected Shift+Alt+L Cmd+Shift+Option+L Test Package (Web) Ctrl+Alt+F7 Cmd+Opt+F7
Open document Ctrl+O Cmd+O
Fold All Alt+O Cmd+Option+O Check Package Ctrl+Shift+E Cmd+Shift+E
Save document Ctrl+S Cmd+S
Unfold All Shift+Alt+O Cmd+Shift+Option+O Close document Ctrl+W Cmd+W Document Package Ctrl+Shift+D Cmd+Shift+D
Go to line Shift+Alt+G Cmd+Shift+Option+G Close document (Chrome) Ctrl+Alt+W Cmd+Option+W
Jump to Shift+Alt+J Cmd+Shift+Option+J Close all documents Ctrl+Shift+W Cmd+Shift+W 8 DOCUMENTS AND APPS Windows/Linux Mac
Switch to tab Ctrl+Shift+. Ctrl+Shift+. Extract function Ctrl+Alt+X Cmd+Option+X Preview HTML (Markdown, etc.) Ctrl+Shift+K Cmd+Shift+K
Previous tab Ctrl+F11 Ctrl+F11 Extract variable Ctrl+Alt+V Cmd+Option+V Knit Document (knitr) Ctrl+Shift+K Cmd+Shift+K
Next tab Ctrl+F12 Ctrl+F12 Reindent lines Ctrl+I Cmd+I Compile Notebook Ctrl+Shift+K Cmd+Shift+K
First tab Ctrl+Shift+F11 Ctrl+Shift+F11 (Un)Comment lines Ctrl+Shift+C Cmd+Shift+C Compile PDF (TeX and Sweave) Ctrl+Shift+K Cmd+Shift+K
Last tab Ctrl+Shift+F12 Ctrl+Shift+F12 Reflow Comment Ctrl+Shift+/ Cmd+Shift+/ Insert chunk (Sweave and Knitr) Ctrl+Alt+I Cmd+Option+I
Navigate back Ctrl+F9 Cmd+F9 Reformat Selection Ctrl+Shift+A Cmd+Shift+A Insert code section Ctrl+Shift+R Cmd+Shift+R
Navigate forward Ctrl+F10 Cmd+F10 Select within braces Ctrl+Shift+E Ctrl+Shift+E Re-run previous region Ctrl+Shift+P Cmd+Shift+P
Jump to Brace Ctrl+P Ctrl+P Show Diagnostics Ctrl+Shift+Alt+P Cmd+Shift+Opt+P Run current document Ctrl+Alt+R Cmd+Option+R
Select within Braces Ctrl+Shift+Alt+E Ctrl+Shift+Option+E Transpose Letters Ctrl+T
Run from start to current line Ctrl+Alt+B Cmd+Option+B
Use Selection for Find Ctrl+F3 Cmd+E Move Lines Up/Down Alt+!/$ Option+!/$
Run the current code section Ctrl+Alt+T Cmd+Option+T
Find in Files Ctrl+Shift+F Cmd+Shift+F Copy Lines Up/Down Shift+Alt+!/$ Cmd+Option+!/$
Run previous Sweave/Rmd code Ctrl+Alt+P Cmd+Option+P
Find Next Win: F3, Linux: Ctrl+G Cmd+G Add New Cursor Above Ctrl+Alt+Up Ctrl+Option+Up
Find Previous W: Shift+F3, L: Cmd+Shift+G Add New Cursor Below Ctrl+Alt+Down Ctrl+Option+Down Run the current chunk Ctrl+Alt+C Cmd+Option+C
Jump to Word Ctrl+ "/# Option+ "/# Move Active Cursor Up Ctrl+Alt+Shift+Up Ctrl+Option+Shift+Up Run the next chunk Ctrl+Alt+N Cmd+Option+N
Jump to Start/End Ctrl+!/$ Cmd+!/$ Move Active Cursor Down Ctrl+Alt+Shift+Down Ctrl+Opt+Shift+Down Sync Editor & PDF Preview Ctrl+F8 Cmd+F8
Toggle Outline Ctrl+Shift+O Cmd+Shift+O Find and Replace Ctrl+F Cmd+F Previous plot Ctrl+Alt+F11 Cmd+Option+F11
Use Selection for Find Ctrl+F3 Cmd+E Next plot Ctrl+Alt+F12 Cmd+Option+F12
Replace and Find Ctrl+Shift+J Cmd+Shift+J Show Keyboard Shortcuts Alt+Shift+K Option+Shift+K
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at www.rstudio.com • RStudio IDE 0.1.0 • Updated: 2017-09
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Vectors Programming
Base R Creating Vectors For Loop While Loop
Cheat Sheet c(2, 4, 6) 2 4 6
Join elements into
for (variable in sequence){ while (condition){
a vector

Do something Do something
An integer
Getting Help 2:6 2 3 4 5 6
sequence } }

Accessing the help files seq(2, 3, by=0.5) 2.0 2.5 3.0


A complex Example Example
sequence
?mean for (i in 1:4){ while (i < 5){
Get help of a particular function. rep(1:2, times=3) 1 2 1 2 1 2 Repeat a vector
j <- i + 10 print(i)
help.search(‘weighted mean’)
Repeat elements print(j) i <- i + 1
Search the help files for a word or phrase. rep(1:2, each=3) 1 1 1 2 2 2
of a vector
help(package = ‘dplyr’) } }
Find help for a package. Vector Functions
More about an object If Statements Functions
sort(x) rev(x)
Return x sorted. Return x reversed. if (condition){ function_name <- function(var){
str(iris)
table(x) unique(x) Do something
Get a summary of an object’s structure. Do something
See counts of values. See unique values. } else {
class(iris) Do something different return(new_variable)
Find the class an object belongs to. } }
Selecting Vector Elements
Example Example
Using Packages By Position if (i > 3){ square <- function(x){
install.packages(‘dplyr’) x[4] The fourth element. print(‘Yes’)
squared <- x*x
Download and install a package from CRAN. } else {
print(‘No’) return(squared)
library(dplyr) x[-4] All but the fourth.
} }
Load the package into the session, making all
its functions available to use. x[2:4] Elements two to four.
Reading and Writing Data Also see the readr package.
dplyr::select All elements except
x[-(2:4)] Input Ouput Description
Use a particular function from a package. two to four.

Elements one and Read and write a delimited text


data(iris) df <- read.table(‘file.txt’) write.table(df, ‘file.txt’)
x[c(1, 5)] file.
five.
Load a built-in dataset into the environment.
By Value Read and write a comma
Working Directory x[x == 10]
Elements which df <- read.csv(‘file.csv’) write.csv(df, ‘file.csv’) separated value file. This is a
special case of read.table/
are equal to 10.
write.table.
getwd()
All elements less
Find the current working directory (where x[x < 0]
than zero. Read and write an R data file, a
inputs are found and outputs are sent). load(‘file.RData’) save(df, file = ’file.Rdata’)
file type special for R.
x[x %in% Elements in the set
setwd(‘C://file/path’) c(1, 2, 5)] 1, 2, 5.
Change the current working directory.
Named Vectors Greater than
a == b Are equal a > b Greater than a >= b is.na(a) Is missing
or equal to
Conditions
Use projects in RStudio to set the working Element with Less than or
x[‘apple’] a != b Not equal a < b Less than a <= b is.null(a) Is null
directory to the folder you are working in. name ‘apple’. equal to

RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • mhairihmcneill@gmail.com Learn more at web page or vignette • package version • Updated: 3/15
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Types Matrices Strings Also see the stringr package.


m <- matrix(x, nrow = 3, ncol = 3) paste(x, y, sep = ' ')
Converting between common data types in R. Can always go Join multiple vectors together.
Create a matrix from x.
from a higher value in the table to a lower value.
paste(x, collapse = ' ') Join elements of a vector together.
m[2, ] - Select a row t(m)
grep(pattern, x) Find regular expression matches in x.
as.logical TRUE, FALSE, TRUE Boolean values (TRUE or FALSE). Transpose
m %*% n gsub(pattern, replace, x) Replace matches in x with a string.
m[ , 1] - Select a column
as.numeric 1, 0, 1 Integers or floating point Matrix Multiplication toupper(x) Convert to uppercase.
numbers.
solve(m, n)
tolower(x) Convert to lowercase.
Character strings. Generally m[2, 3] - Select an element Find x in: m * x = n
as.character '1', '0', '1' nchar(x)
preferred to factors. Number of characters in a string.

'1', '0', '1', Character strings with preset


as.factor
levels: '1', '0' levels. Needed for some Lists Factors
statistical models.

l <- list(x = 1:5, y = c('a', 'b')) factor(x) cut(x, breaks = 4)


A list is a collection of elements which can be of different types. Turn a vector into a factor. Can
Maths Functions set the levels of the factor and
Turn a numeric vector into a
factor by ‘cutting’ into
log(x) Natural log. sum(x) Sum. l[[2]] l[1] l$x l['y'] the order. sections.
New list with New list with
exp(x) Exponential. mean(x) Mean. Second element Element named
only the first only element
max(x) Largest element. median(x) Median.
of l.
element.
x.
named y. Statistics
min(x) Smallest element. quantile(x) Percentage
lm(y ~ x, data=df) prop.test
quantiles. Also see the t.test(x, y)
dplyr package. Data Frames Linear model. Perform a t-test for Test for a
round(x, n) Round to n decimal rank(x) Rank of elements. difference
difference between
places. glm(y ~ x, data=df) between
df <- data.frame(x = 1:3, y = c('a', 'b', 'c')) means.
Generalised linear model. proportions.
signif(x, n) Round to n var(x) The variance. A special case of a list where all elements are the same length.
significant figures. pairwise.t.test
List subsetting summary aov
Perform a t-test for
cor(x, y) Correlation. sd(x) The standard Get more detailed information Analysis of
x y paired data.
deviation. out a model. variance.
df$x df[[2]]
1 a
Variable Assignment Distributions
2 b Understanding a data frame
> a <- 'apple' Random Density Cumulative
Quantile
> a See the full data Variates Function Distribution
3 c View(df)
[1] 'apple' frame. Normal rnorm dnorm pnorm qnorm
See the first 6 rpois dpois ppois qpois
Matrix subsetting head(df) Poisson
rows.
The Environment Binomial rbinom dbinom pbinom qbinom
df[ , 2]
ls() List all variables in the nrow(df) cbind - Bind columns. Uniform runif dunif punif qunif
environment. Number of rows.

rm(x) Remove x from the ncol(df)


environment. df[2, ] Number of
Plotting Also see the ggplot2 package.

columns.
rm(list = ls()) Remove all variables from the rbind - Bind rows. plot(x) plot(x, y) hist(x)
environment. Values of x in Values of x Histogram of
dim(df)
Number of order. against y. x.
You can use the environment panel in RStudio to
df[2, 2] columns and
browse variables in your environment. rows.
Dates See the lubridate package.

RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • mhairihmcneill@gmail.com • 844-448-1212 • rstudio.com Learn more at web page or vignette • package version • Updated: 3/15
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Data Import : : CHEAT SHEET


R’s tidyverse is built around tidy data stored
in tibbles, which are enhanced data frames.
Read Tabular Data - These functions share the common arguments: Data types
read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), readr functions guess
The front side of this sheet shows the types of each column and
how to read text files into R with quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000,
n_max), progress = interactive()) convert types when appropriate (but will NOT
readr. convert strings to factors automatically).
The reverse side shows how to A B C Comma Delimited Files
a,b,c read_csv("file.csv") A message shows the type of each column in the
create tibbles with tibble and to 1 2 3
result.
1,2,3 4 5 NA To make file.csv run:
layout tidy data with tidyr.
4,5,NA write_file(x = "a,b,c\n1,2,3\n4,5,NA", path = "file.csv")
## Parsed with column specification:
## cols(
OTHER TYPES OF DATA A B C Semi-colon Delimited Files ## age = col_integer(), age is an
a;b;c
Try one of the following packages to import 1 2 3 read_csv2("file2.csv") ## sex = col_character(), integer
other types of files 1;2;3 4 5 NA write_file(x = "a;b;c\n1;2;3\n4;5;NA", path = "file2.csv") ## earn = col_double()
4;5;NA ## )
• haven - SPSS, Stata, and SAS files
Files with Any Delimiter sex is a
• readxl - excel files (.xls and .xlsx) character
A B C read_delim("file.txt", delim = "|") earn is a double (numeric)
• DBI - databases a|b|c 1 2 3 write_file(x = "a|b|c\n1|2|3\n4|5|NA", path = "file.txt")
• jsonlite - json 1|2|3 4 5 NA 1. Use problems() to diagnose problems.
• xml2 - XML 4|5|NA Fixed Width Files x <- read_csv("file.csv"); problems(x)
• httr - Web APIs read_fwf("file.fwf", col_positions = c(1, 3, 5))
• rvest - HTML (Web Scraping) A B C
abc write_file(x = "a b c\n1 2 3\n4 5 NA", path = "file.fwf") 2. Use a col_ function to guide parsing.
1 2 3
123 4 5 NA • col_guess() - the default
Tab Delimited Files
Save Data 4 5 NA
read_tsv("file.tsv") Also read_table(). • col_character()
write_file(x = "a\tb\tc\n1\t2\t3\n4\t5\tNA", path = "file.tsv") • col_double(), col_euro_double()
Save x, an R object, to path, a file path, as: • col_datetime(format = "") Also
USEFUL ARGUMENTS col_date(format = ""), col_time(format = "")
Comma delimited file
write_csv(x, path, na = "NA", append = FALSE, Example file 1 2 3 Skip lines • col_factor(levels, ordered = FALSE)
a,b,c
col_names = !append) write_file("a,b,c\n1,2,3\n4,5,NA","file.csv") read_csv(f, skip = 1) • col_integer()
1,2,3 4 5 NA
File with arbitrary delimiter f <- "file.csv" • col_logical()
4,5,NA
write_delim(x, path, delim = " ", na = "NA", • col_number(), col_numeric()
append = FALSE, col_names = !append) A B C No header A B C Read in a subset • col_skip()
1 2 3 x <- read_csv("file.csv", col_types = cols(
CSV for excel read_csv(f, col_names = FALSE) 1 2 3 read_csv(f, n_max = 1)
4 5 NA A = col_double(),
write_excel_csv(x, path, na = "NA", append =
x y z Provide header B = col_logical(),
FALSE, col_names = !append)
Missing Values C = col_factor()))
String to file
A B C read_csv(f, col_names = c("x", "y", "z")) A B C
1 2 3 NA 2 3 read_csv(f, na = c("1", "."))
write_file(x, path, append = FALSE) 4 5 NA 4 5 NA 3. Else, read in as character vectors then parse
String vector to file, one element per line with a parse_ function.
write_lines(x,path, na = "NA", append = FALSE) • parse_guess()
Object to RDS file Read Non-Tabular Data • parse_character()
• parse_datetime() Also parse_date() and
write_rds(x, path, compress = c("none", "gz", Read a file into a raw vector
"bz2", "xz"), ...) Read a file into a single string parse_time()
read_file(file, locale = default_locale()) read_file_raw(file)
Tab delimited files • parse_double()
Read each line into its own string Read each line into a raw vector • parse_factor()
write_tsv(x, path, na = "NA", append = FALSE,
col_names = !append) read_lines(file, skip = 0, n_max = -1L, na = character(), read_lines_raw(file, skip = 0, n_max = -1L, • parse_integer()
locale = default_locale(), progress = interactive()) progress = interactive()) • parse_logical()
Read Apache style log files • parse_number()
read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = -1, progress = interactive()) x$A <- parse_number(x$A)
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Tibbles - an enhanced data frame Tidy Data with tidyr Split Cells
Tidy data is a way to organize tabular data. It provides a consistent data structure across packages.
The tibble package provides a new Use these functions to
A table is tidy if: Tidy data:
S3 class for storing tabular data, the A * B -> C split or combine cells
tibble. Tibbles inherit the data frame A B C A B C A B C A * B C into individual, isolated
class, but improve three behaviors:
• Subsetting - [ always returns a new tibble,
[[ and $ always return a vector.
& values.

separate(data, col, into, sep = "[^[:alnum:]]


Each variable is in Each observation, or Makes variables easy Preserves cases during +", remove = TRUE, convert = FALSE,
• No partial matching - You must use full its own column case, is in its own row to access as vectors vectorized operations extra = "warn", fill = "warn", ...)
column names when subsetting
Separate each cell in a column to make
• Display - When you print a tibble, R provides a
concise view of the
Reshape Data - change the layout of values in a table several columns.
table3
data that fits on Use gather() and spread() to reorganize the values of a table into a new layout.
# A tibble: 234 × 6 country year rate country year cases pop
manufacturer model displ
one screen 1
<chr>
audi
<chr> <dbl>
a4 1.8
gather(data, key, value, ..., na.rm = FALSE, spread(data, key, value, fill = NA, convert = FALSE, A 1999 0.7K/19M A 1999 0.7K 19M
2 audi a4 1.8
3 audi a4 2.0 A 2000 2K/20M A 2000 2K 20M
4
5
6
audi
audi
audi
a4
a4
a4
2.0
2.8
2.8
convert = FALSE, factor_key = FALSE) drop = TRUE, sep = NULL) B 1999 37K/172M B 1999 37K 172
7 audi a4 3.1
8 audi a4 quattro 1.8 B 2000 80K/174M B 2000 80K 174
9
10
audi a4 quattro
audi a4 quattro
1.8
2.0
gather() moves column names into a key spread() moves the unique values of a key C 1999 212K/1T C 1999 212K 1T
# ... with 224 more rows, and 3
#
#
more variables: year <int>,
cyl <int>, trans <chr>
column, gathering the column values into a column into the column names, spreading the C 2000 213K/1T C 2000 213K 1T
single value column. values of a value column across the new columns.
separate(table3, rate,
tibble display table4a table2
country 1999 2000 country year cases country year type count country year cases pop
into = c("cases", "pop"))
156 1999 6 auto(l4)
157 1999 6 auto(l4) A 0.7K 2K A 1999 0.7K A 1999 cases 0.7K A 1999 0.7K 19M
158 2008
159 2008
160 1999
6
8
auto(l4)
auto(s4)
4 manual(m5)
B 37K 80K B 1999 37K A 1999 pop 19M A 2000 2K 20M separate_rows(data, ..., sep = "[^[:alnum:].]
161 1999 4 auto(l4) C 212K 213K C 1999 212K B 1999 37K 172M
162 2008
163 2008
4 manual(m5)
4 manual(m5)
A 2000 cases 2K +", convert = FALSE)
164 2008 4 auto(l4)
A 2000 2K A 2000 pop 20M B 2000 80K 174M
165 2008 4 auto(l4)
166 1999
[ reached
4 auto(l4)
getOption("max.print")
B 2000 80K B 1999 cases 37K C 1999 212K 1T Separate each cell in a column to make
C 2000 213K C 2000 213K 1T
A large table -- omitted 68 rows ] B 1999 pop 172M several rows. Also separate_rows_().
key value B 2000 cases 80K
to display data frame display B 2000 pop 174M table3
country year rate country year rate
• Control the default appearance with options: C 1999 cases 212K
C 1999 pop 1T A 1999 0.7K/19M A 1999 0.7K
options(tibble.print_max = n, C 2000 cases 213K A 2000 2K/20M A 1999 19M
tibble.print_min = m, tibble.width = Inf) C 2000 pop 1T B 1999 37K/172M A 2000 2K
B 2000 80K/174M A 2000 20M
gather(table4a, `1999`, `2000`, key value
• View full data set with View() or glimpse() C 1999 212K/1T B 1999 37K
key = "year", value = "cases") spread(table2, type, count) C 2000 213K/1T B 1999 172M
• Revert to data frame with as.data.frame() B 2000 80K
B 2000 174M
CONSTRUCT A TIBBLE IN TWO WAYS
tibble(…)
Handle Missing Values C
C
1999
1999
212K
1T

Both drop_na(data, ...) fill(data, ..., .direction = c("down", "up")) replace_na(data, C 2000 213K
Construct by columns. C 2000 1T
make this Drop rows containing Fill in NA’s in … columns with most replace = list(), ...)
tibble(x = 1:3, y = c("a", "b", "c")) tibble NA’s in … columns. recent non-NA values. Replace NA’s by column. separate_rows(table3, rate)
x x x
tribble(…)
A tibble: 3 × 2 x1 x2 x1 x2 x1 x2 x1 x2 x1 x2 x1 x2
Construct by rows. x y A 1 A 1 A 1 A 1 A 1 A 1 unite(data, col, ..., sep = "_", remove = TRUE)
tribble( ~x, ~y, <int> <chr> B NA D 3 B NA B 1 B NA B 2
Collapse cells across several columns to
1 1 a C NA C NA C 1 C NA C 2
1, "a", 2 2 b D 3 D 3 D 3 D 3 D 3 make a single column.
2, "b", 3 3 c E NA E NA E 3 E NA E 2
table5
3, "c") drop_na(x, x2) fill(x, x2) replace_na(x, list(x2 = 2)) country century year country year
as_tibble(x, …) Convert data frame to tibble. Afghan 19 99 Afghan 1999

enframe(x, name = "name", value = "value") Expand Tables - quickly create tables with combinations of values Afghan
Brazil
20
19
0
99
Afghan
Brazil
2000
1999
Convert named vector to a tibble Brazil 20 0 Brazil 2000
complete(data, ..., fill = list()) expand(data, ...) China 19 99 China 1999
is_tibble(x) Test whether x is a tibble. China 20 0 China 2000
Adds to the data missing combinations of the Create new tibble with all possible combinations
values of the variables listed in … of the values of the variables listed in … unite(table5, century, year,
complete(mtcars, cyl, gear, carb) expand(mtcars, cyl, gear, carb) col = "year", sep = "")
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Factors with forcats : : CHEAT SHEET


The forcats package provides tools for working with factors, which are R's data structure for categorical data.

Factors stored displayed Change the order of levels Change the value of levels
R represents categorical integer 1 1= a a 1= a
data with factors. A factor vector 3 23 == bc c 23 == bc a 1= a a 1= b fct_relevel(.f, ..., after = 0L) a 1= a v 1= v fct_recode(.f, ...) Manually change
is an integer vector with a c 2= b c 2= c Manually reorder factor levels. c 2= b z 2= x levels. Also fct_relabel which obeys
2 b 3= c 3= a fct_relevel(f, c("b", "c", "a")) 3= c 3= z purrr::map syntax to apply a function
levels attribute that stores levels 1 a b b b x
a set of mappings between or expression to each level.
integers and categorical values. When you view a factor, R
a a a v fct_recode(f, v = "a", x = "b", z = "c")
displays not the integers, but the values associated with them. fct_infreq(f, ordered = NA) fct_relabel(f, ~ paste0("x", .x))
c 1= a c 1= c Reorder levels by the frequency
Create a factor with factor() 2= c 2= a in which they appear in the
a a 1= a c c 1=2
2= b factor(x = character(), levels, data (highest frequency first). a 1= a 2
c c labels = levels, exclude = NA, ordered a a f3 <- factor(c("c", "c", "a")) 2= b 2=1 fct_anon(f, prefix = ""))
b b
3= c c 3= c 1 3=3 Anonymize levels with random
= is.ordered(x), nmax = NA) Convert fct_infreq(f3)
a a a vector to a factor. Also as_factor. b 3 integers. fct_anon(f)
f <- factor(c("a", "c", "b", "a"), a 2
levels = c("a", "b", "c")) 1= a 1= b fct_inorder(f, ordered = NA)
b b
a 2= b a 2= a Reorder levels by order in
a 1= a a Return its levels with levels() which they appear in the data. a 1= a x 1= x fct_collapse(.f, ...) Collapse levels
2= b b c 2= b c 2= c into manually defined groups.
c 3= c c
levels(x) Return/set the levels of a fct_inorder(f2) 3= c fct_collapse(f, x = c("a", "b"))
b factor. levels(f); levels(f) <- c("x","y","z") b x
a a x
Use unclass() to see its structure
a 1= a a 1= c fct_rev(f) Reverse level order.
2= b 2= b f4 <- factor(c("a","b","c")) fct_lump(f, n, prop, w = NULL,
b b
Inspect Factors c
3= c
c
3= a fct_rev(f4) a
c
1= a
2= b
a 1= a
2 = Other
other_level = "Other", ties.method =
c("min", "average", "first", "last",
3= c
Other
"random", "max")) Lump together
a 1= a f n fct_count(f, sort = FALSE) a 1= a a 1= c fct_shift(f) Shift levels to left b Other
least/most common levels into a
c 2= b
a 2 Count the number of values 2= b 2= a or right, wrapping around end. a a single level. Also fct_lump_min.
3= c with each level. fct_count(f) b 3= c b 3= b fct_shift(f4)
b b1 c c fct_lump(f, n = 1)
a c 1

a 1= a a 1= a fct_unique(f) Return the a 1= a a 1= a fct_shuffle(f, n = 1L) Randomly a 1= a a 1= a fct_other(f, keep, drop, other_level =
2= b 2= b unique values, removing b 2= b b 2= c permute order of factor levels. 2= b 2= b "Other") Replace levels with "other."
c c 3= c 3= b fct_shuffle(f4) c 3= c
Other
3 = Other
3= c 3= c duplicates. fct_unique(f) c c fct_other(f, keep = c("a", "b"))
b b b b
a a a
fct_reorder(.f, .x, .fun=median, ...,
1= a 1= b .desc = FALSE) Reorder levels by
Combine Factors a
bc
2= b
3= c bc
a 2= c
3= a
their relationship with another
variable.
Add or drop levels
1= a + fct_c(…) Combine factors boxplot(data = iris, Sepal.Width ~
a b 1= a = a 1= a
fct_reorder(Species, Sepal.Width))
a 1= a a 1= a fct_drop(f, only) Drop unused levels.
c 2= c a 2= b c 2= c with different levels. b 2= b 2= b f5 <- factor(c("a","b"),c("a","b","x"))
3= b f1 <- factor(c("a", "c")) 3= x b
b f6 <- fct_drop(f5)
f2 <- factor(c("b", "a"))
a fct_c(f1, f2) fct_reorder2(.f, .x, .y, .fun =
1= a 1= b last2, ..., .desc = TRUE) Reorder a 1= a a 1= a fct_expand(f, …) Add levels to
2= b 2= c levels by their final values when 2= b 2= b a factor. fct_expand(f6, "x")
3= c 3= a b b 3= x
a 1= a a 1= a plotted with two other variables.
b
2= b
b
2= b
3= c
fct_unify(fs, levels = ggplot(data = iris,
lvls_union(fs)) Standardize aes(Sepal.Width, Sepal.Length,
a 1= a
2= c
a 1= a
2= b levels across a list of factors. color = fct_reorder2(Species, a 1= a a 1= a fct_explicit_na(f, na_level="(Missing)")
c c2 b 2= b
b 2= b Assigns a level to NAs to ensure they
3= c
fct_unify(list(f2, f1)) Sepal.Width, Sepal.Length))) + 3= x appear in plots, etc.
geom_smooth() NA x fct_explicit_na(factor(c("a", "b", NA)))
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at forcats.tidyverse.org • Diagrams inspired by @LVaudor ! • forcats 0.3.0 • Updated: 2019-02
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Data Transformation with dplyr : : CHEAT SHEET


dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
Manipulate Cases Manipulate Variables
A B C A B C
& EXTRACT CASES EXTRACT VARIABLES
pipes
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x %>% f(y)
its own column case, is in its own row becomes f(x, y) pull(.data, var = -1) Extract column values as
filter(.data, …) Extract rows that meet logical
criteria. filter(iris, Sepal.Length > 7) a vector. Choose by name or index.
pull(iris, Sepal.Length)
Summarise Cases distinct(.data, ..., .keep_all = FALSE) Remove select(.data, …)
rows with duplicate values. Extract columns as a table. Also select_if().
These apply summary functions to columns to create a new distinct(iris, Species) select(iris, Sepal.Length, Species)
table of summary statistics. Summary functions take vectors as
input and return one value (see back). sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = parent.frame()) Randomly Use these helpers with select (),
summary function select fraction of rows. e.g. select(iris, starts_with("Sepal"))
sample_frac(iris, 0.5, replace = TRUE)
summarise(.data, …) contains(match) num_range(prefix, range) :, e.g. mpg:cyl
Compute table of summaries. sample_n(tbl, size, replace = FALSE, weight = ends_with(match) one_of(…) -, e.g, -Species
summarise(mtcars, avg = mean(mpg)) NULL, .env = parent.frame()) Randomly select matches(match) starts_with(match)
size rows. sample_n(iris, 10, replace = TRUE)
count(x, ..., wt = NULL, sort = FALSE)
Count number of rows in each group defined slice(.data, …) Select rows by position. MAKE NEW VARIABLES
slice(iris, 10:15)
by the variables in … Also tally().
count(iris, Species) These apply vectorized functions to columns. Vectorized funs take
top_n(x, n, wt) Select and order top n entries (by vectors as input and return vectors of the same length as output
group if grouped data). top_n(iris, 5, Sepal.Width) (see back).
VARIATIONS vectorized function
summarise_all() - Apply funs to every column.
summarise_at() - Apply funs to specific columns. mutate(.data, …)
summarise_if() - Apply funs to all cols of one type. Logical and boolean operators to use with filter() Compute new column(s).
mutate(mtcars, gpm = 1/mpg)
< <= is.na() %in% | xor()
> >= !is.na() ! & transmute(.data, …)
Group Cases See ?base::logic and ?Comparison for help. Compute new column(s), drop others.
transmute(mtcars, gpm = 1/mpg)
Use group_by() to create a "grouped" copy of a table.
dplyr functions will manipulate each "group" separately and mutate_all(.tbl, .funs, …) Apply funs to every
then combine the results. ARRANGE CASES column. Use with funs(). Also mutate_if().
mutate_all(faithful, funs(log(.), log2(.)))
arrange(.data, …) Order rows by values of a
mtcars %>% mutate_if(iris, is.numeric, funs(log(.)))
column or columns (low to high), use with
group_by(cyl) %>% desc() to order from high to low. mutate_at(.tbl, .cols, .funs, …) Apply funs to
arrange(mtcars, mpg)
summarise(avg = mean(mpg)) specific columns. Use with funs(), vars() and
arrange(mtcars, desc(mpg)) the helper functions for select().
mutate_at(iris, vars( -Species), funs(log(.)))
group_by(.data, ..., add = ungroup(x, …) ADD CASES add_column(.data, ..., .before = NULL, .after =
FALSE) Returns ungrouped copy NULL) Add new column(s). Also add_count(),
Returns copy of table of table. add_row(.data, ..., .before = NULL, .after = NULL)
Add one or more rows to a table. add_tally(). add_column(mtcars, new = 1:32)
grouped by … ungroup(g_iris)
g_iris <- group_by(iris, Species) add_row(faithful, eruptions = 1, waiting = 1)
rename(.data, …) Rename columns.
rename(iris, Length = Sepal.Length)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

Vector Functions Summary Functions Combine Tables


TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES dplyr
mutate() and transmute() apply vectorized summarise() applies summary functions to x y
A B C A B D A B C A B D A B C
functions to columns to create new columns. columns to create a new table. Summary
Vectorized functions take vectors as input and
return vectors of the same length as output.
functions take vectors as input and return single
values as output.
a
b
c
t
u
v
1
2
3
+ a
b
d
t
u
w
3
2
1
= a
b
c
t
u
v
1
2
3
a
b
d
t
u
w
3
2
1 x
a
b
c
t
u
v
1
2
3

A B C
Use bind_cols() to paste tables beside each C v 3
vectorized function summary function
other as they are. + y d w 4

COUNTS bind_cols(…) Returns tables placed side by


OFFSETS
dplyr::n() - number of values/rows side as a single table. Use bind_rows() to paste tables below each
dplyr::lag() - Offset elements by 1 BE SURE THAT ROWS ALIGN.
dplyr::n_distinct() - # of uniques other as they are.
dplyr::lead() - Offset elements by -1 sum(!is.na()) - # of non-NA’s
CUMULATIVE AGGREGATES Use a "Mutating Join" to join one table to DF A B C bind_rows(…, .id = NULL)
LOCATION x a t 1
dplyr::cumall() - Cumulative all() columns from another, matching values with Returns tables one on top of the other
mean() - mean, also mean(!is.na()) the rows that they correspond to. Each join
x b u 2
as a single table. Set .id to a column
dplyr::cumany() - Cumulative any() x c v 3
median() - median retains a different combination of values from name to add a column of the original
cummax() - Cumulative max() z c v 3
z d w 4
dplyr::cummean() - Cumulative mean() the tables. table names (as pictured)
LOGICALS
cummin() - Cumulative min()
cumprod() - Cumulative prod() mean() - Proportion of TRUE’s A B C D left_join(x, y, by = NULL, A B C intersect(x, y, …)
c v 3
cumsum() - Cumulative sum() sum() - # of TRUE’s a t 1 3 copy=FALSE, suffix=c(“.x”,“.y”),…) Rows that appear in both x and y.
b u 2 2
c v 3 NA Join matching values from y to x.
RANKINGS POSITION/ORDER A B C setdiff(x, y, …)
dplyr::first() - first value A B C D right_join(x, y, by = NULL, copy = a t 1 Rows that appear in x but not y.
dplyr::cume_dist() - Proportion of all values <= b u 2
dplyr::last() - last value a t 1 3 FALSE, suffix=c(“.x”,“.y”),…)
dplyr::dense_rank() - rank with ties = min, no b u 2 2
dplyr::nth() - value in nth location of vector Join matching values from x to y. A B C union(x, y, …)
gaps d w NA 1
a t 1 Rows that appear in x or y.
dplyr::min_rank() - rank with ties = min b u 2
RANK A B C D inner_join(x, y, by = NULL, copy = (Duplicates removed). union_all()
dplyr::ntile() - bins into n bins c v 3
a t 1 3 FALSE, suffix=c(“.x”,“.y”),…) d w 4 retains duplicates.
dplyr::percent_rank() - min_rank scaled to [0,1] quantile() - nth quantile b u 2 2
Join data. Retain only rows with
dplyr::row_number() - rank with ties = "first" min() - minimum value matches.
max() - maximum value
MATH Use setequal() to test whether two data sets
A B C D full_join(x, y, by = NULL,
+, - , *, /, ^, %/%, %% - arithmetic ops SPREAD a t 1 3
copy=FALSE, suffix=c(“.x”,“.y”),…) contain the exact same rows (in any order).
b u 2 2
log(), log2(), log10() - logs IQR() - Inter-Quartile Range c v 3 NA Join data. Retain all values, all rows.
<, <=, >, >=, !=, == - logical comparisons mad() - median absolute deviation d w NA 1

dplyr::between() - x >= left & x <= right sd() - standard deviation EXTRACT ROWS
dplyr::near() - safe == for floating point var() - variance x y
numbers
A B.x C B.y D Use by = c("col1", "col2", …) to A B C A B D

MISC a
b
t 1
u 2
t 3
u 2
specify one or more common a
b
t
u
1
2 + a
b
t
u
3
2 =
dplyr::case_when() - multi-case if_else()
dplyr::coalesce() - first non-NA values by
Row Names c v 3 NA NA columns to match on.
left_join(x, y, by = "A")
c v 3 d w 1

element across a set of vectors Tidy data does not use rownames, which store a
variable outside of the columns. To work with the A.x B.x C A.y B.y Use a named vector, by = c("col1" = Use a "Filtering Join" to filter one table against
dplyr::if_else() - element-wise if() + else() rownames, first move them into a column. a t 1 d w
"col2"), to match on columns that the rows of another.
dplyr::na_if() - replace specific values with NA b u 2 b u
have different names in each table.
C A B c v 3 a t
pmax() - element-wise max() rownames_to_column() left_join(x, y, by = c("C" = "D")) semi_join(x, y, by = NULL, …)
A B A B C
pmin() - element-wise min() 1 a t 1 a t Move row names into col. a t 1 Return rows of x that have a match in y.
dplyr::recode() - Vectorized switch() 2 b u 2 b u a <- rownames_to_column(iris, var A1 B1 C A2 B2 Use suffix to specify the suffix to b u 2 USEFUL TO SEE WHAT WILL BE JOINED.
dplyr::recode_factor() - Vectorized switch() 3 c v 3 c v
= "C") a t 1 d w give to unmatched columns that
for factors b u 2 b u
have the same name in both tables. A B C anti_join(x, y, by = NULL, …)
c v 3 a t
A B C A B column_to_rownames() left_join(x, y, by = c("C" = "D"), suffix = c v 3 Return rows of x that do not have a
1 a t 1 a t
Move col in row names. c("1", "2")) match in y. USEFUL TO SEE WHAT WILL
2 b u 2 b u
3 c v 3 c v column_to_rownames(a, var = "C") NOT BE JOINED.

Also has_rownames(), remove_rownames()

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03
Downloaded by Luis Ignacio Llanos (llanos.luis@gmail.com)
lOMoARcPSD|9624613

R Syntax Comparison : : CHEAT SHEET


Dollar sign syntax Formula syntax Tidyverse syntax
goal(data$x, data$y) goal(y~x|z, data=data, group=w) data %>% goal(x)
SUMMARY STATISTICS: SUMMARY STATISTICS: SUMMARY STATISTICS:
one continuous variable: one continuous variable: one continuous variable:
mean(mtcars$mpg) mosaic::mean(~mpg, data=mtcars) mtcars %>% dplyr::summarize(mean(mpg))

one categorical variable: one categorical variable: one categorical variable:


table(mtcars$cyl) mosaic::tally(~cyl, data=mtcars) mtcars %>% dplyr::group_by(cyl) %>%
dplyr::summarize(n()) the pipe
two categorical variables: two categorical variables:
table(mtcars$cyl, mtcars$am) mosaic::tally(cyl~am, data=mtcars) two categorical variables:
mtcars %>% dplyr::group_by(cyl, am) %>%
one continuous, one categorical: one continuous, one categorical: dplyr::summarize(n())
mean(mtcars$mpg[mtcars$cyl==4]) mosaic::mean(mpg~cyl, data=mtcars)
mean(mtcars$mpg[mtcars$cyl==6]) one continuous, one categorical:
mean(mtcars$mpg[mtcars$cyl==8]) mtcars %>% dplyr::group_by(cyl) %>%
tilde
dplyr::summarize(mean(mpg))

PLOTTING: PLOTTING: PLOTTING:


one continuous variable: one continuous variable: one continuous variable:
hist(mtcars$disp) lattice::histogram(~disp, data=mtcars) ggplot2::qplot(x=mpg, data=mtcars, geom = "histogram")

boxplot(mtcars$disp) lattice::bwplot(~disp, data=mtcars) ggplot2::qplot(y=disp, x=1, data=mtcars, geom="boxplot")

one categorical variable: one categorical variable: one categorical variable:


barplot(table(mtcars$cyl)) mosaic::bargraph(~cyl, data=mtcars) ggplot2::qplot(x=cyl, data=mtcars, geom="bar")

two continuous variables: two continuous variables: two continuous variables:


plot(mtcars$disp, mtcars$mpg) lattice::xyplot(mpg~disp, data=mtcars) ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")

two categorical variables: two categorical variables: two categorical variables:


mosaicplot(table(mtcars$am, mtcars$cyl)) mosaic::bargraph(~am, data=mtcars, group=cyl) ggplot2::qplot(x=factor(cyl), data=mtcars, geom="bar") +
facet_grid(.~am)
one continuous, one categorical: one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==4]) lattice::histogram(~disp|cyl, data=mtcars) one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==6]) ggplot2::qplot(x=disp, data=mtcars, geom = "histogram") +
histogram(mtcars$disp[mtcars$cyl==8]) lattice::bwplot(cyl~disp, data=mtcars) facet_grid(.~cyl)

boxplot(mtcars$disp[mtcars$cyl==4]) ggplot2::qplot(y=disp, x=factor(cyl), data=mtcars,


boxplot(mtcars$disp[mtcars$cyl==6]) geom="boxplot")
boxplot(mtcars$disp[mtcars$cyl==8]) The variety of R syntaxes give
WRANGLING: you many ways to “say” the WRANGLING:
subsetting:
mtcars[mtcars$mpg>30, ]
same thing subsetting:
mtcars %>% dplyr::filter(mpg>30)

making a new variable: making a new variable:


read across the cheatsheet to see how different
mtcars$efficient[mtcars$mpg>30] <- TRUE mtcars <- mtcars %>%
syntaxes approach the same problem
mtcars$efficient[mtcars$mpg<30] <- FALSE dplyr::mutate(efficient = if_else(mpg>30, TRUE, FALSE))
RStudio®by
Downloaded is aLuis
trademark of RStudio,
Ignacio Llanos Inc. • CC BY Amelia McNamara • amcnamara@smith.edu • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01
(llanos.luis@gmail.com)
lOMoARcPSD|9624613

R Syntax Comparison : : CHEAT SHEET


Syntax is the set of rules that govern what code works and
doesn’t work in a programming language. Most programming
Even more ways to say the same thing
Even within one syntax, there are often variations that are equally valid. As a case study, let’s look at the ggplot2
languages offer one standardized syntax, but R allows package syntax. ggplot2 is the plotting package that lives within the tidyverse. If you read down this column, all the code
developers to specify their own syntax. As a result, there is a large
variety of (equally valid) R syntaxes. here produces the same graphic.

The three most prevalent R syntaxes are: quickplot


1. The dollar sign syntax, sometimes called base R

syntax that look different but produce the same graphic


qplot() stands for quickplot, and allows you to make quick plots. It doesn’t have the full power of ggplot2,

read down this column for many pieces of code in one


syntax, expected by most base R functions. It is
characterized by the use of dataset$variablename, and and it uses a slightly different syntax than the rest of the package.
is also associated with square bracket subsetting, as in
dataset[1,2]. Almost all R functions will accept things ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")
passed to them in dollar sign syntax.
2. The formula syntax, used by modeling functions like
lm(), lattice graphics, and mosaic summary statistics. It ggplot2::qplot(x=disp, y=mpg, data=mtcars) !
uses the tilde (~) to connect a response variable and one (or
many) predictors. Many base R functions will accept formula
syntax. ggplot2::qplot(disp, mpg, data=mtcars) ! !
3. The tidyverse syntax used by dplyr, tidyr, and
more. These functions expect data to be the first argument,
which allows them to work with the “pipe” (%>%) from the
magrittr package. Typically, ggplot2 is thought of as part ggplot
of the tidyverse, although it has its own flavor of the syntax
using plus signs (+) to string pieces together. ggplot2 author To unlock the power of ggplot2, you need to use the ggplot() function (which sets up a plotting region) and
Hadley Wickham has said the package would have had add geoms to the plot.
different syntax if he had written it after learning about the
pipe. ggplot2::ggplot(mtcars) +
Educators often try to teach within one unified syntax, but most R geom_point(aes(x=disp, y=mpg))
programmers use some combination of all the syntaxes.
ggplot2::ggplot(data=mtcars) + plus adds
geom_point(mapping=aes(x=disp, y=mpg)) layers

Internet research tip: ggplot2::ggplot(mtcars, aes(x=disp, y=mpg)) +


geom_point()
If you are searching on google, StackOverflow, or
another favorite online source and see code in a syntax
you don’t recognize: ggplot2::ggplot(mtcars, aes(x=disp)) +
• Check to see if the code is using one of the three geom_point(aes(y=mpg))
common syntaxes listed on this cheatsheet
• Try your search again, using a keyword from the ggformula
syntax name (“tidyverse”) or a relevant package
(“mosaic”) The “third and a half way” to use the formula syntax, but get ggplot2-style graphics

ggformula::gf_point(mpg~disp, data= mtcars)

! Sometimes particular syntaxes work, but are considered formulas in base plots
dangerous to use, because they are so easy to get wrong. For Base R plots will also take the formula syntax, although it's not as commonly used
example, passing variable names without assigning them to a
named argument. plot(mpg~disp, data=mtcars)

RStudio®by
Downloaded is aLuis
trademark of RStudio,
Ignacio Llanos Inc. • CC BY Amelia McNamara • amcnamara@smith.edu • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01
(llanos.luis@gmail.com)

You might also like