Cheat Sheet - Dplyr PDF
Cheat Sheet - Dplyr PDF
Cheat Sheet - Dplyr PDF
dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
Manipulate Cases Manipulate Variables
A B C A B C
& EXTRACT CASES EXTRACT VARIABLES
pipes
Row functions return a subset of rows as a new table. Use a Column functions return a set of columns as a new table. Use a
Each variable is in Each observation, or x %>% f(y) variant that ends in _ for non-standard evaluation friendly code. variant that ends in _ for non-standard evaluation friendly code.
its own column case, is in its own row becomes f(x, y)
filter(.data, …) Extract rows that meet logical select(.data, …)
Extract columns by name. Also select_if()
Summarise Cases w
www
ww criteria. Also filter_(). filter(iris, Sepal.Length > 7)
w
www select(iris, Sepal.Length, Species)
These apply summary functions to columns to create a new distinct(.data, ..., .keep_all = FALSE) Remove
table. Summary functions take vectors as input and return one rows with duplicate values. Also distinct_().
Use these helpers with select (),
value (see back).
summary function
w
www
ww distinct(iris, Species)
sample_frac(tbl, size = 1, replace = FALSE,
e.g. select(iris, starts_with("Sepal"))
contains(match) num_range(prefix, range) :, e.g. mpg:cyl
weight = NULL, .env = parent.frame()) Randomly ends_with(match) one_of(…) -, e.g, -Species
summarise(.data, …)
select fraction of rows.
matches(match) starts_with(match)
w
ww
Compute table of summaries. Also
summarise_().
summarise(mtcars, avg = mean(mpg))
w
www
ww sample_frac(iris, 0.5, replace = TRUE)
sample_n(tbl, size, replace = FALSE, weight = MAKE NEW VARIABLES
NULL, .env = parent.frame()) Randomly select
size rows. sample_n(iris, 10, replace = TRUE) These apply vectorized functions to columns. Vectorized funs take
count(x, ..., wt = NULL, sort = FALSE)
Count number of rows in each group defined vectors as input and return vectors of the same length as output
slice(.data, …) Select rows by position. Also
w
ww by the variables in … Also tally().
count(iris, Species)
slice_(). slice(iris, 10:15)
(see back).
vectorized function
VARIATIONS w
www
ww top_n(x, n, wt) Select and order top n entries (by
group if grouped data). top_n(iris, 5, Sepal.Width) mutate(.data, …)
Compute new column(s).
summarise_all() - Apply funs to every column.
summarise_at() - Apply funs to specific columns.
summarise_if() - Apply funs to all cols of one type. Logical and boolean operators to use with filter()
w
wwww
w mutate(mtcars, gpm = 1/mpg)
transmute(.data, …)
Compute new column(s), drop others.
Group Cases
<
>
<=
>=
is.na()
!is.na()
%in%
!
See ?base::logic and ?Comparison for help.
|
&
xor()
w
ww transmute(mtcars, gpm = 1/mpg)
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01
Vectorized Functions Summary Functions Combine Tables
TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES dplyr
mutate() and transmute() apply vectorized summarise() applies summary functions to x y
functions to columns to create new columns. columns to create a new table. Summary A B C A B D A B C A B D A B C
A B C
vectorized function summary function Use bind_cols() to paste tables beside each
other as they are. + y
C v 3
d w 4
Tidy data does not use rownames, which store a A.x B.x C A.y B.y Use a named vector, by = c("col1" =
dplyr::na_if() - replace specific values with NA a t 1 d w
"col2"), to match on columns with Use a "Filtering Join" to filter one table against
variable outside of the columns. To work with the
pmax() - element-wise max() rownames, first move them into a column.
b u 2 b u
different names in each data set. the rows of another.
c v 3 a t
pmin() - element-wise min() left_join(x, y, by = c("C" = "D"))
C A B
dplyr::recode() - Vectorized switch() rownames_to_column() semi_join(x, y, by = NULL, …)
A B A B C
dplyr::recode_factor() - Vectorized switch()
1 a t 1 a t Move row names into col. A1 B1 C A2 B2 Use suffix to specify suffix to give to a t 1 Return rows of x that have a match in y.
for factors 2 b u 2 b u a <- rownames_to_column(iris, var a t 1 d w duplicate column names. b u 2 USEFUL TO SEE WHAT WILL BE JOINED.
3 c v 3 c v
= "C") b
c
u
v
2
3
b
a
u
t left_join(x, y, by = c("C" = "D"), suffix =
c("1", "2")) A B C anti_join(x, y, by = NULL, …)
A B C A B column_to_rownames() c v 3 Return rows of x that do not have a
1 a t 1 a t
Move col in row names. match in y. USEFUL TO SEE WHAT WILL
2 b u 2 b u
3 c v 3 c v column_to_rownames(a, var = "C") NOT BE JOINED.
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01