Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
27 views

Cbds Intro R

Uploaded by

Azhar Mehmood
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Cbds Intro R

Uploaded by

Azhar Mehmood
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 173

Material for Introduction to R

Jeffrey Leek
Material for Introduction to R
Jeffrey Leek
This book is for sale at http://leanpub.com/universities/courses/jhu/cbds-intro-r

This version was published on 2019-09-23

Copyright © Johns Hopkins University 2019. Creative Commons Attribution 4.0 International
License.
Contents

What is R? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

RStudio Cloud Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Objects in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Basic Commands in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Working with Logicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Lists and Data Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Writing Functions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

R Markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Getting Help in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Pushing Code from R to GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Creating Websites with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154


What is R?
Introduction to R
While you completed a project with Leanpub data using RStudio Cloud¹ in the intro course to this
series, there wasn’t a ton of detail about what R, RStudio, and RStudio Cloud actually are in those
lessons. We’ll discuss a few more details about R here in this introductory lesson and then will get
into more and more detail about using R throughout this course.

The R programming language


R, most simply, is a programming language. Just like there are many different spoken languages
throughout the world, there are many different programming languages. Similar to how each spoken
language is used by a subset of the humans on this Earth, each programming language was created
for a different group of people who code. You may have heard of other programming languages,
such as C++, Java, or HMTL previously. These are all enormously popular programming languages,
but each has what it does best along with its own disadvantages. For example, if you’re interested
in building software that runs really quickly, you may learn C++. If you want to build and edit
websites, you would maybe start by learning HTML. And, Java may be most helpful if you want to
build video games.
¹rstudio.cloud
What is R? 2

Similarly, R has its strengths and weaknesses. R was designed to be helpful to those interested in
statistical computing and graphics. That said, in its simplest form, R is a calculator. If you type ‘3 +
7’ into the R console and hit enter, R will tell you the answer to that math problem is ‘10.’
What is R? 3

R is a calculator

However, R is much more than just a calculator. It also has the ability to work with data, such as
the information in spreadsheets. It’s able to tell you how many rows are in your column. It’s able to
find the average age of individuals across a data set. It’s able to create plots to show you how many
males or females are included in your data set. And beyond data summary, you can run statistical
analyses, write your own software, and carry out complicated analyses start to finish in R. So, while
it is a calculator, it is much more than a calculator. It is a place where you can do all of your data
analysis. RStudio makes the process of doing an analysis in R easier.

RStudio
RStudio is a free, integrated development environment (IDE) for R. Generally, IDEs are software
applications that allow software developers to program more efficiently, putting everything the
programmer needs in one place. With regards to RStudio specifically, RStudio has a space for the
programmer to code, a separate space for that code to run (the Console), a place to see all the objects
created in the current session (the Workspace), and a place to see Plots that have been generated.
All of these spaces are viewable in a single window, simplifying programming and data analysis.
Those who work at RStudio seek to develop tools that support analysts to perform trustworthy and
high quality analysis. Their singular goal is to make your life programming in RStudio easier!
What is R? 4

RStudio IDE

RStudio Cloud
RStudio Cloud² is a version of RStudio that can run in the cloud. This means that regardless of what
computer you’re on, you can access the analysis you were doing previously in your RStudio Cloud
session. Other than that, it has many of the same features and is being developed by the same group
of people who developed the version of RStudio you download and use on your individual laptop.
This means that in RStudio Cloud, like in RStudio, you have four main components, each of which is
visible in the same window. To review from the introductory lesson of this series of courses, RStudio
Cloud has the following four main components:

1. Scripting - where you write your code


2. Console - where your code runs
3. Environment - where you can see what objects have been created during your analysis
4. Files - where you can see all the files that are part of your project
²rstudio.cloud
What is R? 5

RStudio Cloud

There are additional features that you can play around with; however, one important feature to note
is that whenever you create a plot in RStudio or RStudio Cloud, it will be visible in the plots tab at
the bottom right-hand of your screen. This, and a number of additional features will be discussed
in more detail in the next lesson in this course.
What is R? 6

RStudio Cloud plots

Basic History
Knowing the background of a programming language often helps to add some context. So, very
briefly, R first appeared in 1993 and was developed to be very similar to another programming
language, S. R was initially written by Ross Ihaka³ and Robert Gentleman⁴ in the Department of
Statistics at the University of Auckland in New Zealand. Since its inception, many people have
contributed code and improvements to R. And, since 1997, the “R Core Team” is responsible for all
modifications to the language. R is an open source language. This means that the language is free
to use and the source code is available to the general public.
As for RStudio, it was first released in 2011. It was founded by J.J. Allaire⁵, who is the company’s
current CEO. RStudio Cloud, the cloud-based version of RStudio, was first released for alpha testing
(meaning it would have bugs and things that still needed to be fixed and will likely be updated
significantly in the coming years) in 2017.
³https://www.stat.auckland.ac.nz/~ihaka/
⁴https://www.linkedin.com/in/robert-gentleman-06845098/
⁵https://www.linkedin.com/in/jjallaire/
What is R? 7

R Basic History

Learning R
In the first course⁶ in this Course Set, we discussed that learning how to learn is part of the process
of learning data science. Here, we want to remind you that everything in that lesson applies here.
As a refresher, learning R can be difficult and frustrating. Know that if you get stuck, you’re not
alone! The Internet and conversations with others more experienced than yourself (even if those
conversation are on the Internet!) can be very helpful to you!
⁶https://leanpub.com/universities/courses/jhu/cbds-intro
What is R? 8

frustration is normal

We just wanted to take a second to remind you that getting frustrated is normal and failure is
expected. The goal here is to learn how to use R, not to just memorize functions.
What is R? 9

failure is inevitable

So, try things out on your own. Try to work through error messages when you’re stuck. But, if you
can’t figure it out, ask questions of others who have more experience than you!
What is R? 10

try first; then ask questions

Slides and Video

View this Video at https://youtu.be/zdS-7wfGRso⁷.


What is R?

Slides⁸
⁸https://docs.google.com/presentation/d/1-GwpqNvTqNtobrgF-o0SCD0SvUwlv1b_MYF8QnyMHJk/edit?usp=sharing
What is R? 11

Take this quiz online⁹


⁹http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_00_what_is_R
RStudio Cloud Tour
At this point in the course, you’ve been introduced to RStudio Cloud. You know that it’s where you’ll
be writing code, and you’ve been briefly introduced to RStudio and RStudio Cloud. You’ve learned
how to organize your files within RStudio Cloud for data science projects. And, you have worked
with markdown files within RStudio Cloud. While you were introduced to the parts of RStudio
previously, we’ll review them in this lesson and then go into a little more depth to get you even
more comfortable working in RStudio Cloud.

Getting Started in RStudio Cloud


To get started working in RStudio Cloud¹⁰, you’ll want to go to rstudio.cloud¹¹ and log in using your
RStudio Cloud login. You’ll be logged into your home screen. As discussed previously, to get started
working on a new project, you would click on the “New Project” blue icon toward the top right. This
will create a new project. However, all your old projects will be listed at left underneath “Spaces.”
You can always return to an old project or start a new one.
¹⁰rstudio.cloud
¹¹rstudio.cloud
RStudio Cloud Tour 13

RStudio Cloud Home Screen - New Project

By starting a new project you’ll be brought to a screen where three spaces are available.
RStudio Cloud Tour 14

RStudio Cloud Project

However, if you remember from previously, there are four main quadrants when working in RStudio
Cloud. To access the fourth space, you’ll have to start a new R Script. To do so, you’ll click on File,
hover over New File from the drop-down menu that appears, and then click “R Script” from the
drop-down menu.
RStudio Cloud Tour 15

Open up a new R Script

This will open up a new R Script, which is currently called “Untitled1,” which you can see on the
tab at the top left of the quadrant has just appeared.
RStudio Cloud Tour 16

RStudio Cloud

The Tour
Now that RStudio Cloud is opened and you have access to each of four quadrants, we can discuss
and review each quadrant’s purpose. We will go through each of the regions and describe some of
their main functions, so follow along with each step and make sure you understand the function
and how to access each part of RStudio Cloud on your own. But, it would be impossible to cover
everything that RStudio can do, so we urge you to explore RStudio Cloud further on your own too!
RStudio Cloud Tour 17

RStudio’s quadrants

The menu bar

In addition to the four main quadrants, there is also a menu bar. The menu bar runs across the top
of your screen and should have two rows. The first row should be a fairly standard menu, starting
with “File” and “Edit.” Below that, there is a row of icons that are shortcuts for functions that you’ll
frequently use.
RStudio Cloud Tour 18

The commonly used options of the main menu bar

To start, let’s explore the main sections of the menu bar that you will use. The first is the File menu.
Here we can open new or saved files, save our current document, or close RStudio. As we saw earlier
in this lesson, if you mouse over “New File”, a new menu will appear that suggests the various file
formats available to you. R Script and R Markdown files are the most common file types for use, but
you can also generate R notebooks, web apps, websites, or slide presentations. If you click on any
one of these, a new tab in the “Source” quadrant will open. We’ll spend more time in a future lesson
on R Markdown files and their use.
RStudio Cloud Tour 19

The File menu

The Session menu has some R specific functions, in which you can restart, interrupt or terminate R
- these can be helpful if R isn’t behaving or is stuck and you want to stop what it is doing and start
from scratch.
RStudio Cloud Tour 20

The Session menu

The Tools menu is a treasure trove of functions for you to explore. For now, you should know that
this is where you can go to install new packages (see the next lesson in this course!), set up your
version control software (GitHub was discussed in the last course in this series!), and set your options
and preferences for how RStudio looks and functions. For now, we will leave this alone, but be sure
to explore these menus on your own once you have a bit more experience with RStudio and see
what you can change to best suit your preferences!
RStudio Cloud Tour 21

The Tools menu

Console

This region should look familiar to you - when you opened R, you were presented with the console.
This is where you type and execute commands, and where the output of these commands is
displayed.
RStudio Cloud Tour 22

The console

To execute your first command, at the > prompt, try typing 1 + 1. Then, hit enter. You should see
the output [1] 2 below your command.
RStudio Cloud Tour 23

Typing into the console and getting an output

Source: script editor panel


However, often you want to write code and save it so that you can open the code again and re-run
it later. This saved file with code in it is referred to as a script. When you want to write code and
save it in a script, you’ll do this in the Source panel.
To get started in your script file, copy and paste the following into your Source quadrant (top-left).

example <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8), nrow = 4, ncol = 2)

To run this code, you can’t just hit enter (as you were able to do in the Console). Hitting enter will
just bring your cursor to the next line in the script. Instead, with your cursor in the line of code you
want to run, you can click on “Run” at the top right of your script file. This will execute the code in
the Console.
Note Alternatively, to run code, with your cursor on the line of code you’d like to run, you could
hit ‘ctrl + enter’ to run that line of code. This will save you a lot of time as you start writing a lot
of code and analyzing data. Practice this keyboard shortcut now!
What this code does is create an object (we’ll define what that is soon!) called ‘example’ that has the
numbers 1 through 8 in four different rows and two different columns. To see what this object looks
like, we’ll take a look at the environment quadrant of RStudio Cloud.
RStudio Cloud Tour 24

Environment (& History)

To view this object we’ve just created, you’ll first want to ensure that the object was created. In the
Environment quadrant, you should see that ‘example’ is now there. The object was created!

The environment quadrant

Then, just click anywhere on the “example” line, and a new tab on the Source quadrant should
appear, showing the matrix you created.
RStudio Cloud Tour 25

Your newly made object, opened in a new tab of the source panel

RStudio Cloud also tells you some information about the object in the environment, like whether it is
a list or a data frame or if it contains numbers, integers or characters. This is very helpful information
to have as some functions only work with certain classes of data. We’ll get into the details of all this
later, but for now, knowing that this information is in the Environment tab is enough.
The quadrant has two other tabs running across the top of it. We’ll just look at the History tab now.
Your history tab should look something like this:
RStudio Cloud Tour 26

The history tab

Here you will see the commands that we have run in this session of R. If you click on any one
of them, you can click “To Console” or “To Source” and this will either rerun the command in the
console, or will move the command to the source, respectively.
RStudio Cloud Tour 27

From History to Source

Do so now for your View(example) object and send it to Source.


RStudio Cloud Tour 28

Sending ‘View(example)’ from History to Source

This line of code is now in your Source document. When you save this document, you’ll also have
this line of code saved for future use.

Saving Script Files

Now that you’ve created a script with code in it, you likely want to save it. To do so, you’ll want to
click on the save icon.
RStudio Cloud Tour 29

Save Icon

In RStudio Cloud this will open a Save File window.


RStudio Cloud Tour 30

Save File Window

In the File Organization Course, you learned that code is saved in a directory called code. So, we’ll
first create a “New Folder”.
RStudio Cloud Tour 31

New Folder

We’ll name this folder “code” by typing it in the box and clicking “OK”.
RStudio Cloud Tour 32

Create code folder

After creating this new folder, as discussed in a previous lesson, you’ll see along the top that you’re
now in the “code” directory. Within this folder, we’ll create another new folder called “raw_code.”
RStudio Cloud Tour 33

Create raw_code folder

This is where we’ll save this file as “R_basics.R” by typing that in the File name: box and clicking
“Save.”
RStudio Cloud Tour 34

Save file by typing file name in “File name:” box

This file name ‘R_basics.R’ will now show up in the tab at the top of the R Source quadrant.

Files/Help/Plots/Packages/Viewer

Files

You can also see where this file is saved using the fourth and final quadrant in RStudio Cloud that
we’ll discuss. In this final quadrant you’ll see five tabs: Files, Plots, Packages, Help, and Viewer.
RStudio Cloud Tour 35

Files, Plots, Packages, Help, Viewer

In Files, you can see everything in your current working directory. You should now be able to see
the code folder you just created.
RStudio Cloud Tour 36

code directory in Files tab

By clicking on that folder, you should then see the raw_code folder you created.
RStudio Cloud Tour 37

raw_code folder in Files tab

By clicking on this, you’ll see the script file you just saved!
RStudio Cloud Tour 38

R_basics.R is saved in code/raw_code/

After you save a file in a folder, if you realize it’s not where you wanted it, you do have the option
to move it around. To do so, click on the check box of the file you want to move, and click on the
“More” icon to expose options. Click through these to move your file to where you actually wanted
it.
RStudio Cloud Tour 39

The “More”” icon

Plots

In the Plots tab, if you generate a plot with your code, it will appear here. You can use the arrows to
navigate to previously generated plots. The Zoom function will open the plot in a new window, that
is much larger than the quadrant. Export is one way to save the plot. (Saving plots will be discussed
in more detail in a future lesson.) The broom icon clears all plots from memory.
RStudio Cloud Tour 40

The plots tab

Packages

The Packages tab will be explored more in depth in the next lesson on R packages. Here you can see
all the packages you have installed, load and unload these packages, and update them.
RStudio Cloud Tour 41

The packages tab

Help

The Help tab is where you find the documentation for your R packages and various functions. In the
upper right of this panel there is a search function for when you have a specific function or package
in question. Navigating this tab will be discussed in more detail in a later lesson in this course.
RStudio Cloud Tour 42

The help tab

Swirl
Throughout the courses in this Course Set, we’ll be using something called Swirl modules¹² to
practice the R code learned in many of the lessons. These modules will all be run within RStudio
Cloud. To make sure that you’re comfortable using Swirl, we’ll go through the steps on where to
go to run Swirl and how to work through a module. This will be important as many of the quizzes
accompanying these lessons will require you to use Swirl. Follow the steps in this section of the
lesson to get started with your first Swirl module!
Throughout this Course Set, whenever you’re asked to complete a Swirl module, you’ll always start
in the same place: the RStudio Cloud Cloud-based Data Science Space¹³. Click on this link now¹⁴. If
prompted, log into your RStudio Cloud account.
¹²https://swirlstats.com/
¹³http://bit.ly/cbds_projects
¹⁴http://bit.ly/cbds_projects
RStudio Cloud Tour 43

Cloud-based Data Science on RStudio Cloud

Among the projects listed you’ll see one called “swirl” (You may have to scroll down on the list to
see it.). To the right of swirl, you’ll want to click on “Copy”.
RStudio Cloud Tour 44

make a copy of “swirl” project

This project contains all of the swirl modules you’ll be completing throughout the quizzes in this
course set. For each module you’re supposed to complete, there will be a quiz question specifying
which you’re supposed to complete.
But for now, let’s just get comfortable with how swirl works.
Any time you are within this space and supposed to complete a swirl module you’ll start by first
loading the swirl package (it has already been installed in that space for you) and running the
command swirl():

## load package
library(swirl)

## start swirl
swirl()

As a reminder, to run code, with your cursor on the line of code you’d like to run, you can hit ‘ctrl
+ enter’ to run that line of code. Similarly, if there are multiple lines you want to run, you can
highlight the lines you want to run and again hit ‘ctrl + enter’ to run those lines of code.
This will bring up a prompt asking you what swirl should call you. Type your first name as a response
here and hit “enter.”
RStudio Cloud Tour 45

starting swirl

Swirl will often send you some text to read. Always read the text as this text will help explain the
background information you need or will provide you with information you need to answer the
question. At this point, swirl is explaining that when you see ..., that’s when you should press
“enter” to continue.
When you see > or a list of options (like 1:, 2:, 3), that lets you know swirl is looking for something
from you! When you see > that’s a prompt letting you know swirl is expecting you to write some
code. When you see a list of options, those are the possible answers to a question you’re being asked.
In these cases, you’ll want to select the number corresponding to the right answer. For this practice
in swirl, select 1, 2, or 3 and press enter.
RStudio Cloud Tour 46

Getting started in swirl

You’ll then be given a number of options that you can use within swirl whenever you see the >
prompt. Read the list here, but know that info() gives you this list of options again, main() returns
you to swirl’s main menu, and bye() saves your progress but exits swirl.
RStudio Cloud Tour 47

swirl menu options

After this, you will be shown a list of courses. The list will be longer than what you see here, but
we’re showing this simple example to demonstrate that if you wanted to start on the course “CBDS
Introduction to R”, you would type 1. You’ll be told which course to select throughout the course
set.
RStudio Cloud Tour 48

Selecting a course

Note that for each quiz question you complete in swirl, upon completion, you’ll receive a code. This
code is to be entered as the answer to the quiz question on Leanpub.
That’s a basic introduction to using swirl. You’ll have lots of quiz questions that require you to use
swirl in this Course Set, so be sure to walk through this introduction on RStudio Cloud now and get
comfortable navigating within swirl.

Summary
In this lesson we took a tour of RStudio Cloud. We became familiar with the main menu and its
various menus. We looked at the Console, where R code is input and run. We then moved on to
the Environment panel that lists all of the objects that have been created within an R session and
allows you to view these objects in a new tab in Source. In this same quadrant, there is a History
tab, that keeps a record of all commands that have been run. It also presents the option to either
rerun the command in the Console, or send the command to Source, to be saved. Source is where
you save your R commands. And the bottom right quadrant contains a listing of all the files in your
working directory, displays generated plots, lists your installed packages, and supplies help files for
when you need some assistance! Take some time to explore RStudio Cloud and get more comfortable
navigating swirl on your own!
RStudio Cloud Tour 49

Slides and Video

View this Video at https://youtu.be/G2cq4bAxMQQ¹⁵.


RStudio Cloud Tour

• Slides¹⁶

Take this quiz online¹⁷


¹⁶https://docs.google.com/presentation/d/17gq_-4nXwZRznS6OVxCwcZ6uYI_ym5mrO_3oNqqNFk4/edit?usp=sharing
¹⁷http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_01_rstudio_cloud_tour
R Packages
Now that we’ve looked at R and RStudio and have a basic understanding of how they work together,
we can get at one thing that makes R so special: packages.

What is an R package?
So far, anything we’ve played around with in R uses the “base” R system. Base R, or everything
included in R when you download it, has rather basic functionality for statistics and plotting but it
can sometimes be limiting. To expand upon R’s basic functionality, people have developed packages
A package is a collection of functions, data, and code conveniently provided in a nice, complete
format for you. At the time of writing, there are just over 17,600 packages available to download -
each with their own specialized functions and code, all developed for a specific but different purpose.
For a really in depth look at R Packages (what they are, how to develop them), check out Hadley
Wickham’s book from O’Reilly, “R Packages”¹⁸
Side note: A package is not to be confused with a library (these two terms are often conflated in
colloquial speech about R). A library is the place where the package is located on your computer.
To think of an analogy, a library is, well, a library… and a package is a book within the library. The
library is where the books/packages are located.
Packages are what make R so unique. Not only does base R have some great functionality but these
packages greatly expand its functionality. And perhaps most special of all, each package is developed
and published by the R community at large and deposited in repositories.

What are repositories?


For R packages, a repository is central location where many developed packages are located and
available for download.
Note: You may remember the word “repository” from an earlier lesson on GitHub. Like in GitHub,
where a repository was where all the code for each data science project is stored, the repositories
for R packages are also places where information and code are stored. (In fact, as you’ll see below,
GitHub repositories are one of the main repositories for R packages!)
There are three big repositories for R packages:
1. CRAN (Comprehensive R Archive Network):¹⁹ R’s main repository (>12,100 packages available!)
2. BioConductor:²⁰ A repository mainly for bioinformatic-focused packages
¹⁸http://r-pkgs.had.co.nz/
¹⁹https://cran.r-project.org/web/packages/
²⁰https://bioconductor.org/packages/release/BiocViews.html#___Software
R Packages 51

3. GitHub:²¹ A very popular, open source repository (not R specific!)

Take a second to explore the links above and check out the various packages that are out there!

How do you know what package is right for you?


So, you know where to find packages… but there are so many of them, how can you find a package
that will do what you are trying to do in R? There are a few different avenues for exploring packages.
First, CRAN groups all of its packages by their functionality/topic into 35 “themes.” It calls this
its “Task view.”²² This at least allows you to narrow the packages you can look through to a topic
relevant to your interests.
Second, there is a great website, R Documentation,²³ which is a search engine for packages and
functions from CRAN, BioConductor, and GitHub (ie: the big three repositories). If you have a task
in mind, this is a great way to search for specific packages to help you accomplish that task! It also
has a “task” view²⁴ like CRAN, that allows you to browse themes.
More often, if you have a specific task in mind, Googling that task followed by “R package” is a great
place to start! From there, looking at tutorials, vignettes, and forums for people already doing what
you want to do is a great way to find relevant packages.

How do you install packages?


Great! You’ve found a package you want… How do you install it?

Installing from CRAN

If you are installing from the CRAN repository, use the install.packages() function, with the
name of the package you want to install in quotes between the parentheses (note: you can use either
single or double quotes). For example, if you want to install the package ggplot2, you would use:
install.packages("ggplot2")

Try doing so in your R console! This command downloads the ggplot2 package from CRAN and
installs it onto your computer.
If you want to install multiple packages at once, you can do so by using a character vector (we’ll get
back to exactly what that means in a later lesson in this course!), like: install.packages(c("ggplot2",
"devtools", "lme4"))

If you want to use RStudio’s graphical interface (meaning you would point-and-click more than you
would type into the console) to install packages, go to the Tools menu, and the first option should be
²¹https://github.com/collections
²²https://cran.r-project.org/web/views/
²³https://www.rdocumentation.org
²⁴https://www.rdocumentation.org/taskviews
R Packages 52

“Install packages…” If installing from CRAN, select it as the repository and type the desired packages
in the appropriate box.

You can install through the console interface using the above commands or using the Install Packages menu option
R Packages 53

Select the appropriate repository and type in your desired packages

Installing from Bioconductor

The BioConductor repository uses their own method to install packages²⁵. While you get started
and learn to code in R, you will likely not be installing packages from Bioconductor; however, if you
later on work in biology-focused fields, you’ll want to know about Bioconductor. So, we’ll cover
this now so you know about Bioconductor, even if you don’t install most of your packages from this
repository right now.
First, to get the basic functions required to install through BioConductor, use: install.packages("BiocManager")
This makes the main install function of BioConductor, BiocManager::install(), available to you.
Following this, you call the package you want to install in quotes, between the parentheses of the
BiocManager::install command, like so: BiocManager::install("GenomicFeatures")

Installing from GitHub

This is a more specific case that you probably won’t run into too often as you just get started working
in R. As packages are developed, the code is frequently put into a GitHub repository. At this point, as
long as the repository is public, anyone can install the package in RStudio. However, most developers,
once a package is complete, will submit it to CRAN (the first repository discussed above), making it
²⁵https://www.bioconductor.org/install/
R Packages 54

stably available to all R users. Thus, most of the packages discussed throughout these courses will be
available from CRAN. However, newer packages that are still under active development will likely
have to be installed directly from GitHub.
In the event you want to do this, you first must find the package you want on GitHub and take note
of both the package name AND the author of the package. Check out this guide²⁶ for installing from
GitHub, but the general workflow is:

1. install.packages("devtools") - only run this if you don’t already have devtools installed.
If you've been following along with this lesson, you may have installed it when we
were practicing installations using the R console
2. library(devtools) - more on what this command is doing immediately below this
3. install_github("author/package") replacing “author” and “package” with their GitHub
username and the name of the package.

Loading packages
Installing a package does not make its functions immediately available to you. First you must load
the package into R; to do so, use the library() function. Think of this like any other software you
install on your computer. Just because you’ve installed a program, doesn’t mean it’s automatically
running - you have to open the program. Same with R. You’ve installed it, but now you have to
“open” it. For example, to “open” the “ggplot2” package, you would run:library(ggplot2)
NOTE: Unlike when you’re installing a package, when loading a package, you do not have to put
the package name in quotes.
There is an order to loading packages - some packages require other packages to be loaded first. These
other packages are known as dependencies. When you install a new package, that new package’s
manual/help pages will automatically determine what packages this new package is dependent upon
and will install those packages as well.
If you want to load a package using the RStudio interface, in the lower right quadrant there is a
tab called “Packages” that lists out all of the packages and a brief description, as well as the version
number, of all of the packages you have installed. To load a package just click on the checkbox beside
the package name
²⁶http://kbroman.org/pkg_primer/pages/github.html
R Packages 55

Find the package you want to load from the list of installed packages and check the box to load it

Updating, removing, unloading packages


Once you’ve got a package, there are a few things you might need to know how to do:

Checking what packages you have installed

If you aren’t sure if you’ve already installed a package, or want to check what packages are installed,
you can use either of: installed.packages() or library() with nothing between the parentheses
to check!
In RStudio, that package tab introduced earlier is another way to look at all of the packages you
have installed.

Updating packages

Like your projects in GitHub, packages are version controlled. As updates are made to packages,
the version on the package will change and be updated. To see if packages you’ve installed need
an update, use the function old.packages(). This will identify all packages that have been updated
since you installed them/last updated them.
To update all packages, use update.packages(). If you only want to update a specific package, just
use once again install.packages("packagename")
R Packages 56

Within the RStudio interface, still in that Packages tab, you can click “Update,” which will list all of
the packages that are not up to date. It gives you the option to update all of your packages, or allows
you to select specific packages. If all of your packages are up to date you will get the message “All
packages are up to date.”

Using the Update menu, you can select all or some of the packages you have installed that you can update

You will want to periodically check in on your packages and check if you’ve fallen out of date - be
careful though! Sometimes an update can change the functionality of certain functions, so if you
re-run some old code, the command may be changed or perhaps even outright gone and you will
need to update your code too!

Unloading packages

Sometimes you want to unload a package in the middle of a script - the package you have loaded
may not play nicely with another package you want to use.
To unload a given package you can use the detach() function. For example, detach("package:ggplot2",
unload=TRUE) would unload the ggplot2 package (that we loaded earlier). Within the RStudio
interface, in the Packages tab, you can simply unload a package by unchecking the box beside the
package name.
R Packages 57

Uninstalling packages

If you no longer want to have a package installed, you can simply uninstall it using the function
remove.packages(). For example, remove.packages("ggplot2")

(Try that, but then actually re-install the ggplot2 package - it’s a super useful plotting package!)
Within RStudio, in the Packages tab, clicking on the “X” at the end of a package’s row will uninstall
that package.

Sidenote: How do you know what version of R you have?

Sometimes, when you are looking at a package that you might want to install, you will see that it
requires a certain version of R to run. To know if you can use that package, you need to know what
version of R you are running!
One way to know your R version is to check when you first open R/RStudio - the first thing it
outputs in the console tells you what version of R is currently running. If you didn’t pay attention
at the beginning, you can type version into the console and it will output information on the R
version you are running. Another helpful command is sessionInfo() - it will tell you what version
of R you are running along with a listing of all of the packages you have loaded. The output of this
command is a great detail to include when posting a question to forums - it tells potential helpers
a lot of information about your OS, R, and the packages (plus their version numbers!) that you are
using.
In the output from sessionInfo(), you’ll note that the end of each package’s name has an underscore
followed by a series of numbers. Those numbers indicate the packages version. For example, the
version of ggplot2 installed in this session is version 2.2.1 (read version two point two point 1). This
number will change (increase) every time developers make changes to this package.
R Packages 58

sessionInfo() shows you packages and versions

Using the commands in a function


In all of this information about packages, we haven’t actually discussed how to use a package’s
functions! While functions are discussed in greater detail in a later lesson in this course, for now,
know that to use the contents of a package, you’ll use functions.
First, you need to know what functions are included within a package. To do this, you can look at
the man/help pages included in all (well-made) packages. In the console, you can use the help()
function to access a package’s help files. Try help(package = "ggplot2") and you will see all of the
many functions that ggplot2? provides. Within the RStudio interface, you can access the help files
through the Packages tab (again) - clicking on any package name should open up the associated help
files in the “Help” tab, found in that same quadrant, beside the Packages tab. Clicking on any one
of these help pages will take you to that functions help page, that tells you what that function is for
and how to use it.
Once you know what function within a package you want to use, you simply call it in the console
like any other function we’ve been using throughout this lesson. Once a package has been loaded,
it is as if it were a part of the base R functionality.
If you still have questions about what functions within a package are right for you or how to use
them, many packages include “vignettes.” These are extended help files, that include an overview of
the package and its functions, but often they go the extra mile and include detailed examples of how
R Packages 59

to use the functions in plain words that you can follow along with to see how to use the package. To
see the vignettes included in a package, you can use the browseVignettes() function. For example,
let’s look at the vignettes included in ggplot2:browseVignettes("ggplot2") . You should see that
there are two included vignettes: “Extending ggplot2” and “Aesthetic specifications.” Exploring the
Aesthetic specifications vignette is a great example of how vignettes can be helpful, clear instructions
on how to use the included functions.

Summary
In this lesson, we’ve explored R packages in depth. We examined what a packages is (and how it
differs from a library), what repositories are, and how to find a package relevant to your interests. We
investigated all aspects of how packages work: how to install them (from the various repositories),
how to load them, how to check which packages are installed, and how to update, uninstall, and
unload packages. We took a small detour and looked at how to check what version of R you have,
which is often an important detail to know when installing packages. And finally, we spent some
time learning how to explore help files and vignettes, which often give you a good idea of how to
use a package and all of its functions.
If you still want to learn more about R packages, here is a great resource: Introduction to R Packages²⁷
from Ken Rice and Timothy Thornton.

Additional Resources
• “R Packages”²⁸, by Hadley Wickham
• CRAN (Comprehensive R Archive Network):²⁹
• BioConductor:³⁰
• GitHub:³¹
• Introduction to R Packages³², from Ken Rice and Timothy Thornton
²⁷http://faculty.washington.edu/kenrice/rintro/sess08.pdf
²⁸http://r-pkgs.had.co.nz/
²⁹https://cran.r-project.org/web/packages/
³⁰https://bioconductor.org/packages/release/BiocViews.html#___Software
³¹https://github.com/collections
³²http://faculty.washington.edu/kenrice/rintro/sess08.pdf
R Packages 60

Slides and Video

View this Video at https://youtu.be/2vtK6pbPBJU³³.


R Packages

• Slides³⁴

Take this quiz online³⁵


³⁴https://docs.google.com/presentation/d/1sVQJJELq39ctr29VXQGLqb5hw5lGzmbVMgI0ehFa_zo/edit?usp=sharing
³⁵http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_02_intro_to_packages
Objects in R
Now that we’ve gone over what exactly R is and outlined the package functionality that makes it
useful for working with data, we are going to cover information needed to actually start working in
R. To work with data in R, you’ll need an understanding of objects.

What is an object?
Simply put, an object in R is something that contains information. In R there are a number of basic
classes of objects.

Classes of objects

Five classes that you’ll be working with commonly are:

• character
• integer
• numeric (real numbers)
• logical (TRUE/FALSE)
• factor (categorical information)

These classes are the building blocks for creating all sorts of objects in R.

Types of objects

We store these different classes of objects in different ways. The ways this information is stored is
referred to as the *type** of object.
When talking about objects in R, it may be helpful to think of actual objects in every day life for
comparison. For example, think of three objects: a bucket, a pot you would cook with, and a backpack.
These three objects are clearly designed for and carry out different purposes. The bucket may be
used to carry water to clean your floor, the pot to cook pasta, and the backpack to carry notebooks,
but, we can agree that they are all objects, just different **types* of objects. The water, pasta, and
notebooks would be the information contained in the object. In this real-life example, the “class” of
the information may be “liquid”, “food”, and “paper”.
Objects in R 62

Objects in real-life analogy

https://docs.google.com/presentation/d//edit?usp=sharing
That said, each object in this example could hold any of the different classes of information. A pot
of water could hold liquid to clean your floor, it just not make much sense to do so. In R, as you get
more comfortable, you’ll see that each type of object can hold any of the classes of information in
the object, but there are times where each class makes the most sense.
Thus, in R, like in this real-life object example, there are not only different classes of objects, but
different types of objects (different ways to store these classes of objects). Each type is used to store
information in a slightly different way.
The simplest type of object in R is called a vector, which is an object that can contain multiple items.
Generally, each individual vector can only contain objects of the same class, but a certain type of
vector, called a list, can contain objects of different classes. You will learn about lists in a later lesson.
For now, it’s not important to understand the details of that last paragraph, but it is important to
know that there are different types of objects and that these objects each hold information of a
specific class.
We’ll begin this lesson by looking at how to create objects in each of these five basic classes in R.
Objects in R 63

Storing objects
In R, as with all programming languages, it is important to be able to store objects that we create
so that we can use them in later code. The process of storing an object is called assignment, and it
entails giving an object a name. For example, the following code creates an object called min_age
and stores inside that object the value 21.

min_age <- 21

The <- operator is called the assignment operator. The = operator can also be used for assignment.
This code accomplishes the same thing as the code above. It stores the number 21 in an object called
min_age:

min_age = 21

Having this minimum age variable stored in an object can be useful later if we have data where we
only want to keep individuals who exceed this minimum age.

Printing objects
Often we will want to print the contents of an object to see the information it contains. We can do
this by clicking in the Console in RStudio Cloud (bottom left corner). The R prompt is indicated
by the > in the Console. This indicates that R is ready to accept a command from you. If we simply
enter the number 21 at the R prompt, the 21 object will be printed, but you will not see an object
come up under the Environment pane (top right corner). When we create the min_age object, you
will see this object come up under the Environment pane. We can also print this object to the screen
by entering its name at the R prompt.

> 21
[1] 21

> min_age <- 21


> min_age
[1] 21

The 1 in square brackets that gets displayed in the printed output is simply an index that is provided
for convenience of reading in case the object contains several values. It indicates that the number
21 is the first number in this object. It also happens to be the only number in this object.
Throughout this course and curriculum, when we display code without the > indicating that we are
not at the R prompt in the Console, we are emphasizing only the R command. When we display
code with the > indicating that we are at the R prompt in the Console, we want to emphasize the
commands and how the output is displayed on the screen.
Objects in R 64

Character
Character objects in R can be created by surrounding a string in either double quotes or single quotes
as in the following two examples.
“This is a character object.”
‘This is also a character object.’
The example below shows how to store the above sentence character object in an object named
my_char. my_char is a character vector of length 1.

my_char <- "This is a character object."

We can create a character vector named my_char_vec with multiple character objects using the
concatenate function, c(). While we’ll discuss functions more in later lessons, the word concatenate
means to link things together in a series, so this function links pieces of information together:

my_char_vec <- c("char object 1", "char object 2")

This character vector contains two different pieces of information. In R, the number of pieces of
information in a vector is referred to as that vectors length. Thus, this vector is of length 2.

Integer
Integers are whole numbers, such as 1, 23, or 1000. 1.2 is not an integer, as it contains a fraction of a
number. Integer objects in R can be created by specifying an integer number followed by the letter
“L”. The following creates an integer object called num and stores the value 1.

num <- 1L

Without the letter “L”, the number will be recognized as a more general, numeric object (discussed
below). We can create an integer vector with multiple items using the c function, the concatenation
function. The following creates an integer vector of length 3 with the numbers 1, 10, and 3.

num_vec <- c(1L, 10L, 3L)

We can also create an integer vector with the colon operator. The colon operator specifies to
include all numbers between the value before the colon and the value after the colon. The following
command creates an integer vector with the numbers 2, 3, 4, and 5.
Objects in R 65

num_vec2 <- 2:5

If we create longer vectors and print the output, we can see the use of having the square bracket
indices at the beginning of the lines of the printed output. In this last example, we see that 4 is the
first number in the vector, and 12 is the ninth number in the vector, as specified by the 9 in brackets
to start the second line of output.

> 4:16
[1] 4 5 6 7 8 9 10 11
[9] 12 13 14 15 16

Numeric
Numeric objects in R represent real numbers and are created by simply entering a number. Thus,
while 1.2 is not an integer, it is a real number. Thus 1.2 could be stored as a numeric but not an
integer.

num1 <- 1
num2 <- 1.2

We can create a numeric vector with multiple items using the c function.

num_vec <- c(1.2, 9.8)

As discussed previously, we can also use R as calculator. At the prompt, we can enter mathematical
expressions without assignment to display the results as a calculator would. The operators for
addition, subtraction, multiplication, division, and exponentiation in R are +, -, *, /, and ^
respectively.

> 1+5
[1] 6
> 2-3
[1] -1
> 4*2
[1] 8
> 4/5
[1] 0.8
> 3^2
[1] 9

Logical
Logical objects in R represent true or false conditions and can be created by typing “TRUE” or
“FALSE”.
Objects in R 66

check_condition <- TRUE


check_condition <- FALSE

We can create a logical vector with multiple items using the c function.

check_condition <- c(TRUE, TRUE, FALSE)

Factor
Factor objects contain information for categorical variables (e.g. color, shape), where there are a
number of possible values the object can take, but these values are limited. For example, a categorical
variable could include the colors of the rainbow. Here, values could be red, orange, yellow, green,
blue, indigo, or violet. Thus, values could be one of seven different colors, but the categorical variable
is limited to one of these seven values.
To simplify this example and make factors explicitly clear, the following colors object is a character
vector containing five pieces of color information. There are only two unique colors present: red
and blue. These unique colors are called the levels of a factor.

colors <- c("red", "red", "blue", "red", "blue")

To create a factor object out of this character vector we can use the factor function or the as.factor
function. Let’s try both and look at the objects created.

> colors_factor1 <- factor(colors, levels = c("red", "blue"))


> colors_factor1
[1] red red blue red blue
Levels: red blue
> colors_factor2 <- as.factor(colors)
> colors_factor2
[1] red red blue red blue
Levels: blue red

When we used the factor function we also specified the levels to be red and blue. The order of the
levels we specified is important: first red, then blue. We can see that when we print this object the
levels are listed in the order we specified. A quick way to create a factor object is with the coercion
function as.factor. When we print this object, the levels are opposite to what we specified when
we used the factor function because by default, the levels are specified in alphabetical order. Here
the first level is blue and the second is red. The ordering of levels will be important in future courses
when we cover data tidying, plotting, and statistical modeling.
One last topic to cover with factors is labeling. We can control the displayed labels of a factor with
another option with the factor function. This need often arises if we want to create a factor object
Objects in R 67

from an integer object or from a character object with labels that we don’t like. In the example below,
we see that we originally had ozone information encoded with integers. When we use the factor
function to make a corresponding factor object, we specify both the unique levels present in the
integer object and the desired labels with a character vector. The order of the specified levels should
correspond to the order of the specified labels. The two examples, ozone_factor and ozone_factor2,
create the same labeling of the original integer vector, but the order of the levels is different between
the two approaches. In the first approach, the first level is low, the second is medium, and the third
is high, which is the most natural ordering. In the second approach, the first level is medium, the
second is low, and the third is high.

> ozone_levels <- c(1,2,1,3,1,1)


> ozone_factor <- factor(ozone_levels, levels = 1:3, labels = c("low", "medium", "hi\
gh"))
> ozone_factor
[1] low medium low high low low
Levels: low medium high
> ozone_factor2 <- factor(ozone_levels, levels = c(2,1,3), labels = c("medium", "low\
", "high"))
> ozone_factor2
[1] low medium low high low low
Levels: medium low high

Data frames
Now that we’ve covered common basic data classes, we will now discuss data frames. Data frames
are a more complex data type than the simple vectors than we’ve seen so far. Data frames organize
data into a rectangular format where each column corresponds to a single variable and each row
corresponds to an observation. So a row of a data frame contains an observation’s values for all
variables. An example of a data frame is shown below:
Objects in R 68

Data frame example

We see along the columns different variables related to car properties, and each row gives
information on those properties for a particular car model. Every column in a data frame is a simple
vector of values all from the same class. Most often, the data that we work with can be represented
with data frames.
You will learn more about working with data frames in subsequent lessons in this course and also
in later courses.

Missing values
The last topic that we should discuss in our introduction to R objects is missing values. During
nearly any type of data collection, there is information missing for one or more variables. Thus, it is
important to understand how R handles missing values. Most missing values that you will deal with
are encoded with NA in R. Below are some examples of creating objects of the various basic types
we discussed above that contain missing values.
Objects in R 69

> char_vec <- c(NA, "two", "four")


> char_vec
[1] NA "two" "four"
> num_vec <- c(1L, 10L, NA, 3L)
> num_vec
[1] 1 10 NA 3
> num_vec <- c(1.2, 9.8, NA)
> num_vec
[1] 1.2 9.8 NA
> logi_vec <- c(TRUE, NA, FALSE, FALSE)
> logi_vec
[1] TRUE NA FALSE FALSE
> factor_vec <- as.factor(c(NA, "apple", "banana"))
> factor_vec
[1] <NA> apple banana
Levels: apple banana

Another missing value that can arise in R is NaN which stands for “not a number.” This can arise in
mathematical calculations, such as 0 divided by 0.

> 0/0
[1] NaN

Determining the class of an object


In this lesson so far we have discussed how to create the five main classes of objects in R; however,
we haven’t yet described how to determine the class of an object once its been stored. To do so, you
would use the function class() and specify the class of the object within the parentheses:

> min_age <- 21


> class(min_age)
[1] "numeric"

> min_age <- 21L


> class(min_age)
[1] "integer"

> colors <- c("red", "red", "blue", "red", "blue")


> class(colors)
[1] "character"

> colors_factor1 <- factor(colors, levels = c("red", "blue"))


Objects in R 70

> class(colors_factor1)
[1] "factor"

As you can see, the class of the object specified within the parentheses is the class of that object.

Summary
In this lesson, we’ve discussed that within R information can be assigned to objects. We’ve covered
the five main classes of objects in R and have started to touch on the different types of objects
in R, but will discuss this in greater detail in later lessons in this course. We’ve discussed how to
create each class of object in R as well as each class’ unique properties. Finally, we discussed how
to determine the class of an object in R using the function class().

Slides and Video

View this Video at https://youtu.be/w_R66uXAY4I³⁶.


Objects in R

• Slides³⁷

Take this quiz online³⁸


³⁷https://docs.google.com/presentation/d/1Q47qnIkVzE-JzCEE5Lm54P6yqReg09QJdr7kiFyCbGc/edit?usp=sharing
³⁸http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_03_objects
Basic Commands in R
Now that we’ve covered some essentials about R objects, we’ll go over some basic commands that
will be helpful in working with data.

Functions
In working with data, we will be making substantial use of functions. Functions in R carry out some
task. They are always a word (or set of words connected by underscores or periods followed by a
set of parentheses, so the general structure of a function in R would look something like this:

function(input)

function_name(input)

The input to a function in R is known as an argument. Functions require at least one argument,
but can require multiple different arguments, depending on the function. These inputs are often
objects and other variables detailing how you wish to view, summarize, or manipulate these objects.
Function outputs come in a variety of formats. They can return information about the contents of an
object; they can return a manipulated version of an object; and they can create entirely new objects.
In this lesson, we will cover some essential functions for exploring data. This will only consist of
functions that return information about the contents of an object. As you learn more about R, you
will learn about functions that can manipulate objects or create entirely new objects.
To visually understand the anatomy of a function call (a term that describes the using of a function),
let’s look at the following example:

mean(x, trim = 0.1)

We have an object x that presumably contains numbers, and we want to compute the mean of these
numbers with the mean function. As stated above, all of the information inside the parentheses are
function inputs (also called arguments), and they are separated by commas. In this command, I
have supplied the object x and an additional argument trim that I set to be 0.1. The trim argument
calls for a number between 0 and 0.5 and specifies the fraction of the observations in x to trim from
the upper and lower ends of the data. Here, by including the trim argument, I am specifying that I
want to take the mean of the middle 80% of the data.
Basic Commands in R 72

What is this object?


If someone were to write down a mystery noun for us to guess, our first question would likely be:
“Is it a person, place, or thing?” When working with R objects, we will initially want similar types
of information. Here we will go over some functions that can help in this regard.
As discussed briefly in the last lesson, the class function returns the class of an R object. This is
useful for determining if an object is an atomic vector, list, or some other type of object. If it is an
atomic vector, this function tells you the type.

> x <- 1:10


> class(x)
[1] "integer"
> y <- c(1.1,2.2)
> class(y)
[1] "numeric"
> class(mtcars)
[1] "data.frame"

The str function stands for “structure”, and it returns a description of the structure of an object.
It tells you the class of an object, its size, and a preview of different components of the object. For
example, when we call the str function on a data frame object (mtcars), we see that its class is
data.frame, it has 32 rows and 11 columns, and a preview of each of the 11 columns, including the
class of each column. In this example, all of the columns are numeric variables relating to features
of different models of cars.

> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

How big is this object?


After we determine generally what an object is, it is useful to know how much information it
contains, how big it is.
Basic Commands in R 73

The dim function returns the dimensions of a rectangular object, such as a matrix or a data frame.
The output is an integer vector with two components: first is the number of rows (which can also
be obtained with nrow()), and second is the number of columns (which can also be obtained with
ncol()). We saw previously that the str function provides the same information and more, so why
would we use these functions instead? The str function provides this information by printing it to
the screen for us to visually see, but it does not extract this information directly. If we need to use
the dimensions later in the analysis as a variable, these functions provide a direct way to store this
information.

> dim(mtcars)
[1] 32 11
> nrow(mtcars)
[1] 32
> ncol(mtcars)
[1] 11

The length function returns the number of items in a vector object. We talked about this briefly
last lesson that the number of things in your object is referred to as its length. Here, we can quickly
calculate the length of an object by calling the length function.

> x <- c(1, 10, 3)


> length(x)
[1] 3

Are there named features of this object?


Another way to explore an object in R is to see what components it has. In R, these components are
designated with names.
The names function can be used to get and set the names of an R object, most often an atomic vector
or a list. For example, we can create an R object called prize_money that contains the prize money
for first, second, and third places:

prize_money <- c(1000, 500, 250)

If we want to label this vector with the prizes, we can use names combined with the assignment
operator <- and a character vector of labels:

names(prize_money) <- c("first", "second", "third")

Later in our work, if we want to remind ourselves of the labels, we can use the names function by
itself, which will print the names for the object.
Basic Commands in R 74

> names(prize_money)
[1] "first" "second" "third"

Note that in many situations, it will be better practice to encapsulate the above information in a
two-column data frame instead of a named vector as below.

prize_info <- data.frame(


money = c(1000,500,250),
place = c("first", "second", "third")
)

This is more convenient for further work if you have other objects that have information on first,
second, or third placing, but not prize money information. You’ll learn more about these concepts
when you learn about “tidy data” in a later course.
The colnames() and rownames() functions act analogously to the names function but are used for
the column labels and row labels of a matrix or data frame. The numbers in square brackets at the
beginning of the lines of printed output indicate the index of the first observation on the line. So for
the row names, we can see that “Duster 360” is the seventh element.

> colnames(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
> rownames(mtcars)
[1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
[4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
[7] "Duster 360" "Merc 240D" "Merc 230"
[10] "Merc 280" "Merc 280C" "Merc 450SE"
[13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
[16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
[19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
[22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
[25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
[28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
[31] "Maserati Bora" "Volvo 142E"
Basic Commands in R 75

colnames() and rownames() functions

What does this object look like?


Sometimes we may just want to see the information contained in an object. Here we will discuss
functions that allow you to see parts of objects.
The print function displays the entire contents of an object.

print(mtcars)

Recall that in R, the Console is where commands can be typed and entered for R to run. When R is
ready to accept a command a greater than sign will be displayed. An alternative to calling the print
function is to simply type the name of the object in the Console and press enter. In general printing
an entire object is not advisable just in case the object is quite large. In this case your screen would
overflow with text!

mtcars
Basic Commands in R 76

printing objects’ contents to the screen

Safer alternatives to printing are the head and tail functions. The head function displays the
beginning of an object. By default, it shows the first 6 items. If the object is a vector, head shows
the first 6 entries. If the object is a rectangle, such as a matrix or a data frame, head shows the first
6 rows. The tail function is analogous to head but for the end of the object.

> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

> tail(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Basic Commands in R 77

Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2

head() and tail() can be used to see a portion of the data

The summary function computes summary statistics for numeric data and performs tabulations for
categorical data, which are called factors in R.

> summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
Basic Commands in R 78

The summary() function summarizes data

The unique function shows only the unique elements of an object. For vectors, this returns the set
of unique elements. For rectangles such as matrices and data frames, this returns the unique rows.
This function is useful if we want to check the coding of our data. If we have sex information, then
we expect the result of unique to be two elements. If not, there is likely some data cleaning that
must be done. The unique function is also useful for simply exploring the values that a variable can
take. In the example below, we can see that in the mtcars data frame, there are only cars with 6, 4,
and 8 cylinders. Note that to extract the column corresponding to cylinders, we used a dollar sign
followed by the column name: $cyl. This is an example of subsetting that you will learn in later
lessons.

> unique(mtcars$cyl)
[1] 6 4 8
> dat <- data.frame(a = c(1,1), b = c(2,2))
> dat
a b
1 1 2
2 1 2
> unique(dat)
a b
1 1 2
Basic Commands in R 79

The unique() shows unique elements of an object

Errors, Warnings, and Messages


In R, there are three types information that R may return to you to your screen to provide you with
additional information. These come in the form of errors, warnings, and messages. While they
will often look similar to one another, it’s important to understand the difference between them.
The most serious of these messages is an error message. Errors indicate that the code you tried to
run did not run successfully. If you receive an error message, you should carefully look back at
your code to see what went wrong. Error messages cannot be ignored as they indicate that there
was no way for the code to run. Something has to be fixed before moving forward. For example, the
code here produces an error, since mtca is not a data frame or object in R.

unique(mtca$cyl)
Basic Commands in R 80

errors

Warnings are generally less serious than error messages. They are generated when the code
executes (meaning, it runs without producing an error and stopping), but produces something
unexpected. Warning messages should always be read, and then you, the person writing the code,
has the option to decide whether or not the code that has generated the warning needs to be re-
written. For example, the log function is only defined for numbers greater than zero. If, in R, you
try to take the log of a negative number, you get an output (NaN):

log(-1)

This output means the code executed (there was no error), but you also get a warning letting you
know that NaNs were produced. If you meant to take the log of a negative number, you would leave
the code as is. However, if you did not intend to do this, the warning message helps clue you into
the fact that you may want to revisit your code.
Basic Commands in R 81

warnings

Last but not least, messages, in general, are simply there to provide you with more information.
They do not indicate that you have done anything wrong. For example, if you were to run a function
that creates a directory if it does not yet exist, the function may provide you a message informing
you whenever a new directory has been created. This message would just be there to provide you
with more information. No further action is generally necessary when a message is provided.
Basic Commands in R 82

messages

Note that all three are in the same font and same color, so they’ll look similar in your RStudio
Cloud console. Over time, you’ll get more comfortable dealing with and understanding the difference
between the three. For now, be sure that to remember if you get an error, your code did not execute
successfully. Go back and find what caused the error.

Summary
In this lesson, you have been introduced to a number of commonly-used commands (functions)
that are available to you in R. These will help you to determine the class of objects (class()),
figure out how big an object is (length(), dim(), nrow(), ncol()), get an idea of what the object
looks like (str(), head(), tail()), and summarize the data contained in the object (summary(),
unique()), among many others. Understanding the functions discussed in this lesson and becoming
very comfortable with what each of these does is incredibly important for moving forward and
programming in R. Finally, we discussed errors, warnings, and messages in R. This is the foundation
of what we’ll use throughout the rest of the course, so spend some time here and ensure that you
understand what the code does in each example before moving on!
Basic Commands in R 83

Slides and Video

View this Video at https://youtu.be/OfsyXMY2sAw³⁹.


Basic commands in R

• Slides⁴⁰

Take this quiz online⁴¹


⁴⁰https://docs.google.com/presentation/d/1ew_I5lM283x6Xxlywznp-02tvCKcwxcvNMJeFM4gXUM/edit?usp=sharing
⁴¹http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_04_basic_commands
Working with Logicals
Earlier in this course, you learned that one of the basic classes of objects in R is the class of
logical objects which contain TRUE and FALSE values. Logicals come up very frequently in data
management and analysis because they form the basis of conditional operations (if a condition is
met, perform a task) and are instrumental in data exploration, visualization, and analysis. In this
lesson, we will cover the tools you will need to work with logical values in R.
As you work through this lesson, you’ll be inundated with TRUE and FALSE a lot. That is because
there are only two options when it comes to logicals. However, these are incredibly important and
helpful class of objects. So, take your time to understand each example. Copying and pasting the
code into your own RStudio, running it, and spending time to understand the output will really help
you understand how to work with logicals!

Logical operators
One of the most common ways to create and combine logical objects is to use logical operators.
Broadly speaking, operators are symbols that indicate some action. We introduced arithmetic
operators in an earlier lesson for performing routine arithmetic calculations. There was + for
addition, - for subtraction, * for multiplication, / for division, and ^ for exponentiation. Logical
operators in R perform actions relating to logic checking and include the following:

• !: the “not” operator


• &: the “and” operator
• |: the “or” operator (Shift + backslash() )
• ==: the “equals” operator
• !=: the “not equal” operator
• >: the “greater than” operator
• >=: the “greater than or equal to” operator
• <: the “less than” operator
• <=: the “less than or equal to” operator
• %in%: the “contained in” operator

Logical operators are used with one to several R objects in order to create logical objects. These
logical objects are the result of checking conditions, and they store the answers to yes/no questions
that you may ask throughout your work. Let’s look at several examples.
We have data on ages of some students, and we have stored this information in the ages object. We
have information on a common age cutoff that is applied to all students for a particular activity. This
is stored in the common_cutoff object. For another activity, we have individualized age cutoffs for
each student. This is stored in the indiv_cutoffs object. Let’s ask several yes/no questions relating
to this data.
Working with Logicals 85

ages <- c(12, 17, 16, 13, 14)


common_cutoff <- 13
indiv_cutoffs <- c(12, 12, 14, 14, 14)

Do the students’ ages equal the cutoff? To answer this, we would use the “equals” operator ==. (Note:
the equals operator requires two equals signs (==). You’ll recall that a single equals sign (=) is used
for object assignment and is equivalent to <- . Whenever you want to ask if two things are equal be
sure you have both equals signs in your code!) Here, each number in the ages object is compared to
13. Only the fourth student meets this condition.

> ages==common_cutoff
[1] FALSE FALSE FALSE TRUE FALSE

The output from this code prints “TRUE” for the individual (the fourth person) who meets this
condition.
Do the students’ ages equal the individualized cutoffs? Here, each number in the ages object is
compared to the corresponding number in the indiv_cutoffs vector. Only the first and fifth students
meet this condition.

> ages == indiv_cutoffs


[1] TRUE FALSE FALSE FALSE TRUE

This is obvious in the output from R, where the first and the fifth values are TRUE, while the rest
are FALSE.
Usually cutoffs are a bound rather than a specification of an equality, so we may instead ask if the
students older than the cutoff by using the “greater than” operator >.

> ages > common_cutoff


[1] FALSE TRUE TRUE FALSE TRUE
> ages > indiv_cutoffs
[1] FALSE TRUE TRUE FALSE FALSE

Are they at least as old as the cutoff? We can answer this with the “greater than or equal to” operator
>=.

> ages >= common_cutoff


[1] FALSE TRUE TRUE TRUE TRUE
> ages >= indiv_cutoffs
[1] TRUE TRUE TRUE FALSE TRUE

If the cutoffs are upper bounds instead of lower bounds, we can answer similar questions as above
using the “less than” < and “less than or equal to” <= operators.
Working with Logicals 86

> ages < common_cutoff


[1] TRUE FALSE FALSE FALSE FALSE
> ages < indiv_cutoffs
[1] FALSE FALSE FALSE TRUE FALSE
> ages <= common_cutoff
[1] TRUE FALSE FALSE TRUE FALSE
> ages <= indiv_cutoffs
[1] TRUE FALSE FALSE TRUE TRUE

So far we have treated the common cutoff and the individualized cutoffs separately, and we have
thus only used one logical operator at the time. We can use several logical operators simultaneously
to answer more complex yes/no questions. Are the students older than the common cutoff and the
individualized cutoffs? We can combine the “greater than” operator with the “and” & operator.

> ages > common_cutoff & ages > indiv_cutoffs


[1] FALSE TRUE TRUE FALSE FALSE

Are the students older than the common cutoff or the individualized cutoffs? We can combine the
“greater than” operator with the “or” | operator.

> ages > common_cutoff | ages > indiv_cutoffs


[1] FALSE TRUE TRUE FALSE TRUE

Are the students older than the common cutoff but not the individualized cutoffs? We can answer
this with the “not” operator or without it by reasoning through with the inequalities. In using the
“not” operator, it is a good idea to wrap the condition that you are negating in parentheses to enhance
clarity and avoid errors.

> ages > common_cutoff & !(ages > indiv_cutoffs)


[1] FALSE FALSE FALSE FALSE TRUE
> ages > common_cutoff & ages <= indiv_cutoffs
[1] FALSE FALSE FALSE FALSE TRUE

When working with complex logical expressions, it can help to store different parts of the
expression in their own objects. In reproducing the example above, we have stored the result
of the logical operation dealing with the common cutoff in the meets_common_cut logical object.
We have also stored the result of the logical operation dealing with the individual cutoffs in the
not_meets_indiv_cut logical object. These two objects can be combined at the end in a more
readable expression.
Working with Logicals 87

> meets_common_cut <- ages > common_cutoff


> not_meets_indiv_cut <- !(ages > indiv_cutoffs)
> meets_common_cut
[1] FALSE TRUE TRUE FALSE TRUE
> not_meets_indiv_cut
[1] TRUE FALSE FALSE TRUE TRUE
> meets_common_cut & not_meets_indiv_cut
[1] FALSE FALSE FALSE FALSE TRUE

Although these examples have all used numbers, logical operators can also be used for character and
factor objects. Let’s start with character objects. For comparing character objects, you will primarily
use the “equals” == and “not equal” != operators. For example, we have a character vector of colors.
Are the colors “red”?

> colors <- c("red", "red", "green", "orange", "blue")


> colors == "red"
[1] TRUE TRUE FALSE FALSE FALSE

Are the colors not “blue”?

> colors != "blue"


[1] TRUE TRUE TRUE TRUE FALSE

Here it is useful to introduce the “contained in” operator %in%. This operator checks if the elements
in the left hand object are contained in the right hand object. Are “red” and “purple” contained in
this set of colors? The length of the output is the same as the length of the left hand side. We ask
about two colors, “red” and “purple”, and we see that “red” is contained in the colors object but
“purple” is not.

> c("red", "purple") %in% colors


[1] TRUE FALSE

If we had reversed the command, we would instead be asking, “Are the colors in the colors object
contained in the red and purple set?” Only the instances of “red” will be marked as TRUE.

> colors %in% c("red", "purple")


[1] TRUE TRUE FALSE FALSE FALSE

When dealing with logical operations with factors, we can only use the “equals” == and “not equal”
!= operators. Usually we will want to compare factor objects with values of their labels. Let’s look
at logical operations for the following factor object containing height category information.
Working with Logicals 88

> height_factor <- factor(c(2,1,2,3,1), levels = 1:3, labels = c("short", "average",\


"tall"))
> height_factor
[1] average short average tall short
Levels: short average tall

Although we create this factor object from integers, comparing it to the value 1 will not give desired
results. The intention in comparing it to the integer 1 is to mark the short individuals with TRUE. We
can do this by either coercing the factor object to an integer object with as.integer or by comparing
the factor to the string label “short”.

> height_factor == 1
[1] FALSE FALSE FALSE FALSE FALSE
> as.integer(height_factor)
[1] 2 1 2 3 1

## coerce object to be an integer


> as.integer(height_factor) == 1
[1] FALSE TRUE FALSE FALSE TRUE

## compare to label directly


> height_factor == "short"
[1] FALSE TRUE FALSE FALSE TRUE

When we coerce the object to e an integer, we get the expected output. The second and final outputs
are TRUE, corresponding to the values of “1” in the height_factor object. The output is the same
for when the labels are directly compared. The output here returns TRUE for any places in the
height_vector object where the factor label is (equal to) “short”.

Logical functions
So far we have used logical operators to ask yes/no questions on a unit-by-unit basis. That is, asking
the question for each data observation. This has given us TRUE/FALSE answers for each unit. We
might also want to summarize the results of these multiple responses with questions such as “Do all
units meet the condition?” or “Do any (at least one) units meet the condition?”
For the first question, “Do all units meet the condition?”, we can use the all function. The all
function takes a logical object as input and returns TRUE if all values in the logical object are TRUE,
and it returns FALSE otherwise. Are all student ages equal to the individual cutoffs? Are all ages
greater than or equal to zero?
Working with Logicals 89

> all(ages == indiv_cutoffs)


[1] FALSE
> all(ages >= 0)
[1] TRUE

For the second question, “Do any units meet the condition?”, we can use the any function. The any
function takes a logical object as input and returns TRUE if at least one of the values in the logical
object is TRUE, and it returns FALSE otherwise. Are any of the student ages equal to the common
cutoff? Are any ages greater than 100?

> any(ages == common_cutoff)


[1] TRUE
> any(ages > 100)
[1] FALSE

Often we will want to combine the asking of yes/no questions with “who” and “how many”
questions. Who meets the condition? How many units meet the condition? For the first question,
“Who meets the condition?”, we can use the which function. The which function takes a logical object
as input and returns the indices of TRUE values. In this example, we see that the first and second
colors are the ones that are contained within the red and purple set.

> colors %in% c("red", "purple")


[1] TRUE TRUE FALSE FALSE FALSE
> which(colors %in% c("red", "purple"))
[1] 1 2

To answer, “How many units meet this condition?”, we can make use of the sum and mean functions.
The idea here is that logical values have a correspondence with the integer values of 0 and 1. TRUE
values correspond to 1, and FALSE values correspond to 0. Thus when we create a logical object, we
can use sum to count the number of TRUE values, and we can use mean to compute the fraction of
TRUE values.

## assign logical to ages that are greater than


## or equal to indiv_cutoffs
> meets_indiv_cut <- ages >= indiv_cutoffs
> meets_indiv_cut
[1] TRUE TRUE TRUE FALSE TRUE
## sum that object
> sum(meets_indiv_cut)
[1] 4
## get the mean of that object
> mean(meets_indiv_cut)
[1] 0.8
Working with Logicals 90

Here, the sum of the meets_indiv_cut is 4. When you sum a logical, R returns the number of TRUE
responses. Similarly, when you take the mean() of an object of class logical, you get the proportion
of responses that were TRUE. Here, that’s 4 out of 5, or 0.8.

Summary
This lesson walked you through how to work with operators and logical objects. This will be
incredibly helpful as you start to manipulate and clean data. Having a thorough understanding
of this class of objects and how to work with them will serve you well going forward.

Slides and Video

View this Video at https://youtu.be/EQYYXuPpVdE⁴².


Working with Logicals

• Slides⁴³

Take this quiz online⁴⁴


⁴³https://docs.google.com/presentation/d/1aAtIEkECvpYeOCa1lg6BXEvlw_mvkGhbhRJMVeBlBg4/edit?usp=sharing
⁴⁴http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_06_logicals
Lists and Data Frames
Now that we’ve covered basic object types and some commands to explore and work with them, we
will cover a slightly more complex type of object: lists. Lists have similarities to data frames, which
we have introduced briefly in previous lessons. In this lesson, you will learn about the structure of
list objects, how to create them, and how to subset them.

What are lists in R?


Previously, you have learned about basic classes of objects in R. These included character, integer,
numeric, logical, and factor objects. In particular, we covered the creation of simple vectors of objects
in these classes. These simple vectors are a collection of items all of the same class. In other words,
each slot of these simple vectors is a length-1 vector of the same class. The term slot and element
are often used interchangeably. They both refer to a particular item within the collection.
Lists are also a type of vector, but they are more complex than the simple vectors we have learned
about so far because each slot in a list can contain objects of different classes. The objects in these
slots can be simple vectors, lists, or other types of objects.
Lists are a general and flexible way to store information, and it turns out that data frames, which
you have learned about in previous lessons, are a special cases of lists. As you learn how to create
and work with lists in the remainder of this lesson, we will also cover connections to working with
data frames.
Lists and Data Frames 92

Vectors and Lists

Creating a list
The main way to create a list from scratch in R is with the list function. The list function takes
as many arguments as you want to give it and creates a list with each of the specified objects. For
example, we could conduct a poll in a first grade classroom and ask students to name some numbers,
animals, and colors that come to mind. In the example below, we have created a list with three slots
to store the responses for a single student. In the first slot, we have a numeric vector containing
three numbers. In the second, we have a character vector containing two animal names. In the
third, we have a character vector containing six colors. This will create a list with three slots, where
the responses in each slot are a different length.
Within the list function, each of these objects is separated with a comma.

responses_student1 <- list(c(4,20,3), c("bear", "giraffe"), c("red", "orange", "yell\


ow", "green", "blue", "purple"))

This means that the length of the list is three (one for the numbers, one for the animal names, and
one for the colors), but the lengths of the responses within each list differs for each slot (3 for the
numbers, 2 for the animal names, and 6 for the colors).
Lists and Data Frames 93

When we print this object to the screen, it looks as below. The double square brackets indicate the
slot number, or element number. So [[1]] indicates the first slot or first element of the list, and we
see that this first element is a length-3 numeric vector. The [[2]] indicates the second element, and
we see that it is a length-2 character vector. The [[3]] indicates the third slot, and we see that it
is a length-6 character vector. The double bracket notation alludes to one way that we can access
certain elements of a list. We will cover this in detail in the next section of this lesson.

> responses_student1
[[1]]
[1] 4 20 3

[[2]]
[1] "bear" "giraffe"

[[3]]
[1] "red" "orange" "yellow" "green" "blue" "purple"

Note that to access a single value within a list you can use the following notation: list[[element]][index].
For example, using the list above, responses_student1[[3]][1] would print “red” to the screen, as
this is the first value of the third element in the list responses_student1.
All that said, the information contained in this list would be improved with labels for the three list
elements. This is achieved by adding names to the list. As we learned about in the Basic Commands
lesson, we can do this with the names function.

names(responses_student1) <- c("numbers", "animals", "colors")

Now when we display the object, we can see these labels. The double square brackets have been
replaced with $label where label is one of the names that we just entered. Similar to the double
bracket notation, the dollar sign notation here indicates another way that we can access certain
elements of a list. We will cover this in detail in the next section of this lesson.

> responses_student1
$numbers
[1] 4 20 3

$animals
[1] "bear" "giraffe"

$colors
[1] "red" "orange" "yellow" "green" "blue" "purple"
Lists and Data Frames 94

We can also specify the names of the list elements from the start in the list function. Let’s create
another list object that contains poll responses for a second student. We can specify the names as
argument names in the list function by stating the name of each slot in the list followed by an
equals sign and then the values you want in that slot in the list after. Again, each slot is separated
by a comma.

responses_student2 <- list(numbers = 1:5, animals = c("T-rex", "tiger", "lion"), col\


ors = c("red", "green"))

When we display this object, we see that the names have been added automatically.

> responses_student2
$numbers
[1] 1 2 3 4 5

$animals
[1] "T-rex" "tiger" "lion"

$colors
[1] "red" "green"

To highlight the flexibility and complexity of lists, note that lists can be contained within lists! We
create a new list object that contains the responses for both students.

responses_all_students <- list(responses_student1, responses_student2)

Here we did not specify labels with argument names, so when we print this object, we see the double
square bracket notation return:

> responses_all_students
[[1]]
[[1]]$numbers
[1] 4 20 3

[[1]]$animals
[1] "bear" "giraffe"

[[1]]$colors
[1] "red" "orange" "yellow" "green" "blue" "purple"

[[2]]
Lists and Data Frames 95

[[2]]$numbers
[1] 1 2 3 4 5

[[2]]$animals
[1] "T-rex" "tiger" "lion"

[[2]]$colors
[1] "red" "green"

If we had specified argument names to label this list, the resulting object would look as follows:

> list(st1 = responses_student1, st2 = responses_student2)


$st1
$st1$numbers
[1] 4 20 3

$st1$animals
[1] "bear" "giraffe"

$st1$colors
[1] "red" "orange" "yellow" "green" "blue" "purple"

$st2
$st2$numbers
[1] 1 2 3 4 5

$st2$animals
[1] "T-rex" "tiger" "lion"

$st2$colors
[1] "red" "green"

Lists and data frames


We mentioned at the start of this lesson that data frames are a special case of lists. In particular,
data frames are lists where each element (column) is a simple vector of the same length. In the car
information data frame subset below, each column from mpg to carb is a simple vector. They are all
either numeric or integer vectors of length 6. Each element (column) in a data frame must be the
same length. The car models listed on the left-hand side do not actually form a column in the data
frame, but rather, they are the row names of the data frame. This ability to have row names is a
special feature of data frames that lists do not have.
Lists and Data Frames 96

mtcars data frame

We can see the relationship between data frames and lists by using the coercion function as.list.
We can see the familiar dollar sign notation indicating that the names of the list correspond to the
column names of the data frame. We can also see that the simple vectors in each of these slots has
length 6.

> as.list(head(mtcars))
$mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1

$cyl
[1] 6 6 4 6 8 6

$disp
[1] 160 160 108 258 360 225

$hp
[1] 110 110 93 110 175 105

$drat
[1] 3.90 3.90 3.85 3.08 3.15 2.76
Lists and Data Frames 97

$wt
[1] 2.620 2.875 2.320 3.215 3.440 3.460

$qsec
[1] 16.46 17.02 18.61 19.44 17.02 20.22

$vs
[1] 0 0 1 1 0 1

$am
[1] 1 1 1 0 0 0

$gear
[1] 4 4 4 3 3 3

$carb
[1] 4 4 1 1 2 1

Here, when displayed as a list, each column from the data frame is now a different element of the
list. This is meant to highlight the fact that data frames and lists are related. All data frames are lists.
But, data have the special constraint that each element must contain the same number of items as
all the other elements in the data frame and the special ability to have row names. Thus, data frames
are less flexible than lists.

Subsetting lists
It is often the case that we want to work with part of the information in an object, but not all of it.
As alluded to in the previous section, we can subset lists using double square bracket or dollar sign
notation. Because data frames are a special type of list, data frames can also be subset using double
bracket or dollar sign notation. In addition to double bracket and dollar sign notation, we will cover
single bracket notation for subsetting, and we will discuss the differences between these approaches.

Double square brackets

When using double square brackets, either an integer (i.e. 2) or a character string (i.e. “gear”) is
specified within the brackets. An integer specifies the index (also referred to as the position) within
the list to extract. A character string specifies that the extraction should be done by name. The
example below shows how both of these methods can be used to extract the second element of
the list l. Note that the extracted objects are character vectors, which we can see with the class
function. This is in contrast to the class of l being a list.
Lists and Data Frames 98

> ## create list


> l <- list(a = 1:7, b = c("foo", "bar", "biz"))
> ## extract by index
> res1 <- l[[2]]
> ## extract by character string
> res2 <- l[["b"]]
> res1
[1] "foo" "bar" "biz"
> res2
[1] "foo" "bar" "biz"
> class(l)
[1] "list"
> class(res1)
[1] "character"
> class(res2)
[1] "character"

This works similarly for data frames. The example below uses the same idea for a subset of the iris
data frame object.

> iris_subset <- head(iris, 3)


> iris_subset
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> iris_subset[[2]]
[1] 3.5 3.0 3.2
> iris_subset[["Sepal.Width"]]
[1] 3.5 3.0 3.2

Dollar signs

When using dollar sign notation for subsetting, the syntax is object$name, where object is the
name of the list/data frame object, and name is the label for the element you want to extract. Here,
in contrast to double square bracket notation, we must have names for the elements we want to
extract. In the example below, we extract the element named “b” in the list object l, and we extract
the “Sepal.Width” column in the iris_subset data frame.
Lists and Data Frames 99

> l$b
[1] "foo" "bar" "biz"
> iris_subset$Sepal.Width
[1] 3.5 3.0 3.2

Single square brackets

When using either double square brackets or dollar signs to extract list/data frame elements, the
subsetting is simplifying the output. That is, the original object was a list/data frame and the
extracted object is simpler than a list - it is a simple vector. The opposite of a subsetting operation
that simplifies the class of the output is a subsetting observation that preserves the class of the
output. For lists/data frames this means that the output of the subsetting is also a list/data frame.
Class preservation is achieved using single square bracket subsetting. As with double square bracket
subsetting, an integer index or a character string can be specified with the brackets.

> l[2]
$b
[1] "foo" "bar" "biz"

> l["b"]
$b
[1] "foo" "bar" "biz"

> class(l[2])
[1] "list"
> class(l["b"])
[1] "list"

Here we can tell from the printed output that the extracted output is a list because we can see the
$b printed in the output. We can also verify this with the class function. In the example below, we
show the same for the iris_subset data frame.

> iris_subset[2]
Sepal.Width
1 3.5
2 3.0
3 3.2
> iris_subset["Sepal.Width"]
Sepal.Width
1 3.5
2 3.0
3 3.2
Lists and Data Frames 100

> class(iris_subset[2])
[1] "data.frame"
> class(iris_subset["Sepal.Width"])
[1] "data.frame"

Here we can tell from the printed output that the extracted output is a list because we can see that
it is printed in a column. We can also verify this with the class function.

Summary
Lists are a flexible way to store complex data. We have seen how to create lists using the list
function in this lesson. As you move through the course, you will learn about ways of getting data
and information into R that create list objects automatically.
Working with subsets of data will be a daily part of your work routine, so familiarity with subsetting
operations will be vital. Sometimes when data is acquired, labels will be present, and sometimes it
won’t, so it is important to be comfortable with knowing when integer indices can be used to subset.
Knowing the difference between class-simplifying and class-preserving subsetting operations is also
important so that you know exactly what type of object you are working with. You will gain an
appreciation for this as you move through the courses and work on projects.

Slides and Video

View this Video at https://youtu.be/tVEVFLxmi2M⁴⁵.


Lists

• Slides⁴⁶

Take this quiz online⁴⁷


⁴⁶https://docs.google.com/presentation/d/10_DQyZ_g9h-MB8yRaen-GFcH9boTwgrF6GLdVUS7Cvk/edit?usp=sharing
⁴⁷http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_06_lists
Writing Functions in R
Up to this point, you’ve come across a number of functions. length() will tell you the length of an
object. names() will allow you assign names to your object. And, nrow() will tell you the number
of rows in your data frame. These are all functions within R. Very generally, all of these functions
take some input (often, an object in R), do something using that input when the function runs, and
then provide the user (you!) some output.
In addition to built-in functions, R allows for users to write code and develop additional functions
that will work in addition to these included functions. We talked about these types of functions when
we talked about packages in an earlier lesson. In these cases, developers wrote code to accomplish a
task that was not included the base functions that come with R.
While R packages are a great way to add functionality to your work in R, functions can be written
outside of packages. You can write your own functions at any point in time in R. When you write
code to carry out a function, you generate a User-Defined Function (UDF). UDFs are functions
beyond the set of functions that come built-in to R, and are written by users.
UDFs are incredibly helpful as you code. While they may seem intimidating now, take your time
working through this lesson. It’s worth the time investment now to really understand how to write
a function

When to write a function


Writing functions can be a daunting task as you learn to code. As you get more comfortable coding,
you’ll naturally get more comfortable writing functions. However, you will always learn better and
more efficient ways to write functions. This means that the learning never stops, which can be a lot
of fun! But, if this is a difficult lesson, you’re not alone. Functions take a while to get the hang of
but are incredibly helpful once you do!
That said, when you’re new to coding, it often seems easier to avoid writing functions and just copy
and paste code over and over again each time you have to carry out the task that that chunk of
code carries out. However, a really important rule of thumb is that you should write a function
whenever you’ve copy and pasted code more than once. So, if you’ve written a chunk of code
and then used it in at least two more places, write a function. It will save you time in the long run!

How will writing functions help me?


Writing functions for tasks that you carry out multiple times is helpful because:
Writing Functions in R 102

1. It makes your code more readable. - By having a single function such as, for example,
summarize_samples( samples ), instead of requiring some twenty lines of code to summarize
your samples in the middle of your script file, readers (and you later on when you come back to
this code!) will quickly and easily be able to determine what that part of your code accomplishes
by that single function call.
2. You only have to update code in one place. - After writing a function, to update this code in
the future (and trust me, you always end up editing code), you only have to update the function
code once. Alternatively, if you had copy and pasted this code all over your script files, you’d
have to find each instance, and edit the code in each individual case. From personal experience,
save yourself the hassle, and write a function!
3. You’ll avoid accidental errors. Inevitably, as you copy and paste you’ll decide you want to
change an object name here or there. However, if you do this in one place, but forget to do
this in another place where you’ve copy and pasted the code, you’ll run into errors. By writing
functions, this issue is avoided!

How to write a function


There are three main components to a function. Each function has a:

1. name
2. argument(s)
3. body

We’ll break down each of these three parts in this section, but know that generally every function
in R will use this format:

name <- function(arguments){


body
}

You will assign a name to each function you write. This will go to the left of the assignment operator:
<-. Note that to create a function you will then call the built-in R function function (what a perfect
function name!). The arguments of the function you’re writing will go inside the parentheses after
you call function. This will be followed by curly braces. All the code you want your function to
execute will go within these curly braces. Now that you have an idea of what the basic components
of a function in R are, we’ll discuss each in greater detail.

Naming your function

Naming your function may seem trivial but is an important step in writing a function. To be clear,
R doesn’t really care what you name your function. You could name your function “asdfklj,” and as
long as the code in the body of the function is correct, R will know what to do and run your function
Writing Functions in R 103

correctly. So, if R doesn’t care about the name, why should you? Well, honestly, you aren’t naming
your function well for R, you’re naming it well for humans (including your future self!).
Good function names in R explain what the function does clearly and succinctly. Thus, you’ll want
to choose function names that are the shortest possible names but still clearly describe what the
function does. Additionally, function names are generally verbs, or action words. This is because
functions are usually doing something, so it makes sense to have the function name reflect that.
Above I mentioned you may often want to summarize some details about the samples in your dataset.
Thus, you would likely want to write a function called summarize_samples. This is a reasonable
function name. It clearly describes what the function does. It may be a tad long, but it’s better than
the alternatives. ss is way too short for a function name, and summarize doesn’t do the trick here
because it’s not specific enough. So, summarize_samples is a reasonably good function name.
You may have noticed that the two words summarize and samples is joined by an underscore.
Function names, like objects in R, cannot contain spaces. So, I could have chosen summarizesamples,
but that’s hard to read quickly. Or, I could have chosen SummarizeSamples, but that’s harder to type,
having to capitalize two letters. Thus, it’s generally best to use what is known as snake case for
functions. Snake case means that all words include only lowercase letters and that separate words
are connected by an underscore (_). No spaces are allowed in snake case.
Lastly, you’ll want to choose a name that is not already a function that exists in R. For instance, you
would want to avoid naming a function mean, as a function name since it’s already used by R.
To recap, the best function names are:

• short
• clear
• descriptive
• verbs (action words)
• “snake case”
• not already R functions

Function arguments

Now that we’ve discussed how to name the functions you write, we have to discuss the arguments,
or inputs to your functions. Arguments go within the parentheses after function. A function can
have multiple arguments, but every function must have at least one argument. Arguments tell the
function on what input values the function should act.
For example, let’s return now to a function you’re already familiar with: the length function. This
function tells you the length of an object. Thus, the argument to this function is the object for which
you want to get the length.
Take the following code into consideration:
Writing Functions in R 104

x <- c(1, 3, 7, 19)


length(x)

The object x contains four numbers. When you want to run the length function, you have to tell
R what you want the length of. Thus, the argument required is the object, which in this case is the
object x.
For functions you write, you will have to tell the function what the inputs are for the function by
defining at least one argument within the parentheses after you call function. We’ll work through
an example of this later in the lesson, but for now it’s important to note that unlike functions (which
are verbs), arguments tend to be nouns. Generally, arguments describe on what object the function
(verb) should be carried out. Nouns are good words to accomplish this.

Body of your function

Finally, the code you want to run using the input you’ve defined as an argument is included in the
body of the function. While the name of the function was for the humans using the function or
reading your code, this section is primarily for the computer. This section will only run properly if
the R code is correctly written.

Commenting your function


So, it was just mentioned that the body of your function is for the computer. While that’s true (the
computer will know what to do if your R code is correct!), to make your function even better the
code within your function should include comments that make sense to humans.
By including lines in your code that start with a pound sign (#) and are followed by human-readable
text, you make your function easily understandable to anyone trying to figure out what your code
does. See examples of this as we work through an example function below.
Additionally, for long functions with a lot of steps, it’s helpful to break up the sections of your
code with dashes (-) that visually separate the sections for others when reading the function. For
example, below, <YOUR CODE HERE> represents lines of code that carry out the task explained in
the comment above. The comment lines (#) with dashes (-) help visually separate out the sections
for anyone looking at this code. Note, the computer will skip over any line of code that starts with
a pound sign (#), so these comments are only helpful to humans. But they are incredibly helpful to
humans. Always take the time to comment your code, especially in functions.
Writing Functions in R 105

# Read in sample --------------------------------------------------------

<YOUR CODE HERE>

# Calculate sample information --------------------------------------

<YOUR CODE HERE>

# Generate summary table --------------------------------------------

<YOUR CODE HERE>

Function Output
In R, the default for any function is to return the last statement evaluated. If the last bit of code
computes a value, the function will return the last value that was computed. So, if you write code
and the last value calculated in the body of your function is what you want, then you’re all set.
If, however, you want the function to return something else as its output (such as an object you
created earlier in the function or multiple values as the functions’ output), you’ll need to specify
that using the return function. The argument of the return function is the object you want the
function to return. So generally, you would include something like the following at the end of the
body of your function (still within the curly braces):

return(object_you_want_to_return)

An example function: Converting from Celsius to Fahrenheit


Now that we’ve covered the basics components to writing a function in R, we can walk through this
using an example function.
Specifically, imagine you need to convert Celsius degree to Fahrenheit. You want to know what 0,
20, and 100 degrees Celsius are in Fahrenheit. We can write a function to accomplish this!
If you were to Google how to convert Celsius to Fahrenheit, you’d find that the conversion formula
is: T(F) = (9/5) * T(C) + 32. So to calculate what 0, 20, and 100 degree Celsius are you can do the
calculation above for each temperature by hand. You could take zero degrees celsius, multiply it by
(9/5) and then add 32 to it. Then, you could repeat this for 20 C and then again for 100 C. The more
temperatures you need to convert, the longer this will take you. But, as you’re repeating a task (a
calculation) multiple times and just changing the input for each calculation, it is the perfect time to
write a function. Instead of doing something over and over by hand (or by copy and pasting code),
you can just write a function in R to do the work for you!
Take for example this function:
Writing Functions in R 106

celsius_to_fahrenheit <- function(C){


C * (9/5) + 32
}

To recap:

• celsius_to_fahrenheit is the name of the function.


• C is the argument (or input)
• C * (9/5) + 32 is the body, and tells R what to do to the input

Specifically, this code says, take whatever input C is, multiply it by (9/5) and then add 32 to that
number. This is exactly what we want to do to convert from Celsius to Fahrenheit! So, just copy and
paste this code into R, and then try running the following code!

Running a Function

To run a function, you take the function name (here, our function name is celsius_to_fahrenheit
and then put the values for the input arguments within the parentheses. Here, our input is a
temperature in degrees Celsius, and specifically, we’ll use 70 degrees Celsius as our input for this
example.

> celsius_to_fahrenheit(70)
[1] 158

Here, the output lets us know that 70 degrees Celsius (our input argument) is 158 degrees Fahrenheit
(the output from running the function).
Additionally, this example demonstrates that sometimes a function can carry out a necessary and
important task with only a single line of code. But, what about our initial question? What about
calculating what 0, 20, and 100 degree Celsius is in Fahrenheit?

> celsius_to_fahrenheit(c(0, 20, 100))


[1] 32 68 212

By including these three values into our functions arguments, we quickly and easily calculate the
three temperatures values in Fahrenheit! And, we didn’t have to do each by hand!
While the goal of this function is pretty straightforward and only accomplishes one task, functions
can and do get a lot more complicated and include a lot more code. However, even when this is the
case, the overall structure of the function in R will not change. The function will still have a name,
at least one argument, and code in the body of the function. Let’s take a look at how functions can
get a little more complicated.
Writing Functions in R 107

Customizing the function’s output: return()

Now that we know what the basic structure of the function is, let’s make it a little bit fancier.
We mentioned it briefly above, but the return function can help customize what your function
returns. In the example above, we took advantage of the fact that R will automatically return the
last statement evaluated (which in this case is the last value calculated). For instance, when the
function is called, we like it to return a sentence that is more clear.

celsius_to_fahrenheit <- function(C){


F <- C * (9/5) + 32
return(paste("The entered Celsius temperature is", F, "degrees Fahrenheit."))
}

Here, the first line of code takes the input argument, the degrees celsius (C) and converts it to
degrees Fahrenheit, as the function did above. However, this time, that line of code is assigned to an
object F. The additional line of code uses the return function to explicitly state what the function
should return. Here, the function paste is used within return to combine the text “The entered
Celsius temperature is” with the value of variable F followed by “degrees Fahrenheit”, where each
is separated by a comma.
Note that the text we want paste to display is in quotes while the object F is not in quotes. This is
how R knows that the text is supposed to be displayed as text while F is supposed to be the value of
that object.
Now if we were to call this updated function, the output would provide a sentence that makes
explicitly clear what the output of the function means:

> celsius_to_fahrenheit(70)
[1] "The entered Celsius temperature is 158 degrees Fahrenheit."

Just as we wanted, we have a clear sentence output with the conversion of 70 C to 158 F!

Required arguments

Earlier we stated that functions require at least one argument. That wasn’t quite true. You can write
a function that does not have an argument explicitly stated, but that instead requires input from
the user. While we’ll include an example here, we won’t go into explicit detail as these types of
functions are not frequently used in R code. For instance, we can write a function that can only be
called by celsius_to_fahrenheit() and no input is passed in the function. When this function is
run, a prompt appears asking the user to enter a value.
Writing Functions in R 108

celsius_to_fahrenheit <- function(){


C <- readline(prompt="Enter a value in Celsius: ")
F <- as.integer(C) * (9/5) + 32
return(paste("The entered Celsius temperature is", F, "degrees Fahrenheit."))
}

When this is run, the output would look as follows:

> celsius_to_fahrenheit()
Enter a value in Celsius: 70
[1] "The entered Celsius temperature is 158 degrees Fahrenheit."

Functions with multiple arguments

Functions can be a lot more advanced. For example, while we stated earlier that functions can
have multiple inputs, we haven’t exactly shown what that would look like. So, going back to our
Celsius/Fahrenheit example, let’s write a function that takes a temperature and can convert it both
from Celsius to Fahrenheit (as it did above) and can go the other direction, taking a temperature
from Fahrenheit to Celsius. To use this function, you’ll not only have to provide the temperature
you want to convert as an argument (as above), but you’ll have to tell the function whether or not
that temperature you’re inputting is in Fahrenheit or Celsius. Specifically, is if the user passes the
values of 10 and C (for Celsius) to the function, it will convert it to Fahrenheit and return the value
and if the user passes the values of 10 and F (for Fahrenheit), it will convert it to Celsius and return
the value.

convert_temp <- function(temp, unit){


if (unit=="C"){
D <- temp * (9/5) + 32
} else if (unit=="F") {
D <- (temp - 32) * (5/9)
} else {
D <- message("Please enter a correct unit -- either F or C")
}
return(D)
}

Ok, admittedly, this function has a lot going on! No need to worry we’ll break it down now.
Let’s first identify the three parts of the function that we know will always be there for every
function.

• name: convert_temp
• argument(s) temp and unit
Writing Functions in R 109

• body: all that code in there!

This function is similar to the previous function, but has a few extra bells and whistles.
First, instead of having the argument C, for a temperature in degrees Celsius, we’ve now changed that
argument to temp, as the input temperature can be in either degrees Celsius or degrees Fahrenheit.
There’s also that additional argument unit, which, as discussed above, will specify what temperature
our temp input is.
Additionally, the function now has a conditional statement based on the argument unit. What those
ifs and else if statements say is if the argument unit is “C” (meaning the temperature is in
Celsius”), then convert the value to Fahrenheit. But, if the argument unit is “F” (stating the input
temp is in Fahrenheit), well, in that case convert the value to Celsius. Finally, if the argument unit
is neither “F” nor “C” tell the user that they haven’t entered that unit correctly by displaying the
message “Please enter a correct unit – either F or C”.
When we run this new function convert_temp, the output looks as follows! Feel free to copy that
function above and then test it out yourself! Go ahead and convert a few temperatures!

> convert_temp(70, "C")


[1] 158
> convert_temp(158, "F")
[1] 70
> convert_temp(158, "degrees")
Please enter a correct unit -- either F or C
NULL

### Functions with default arguments


So far, we have required the user to state what each argument is every time they run one of the
functions. However, arguments can have default values. This is helpful when most of the time an
argument will take one value (the default) but you want to make the other input value available.
For example, the author of this lesson happens to live in the US, where we mostly think in degrees
Fahrenheit. But, working with someone from pretty much anywhere else in the world requires
understanding temperatures in degrees Celsius. Thus, if I were writing this function for myself or
co-workers in the states who like to cook, I would likely set the default in the function to Celsius,
as we’re much more likely to be looking at a temperature on a recipe that’s in Celsius but that we
need in Fahrenheit to set our ovens to the correct temperature.
To accomplish this, when you write the function, you can assign a default value to the argument
unit for instance unit = "C". This means that if the user does not have to explicitly state what the
value of the argument unit is. Instead, the function will assume will be that its value is "C", unless
the user states otherwise. This is how we modify the function:
Writing Functions in R 110

convert_temp <- function(temp, unit = "C"){


if (unit=="C"){
D <- temp * (9/5) + 32
} else if (unit=="F") {
D <- (temp - 32) * (5/9)
} else {
D <- message("Please enter a correct unit -- either F or C")
}
return(D)
}

Now, if we run that function without the specified unit, the function assumes the input is in degrees
Celsius. Note, if you specify unit="C", the output doesn’t change. Additionally, the function can still
go in the other direction ( F -> C), but to do so, it must be specified in the arguments:

> convert_temp(70)
[1] 158
> convert_temp(70, unit = "C")
[1] 158
> convert_temp(158, unit = "F")
[1] 70
> convert_temp(158, "degrees")
Please enter a correct unit -- either F or C
NULL

Commenting your function

Earlier in this lesson we discussed the importance of commenting your function for others’ who
read your code. So, let’s practice that now with the function we just worked with:

convert_temp <- function(temp, unit = "C"){


if (unit=="C"){
# if temp in C, convert to F
D <- temp * (9/5) + 32
} else if (unit=="F") {
# if temp in F, convert to C
D <- (temp - 32) * (5/9)
} else {
D <- message("Please enter a correct unit -- either F or C")
}
return(D)
}
Writing Functions in R 111

Here, to make clear to anyone looking at the code, we’ve added two lines of comments to make
explicitly clear what the code is doing. The longer your functions get, the more critical commenting
your code will become.

Summary
In this lesson, we’ve covered the components of a function in R (name, argument(s), and body)
and broken down each of them using an example of converting temperatures between Celsius and
Fahrenheit. We’ve discussed when to write a function, best practices for naming functions, how to
run a function, and how to appropriately comment a function. The more you practice with functions
early on as you learn R, the better you’ll be as you start to write longer and longer pieces of code
and analyze data.

Additional Resources
• Chapter 19: Functions in r4ds⁴⁸, by Hadley Wickham

Slides and Video

View this Video at https://youtu.be/Iqm6Q0W8hzA⁴⁹.


Writing Functions in R

• Slides⁵⁰

Take this quiz online⁵¹


⁴⁸http://r4ds.had.co.nz/functions.html
⁵⁰https://docs.google.com/presentation/d/1Q7pkb4lM8M8MRQzxCCfhXx8ddEi9XeaKSKT82OJAZX8/edit?usp=sharing
⁵¹http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_07_functions
R Markdown
We’ve spent a lot of time getting R and RStudio working, learning about projects and version control
- you are practically an expert at this! There is one last major functionality of R/RStudio that we
would be remiss to not include in your introduction to R - R Markdown!⁵²

What is R Markdown?
R Markdown is a type of document that allows you to generate fully reproducible reports. In these
documents, text , code, and the results of the code can be easily combined. In fact, these lessons
are written using what you’ve already learned about Markdown and all the R code you’ve recently
mastered!
To refresh your memory, this is how we use plain text in Markdown formatting:

Markdown review

Throughout this lesson we’ll remind you of what you learned in the previous lesson on Markdown
and discuss all the new things you’ll be able to do with R Markdown documents!
⁵²http://rmarkdown.rstudio.com/
R Markdown 113

R Markdown documents generally take one of two file extensions: .Rmd, .rmd. If a file ends with
either of these two file extensions (ie Project_Analysis.Rmd), then you know it’s an R Markdown
document. While this type of file is a plain text file, it can be rendered (“Knit”) into HTML pages,
PDFs, Word documents, or slides! We’ll get into exactly what that means in just a second!

.Rmd to PDF

Why use R Markdown?


One of the main benefits is the reproducibility of using R Markdown. Since you can easily combine
text and code chunks in one document, what this means for a data science project is that you can
easily integrate an introduction about what your project question is and where your data came from
and the code that you are running, the results of that code, some pretty plots and figures, and your
conclusions all in one document.
Sharing what you did, why you did it and how it turned out becomes so simple - and that person
you share it with can re-run your code and get the exact same answers you got. That’s what we
mean about reproducibility.
In addition to being reproducible, there will be times that you’re working working on a project that
takes many weeks or months to complete. In these cases, you want to be able to see what you did a
long time ago (and perhaps be reminded exactly why you were doing this). By using an R Markdown
document, you’ll be able to see exactly what you did previously, what code you used AND the results
of that code!
R Markdown 114

Another major benefit to R Markdown is that since it is plain text, it works very well with version
control systems, such as git and GitHub. It is easy to track what character changes occur between
commits; unlike other formats that aren’t plain text. For example, in one version of this lesson, I
may have forgotten to bold this word. When I catch my mistake, I can make the plain text changes
to signal I would like that word bolded, and in the commit, you can see the exact character changes
that occurred to now make the word bold.
Check out this video⁵³ that the RStudio developers released about R Markdown and what it is!

Getting started with R Markdown


The best way to follow along for the rest of this lesson is to open up RStudio Cloud⁵⁴, and follow along
on your own step-by-step. In the process, you’ll get to generate your first R Markdown document!
Generating and working with RMarkdown documents is incredibly easy when working within
RStudio (or RStudio Cloud). To get started in RStudio Cloud⁵⁵, go to File > New File > R Markdown..
.

R Markdown…

If a window pops up specifying that you need to install and update a few packages before using R
Markdown, click “Yes” to install those updates.
⁵³https://vimeo.com/178485416
⁵⁴rstudio.cloud
⁵⁵rstudio.cloud
R Markdown 115

At this point, you will be presented with the following window:

R Markdown

You’ll want to add a Title to this document and put your name in the Author box.
R Markdown 116

Title and Author information filled out

When you are done entering this information, click OK, and a new .Rmd document will open with
a little explanation on R Markdown files.
R Markdown 117

R Markdown document

There are three main sections of an R Markdown document. The first is the YAML at the top,
bounded by the three dashes. This is where you can specify details like the title, your name, the
date, and what kind of document you want output. If you filled in the blanks in the window earlier,
these will be filled out for you. The spacing of this section matters, so if you edit anything here and
then get an error when you try to Knit your document, it may be worth returning to this section to
make sure spacing is as it should be.
Also on this page, you can see text sections. In this section, text should be written in Markdown.
This means that the “## R Markdown” will appear as an H2 header when the document is rendered.
and Knit will be bold, as discussed in a previous lesson.
And finally, you will see code chunks. These are bounded by the triple backticks. These are pieces of
R code, and are referred to as “code chunks”. These code in these chunks can run right from within
your document - and the output of this code will be included in the document when you Knit it.
The easiest way to see how each of these sections behave is to produce the HTML! To do so, you’ll
learn how to knit the document below.

“Knitting” documents
When you want to preview an R Markdown document and when you are finished with an R
Markdown document, you’ll want to “knit” the plain text and code from your .Rmd into your final
document.
R Markdown 118

To do so with the R Markdown that opened with your R Markdown file, click on the “Knit” button
along the top of the source panel.

Knit

When you do so, it will prompt you to save the document. For now, we’ll type “test_document”
into the box. (However, when you’re generating these documents for projects, you’ll want to be
sure that this document is saved in the appropriate directory, which is likely the raw_code directory.
Click “Save.”
R Markdown 119

Save .Rmd

Upon saving the document, you should see a document like this appear in a new window:
R Markdown 120

Knit HTML

So here you can see that the content of the header was rendered into a title, followed by your name
and the date. The text chunks produced a section header called “R Markdown” which is followed by
two paragraphs of text. Following this, you can see the R code summary(cars), importantly, followed
by the output of running that code. And further down you will see code that ran to produce a plot,
and then that plot. This is one of the huge benefits of R Markdown - rendering the results to code
inline.
Go back to the R Markdown file that produced this HTML and see if you can see how you signify
you want text bolded. (Hint: Look at the word “Knit” and see what it is surrounded by). Additionally,
feel free to change the text in this document or add additional code. Then, click on “Knit” again and
see how the changes alter the HTML that is produced.
Upon Knitting to HTML, an additional file will now be saved in the same directory where you saved
your .Rmd file. However, as expected, this file will have the extension .html. If you make changes
and re-knit your file, this HTML file will be re-generated and all changes will be saved in this file.
R Markdown 121

Saved HTML file

One final note on knitting. In this example, we have Knit to HTML (a format that can be easily
viewed in any web browser), but you can also knit to a PDF or Word document (among other
options). To Knit to a different output format, click on the arrow to right of the Knit icon to expose
a drop-down menu. Select the output document you’d like from this list. The new file type will be
generated and saved in the same directory where the .Rmd file is, but with the appropriate extension
(i.e. .pdf if you selected “Knit to PDF”). Feel free to play around with how these different file output
options look when you Knit!
R Markdown 122

Other file output options when Knitting

Summary
In this lesson we introduced you to R Markdown documents, discussing what they are and why
you should use them. We briefly reviewed Markdown formatting, but if you are unclear on what
Markdown is, feel free to go back to your previous Markdown lesson. In addition to introducing
the what and why, we got you started with actually using R Markdown. Hopefully, you were able
to generate and knit your first R Markdown document! Finally, this just touched on the basics of R
Markdown. There are a number of options you can specify to display your code chunks in varied
ways, different changes that can be made to your YAML to customize your output documents, and
ways to create tables and use citations that we did not discuss. As you use R Markdown more and
more, you’ll get more acquainted with these capabilities. For now, it’s great to know the basics and
that there are more advanced features!

Additional Resources
• R Markdown Documentation⁵⁶, from RStudio
• R Markdown video⁵⁷, from RStudio
⁵⁶https://rmarkdown.rstudio.com/index.html
⁵⁷https://vimeo.com/178485416
R Markdown 123

• Basic R Markdown cheatsheet⁵⁸


• “R Markdown cheatsheet”⁵⁹

Slides and Video

View this Video at https://youtu.be/3jO4uRqR6PA⁶⁰.


R Markdown

• Slides⁶¹

Take this quiz online⁶²


⁵⁸https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
⁵⁹http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf
⁶¹https://docs.google.com/presentation/d/1COQq29mnEWgt1NO0qC8tmurswmaDRJpBzmKhKKmJgiw/edit?usp=sharing
⁶²http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_08_Rmd
Getting Help in R
In a lesson in an earlier course, we discussed some basic guidelines for carrying out an effective
Google Search. Now that you’ve been introduced to the basics of R, we wanted to guide you to
some incredibly helpful resources that can help you work through issues when you’re trying to
write your own R code.

R Help: ?
To access documentation directly within RStudio, you can type a question mark followed by the
function, dataset, or object within the R Console directly. The output for this documentation will
display in the Help window at the bottom-right hand of RStudio.
For example, earlier in this course, you were introduced to the function summary() in R. If you later
can’t remember what this function does, but you can remember the function, you can always type
?summary in your Console. The following will display in the Help tab:

?summary documentation

In this documentation we see:


Getting Help in R 125

• A general description of what the summary() function can be used for


• what package the function is from – “base” means it’s part of the base installation in R, rather
than from a specific package
• the syntax you should use to carry out this function
• some sample code

As you work in R, you’ll find that some documentation is more helpful than others. Some packages
and functions have incredibly helpful documentation pages. Others are less helpful. A thing to
remember is that humans are responsible for writing this documentation. It’s great when someone
who’s really great at documenting software or who has the time to do so write great documentation
pages. However, for those times when the documentation proves less than helpful, there are other
places you can look to for help.

Stack Overflow
Stack Overflow⁶³ is an “online community for programmers.” Stack Overflow is not just a place for R
programmers, but rather, is a place to ask and answer questions about any programming language.
That said, Stack Overflow has hundreds of thousands of questions and and answers from R users.

How Stack Overflow works

On Stack Overflow, people (who have logins on the website) can ask questions. Then, others on the
site can provide answers. The user who originally posed the question can mark one answer as the
“accepted” answer. Any user, however, can up-vote an answer, helping it rise to the top of the answer
section. This means that when someone else comes across this question, they come across the best
answers first.
Additionally, questions can be tagged with their subject area. For example, a user looking for help
coding in R would tag their question with the “r” tag to help direct this question to the right
community for answering.
⁶³stackoverflow.com
Getting Help in R 126

Stack Overflow Homepage

Getting Answers

Stack Overflow is a place for specific coding questions and answers. It is not a discussion board nor
is it a place for polling. This design is intended to help you quickly get an answer. In that spirit,
anyone can search questions and answers, regardless of whether they have an account on the site.
All that said, before asking a question on Stack Overflow, it’s best to check to see if someone else
has already asked a similar question.
For example, if you were trying to learn more about how to work with objects in R, you would
maybe type “objects in R” into the search bar. The results would look something like this:
Getting Help in R 127

Stack Overflow Search Output

For each search result, you will see:

• the question posed


• the tags accompanying that post
• the number of responses for the question
• when the question was posted and by whom.

Clicking on the link will provide you with the full question and the posted answers. You’ll most
often find that the question you have has already been answered on Stack Overflow, which is great.
This way you get an answer immediately and don’t have to wait!
If, after searching questions that already exist, you still cannot find an answer to your question, you
can post a question on Stack Overflow. this will require you to Sign Up for an account.
Once you’ve created an account and logged in, you will see a blue “Ask Question” box in the top
right-hand of your screen.
Getting Help in R 128

Ask Question

Clicking on “Ask Question” will bring you to an additional screen reminding you of how to proceed
before posting a question (search for existing answers!) and how to post a question
Getting Help in R 129

Posting a Question Guidelines

Reproducible Examples (reprex)

When posting a question on Stack Overflow, or anywhere really, it’s always best to post what is
known as a reproducible example, or reprex.
A reprex is the simplest bit of code possible needed to ask your question from start to finish. By
including this code in your question on Stack Overflow, others can see what issue you have run into
and hopefully help you find a solution!
By simplest bit of code possible, this means that the code:

1. Must be reproducible – it must run start to finish, up until the part where you’re running into
trouble (This includes any packages you’ve loaded and all necessary objects)
2. Should be minimal – Remove any code that is in your script but that is not directly related to
the problem at hand. Often this requires making a smaller dataset than the one you’re working
with.

By generating a reprex, you’ll often figure out the answer to your question in the process, which is
great! The rest of the time, by creating the reprex and including it in your question, others are able
to help you as quickly as possible, which is also great!
Getting Help in R 130

Providing Help to Others

Stack Overflow allows users with a login to ask and answer questions on Stack Overflow. While in
the beginning you’ll likely just be searching for (and getting) answers to questions, as you become
more of an expert, answering others’ questions is a great way to help the community!
That said, it’s probably no surprise that sometimes people can be mean to others on the Internet. The
R community does its best to be a welcoming community. Hopefully, you will never run into people
who are mean on Stack Overflow, and even more importantly, when you become an expert in R
and answer others’ questions online, it’s important to remember that everyone starts as a beginner.
If someone’s question isn’t perfectly asked or their reprex isn’t perfect, let it pass. Be as helpful and
kind as you can as you interact with others online.

RStudio Community
In addition to Stack Overflow, RStudio Community⁶⁴ is an online community for questions and
answers that is specifically geared toward R users.

RStudio Community

With this interface, in addition to the Topic, you can see the Category (similar to tags in Stack
Overflow) and the number of Replies and Views for each topic. You can sort by any of these
⁶⁴https://community.rstudio.com/
Getting Help in R 131

categories by clicking on the label at the top. Or, you can search specific questions or topics. Clicking
on any topic will direct you to a new window where you can sort through the answers for each
question.

Summary
In this lesson we discussed documentation and help pages available directly in R that can be
accessed using a question mark followed by the function or package name in question. We discussed
two incredibly helpful online forums: Stack Overflow and RStudio community. And, finally, we
highlighted the importance of creating a reprex when asking a question on any of these forums.
Knowing where to find help and how to best ask for it is an important skill as you’re learning a new
programing language!

Additional Community Resources


The R community is a welcoming and helpful community. Many people go years as R users before
they learn what a rich community there is out there. We’re hoping to change that by introducing
new users to a number of resources that will help them feel more welcome to the large community
of individuals using R! Below is a short list of ways to connect with the R community:

• R for Data Science Learning Community⁶⁵ - a welcoming and helpful community for those
new to R and data science
• Twitter⁶⁶ - Using or searching the hashtag #rstats can be incredibly helpful and can connect
you to others who use R
• ROpenSci⁶⁷ - an online community of developers developing tools for open science
• R Project Help in R⁶⁸
• Tidyverse Help Documentation⁶⁹ - more details on creating a reprex
⁶⁵https://www.jessemaegan.com/post/r4ds-the-next-iteration/
⁶⁶https://twitter.com
⁶⁷https://ropensci.org/
⁶⁸https://www.r-project.org/help.html
⁶⁹https://www.tidyverse.org/help/
Getting Help in R 132

Slides and Video

View this Video at https://youtu.be/xOww087Vp9g⁷⁰.


Getting Help in R

• Slides⁷¹

Take this quiz online⁷²


⁷¹https://docs.google.com/presentation/d/1xDXjuZZ8OifFKW3MzKhQL0kI5f3XcCfHn9oHgyhpyMk/edit?usp=sharing
⁷²http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_09_help_in_R
Pushing Code from R to GitHub
In the previous course in this series, you learned the basics of version control using git commands
and GitHub. Through the lessons in this course you learned the basics of coding in R. Throughout
the rest of the courses in the Cloud-based Data Science series we will continue to use R and GitHub
extensively, so we figure it’s best to get some practice with this workflow now, before any projects
are due.

GitHub repository: my_first_project


In the Version Control course you set up a GitHub repository called my_first_project where there
were a number of directories with the following file structure:

• data/
– raw_data/
– tidy_data/
• code/
– raw_code/
– final_code/
• figures/
– exploratory_figures/
– explanatory_figures/
• products/
– writing/

If you created that project in RStudio Cloud and pushed it to GitHub, go to that project on RStudio
Cloud now.
Pushing Code from R to GitHub 134

RStudio Cloud project

If you haven’t yet created the my_first_project project on RStudio Cloud and pushed it to GitHub,
do that now.
Before we move on, a quick reminder: you won’t see the folder structure that you created on GitHub.
Unfortunately, a folder structure won’t be pushed to GitHub unless the folders contain at least a file.
No worries though! We’re about to start populating this project with some code right now! Then,
you’ll start to see some of the folders you’ve created!
Pushing Code from R to GitHub 135

my_first_project

Adding code to your GitHub project


Now that you’ve made your way back to your my_first_project repository, you can go to the
“Terminal” in RStudio Cloud. Here, if you haven’t made any changes since the last time you pushed
to GitHub, when you type git status you should get a message letting you know that “Your branch
is up-to-date.”
Pushing Code from R to GitHub 136

git status

Adding an R script

To get started adding some code to this repository, let’s just first create an R Script (.r file), and add
some code.
To do so, in the menu along the top within RStudio Cloud click on File > New File > R Script.
Pushing Code from R to GitHub 137

New R Script

This will open a blank file (currently titled “Untitled1”) in the scripting area of RStudio Cloud.
As this is just an example, add two lines of code to this file:

# Let's get started


summary(mtcars$mpg)
Pushing Code from R to GitHub 138

R Script in RStudio Cloud

Now that there is some code in that R Script file, let’s save it by going to File > Save As…
Pushing Code from R to GitHub 139

Save As…

We’ll save this file in the code/raw_code directory. Navigate to this directory. Then save the file
as “mtcars_code.r” by typing this in the “File name” box at the top of the save File window. Click
“Save”. The new file name will show up along the tab at the top in the scripting are of RStudio Cloud.
Pushing Code from R to GitHub 140

Save file in code/raw_code

Adding an .Rmd file

In this course we also learned how to generate R Markdown files (.Rmd). So, we might as well
generate one of them now before pushing our changes to GitHub.
To get started go to File > New File > R Markdown…
Pushing Code from R to GitHub 141

New R Markdown File

In the window that pops up, add the title “mtcars” and your name in the Author box. Click ‘OK’.
Pushing Code from R to GitHub 142

Info for Rmd file

Replace the default text you see in the .Rmd document with the text and code you see here in this
‘Untitled1’ .Rmd document.
Pushing Code from R to GitHub 143

New Rmd document

Once that’s completed, save your file as “mtcars.Rmd”, again in the code/raw_code directory.
Tabs for both the .r file and .Rmd file you created in this lesson should now be visible in the scripting
area of R Studio Cloud.
Pushing Code from R to GitHub 144

Save and knit file

Before we push these changes to GitHub, let’s knit the .Rmd file to an HTML document. To do so,
click “Knit” at the top of your .Rmd file tab.
The rendered HTML document should pop up in a new window.
Pushing Code from R to GitHub 145

HTML document

Additionally, there should now be a “mtcars.html” file in the “Files” window of RStudio Cloud along
with “mtcars.Rmd” and “mtcars_code.r.”
Pushing Code from R to GitHub 146

Files tab in RStudio Cloud

Pushing to GitHub

If you were to go back to the Terminal and type git status and hit “enter”, the output should
indicate that three new files have been created that are not yet being tracked by git.
Pushing Code from R to GitHub 147

git status

To add these to GitHub, you’ll want to run:

git add .
git commit -m "add mtcars scripts"
git push
Pushing Code from R to GitHub 148

add, commit, push

In this process, however, git may ask you who you are. It will tell you that you have to configure
your GitHub identity, using commands similar to these, but where you replace the email address
and username below with the email address used to set up your GitHub account and your GitHub
username:

git config --global user.email jane.everyday.doe@gmail.com


git config --global user.name janeeverydaydoe
Pushing Code from R to GitHub 149

git config

One this has been established, you can then rerun:

git commit -m "add mtcars scripts"

The output will indicate that those three files have been committed to GitHub.
Pushing Code from R to GitHub 150

git commit

When you then go to run git push, you may be asked for your Username and GitHub password.
After entering them, you will get output that indicates that your files have been successfully pushed
to GitHub.
Pushing Code from R to GitHub 151

input username and password to push

You can now go to your repository on GitHub. You’ll see the commit message you just specified and
will find the files you pushed within code/raw_code on your GitHub repository!
Pushing Code from R to GitHub 152

code has been pushed to GitHub repository

Summary
In this lesson we went through all the steps of making changes to a project in RStudio Cloud and
how to push those changes to an existing GitHub repository that you had previously cloned into
RStudio Cloud. The more comfortable you are with the steps in this lesson, the easier it will be for
you to carry out future data science projects using RStudio Cloud and GitHub.
Pushing Code from R to GitHub 153

Slides and Video

View this Video at https://youtu.be/X38P9vn3NNM⁷³.


Pushing Code from R to GitHub

• Slides⁷⁴

Take this quiz online⁷⁵


⁷⁴https://docs.google.com/presentation/d/1nNKiebsQieBUr645KDfMmbBFr26J2HQ0FAFH8WuSBQQ/edit?usp=sharing
⁷⁵http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_10_R_to_GitHub
Creating Websites with R
In previous lectures you have learned how to:

• Create a GitHub repository


• Use R Markdown for combining R code and text

Now we are going to use these two skills to create a website from within R. At the end of this lecture
you will have your own personal website made with R!

Create your website repository


GitHub has the nice property that it can automatically host websites for you from a repository. While
this can de done for any repository, there is a special case: the yourUsername.github.com repository
that makes setting up your own website a little bit simpler. We will start here because it’s the most
straightforward case.

Create Repository
Creating Websites with R 155

We’ll work through an example for someone whose GitHub username is JaneEverydayDoe, but you
should follow along and complete each step using your own GitHub account. To get started, first
log in to github.com⁷⁶ and create a new repository (that nice green button). While normally for
projects you’ll be able to name your repository whatever you want, this new repository for your
website must have a specific name. The repository must be named either username.github.io or
username.github.com, where your GitHub username would replace username. So, in our case, the
repo would be named: JaneEverydayDoe.github.io.
Remember, this repository must be named: username.github.io or usernmae.github.com, with
your GitHub username in the place of username.
Yay, we now have our repository to get started!

Sync files with GitHub


Now we are going to start putting your website together. We need to keep our files in sync with
GitHub, and to do so we will clone the repository RStudio Cloud. That is, we will clone the
username/username.github.com repository. For JaneEverydayDoe that would be JaneEverydayDoe/JaneEverydayDoe
We will do this via the terminal from RStudio Cloud. Once you are in the terminal run the following
(but change the username ‘JaneEverydayDoe’ to your GitHub username:

git clone https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com.git


⁷⁶http://github.com
Creating Websites with R 156

git clone

We now have completed the necessary GitHub setup!

One last setup step


It’s time to start working on our website! In the terminal access the username.github.com repository
using the cd (change directory) command. For JaneEverydayDoe that would be:

cd janeeverydaydoe.github.com/
Creating Websites with R 157

cd into directory

The directory is empty currently. At this point, we can almost start our website. There is one more
thing we need to do that is rather obscure. Something we haven’t talked about is that GitHub was
initially set up for hosting Jekyll static websites and this is still the default. But, we are not using
Jekyll and we need to tell GitHub that. To do so we need to create the hidden file .nojekyll and
version control it. You can create the file using the touch command in the terminal or from RStudio
Cloud using the following command:

touch .nojekyll

We haven’t discussed the command touch or hidden files yet, so now is a great time to explain
briefly that touch simply creates an empty file and hidden files are files that start with a period in
the file name.
However, despite being hidden, we can check to make sure this file was created by using the
command ls -a (ls, as discussed previously, stands for list and the -a option states that we also want
to display hidden files). When we run this command, we see the “.nojekyll” file has been created.

ls -a
Creating Websites with R 158

touch and ls

Cool, we now have the required .nojekyll file! But, it’s not on GitHub yet. So, we need to version
control it. You can do so in the RStudio Cloud terminal window using the following commands:

git add .nojekyll


git commit -m "This is not a Jekyll website"
git push

Note: If prompted to set your global configuration on GitHub, provide your email address or GitHub
username and password, do so.
Creating Websites with R 159

touch and ls

Using the JaneEverydayDoe example, you can see here⁷⁷ how the repository should look at this point
in the process.
⁷⁷https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/a5f889e3c9749720ee0435e95f77d0b4eaeea8d9
Creating Websites with R 160

Repo at this point in the process

Make the website


Ok, now we can make the website!
First, web pages display HTML code. With that knowledge, we are going to take advantage of another
nice property of the web. If a directory (say that directory is named hello) contains an index.html
file, then that file is displayed on the web browser when someone accesses the website’s home page.
In other words, index.html is the default HTML file that is displayed. This means that
we need to create an index.html‘ file. How can we do that from R?

Well, let’s create a index.Rmd file inside the username.github.com/ directory. Here, lets take
advantage of the nice RStudio menus. Go to File -> New File -> R Markdown. (Note: If prompted to
install packages, click yes to install them)
Creating Websites with R 161

New Rmd document

Then ensure Document is selected along the left and HTML is selected for the output format. Click OK.
Creating Websites with R 162

New R Markdown

Now let’s save it as index.Rmd. go to File -> Save As then type index.Rmd in the file name box,
making sure that it’s saved inside the username.github.com directory. In JaneEverydayDoe’s case,
the relative file path would be janeeverydaydoe.github.com/index.Rmd.
Creating Websites with R 163

How to Knit

Now click on the knit button and R to create the index.html file!

Publish the website


Just so we can see the website in action, let’s publish it. What does that mean? Well, we have the
index.html file in our computer but it’s not on GitHub’s computers. So, we need to version control
the index.html file (let’s also version control the index.Rmd source file) and upload (git push) to
GitHub.
In the terminal window of RStudio Cloud, run the following commands:

## Tell git to version control the index.* files


git add index.*

## Provide a good commit message


git commit -m "First version of the website"

## Upload the files to GitHub


git push

And after a few seconds or a minute or so, you can view your website at https://username.github.io.
For JaneEverydayDoe that is https://janeeverydaydoe.github.io/ and it looks like this:
Creating Websites with R 164

First website!

You can check JaneEverydayDoe files at this point in time⁷⁸ and compare it against yours.

Customize the website


Ok, so now we have a website that is publicly available and that we made with R. Isn’t that
awesome!? Next lets work on customizing our website. This part is super flexible and it really
depends on what you want to show. In this example, JaneEverydayDoe is going to describe a little
about her, her interests, projects, internet work-related profiles, her contact information (her email)
and a nice little image that shows her GitHub activity. Take the template below and edit your
index.Rmd file. Note that everything here is written using Markdown formatting.

⁷⁸https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/6ef6468c9fb0dc93ed436b056ddd602e13658377
Creating Websites with R 165

## About

Describe who you are. For example, what you are currently studying.

Summarize your trajectory. You could mention what you've done. Like what you've stud\
ied or where you've worked.

## Interests

* Interest 1
* Interest 2
* etc

## Projects

* List some of your recent projects


* You could include this website as a project!

## Profiles

* [LinkedIn](https://www.linkedin.com/in/yourprofile/)
* ...
* [GitHub](http://github.com/username)

## Contact

* [youremail@email](mailto:youremail@email)

Once you are done editing, click on the knit button to update the index.html file in your computer.
Next, update your files on GitHub using the following commands.

git add index.*


git commit -m "Add my initial information"
git push
Creating Websites with R 166

Updated website

Check JaneEverydayDoe’s files at this point in time⁷⁹.

Change the theme


R Markdown has several themes that are available to you. These themes include preset fonts and col-
ors for HTML pages that you can use. You can find them all listed at rmarkdown.rstudio.com/html_-
document_format.html#appearance_and_style⁸⁰. Let’s try the spacelab theme by editing the YAML
front matter (the top section) of the index.Rmd file from:

---
title: "JaneEverydayDoe's website"
output: html_document
---

to

⁷⁹https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/c0b15cf33e3fe88322d46e9f35def6c485eb0fee
⁸⁰https://rmarkdown.rstudio.com/html_document_format.html#appearance_and_style
Creating Websites with R 167

---
title: "JaneEverydayDoe's website"
output:
html_document:
theme: spacelab
---

Again, knit then update the files on GitHub.

git add index.*


git commit -m "Use spacelab"
git push

JaneEverydayDoe’s files at this point in time⁸¹.

Adding a table of contents


A similar process could be used to add a table of contents. Update your YAML using the code you
see here.

---
title: "JaneEverydayDoe's website"
output:
html_document:
theme: spacelab
toc: true
toc_depth: 3
toc_float: true
---

Again, knit then update the files on GitHub.

git add index.*


git commit -m "Add a floating table of contents"
git push

Your website should now reflect these changes!


⁸¹https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/33db3fac9619ef32bdb032b56e0ca5836b5e28c5
Creating Websites with R 168

Website at this point in the process

JaneEverydayDoe’s files at this point in time⁸².

Adding an image
Now let’s add an image to our website. To keep our files organized, lets make a img directory inside
our repository using mkdir (make directory).

mkdir img

In that directory, we’ll upload a picture file (any format). In this case, we are uploading a file called
Jane.png. If you don’t have an image, take a screenshot of your GitHub profile and rename it
accordingly. You can verify your upload worked using ls.

$ ls img
Jane.png

Now that we have the image, lets add the following code to our index.Rmd file:

⁸²https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/3caa9cc00e9ea167360b1299763ad3b01b6e9de6
Creating Websites with R 169

![](img/Jane.png){width=100px}

This code introduces the image and specifies the width of the image in pixels. We want a small
image, so we are setting it to 100 pixels (100px).
After knitting and uploading to GitHub, we now have a website with an image!

git add img/Jane.png


git add index.*
git commit -m "Add Jane's image"
git push

Website with image added

JaneEverydayDoe’s files at this point in the process⁸³.

A similar website
If you need some inspiration, check out Amy Peterson’s website: amy-peterson.github.io⁸⁴. It was
made using exactly the same tools.
The raw files for her website are available here⁸⁵.
⁸³https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/tree/505eaf2049930084526e79a005fc9e8a75f6b143
⁸⁴https://amy-peterson.github.io/
⁸⁵https://raw.githubusercontent.com/amy-peterson/amy-peterson.github.com/bf9637d0351e1494cbd0c34528b261e340539b06/index.Rmd
Creating Websites with R 170

Beyond this lesson


In this lesson you learnt how to create websites from R and publishing them in GitHub. You also
made your own personal website that you can put in your business cards, Twitter profile, share with
friends and anywhere you want. Plus it was all free!
You might be interested in going beyond this quick website. For example, you might want to add
a visitor’s map using clustrmaps.com or track visitor using Google Analytics. You could also be
interested in buying your own domain, setting up a multi-page website or even make a blog. All
those things are possible but beyond the scope of this lesson. If you want to spend the time learning
how to do some of those things, check:

• Recent changes to JaneEverydayDoe website at https://github.com/JaneEverydayDoe/janeeverydaydoe.github.c


• https://pages.github.com/⁸⁷
• Emily Zabor’s great tutorial http://www.emilyzabor.com/tutorials/rmarkdown_websites_tuto-
rial.html⁸⁸
• https://bookdown.org/yihui/blogdown/⁸⁹ for making blogs

Slides and Video

View this Video at https://youtu.be/phEJ5oI37bs⁹⁰.


Creating Websites with R

• Slides⁹¹

Take this quiz online⁹²


⁸⁶https://github.com/JaneEverydayDoe/janeeverydaydoe.github.com/commits/master
⁸⁷https://pages.github.com/
⁸⁸http://www.emilyzabor.com/tutorials/rmarkdown_websites_tutorial.html
⁸⁹https://bookdown.org/yihui/blogdown/
⁹¹https://docs.google.com/presentation/d/18cfusRGwEtQCD4MKew4S3s7HdK8AuSr_RRPQS6S3KKU/edit?usp=sharing
⁹²http://leanpub.com/courses/jhu/cbds-intro-r/quizzes/quiz_11_website

You might also like