Source Code 1
Source Code 1
Source Code 1
2^3
[1] 8
# Modulo: returns the remainder of the division of 8/3
8 %% 3
[1] 2
SOURCE CODE
BASIC ARITHMETIC FUNCTIONS
2. Trigonometric functions:
cos(x) # Cosine of x
sin(x) # Sine of x
tan(x) #Tangent of x
acos(x) # arc-cosine of x
asin(x) # arc-sine of x
atan(x) #arc-tangent of x
For example, the R code below will store the price of a lemon in a variable, say “lemon_price”:
# Price of a lemon = 2 euros
lemon_price<- 2
# or use this
lemon_price = 2
Note that, R is case-sensitive. This means that lemon_price is different from Lemon_Price.
To print the value of the created object, just type its name:
lemon_price
[1] 2
print(lemon_price)
[1] 2
R saves the object lemon_price (also known as a variable) in memory. It’s possible to make
some operations with it.
# Multiply lemon price by 5
5 * lemon_price
[1] 10
The following R code creates two variables holding the width and the height of a rectangle.
These two variables will be used to compute of the rectangle.
# Rectangle height
height <- 10
# rectangle width
width <- 5
# compute rectangle area
area <- height*width
print(area)
[1] 50
The function ls() can be used to see the list of objects we have created:
ls()
[1] "area" "height" "info" "lemon_price" "PACKAGES"
"R_VERSION"
[7] "width"
Note that, each variable takes some place in the computer memory. If you work on a big
project, it’s good to clean up your workspace.
Note that, character vector can be created using double (“) or single (’) quotes. If your text
contains quotes, you should escape them using”\" as follow.
It’s possible to use the function class() to see what type a variable is:
class(my_age)
[1] "numeric"
class(my_name)
[1] "character"
You can also use the functions is.numeric(), is.character(), is.logical() to check whether a
variable is numeric, character or logical, respectively. For instance:
is.numeric(my_age)
[1] TRUE
is.numeric(my_name)
[1] FALSE
If you want to change the type of a variable to another one, use the as.* functions, including:
as.numeric(), as.character(), as.logical(), etc.
my_age
[1] 28
# Convert my_age to a character variable
as.character(my_age)
[1] "28"
Note that, the conversion of a character to a numeric will output NA (for not available). R
doesn’t know how to convert a numeric variable to a character variable.
SOURCE CODE
Vectors
A vector is a combination of multiple values (numeric, character or logical) in the same object.
In this case, you can have numeric vectors, character vectors or logical vectors.
Create a vector
A vector is created using the function c() (for concatenate), as follow:
It’s possible to give a name to the elements of a vector using the function names()
Note that a vector can only hold elements of the same type. For example, you cannot have a
vector that contains both characters and numeric values.
# Number of friends
length(my_friends)
[1] 4
SOURCE CODE
Case of missing values
I know that some of my friends (Nicolas and Thierry) have 2 child. But this information is not
available (NA) for the remaining friends (Bernard and Jerome)
In R missing values (or missing information) are represented by NA:
It’s possible to use the function is.na() to check whether a data contains missing value. The
result of the function is.na() is a logical vector in which, the value TRUE specifies that the
corresponding element in x is NA.
Note that, there is a second type of missing values named NaN (“Not a Number”). This is
produced in a situation where mathematical function won’t work properly, for example 0/0 =
NaN
Note also that, the function is.na() is TRUE for both NA and NaN values. To differentiate these,
the function is.nan() is only TRUE for NaNs.
SOURCE CODE
Get a subset of a vector
Selection by positive indexing: select an element of a vector by its position (index) in square
brackets
Note that, R indexes from 1, NOT 0. So your first column is at [1] and not [0].
If you have a named vector, it’s also possible to use the name for selecting an element:
friend_ages["Bernard"]
Bernard
29
Selection by logical vector: Only, the elements for which the corresponding value in the
selecting vector is TRUE, will be kept in the subset.
Note that, all the basic arithmetic operators (+, -, *, / and ^ ) as well as the common arithmetic
functions (log, exp, sin, cos, tan, sqrt, abs, …), described in the previous sections, can be applied
on a numeric vector.
If you perform an operation with vectors, the operation will be applied to each element of the
vector. An example is provided below:
As you can see, R multiplies each element in the salaries vector with 2.
Now, suppose that you want to multiply the salaries by different coefficients. The following R
code can be used:
# createcoefs vector with the same length as salaries
coefs<- c(2, 1.5, 1, 3)
# Multiply salaries by coeff
salaries*coefs
Nicolas Thierry Bernard Jerome
4000 2700 2500 9000
Note that the calculation is done element-wise. The first element of salaries vector is multiplied
by the first element of coefs vector, and so on.
Compute the square root of a numeric vector:
my_vector<- c(4, 16, 9)
sqrt(my_vector)
[1] 2 4 3
Other useful functions are:
max(x) # Get the maximum value of x
min(x) # Get the minimum value of x
# Get the range of x. Returns a vector containing
# the minimum and the maximum of x
range(x)
length(x) # Get the number of elements in x
sum(x) # Get the total of the elements in x
prod(x) # Get the product of the elements in x
For example, if you want to compute the total sum of salaries, type this:
sum(salaries)
[1] 9300
Compute the mean of salaries:
mean(salaries)
[1] 2325
The range (minimum, maximum) of salaries is:
range(salaries)
[1] 1800 3000
SOURCE CODE
MATRICES
A matrix is like an Excel sheet containing multiple rows and columns. It’s used to combine
vectors with the same type, which can be either numeric, character or logical. Matrices are used
to store a data table in R. The rows of a matrix are generally individuals/observations and the
columns are variables.
# Numeric vectors
col1 <- c(5, 6, 7, 8, 9)
col2 <- c(2, 4, 5, 9, 8)
col3 <- c(7, 3, 4, 8, 7)
# Combine the vectors by column
my_data<- cbind(col1, col2, col3)
my_data
col1 col2 col3
[1,] 5 2 7
[2,] 6 4 3
[3,] 7 5 4
[4,] 8 9 8
[5,] 9 8 7
# Change rownames
rownames(my_data) <- c("row1", "row2", "row3", "row4", "row5")
my_data
col1 col2 col3
row1 5 2 7
row2 6 4 3
row3 7 5 4
row4 8 9 8
row5 9 8 7
t(my_data)
row1 row2 row3 row4 row5
col1 5 6 7 8 9
col2 2 4 5 9 8
col3 7 3 4 8 7
Note that, it’s also possible to construct a matrix using the function matrix():
The simplified format of matrix() is as follow:
Matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
byrow: logical value. If FALSE (the default) the matrix is filled by columns, otherwise the
matrix is filled by rows.
dimnames: A list of two vectors giving the row and column names respectively.
In the R code below, the input data has length 6. We want to create a matrix with two columns.
You don’t need to specify the number of rows (here nrow = 3). R will infer this automatically.
The matrix is filled column by column when the argument byrow = FALSE. If you want to fill
the matrix by rows, use byrow = TRUE.
mdat<- matrix(
data = c(1,2,3, 11,12,13),
nrow = 2, byrow = TRUE,
dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))
)
mdat
C.1 C.2 C.3
row1 1 2 3
row2 11 12 13
SOURCE CODE
DIMENSIONS OF A MATRIX
The R functions nrow() and ncol() return the number of rows and columns present in the data,
respectively.
Selection by logical: In the R code below, we want to keep only rows where col3 >=4:
col3 <- my_data[, "col3"]
my_data[col3 >= 4, ]
col1 col2 col3
row1 5 2 7
row3 7 5 4
row4 8 9 8
row5 9 8 7
SOURCE CODE
It’s also possible to perform simple operations on matrice. For example, the following R code
multiplies each element of the matrix by 2:
my_data*2
col1 col2 col3
row1 10 4 14
row2 12 8 6
row3 14 10 8
row4 16 18 16
row5 18 16 14
rowSums() and colSums() functions: Compute the total of each row and the total of each
column, respectively.
If you are interested in row/column means, you can use the function rowMeans() and
colMeans() for computing row and column means, respectively.
Note that, it’s also possible to use the function apply() to apply any statistical functions to
rows/columns of matrices.
The simplified format of apply() is as follow:
Factor variables represent categories or groups in your data. The function factor() can be used
to create a factor variable.
CREATE A FACTOR
Note that:
The function is.factor() can be used to check whether a variable is a factor. Results are TRUE
(if factor) or FALSE (if not factor)
If you want to know the number of individuals in each levels, use the function summary():
summary(friend_groups)
not_best_friendbest_friend
22
In the following example, I want to compute the mean salary of my friends by groups. The
function tapply() can be used to apply a function, here mean(), to each group.
# Salaries of my friends
Salaries
Nicolas Thierry Bernard Jerome
2000 1800 2500 3000
# Friend groups
friend_groups
[1]best_friendnot_best_friendbest_friendnot_best_friend
Levels: not_best_friendbest_friend
# Compute the mean salaries by groups
mean_salaries<- tapply(salaries, friend_groups, mean)
mean_salaries
not_best_friendbest_friend
2400 2250
# Compute the size/length of each group
tapply(salaries, friend_groups, length)
not_best_friendbest_friend
22
It’s also possible to use the function table() to create a frequency table, also known as a
contingency table of the counts at each combination of factor levels.
table(friend_groups)
friend_groups
not_best_friendbest_friend
22
# Cross-tabulation between
# friend_groups and are_married variables
table(friend_groups, are_married)
are_married
friend_groups FALSE TRUE
not_best_friend 1 1
best_friend 0 2
SOURCE CODE
DATA FRAMES
A data frame is like a matrix but can have columns with different types (numeric, character,
logical). Rows are observations (individuals) and columns are variables.
is.data.frame(friends_data)
[1] TRUE
is.data.frame(my_data)
[1] FALSE
The object “friends_data” is a data frame, but not the object “my_data”. We can convert-it to a
data frame using the as.data.frame() function:
As described in matrix section, you can use the function t() to transpose a data frame:
t(friends_data)
SOURCE CODE
SUBSET A DATA FRAME
To select just certain columns from a data frame, you can either refer to the columns by name
or by their location (i.e., column 1, 2, 3, etc.).
1.Positive indexingby name and by location
2. Negative indexing:
# Exclude column 1
friends_data[, -1]
age height married
Nicolas 27 180 TRUE
Thierry 25 170 FALSE
Bernard 29 185 TRUE
Jerome 26 169 TRUE
3. Index by characteristics:
TRUE specifies that the row contains a value of age >= 27.
# Select the rows that meet the condition
friends_data[friends_data$age>= 27, ]
name age height married
Nicolas Nicolas 27 180 TRUE
BernardBernard29 185 TRUE
The R code above, tells R to get all rows from friends_data where age >= 27, and then to return
all the columns
If you don’t want to see all the column data for the selected rows but are just interested in
displaying, for example, friend names and age for friends with age >= 27, you could use the
following R code:
Another option is to use the functions attach() and detach(). The function attach() takes a data
frame and makes its columns accessible by simply giving their names.
Create a list
# Create a list
my_family<- list(
mother = "Veronique",
father = "Michel",
sisters = c("Alicia", "Monica"),
sister_age = c(12, 22)
)
# Print
my_family
$mother
[1] "Veronique"
$father
[1] "Michel"
$sisters
[1] "Alicia" "Monica"
$sister_age
[1] 12 22
# Names of elements in the list
names(my_family)
[1] "mother" "father" "sisters" "sister_age"
# Number of elements in the list
length(my_family)
[1] 4
The list object “my_family”, contains four components, which may be individually referred to
as my_family[[1]], as_family[[2]] and so on.
SOURCE CODE
SUBSET A LIST
It’s possible to select an element, from a list, by its name or its index:
The result is a list also, whose components are those of the argument lists joined together in
sequence.
SOURCE CODE
R base functions for importing data
READ TABLE
The R base function read.table() is a general function that can be used to read a file in table
format. The data will be imported as a data frame.
Note that, depending on the format of your file, several variants of read.table() are available to
make your life easier, including read.csv(), read.csv2(), read.delim() and read.delim2().
read.csv2(): variant used in countries that use a comma “,” as decimal point and a semicolon
“;” as field separators.
read.delim(): for reading “tab-separated value” files (“.txt”). By default, point (“.”) is used as
decimal points.
read.delim2(): for reading “tab-separated value” files (“.txt”). By default, comma (“,”) is used
as decimal points.
The simplified format of these functions are, as follow:
# Read tabular data into R
read.table(file, header = FALSE, sep = "", dec = ".")
# Read "comma separated value" files (".csv")
read.csv(file, header = TRUE, sep = ",", dec = ".", ...)
# Or use read.csv2: variant used in countries that
# use a comma as decimal point and a semicolon as field separator.
read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
# Read TAB delimited files
read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)
file: the path to the file containing the data to be imported into R.
sep: the field separator character. “\t” is used for tab-delimited file.
header: logical value. If TRUE, read.table() assumes that your file has a header row, so row 1
is the name of each column. If that’s not the case, you can add the argument
header = FALSE.
It’s also possible to choose a file interactively using the function file.choose(), which I
recommend if you’re a beginner in R programming:
# Read a txt file
my_data<- read.delim(file.choose())
# Read a csv file
my_data<- read.csv(file.choose())
If you use the R code above in RStudio, you will be asked to choose a file.
If your data contains column with text, R may assume that columns as a factors or grouping
variables (e.g.: “good”, “good”, “bad”, “bad”, “bad”). If you don’t want your text data to be
converted as factors, add stringsAsFactor = FALSE in read.delim(), read.csv() and read.table()
functions. In this case, the data frame columns corresponding to string in your text file will be
character.
For example:
my_data<- read.delim(file.choose(),
stringsAsFactor = FALSE)
If your field separator is for example “|”, it’s possible use the general function read.table()
with additional arguments:
my_data<- read.table(file.choose(),
sep ="|", header = TRUE, dec =".")
READING A FILE FROM INTERNET
It’s possible to use the functions read.delim(), read.csv() and read.table() to import files from
the web.
my_data<- read.delim("http://www.sthda.com/upload/boxplot_format.txt")
head(my_data)
Nom variable Group
1 IND1 10 A
2 IND2 7 A
3 IND3 20 A
4 IND4 14 A
5 IND5 14 A
6 IND6 12 A