R Language Workshop
R Language Workshop
R Language Workshop
• It is an open-source tool
• R supports Object-oriented as well as Procedural programming.
• It provides an environment for statistical computation and software
development.
• Provides extensive packages & libraries
• R has a wonderful community for people to share and learn from experts
• Numerous data sources to connect.
• Data Science
• Statistical computing
• Machine Learning
Installation of R
Reserved words in R programming are a set of words that have special meaning
and cannot be used as an identifier (variable name, function name etc.).
Here is a list of reserved words in the R’s parser.
Reserved words in R
Variables in R
Variables are used to store data, whose value can be changed according to our
need. Unique name given to variable (function and objects as well) is identifier.
1. Identifiers can be a combination of letters, digits, period (.) and underscore (_).
2. It must start with a letter or a period. If it starts with a period, it cannot be followed
by a digit.
3. Reserved words in R cannot be used as identifiers.
Valid identifiers in R
Invalid identifiers in R
Data structures are very important to understand because these are the
objects you will manipulate on a day-to-day basis in R
Everything in R is an object.
R has 6 basic data types
• character
• numeric (real or decimal)
• integer
• logical
• complex
Constants in R
Constants, as the name suggests, are entities whose value cannot be altered. Basic
types of constant are numeric constants and character constants.
Numeric Constants
> typeof(5)
[1] "double"
> typeof(5L)
[1] "integer"
> typeof(5i)
[1] "complex"
Character Constants
Character constants can be represented using either single quotes (') or double
quotes (") as delimiters.
> 'example'
[1] "example"
> typeof("5")
[1] "character"
Built-in Constants
Some of the built-in constants defined in R along with their values is shown below.
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R"
"S"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
> pi
[1] 3.141593
> month.name
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
R Operators
R has many operators to carry out different mathematical and logical operations.
Operators in R can mainly be classified into the following categories.
R Arithmetic Operators
These operators are used to carry out mathematical operations like addition and
multiplication. Here is a list of arithmetic operators available in R.
An example
> x <- 5
> y <- 16
> x+y
[1] 21
> x-y
[1] -11
> x*y
[1] 80
> y/x
[1] 3.2
> y%/%x
[1] 3
> y%%x
[1] 1
> y^x
[1] 1048576
R Relational Operators
Relational operators are used to compare between values. Here is a list of relational
operators available in R.
An example
> x <- 5
> y <- 16
> x<y
[1] TRUE
> x>y
[1] FALSE
> x<=5
[1] TRUE
> y>=20
[1] FALSE
> y == 16
[1] TRUE
> x != 5
[1] FALSE
R Logical Operators
Logical operators are used to carry out Boolean operations like AND, OR etc.
Logical Operators in R
Operator Description
! Logical NOT
| Element-wise logical OR
|| Logical OR
Operators & and | perform element-wise operation producing result having length
of the longer operand.
But && and || examines only the first element of the operands resulting into a
single length logical vector.
Zero is considered FALSE and non-zero numbers are taken as TRUE. An example
run.
> !x
> x&y
> x&&y
[1] FALSE
> x|y
> x||y
[1] TRUE
R Assignment Operators
These operators are used to assign values to variables.
The operators <- and = can be used, almost interchangeably, to assign to variable
in the same environment.
The <<- operator is used for assigning to variables in the parent environments
(more like global assignments). The rightward assignments, although available are
rarely used.
> x <- 5
>x
[1] 5
>x=9
>x
[1] 9
> 10 -> x
>x
[1] 10
Operator Precedence
(2 + 6) * 5
[1] 40
Operator Precedence in R
Output
Enter age: 17
R provides many functions to examine features of vectors and other objects, for
example
Objects Attributes
Objects can have attributes. Attributes are part of the object. These include:
• names
• dimnames
• dim
• class
• attributes (contain metadata)
R DATA STRUCTURES
• R Vectors
• R Matrix
• R array
• List in R
• R Data Frame
• R Factor
R Vector
Vector is a basic data structure in R. It contains element of the same type. The
data types can be logical, integer, double, character, complex.
A vector’s type can be checked with the typeof() function.
Another important property of a vector is its length. This is the number of elements
in the vector and can be checked with the function length().
Vectors are the most basic R data objects and there are six types of atomic vectors.
Below are the six atomic vectors:
How to Create Vector in R?
Vectors are generally created using the c() function
Since, a vector must have elements of the same type, this function will try and
coerce elements to the same type, if they are different.
Coercion is from lower to higher types from logical to integer to double to
character.
[1] "double"
> length(x)
[1] 5
>x
> typeof(x)
[1] "character"
[1] 1 2 3 4 5 6 7
[1] 2 1 0 -1 -2
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
> seq(1, 5, length.out=4) # specify length of the vector
Vector index in R starts from 1, unlike most programming languages where index
start from 0.
We can use a vector of integers as index to access specific elements.
We can also use negative integers to return all elements except that those specified.
But we cannot mix positive and negative integers while indexing and real numbers,
if used, are truncated to integers.
>x
[1] 0 2 4 6 8 10
[1] 4
[1] 2 6
[1] 2 4 6 8 10
> x[c(2, -4)] # cannot mix positive and negative integers
Error in x[c(2, -4)] : only 0's may be mixed with negative subscripts
[1] 2 4
> names(x)
> x["second"]
second
3 9
>x
[1] -3 -2 -1 0 1 2
[1] -3 0 -1 0 1 2
[1] 5 0 5 0 1 2
[1] 5 0 5 0
How to delete a Vector?
We can delete a vector by simply assigning a NULL to it.
>x
[1] -3 -2 -1 0 1 2
>x
NULL
> x[4]
NULL
Missing Data
> sum(2,7,5)
[1] 14
>x
[1] 2 NA 3 1 4
[1] NA
> sum(x, na.rm=TRUE) # this way we can ignore NA and NaN values
[1] 10
[1] 2.5
[1] 24
Other Special Values
R Matrix
>a
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> class(a)
[1] "matrix"
> attributes(a)
$dim
[1] 3 3
> dim(a)
[1] 3 3
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
We can see that the matrix is filled column-wise. This can be reversed to row-wise
filling by passing TRUE to the argument byrow.
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
>x
ABC
X147
Y258
Z369
> colnames(x)
> rownames(x)
>x
C1 C2 C3
R1 1 4 7
R2 2 5 8
R3 3 6 9
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> rbind(c(1,2,3),c(4,5,6))
[1,] 1 2 3
[2,] 4 5 6
Finally, you can also create a matrix from a vector by setting its dimension
using dim().
>x
[1] 1 2 3 4 5 6
> class(x)
[1] "numeric"
[1,] 1 3 5
[2,] 2 4 6
> class(x)
[1] "matrix"
We specify the row numbers and column numbers as vectors and use it for
indexing.
If any field inside the bracket is left blank, it selects all.
We can use negative integers to specify rows or columns to be excluded.
>x
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[,1] [,2]
[1,] 4 7
[2,] 5 8
> x[c(3,2),] # leaving column field blank will select entire columns
[1,] 3 6 9
[2,] 2 5 8
> x[,] # leaving row as well as column field blank will select entire matrix
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1,] 2 5 8
[2,] 3 6 9
One thing to notice here is that, if the matrix returned after indexing is a row
matrix or column matrix, the result is given as a vector.
> x[1,]
[1] 1 4 7
> class(x[1,])
[1] "integer"
This behavior can be avoided by using the argument drop = FALSE while
indexing.
> x[1,,drop=FALSE] # now the result is a 1X3 matrix rather than a vector
[1,] 1 4 7
> class(x[1,,drop=FALSE])
[1] "matrix"
>x
[2,] 6 0 7
[3,] 1 2 9
> x[1:4]
[1] 4 6 1 8
> x[c(3,5,7)]
[1] 1 0 3
Two logical vectors can be used to index a matrix. In such situation, rows and
columns where the value is TRUE is returned. These indexing vectors are recycled
if necessary and can be mixed with integer vectors.
>x
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9
> x[c(TRUE,FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 4 8
[2,] 1 2
[,1] [,2]
[1,] 8 3
[2,] 2 9
It is also possible to index using a single logical vector where recycling takes place
if necessary.
[1] 4 1 0 3 9
In the above example, the matrix x is treated as vector formed by stacking columns
of the matrix one after another, i.e., (4,6,1,8,0,2,3,7,9).
The indexing logical vector is also recycled and thus alternating elements are
selected. This property is utilized for filtering of matrix elements as shown below.
[1] 6 8 7 9
Indexing with character vector is possible for matrix with named row or column.
This can be mixed with integer or logical indexing.
>x
ABC
[1,] 4 8 3
[2,] 6 0 7
[3,] 1 2 9
> x[,"A"]
[1] 4 6 1
> x[TRUE,c("A","C")]
AC
[1,] 4 3
[2,] 6 7
[3,] 1 9
> x[2:3,c("A","C")]
AC
[1,] 6 7
[2,] 1 9
>x
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1,] 1 4 7
[2,] 2 10 8
[3,] 3 6 9
[1,] 0 0 7
[2,] 0 10 8
[3,] 0 6 9
A common operation with matrix is to transpose it. This can be done with the
function t().
[1,] 0 0 0
[2,] 0 10 6
[3,] 7 8 9
We can add row or column using rbind() and cbind() function respectively.
Similarly, it can be removed through reassignment.
[1,] 0 0 7 1
[2,] 0 10 8 2
[3,] 0 6 9 3
> rbind(x,c(1,2,3)) # add row
[1,] 0 0 7
[2,] 0 10 8
[3,] 0 6 9
[4,] 1 2 3
[1,] 0 0 7
[2,] 0 10 8
>x
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
[1,] 1 2 3 4 5 6
Arrays:
Arrays are the R data objects which can store data in more than two dimensions. It
takes vectors as input and uses the values in the dim parameter to create an array.
R Array Syntax
Array_NAME <- array(data, dim = (row_Size, column_Size,
matrices, dimnames)
,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
R Lists
>x
$a
[1] 2.5
$b
[1] TRUE
$c
[1] 1 2 3
> typeof(x)
[1] "list"
> length(x)
[1] 3
> str(x)
List of 3
$ a: num 2.5
$ b: logi TRUE
$ c: int [1:3] 1 2 3
In this example, a, b and c are called tags which makes it easier to reference the
components of the list.
However, tags are optional. We can create the same list without the tags as follows.
In such scenario, numeric indices are used by default.
>x
[[1]]
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
>x
$name
[1] "John"
$age
[1] 19
$speaks
[1] "John"
$age
[1] 19
$name
[1] "John"
$speaks
$name
[1] "John"
$age
[1] 19
$speaks
> x["age"]
$age
[1] 19
[1] "list"
[1] 19
> typeof(x[["age"]])
[1] "double"
[1] "John"
[1] 19
> x[["a"]] # cannot do partial match with [[
NULL
> x$speaks[1]
[1] "English"
> x[["speaks"]][2]
[1] "French"
$age
[1] 19
$speaks
$name
[1] "Clair"
Adding new components is easy. We simply assign values using new tags and it
will pop into action.
>x
$age
[1] 19
$speaks
$name
[1] "Clair"
$married
[1] FALSE
> str(x)
List of 3
> str(x)
List of 2
R Data Frame
• $ SN : int 1 2
• $ Age : num 21 15
• Notice above that the third column, Name is of type factor, instead of a
character vector.
• By default, data.frame() function converts character vector into factor.
• To suppress this behavior, we can pass the
argument stringsAsFactors=FALSE.
• $ SN : int 1 2
• $ Age : num 21 15
• We can check if a variable is a data frame or not using the class() function.
• >x
• SN Age Name
• 1 1 21 John
• 2 2 15 Dora
• [1] "list"
• > class(x)
• [1] "data.frame"
• > names(x)
• > ncol(x)
• [1] 3
• > nrow(x)
• [1] 2
• [1] 3
• > x["Name"]
• Name
• 1 John
• 2 Dora
• > x$Name
• > x[["Name"]]
• > x[[3]]
• Data frames can be accessed like a matrix by providing index for row and
column.
• To illustrate this, we use datasets already available in R. Datasets that are
available can be listed with the command library(help = "datasets").
• We will use the trees dataset which contains Girth, Height and Volume for
Black Cherry Trees.
• A data frame can be examined using functions like str() and head().
• > str(trees)
• $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
• $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
• > head(trees,n=3)
• 1 8.3 70 10.3
• 2 8.6 65 10.3
• 3 8.8 63 10.2
• We can see that trees is a data frame with 31 rows and 3 columns. We also
display the first 3 rows of the data frame.
• Now we proceed to access the data frame like a matrix.
• 2 8.6 65 10.3
• 3 8.8 63 10.2
• > trees[trees$Height > 82,] # selects rows with Height greater than 82
• 6 10.8 83 19.7
• 17 12.9 85 33.8
• 18 13.3 86 27.4
• 31 20.6 87 77.0
• > trees[10:12,2]
• [1] 75 79 76
• We can see in the last case that the returned type is a vector since we
extracted data from a single column.
• This behavior can be avoided by passing the argument drop=FALSE as
follows.
• Height
• 10 75
• 11 79
• 12 76
• How to modify a Data Frame in R?
• Data frames can be modified like we modified matrices through
reassignment.
• >x
• SN Age Name
• 1 1 21 John
• 2 2 15 Dora
• SN Age Name
• 1 1 20 John
• 2 2 15 Dora
• Adding Components
• > rbind(x,list(1,16,"Paul"))
• SN Age Name
• 1 1 20 John
• 2 2 15 Dora
• 3 1 16 Paul
• > cbind(x,State=c("NY","FL"))
• 1 1 20 John NY
• 2 2 15 Dora FL
• Since data frames are implemented as list, we can also add new columns
through simple list-like assignments.
• >x
• SN Age Name
• 1 1 20 John
• 2 2 15 Dora
• 1 1 20 John NY
• 2 2 15 Dora FL
•
• Deleting Component
• >x
• SN Age Name
• 1 1 20 John
• 2 2 15 Dora
• >x
• SN Age Name
• 2 2 15 Dora
R Factors
Factor is a data structure used for fields that takes only predefined, finite number of
values (categorical data). For example: a data field such as marital status may
contain only values from single, married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these predefined,
distinct values are called levels. Following is an example of factor in R.
>x
Here, we can see that factor x has four elements and two levels. We can check if a
variable is a factor or not using class() function.
Similarly, levels of a factor can be checked using the levels() function.
> class(x)
[1] "factor"
> levels(x)
>x
>x
[1] single married married single
We can see from the above example that levels may be predefined even if not used.
Factors are closely related with vectors. In fact, factors are stored as integer
vectors. This is clearly seen from its structure.
> str(x)
We see that levels are stored in a character vector and the individual elements are
actually stored as indices.
Factors are also created when we read non-numerical columns into a data frame.
By default, data.frame() function converts character vector into factor. To suppress
this behavior, we have to pass the argument stringsAsFactors = FALSE.
>x
>x
Warning message:
>x
>x
R if statement
The syntax of if statement is:
if (test_expression) {
statement
If the test_expression is TRUE, the statement gets executed. But if it’s FALSE,
nothing happens.
Here, test_expression can be a logical or numeric vector, but only the first element
is taken into consideration.
In the case of numeric vector, zero is taken as FALSE, rest as TRUE.
Flowchart of if statement
Example: if statement
x <- 5
print("Positive number")
Output
[1] "Positive number"
if…else statement
The syntax of if…else statement is:
if (test_expression) {
statement1
} else {
statement2
x <- -5
print("Non-negative number")
} else {
print("Negative number")
Output
> x <- -5
>y
[1] 6
if…else Ladder
The if…else ladder (if…else…if) statement allows you execute a block of code
among more than 2 alternatives
The syntax of if…else statement is:
if ( test_expression1) {
statement1
} else if ( test_expression2) {
statement2
} else if ( test_expression3) {
statement3
} else {
statement4
}
Only one statement will get executed depending upon the test_expressions.
Example of nested if…else
x <- 0
if (x < 0) {
print("Negative number")
} else if (x > 0) {
print("Positive number")
} else
print("Zero")
Output
[1] "Zero"
R ifelse() Function
ifelse(test_expression, x, y)
Here, test_expression must be a logical vector (or an object that can be coerced to
logical). The return value is a vector with the same length as test_expression.
This returned vector has element from x if the corresponding value
of test_expression is TRUE or from y if the corresponding value
of test_expression is FALSE.
This is to say, the i-th element of result will
be x[i] if test_expression[i] is TRUE else it will take the value of y[i].
The vectors x and y are recycled whenever necessary.
> a = c(5,7,2,9)
In the above example, the test_expression is a %% 2 == 0 which will result into the
vector (FALSE,FALSE,TRUE ,FALSE).
Similarly, the other two vectors in the function argument gets recycled
to ("even","even","even","even") and ("odd","odd","odd","odd") respectively.
And hence the result is evaluated accordingly.
Switch Statements:
1. First of all it will enter the switch case which has an expression.
2. Next it will go to Case 1 condition, checks the value passed to the condition.
If it is true, Statement block will execute. After that, it will break from that
switch case.
3. In case it is false, then it will switch to the next case. If Case 2 condition is
true, it will execute the statement and break from that case, else it will again
jump to the next case.
4. Now let’s say you have not specified any case or there is some wrong input
from the user, then it will go to the default case where it will print your default
statement.
Output :
[1] 275
For loop:
A for loop is used to iterate over a vector in R programming.
statement
}
Here, sequence is a vector and val takes on each of its value during the loop. In each
iteration, statement is evaluated.
x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x) {
}
print(count)
Output
[1] 3
factorial = 1
if(num < 0) {
} else if(num == 0) {
} else {
for(i in 1:num) {
factorial = factorial * i
Output
Enter a number: 8
for(i in 1:10) {
Output
Enter a number: 7
flag = 0
if(num > 1) {
flag = 1
for(i in 2:(num-1)) {
if ((num %% i) == 0) {
flag = 0
break
if(num == 2) flag = 1
if(flag == 1) {
} else {
Output 1
Enter a number: 25
Output 2
Enter a number: 19
[1] "19 is a prime number"
R while Loop
Loops are used in programming to repeat a specific block of code. In this article,
you will learn to create a while loop in R programming.
In R programming, while loops are used to loop until a specific condition is met.
while (test_expression)
statement
Here, test_expression is evaluated and the body of the loop is entered if the result
is TRUE.
The statements inside the loop are executed and the flow returns to evaluate
the test_expression again.
This is repeated each time until test_expression evaluates to FALSE, in which case,
the loop exits.
i <- 1
while (i < 6) {
print(i)
i = i+1
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# initialize sum
sum = 0
temp = num
while(temp > 0) {
digit = temp %% 10
if(num == sum) {
Output 1
Enter a number: 23
Output 2
n1 = 0
n2 = 1
count = 2
} else {
if(nterms == 1) {
print("Fibonacci sequence:")
print(n1)
} else {
print("Fibonacci sequence:")
print(n1)
print(n2)
nth = n1 + n2
print(nth)
# update values
n1 = n2
n2 = nth
count = count + 1
}
Output
[1] 0
[1] 1
[1] 1
[1] 2
[1] 3
[1] 5
[1] 8
if(num < 0) {
print("Enter a positive number")
} else {
sum = 0
while(num > 0) {
num = num - 1
Output
Enter a number: 10
R repeat loop
A repeat loop is used to iterate over a block of code multiple number of times.
There is no condition check in repeat loop to exit the loop.
We must ourselves put a condition explicitly inside the body of the loop and use
the break statement to exit the loop. Failing to do so will result into an infinite loop.
Syntax of repeat loop
repeat {
statement
In the statement block, we must use the break statement to exit the loop.
x <- 1
repeat {
print(x)
x = x+1
if (x == 6){
break
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Control statements
break statement
A break statement is used inside a loop (repeat, for, while) to stop the iterations and
flow the control outside of the loop.
In a nested looping situation, where there is a loop inside another loop, this statement
exits from the innermost loop that is being evaluated.
if (test_expression) {
break
Note: the break statement can also be used inside the else branch
of if...else statement.
Flowchart of break statement
x <- 1:5
for (val in x) {
if (val == 3){
break
print(val)
}
Output
[1] 1
[1] 2
In this example, we iterate over the vector x, which has consecutive numbers from
1 to 5.
Inside the for loop we have used a if condition to break if the current value is equal
to 3.
As we can see from the output, the loop terminates when it encounters
the break statement.
next statement
A next statement is useful when we want to skip the current iteration of a loop
without terminating it. On encountering next, the R parser skips further evaluation
and starts next iteration of the loop.
if (test_condition) {
next
Note: the next statement can also be used inside the else branch
of if...else statement.
Flowchart of next statement
x <- 1:5
for (val in x) {
if (val == 3){
next
print(val)
}
Output
[1] 1
[1] 2
[1] 4
[1] 5
R Functions
Functions are used to logically break our code into simpler parts which become easy
to maintain and understand.
It’s pretty straightforward to create your own function in R programming.
statement
• Here, we can see that the reserved word function is used to declare a function in R.
• The statements within the curly braces form the body of the function. These braces
are optional if the body contains only a single expression.
• Finally, this function object is given a name by assigning it to a variable, func_name.
Example of a Function
>pow(8, 2)
> pow(2, 8)
Here, the arguments used in the function declaration (x and y) are called formal
arguments and those used while calling the function are called actual arguments.
Named Arguments
In the above function calls, the argument matching of formal argument to the actual
arguments takes place in positional order.
This means that, in the call pow(8,2), the formal arguments x and y are assigned 8
and 2 respectively.
We can also call the function using named arguments.
When calling a function in this way, the order of the actual arguments doesn’t matter.
For example, all of the function calls given below are equivalent.
> pow(8, 2)
> pow(x = 8, y = 2)
> pow(y = 2, x = 8)
[1] "8 raised to the power 2 is 64"
> pow(x=8, 2)
In all the examples above, x gets the value 8 and y gets the value 2.
The use of default value to an argument makes it optional when calling the function.
> pow(3)
> pow(3,1)
Here, y is optional and will take the value 2 when not provided.
R Return Value from Function
Many a times, we will require our functions to do some processing and return back
the result. This is accomplished with the return() function in R.
Syntax of return()
return(expression)
Example: return()
Let us look at an example which will return whether a given number is positive,
negative or zero.
else if (x < 0) {
else {
return(result)
> check(1)
[1] "Positive"
> check(-10)
[1] "Negative"
> check(0)
[1] "Zero"
if (x > 0) {
else if (x < 0) {
else {
result
}
We generally use explicit return() functions to return a value immediately from a
function.
If it is not the last statement of the function, it will prematurely end the function
bringing the control to the place from which it was called.
if (x>0) {
return("Positive")
else if (x<0) {
return("Negative")
else {
return("Zero")
In the above example, if x > 0, the function immediately returns "Positive" without
evaluating rest of the body.
Multiple Returns
The return() function can return only a single object. If we want to return multiple
values in R, we can use a list (or other objects) and return it.
Following is an example.
return(my_list)
Here, we create a list my_list with multiple elements and return this single list.
> a$color
[1] "red"
> a$size
[1] 20
> a$shape
[1] "round"
# Program make a simple calculator that can add, subtract, multiply and divide using
functions
add <- function(x, y) {
return(x + y)
return(x - y)
return(x * y)
return(x / y)
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
Output
[1] "1.Add"
[1] "2.Subtract"
[1] "3.Multiply"
[1] "4.Divide"
Enter choice[1/2/3/4]: 4
R Bar Plot
Bar plots can be created in R using the barplot() function. We can supply a vector or
matrix to this function. If we supply a vector, the plot will have bars with their
heights equal to the elements in the vector.
Let us suppose, we have a vector of maximum temperatures (in degree Celsius) for
seven days as follows.
barplot(max.temp)
This function can take a lot of argument to control the way our data is plotted. You
can read about them in the help section ?barplot.
Some of the frequently used ones are, main to give the title, xlab and ylab to provide
labels for the axes, names.arg for naming each bar, col to define color etc.
We can also plot bars horizontally by providing the argument horiz = TRUE.
barplot(max.temp,
ylab = "Day",
col = "darkred",
horiz = TRUE)
Plotting Categorical Data
Sometimes we have to plot the count of each item as bar plots from categorical data.
For example, here is a vector of age of 10 college freshmen.
Simply doing barplot(age) will not give us the required plot. It will plot 10 bars with
height equal to the student’s age. But we want to know the number of student in each
age category.
This count can be quickly found using the table() function, as shown below.
> table(age)
age
16 17 18 19
1 2 6 1
Now plotting this data will give our required bar plot. Note below, that we define
the argument density to shade the bars.
barplot(table(age),
xlab="Age",
ylab="Count",
border="red",
col="blue",
density=10
)
How to plot higher dimensional tables?
Sometimes the data is in the form of a contingency table. For example, let us take
the built-in Titanic dataset.
This data set provides information on the fate of passengers on the fatal maiden
voyage of the ocean liner ‘Titanic’, summarized according to economic status
(class), sex, age and survival.-R documentation.
> Titanic
Sex
2nd 0 0
3rd 35 17
Crew 0 0
Sex
1st 118 4
2nd 154 13
3rd 387 89
Crew 670 3
Sex
1st 5 1
2nd 11 13
3rd 13 14
Crew 0 0
Sex
1st 57 140
2nd 14 80
3rd 75 76
Crew 192 20
We can see that this data has 4 dimensions, class, sex, age and survival. Suppose we
wanted to bar plot the count of males and females.
In this case we can use the margin.table() function. This function sums up the table
entries according to the given index.
Class
Survived
No Yes
1490 711
[1] 2201
Now that we have our data in the required format, we can plot, survival for example,
as barplot(margin.table(Titanic,4)) or plot male vs female count
as barplot(margin.table(Titanic,2)).
> titanic.data
Class
barplot(titanic.data,
col = c("red","green")
legend("topleft",
c("Not survived","Survived"),
fill = c("red","green")
)
We have used the legend() function to appropriately display the legend.
Instead of a stacked bar we can have different bars for each element in a column
juxtaposed to each other by specifying the parameter beside = TRUE as shown
below.
R Histograms
Histogram can be created using the hist() function in R programming language. This
function takes in a vector of values for which the histogram is plotted.
Let us use the built-in dataset airquality which has Daily air quality measurements
in New York, May to September 1973.-R documentation.
> str(airquality)
We will use the temperature parameter which has 154 observations in degree
Fahrenheit.
hist(Temperature)
We can see above that there are 9 cells with equally spaced breaks. In this case, the
height of a cell is equal to the number of observation falling in that cell.
We can pass in additional parameters to control the way our plot looks. You can read
about them in the help section ?hist.
Some of the frequently used ones are, main to give the title, xlab and ylab to provide
labels for the axes, xlim and ylim to provide range of the axes, col to define color
etc.
Additionally, with the argument freq=FALSE we can get the probability distribution
instead of the frequency.
hist(Temperature,
xlim=c(50,100),
col="darkmagenta",
freq=FALSE
)
Note that the y axis is labelled density instead of frequency. In this case, the total
area of the histogram is equal to 1.
hist(Temperature,
xlim=c(50,100),
col="chocolate",
border="brown",
breaks=c(55,60,70,75,80,100)
R Pie Chart
Pie chart is drawn using the pie() function in R programming . This function takes
in a vector of non-negative numbers.
> expenditure
Let us consider the above data represents the monthly expenditure breakdown of an
individual.
Example: Simple pie chart using pie()
Now let us draw a simple pie chart out of this data using the pie() function.
expenditure<-c(600,300,150,100,200)
pie(expenditure)
We can see above that a pie chart was plotted with 5 slices. The chart was drawn in
anti-clockwise direction using pastel colors.
We can pass in additional parameters to affect the way pie chart is drawn. You can
read about it in the help section ?pie.
Some of the frequently used ones are, labels-to give names to slices, main-to add a
title, col-to define colors for the slices and border-to color the borders.
We can also pass the argument clockwise=TRUE to draw the chart in clockwise
fashion.
pie(expenditure,
labels=as.character(expenditure),
border="brown",
clockwise=TRUE
As seen in the above figure, we have used the actual amount as labels. Also, the chart
is drawn in clockwise fashion.
R Box Plot
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
boxplot(airquality$Ozone)
We can see that data above the median is more dispersed. We can also notice two
outliers at the higher extreme.
We can pass in additional parameters to control the way our plot looks. You can read
about them in the help section ?boxplot.
Some of the frequently used ones are, main-to give the title, xlab and ylab-to provide
labels for the axes, col to define color etc.
Additionally, with the argument horizontal = TRUE we can plot it horizontally and
with notch = TRUE we can add a notch to the box.
boxplot(airquality$Ozone,
ylab = "Ozone",
col = "orange",
border = "brown",
horizontal = TRUE,
notch = TRUE
)
Multiple Boxplots
We can draw multiple boxplots in a single plot, by passing in a list, data frame or
multiple vectors.
Let us consider the Ozone and Temp field of airquality dataset.
boxplot(ozone, temp,
col = c("orange","red"),
border = "brown",
horizontal = TRUE,
notch = TRUE
)
boxplot(Temp~Month,
data=airquality,
xlab="Month Number",
ylab="Degree Fahrenheit",
col="orange",
border="brown"
It is clear from the above figure that the month number 7 (July) is relatively hotter
than the rest.
R Plot Function
In the simplest case, we can pass in a vector and we will get a scatter plot of
magnitude vs index. But generally, we pass in two vectors and a scatter plot of these
points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and (2,5).
Here is a more concrete example where we plot a sine function form range -pi to pi.
x <- seq(-pi,pi,0.1)
plot(x, sin(x))
plot(x, sin(x),
ylab="sin(x)")
Changing Color and Plot Type
We can see above that the plot is of circular points and black in color. This is the
default color.
We can change the plot type with the argument type. It accepts the following strings
and has the given effect.
"p" - points
"l" - lines
plot(x, sin(x),
ylab="sin(x)",
type="l",
col="blue")
Overlaying Plots Using legend() function
Calling plot() multiple times will have the effect of plotting the current graph on the
same window replacing the previous one.
However, sometimes we wish to overlay the plots in order to compare the results.
This is made possible with the functions lines() and points() to add lines and points
respectively, to the existing plot.
plot(x, sin(x),
main="Overlaying Graphs",
ylab="",
type="l",
col="blue")
lines(x,cos(x), col="red")
legend("topleft",
c("sin(x)","cos(x)"),
fill=c("blue","red")
)
We have used the function legend() to appropriately display the legend.
Importing Data
The first step to any data analysis process is to get the data. Data can come from
many sources but two of the most common include text and Excel files.
Text file formats use delimiters to separate the different elements in a line, and
each line of data is in its own line in the text file. Therefore, importing different
kinds of text files can follow a fairly consistent process once you’ve identified the
delimiter.
CSV files can be opened by any spreadsheet program: Microsoft Excel, Open Office,
Google Sheets, etc. You can open a CSV file in a simple text editor as well. It is a very
widespread and popular file format for storing and reading data because it is simple
and it’s compatible with most platforms. But this simplicity has some disadvantages.
CSV is only capable of storing a single sheet in a file, without any formatting and
formulas.
Here’s an example CSV spreadsheet:
Rank,Movie,Director,Year,Gross profit
Base R functions
read.table() is a multipurpose work-horse function in base R for importing data.
The functions read.csv() and read.delim() are special cases of read.table() in which
the defaults have been adjusted for efficiency.
View(mydata)
str(mydata)
str(mydata)
## 'data.frame': 3 obs. of 3 variables:
## $ variable.1: int 10 25 8
## $ variable.2: Factor w/ 3 levels "beer","cheese",..: 1 3 2
## $ variable.3: logi TRUE TRUE FALSE
str(mydata_2)
## 'data.frame': 3 obs. of 3 variables:
## $ variable.1: int 10 25 8
## $ variable.2: chr "beer" "wine" "cheese"
## $ variable.3: logi TRUE TRUE FALSE
As previously stated read.csv is just a wrapper function for read.table but with
adjusted default arguments. Therefore, we can use read.table to read in this same
data. The two arguments we need to be aware of are the field separator (sep) and
the argument indicating whether the file contains the names of the variables as its
first line (header). In read.table the defaults are sep = "" and header =
FALSE whereas in read.csv the defaults are sep = "," and header = TRUE.
In addition to .csv files, there are other text files that read.table works with. The
primary difference is what separates the elements. For example, tab delimited text
files typically end with the .txt and .tsv extensions. You can also use
the read.delim() function as, similiar to read.csv(), read.delim() is a wrapper
of read.table() with defaults set specifically for tab delimited files. We can read in
this .txt file with the following:
library(readr)
mydata_3 <- read_csv("mydata.csv")
mydata_3
## variable 1 variable 2 variable 3
## 1 10 beer TRUE
## 2 25 wine TRUE
## 3 8 cheese FALSE
str(mydata_3)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
## $ variable 1: int 10 25 8
## $ variable 2: chr "beer" "wine" "cheese"
## $ variable 3: logi TRUE TRUE FALSE
From Excel
you can use the xlsx package to access Excel files. The first row should contain
variable/column names.
# read in the first worksheet from the workbook myexcel.xlsx
library(xlsx)
library(readxl)
str(mydata)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 3 variables:
## $ variable 1: num 10 25 8
## $ variable 2: chr "beer" "wine" "cheese"
## $ variable 3: logi TRUE TRUE FALSE
Exporting Data
This section will cover how to export data to text files, Excel files (along with
some additional formatting capabilities)
Base R functions
write.table() is the multipurpose work-horse function in base R for exporting data.
The functions write.csv() and write.delim() are special cases of write.table() in
which the defaults have been adjusted for efficiency. To illustrate these functions
let’s work with a data frame that we wish to export to a CSV file in our working
directory.
df
## var1 var2 var3
## billy 10 beer TRUE
## bob 25 wine TRUE
## thornton 8 cheese FALSE
To export df to a .csv file we can use write.csv(). Additional arguments allow you
to exclude row and column names, specify what to use for missing values, add or
remove quotations around character strings, etc.
readr package
The readr package uses write functions similar to base R. However, readr write
functions are about twice as fast and they do not write row names. One thing to
note, where base R write functions use the file = argument, readr write functions
use path =.
library(readr)
As previously mentioned, many organizations still rely on Excel to hold and share
data
xlsx package
The xlsx package provides exporting and formatting capabibilities for Excel 2007
and Excel 97/2000/XP/2003 file formats. Although these file formats are a bit
outdated this package provides some nice formatting options. Saving a data frame
to a .xlsx file is as easy as saving to a .csv file:
library(xlsx)