Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

R - Programming - Moduel 1 - Module 4

The document outlines a course on R programming. It covers 5 modules including numeric, arithmetic, assignment, vectors, matrices and arrays, lists and data frames, functions, and pointers to further programming techniques. It also includes course objectives, outcomes, textbook references, assessment details and module notes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

R - Programming - Moduel 1 - Module 4

The document outlines a course on R programming. It covers 5 modules including numeric, arithmetic, assignment, vectors, matrices and arrays, lists and data frames, functions, and pointers to further programming techniques. It also includes course objectives, outcomes, textbook references, assessment details and module notes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

R - Programming

Course Code: 21CB484


Course Instructor: Praahas Amin
Department of CSBS
Canara Engineering College
PAGE 1
Module 1 - Numeric, Arithmetic, Assignment, and Vectors

Module 2 - Matrices and Arrays

Course Module 3 - Lists and Data Frames


Outline
Module 4 - Functions

Module 5 – Pointers to Further Programming Techniques

PAGE 2
Course Objectives
▪ Explore and understand R and R Studio interactive
environment. Course
▪ To learn and practice programming techniques using R
programming. Objectives
▪ Read Structured Data into R from various sources.
▪ Understand the different data Structures, data types in R.
▪ To develop small applications using R Programming

PAGE 3
CO Outcomes RBT Level
Understand the fundamental syntax of R data types,
CO1 L2
expressions and the usage of the R-Studio IDE
Apply critical programming language concepts of
CO2 control structures in R for conditional branching and L3
looping Course
Apply the List and Data Frame data structures of R
CO3 programming language and import data into R
programs
L3 Outcomes
Utilize the functions in R-Programs and understand
CO4 L3
their scope in R language.
Use advanced R concepts of debugging and object
CO5 L3
oriented concepts

PAGE 4
CO-PO-PSO Mapping

Program
Course
Program Outcomes Specific
Outcomes
Outcomes

CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2

CO1 2 2 3 3 1 1 1 2
CO2 2 2 3 3 1 1 1 2
CO3 2 2 3 3 1 1 1 2
CO4 2 2 3 3 1 1 1 2
CO5 2 2 3 3 1 1 1 2

PAGE 5
1. Jones, O., Maillardet. R. and Robinson, A. (2014).
Introduction to Scientific Programming and
Simulation Using R. Chapman & Hall/CRC, The R
Series. Text Books
2. Michael J. Crawley, “Statistics: An Introduction using &
R” Second edition, Wiley,2015
References
3. Wickham, H. & Grolemund, G. (2018). for Data
Science. O’Reilly New York. Available for free at
http://r4ds.had.co.nz/

PAGE 6
Assessment Details
Teaching Hours/Week(L:T:P:S) 0:2:0:0
Total Hours 24
Credits 01
CIE (1 hour) 3
Assignments 2
Quiz/GD/Seminar (1 Hour) 1
SEE (1 Hour) 1
CIE Test Marks 20 Marks
Assignment Marks
Quiz/GD/Seminar Marks
10 Marks
20 Marks
Assessment
CIE Marks 50
SEE Marks 50
Total Marks 100
CIE Type MCQ
SEE Type MCQ
Min Passing Marks CIE 40% of Max (i.e 20/50)
Min Passing Marks SEE 35% of Max(18/50)
Total Min Passing Marks 40% of Total Max (40/100) PAGE 7
Numeric, Arithmetic, Assignment, and Vectors:
R for Basic Math, Arithmetic, Variables,
Functions, Vectors, Expressions and
Assignments and Logical expressions. Module 1
Text Book 1: Chapter 2(2.1 to 2.7)

PAGE 8
Variables
▪ A placeholder to hold a value (like a folder) Note:
▪ Can place a value in it, operate on it or modify it bu
the name of the placeholder remains the same. ▪ To display the value of the variable we can
use print(x) or show(x)
▪ Assigning Values to variables
▪ x <- 2.5 ▪ To get the datatype of the variable we can
▪ x = 2.5 use typeof(x)
▪ Variables are created when the values are assigned ▪ We can show outcome of na assignment by
top them. surrounding with parenthesis
▪ Naming of Variables ▪ X<-200
▪ Any name made up of letters, numbers and . Or_ ▪ (y<-(1+1/x)^x)
▪ Name should start with letter or . then a letter.
▪ Names are case-sensitive
▪ Use informative names for readability
▪ When assigning values to a variable, the
expression on the RHS is evaluated first and then
the value is placed in the variable on the LHS

Module 1 Module 3 Module 4 Module 5 PAGE 9


Module 2
Data Types in R
▪ Integer
▪ x <- 2L
▪ x = 2L
▪ Double
▪ x <- 2.5
▪ Complex ▪ typeof(x) - > shows the datatype of the
▪ x <- 3+2i variable
▪ Character
▪ x <- “h”
▪ Logical
▪ x <- T
▪ x <- F
▪ x <- TRUE
▪ x <- FALSE

Module 1 Module 3 Module 4 Module 5 PAGE 10


Module 2
Arithmetic Operators
▪ [1] that prefixes output indicates that this is
▪ Addition item 1 in a vector of output.
▪ x1 <- 2L
▪ x2 <- 2.5 ▪ R by default displays only 7 significant digits.
▪ x <- x1+x2 #x will have 4.5 as result ▪ The display can be changed to display “x”
▪ Subtraction digits, by using options(digits=x)
▪ x <- 4-5 #This will store -1 in x
▪ Multiplication ▪ This however does not guarantee accuracy to
▪ x <- 3*2 #This will store 6 in x x digits.
▪ Division
▪ x <- 3/2 #This will store 1.5 in x
▪ Exponentiation
▪ x <- 3^2 #This will store 9 in x
▪ Modulus
▪ x <- 3%%2 #This will store remainder 1 in x
▪ Integer Division
▪ x <- 17%/%5 #This will store Quotient 3 in x

Module 1 Module 3 Module 4 Module 5 PAGE 11


Module 2
Functions
▪ To find out aboput default values and
▪ Functions take 1 or more arguments or inputs and alternative usages use help(function_name)
produces 1 or more outputs or return values ▪ Eg: help(fname)
▪ Eg: seq( from=1, to=9, by=2) #O/p – [1] 1 3 5 7 9
▪ Or ?fname
▪ Eg: seq( from=1, to=9, by= - 2) #O/p – [1] 9 7 5 3 1
▪ Some arguments are optional and have a ▪ If we just call function name without
predefined value if we omit it. (Here by=1 by
default) parenthesis for arguments, we see the object
▪ Eg: seq( from=1, to=9) #O/p – [1] 1 2 3 4 5 6 7 8 9 type
▪ Functions can have no arguments at all.
▪ Arguments can be constant,variable,another
▪ To see a demonstration of a function use
function call or an algebraic combination of these demo(function_name)
▪ Eg: seq(1, x, x/3) ▪ Eg:demo(graphics)
▪ Order of arguments
▪ Every function has a default order for arguments.
▪ If arguments are provided in same order, then
naming the arguments is not required
▪ If the argument names are not provided in the
default order, then their names must be provided

Module 1 Module 3 Module 4 Module 5 PAGE 12


Module 2
Vectors
▪ Vector is an indexed list of variables. ▪ (x <- seq(1,20, by=2) )
▪ It is a data structure that has a name and within ▪ [1] 1 3 5 7 9 11 13 15 17 19
it there are different variables that are labelled
sequentially ▪ (y <- rep(3,4))
▪ Labelling of variables within a Vector is as ▪ [1] 3 3 3 3
1,2,3,4….
▪ Observe that the first index is 1 and not 0 ▪ (z <- c(y,x))
▪ Vectors are created the first time values are ▪ [1] 3 3 3 3 1 3 5 7 9 11 13 15 17 19
assigned to it, just like variables
▪ A variable is a vector of length 1 called atomic
▪ Shorthand seq(from,to,by=1) seq(from,to,by= -1):
▪ (x <- 100:105) #100 101 102 103 104 105
▪ Top create vectors of length greater than 1, we
use functions that produce vetor-valued ▪ To get a sequence from 1 to n+1 use 1:(n+1)
output. ▪ : takes precedence over *,+,/,-
▪ c(…) # Combine ▪ n <- 5(x <- 1:n+1) # 2 3 4 5 6
▪ seq(from,to,by) #Sequence ▪ (y <- 1:(n+1)) # 1 2 3 4 5 6
▪ rep(x, times) #Repeat

Module 1 Module 3 Module 4 Module 5 PAGE 13


Module 2
Vectors
▪ Element “i” of vector “x” is referred using x[i]. ▪ length(x) gives the number of elements of x.
▪ If “i” is a vector of positive integers, then x[i] ▪ It is possible to have a vector with no elements
corresponds to subvector of “x” ▪ x<-c()
▪ If “i” is a vector of negative integers, then x[i] ▪ length(x) # [1] 0
then corresponding values “x” are omitted
▪ (x<-100:110) #100 101 102 103 104 105 106 107
108 109 110
▪ i <- c(1, 3, 2)
▪ j <- c(-1, -2, -3)
▪ x[i] # [1] 100 102 101
▪ x[j] # [1] 103 104 105 106 107 108 109 110
▪ Square brackets can be used to get or set the
value
▪ x[1]<-3000# #[1] 3000 101 102 103 104 105 106
107 108 109 110

Module 1 Module 3 Module 4 Module 5 PAGE 14


Module 2
Vectors
▪ Algebraic operations on vectors act element-wise ▪ R will still duplicate the shorter vector even if it
▪ x <- c(1,2,3) cannot match the longer vector with a whole
▪ y <- c(4,5,6) number of multiples, but will produce a warning
▪ x*y # [1] 4 10 18 ▪ c(1,2,3) + c(1,2) # [1] 2 4 4
▪ x+y # [1] 5 7 9 Warning message:
▪ y^x # [1] 4 25 216 In c(1,2,3) + c(1,2) :
▪ When algebraic expressions are applied on two Longer object length is not a multiple of shorter
vectors of unequal lengths, R automatically object length.
repeats the shorter vector until it has something
that has the same length as the longer vector.
▪ c(1,2,3,4) + c(1,2) # [1] 2 4 4 6
▪ (1:10)^c(1,2) # [1] 1 4 3 16 5 36 7 64 9 100
▪ 2+c(1,2,3) # [1] 3 4 5
▪ 2*c(1,2,3) # [1] 2 4 6
▪ (1:10)^2 # [1] 1 4 9 16 25 36 49 64 81 100

Module 1 Module 3 Module 4 Module 5 PAGE 15


Module 2
Vectors
▪ Useful set of functions that take Vector arguments ▪ Example Numerical Integration
are: ▪ dt <- 0.005
▪ sum(), prod(), max(), min(), sqrt(), sort(), mean(x), ▪ t <- seq(0,2*pi, by =dt)
var(x) ▪ ft <- sin(t)
▪ Note that functions applied to a vector may be ▪ (I <- sum(ft)*dt) # [1] 0.5015487
defined to act element-wise or may act on the whole ▪ * t is a vector. ft is also a vector.
vector input and return a result ▪ plot(t,ft)
▪ sqrt(1:4) # [1] 1.00000 1.414214 1.732051 2.000000 ▪ Note: when using plot(x,y,type),x and y mus be
▪ mean(1:6) # [1] 3.5 vectors of same length
▪ sort(c(5,1,3,4,2)) # [1] 1 2 3 4 5 ▪ Type can be set to ”p” (default i.e points), “l”
(lines),”o” points over lines etc

▪ Example: Mean and Variance ▪ Example: Exponential Limit


▪ x <- c(1.2, 0.9, 0.8, 1.0, 1.2) ▪ x <- seq(10,200,by=10)
▪ x.mean <- sum(x)/length(x)
▪ x.mean – mean(x) #[1] 0 ▪ y <- (1+1/x)^x
▪ x.var <- sum((x-x.mean)^2)/(length(x) -1) ▪ exp(y) – y
▪ x.var – var(x) # [1] 0
▪ plot(x,y)
Module 1 Module 3 Module 4 Module 5 PAGE 16
Module 2
Missing Data
▪ In Real experiments certain observations ▪ a <- NA
maybe missing for one reason or another
▪ is.na(a) # [1] TRUE
▪ Missing data can be ignored or imputed
(invented) depending on the statistical analysis ▪ a <- c(11,NA,13)
involved. ▪ is.na(a) # [1]FALSE TRUE FALSE
▪ Represented in R using NA
▪ any(is.na(a)) #[1] TRUE
▪ NA can be thought of as placeholders for
values that should have been there but are ▪ mean(a) #[1] NA
missing ▪ mean(a,na.rm=TRUE) #[1] 12 NAs can be removed
▪ We can check for missing values using “is.na”
▪ NA is not same as NULL
▪ NA is a placeholder for something that is
missing. NULL is something that never existed
at all

Module 1 Module 3 Module 4 Module 5 PAGE 17


Module 2
Expressions & Assignments

▪ Expression is used to denote a phrase of code


that can be executed
▪ Eg: seq(10,20, by=3) #[1] 10 13 16 19
▪ Eg: 4 #[1] 4
▪ Eg: mean(c(1,2,3)) #[1] 2
▪ Eg: 1>2 #[1] FALSE
▪ If the evaluation of an expression is saved
using the <- operator, then the combinationis
called assignment
▪ Eg: x1 <- seq(10,20, by=3)
▪ Eg: x2 <- 4
▪ Eg: x3 <- mean(c(1,2,3))
▪ Eg: x4 <- 1>2

Module 1 Module 3 Module 4 Module 5 PAGE 18


Module 2
Logical Expressions
▪ Logical Expression is formed using comparison operators and
the logical operators ▪ &&, || - These logical operators consider only
▪ Value of logical expression is always TRUE or FALSE.
the first element of the vectors and give a
vector of single element as output.
▪ Integers 1 & 0 can represent TRUE & FALSE
respectively(coercion) ▪ They work only on scalars
▪ <, >, <=, >=, ==, != ▪ v <- c(3,0,TRUE,2+2i)
▪ &, |, ! (Elementwise Logical and, or and not operation)
▪ Works on vectors on an element by element basis
▪ t <- c(1,3,TRUE,2+3i)
▪ v <- c(3,1,TRUE,2+3i) ▪ print(v&&t) # [1] TRUE
▪ t <- c(4,1,FALSE,2+3i)
▪ print(v&t) #[1] TRUE TRUE FALSE TRUE
▪ v <- c(0,0,TRUE,2+2i)
▪ v <- c(3,0,TRUE,2+2i) ▪ t <- c(0,3,TRUE,2+3i)
▪ t <- c(4,0,FALSE,2+3i)
▪ print(v|t) #[1] TRUE FALSE TRUE TRUE ▪ print(v||t) # [1] FALSE

▪ v <- c(3,0,TRUE,2+2i)
▪ print(!v) # [1] FALSE TRUE FALSE FALSE

▪ c(0,0,1,1) | c(0,1,0,1) #[1] FALSE TRUE TRUE TRUE


▪ xor(c(0,0,1,1),c(0,1,0,1)) #[1] FALSE TRUE TRUE FALSE

Module 1 Module 3 Module 4 Module 5 PAGE 19


Module 2
Logical Expressions
▪ Logical Expressions can be applied to Vectors to ▪ subset function can also be used for selecting a
produce vectors of TRUE/FALSE values. This can be subvector of x
used for selecting subvectors using indexing ▪ One difference between subset function and
operation. using the index operator is that subset function
will ignore the missing index values(NA),
▪ Eg: Find all integers between 1 & 20 that are whereas the x[subset] preserves the NA values
divisible by 4. ▪ x <- c(1,NA,3,4)
▪ x <- 1:20 # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ▪ x>2 #[1] FALSE NA TRUE TRUE
14 15
▪ x[x>2] #[1] NA 3 4
[16] 16 17 18 19 20 ▪ subset(x,subset=x>2) #[1]3 4
▪ x %% 4 == 0 # [1] FALSE FALSE FALSE TRUE FALSE ▪ Another difference between subset(x,subset=?)
FALSE FALSE and x[?] is that the latter accepts expressions
[8] TRUE FALSE FALSE FALSE TRUE that resolve to integer or logical objects,
FALSE FALSE whereas the former only works with logical
[15] FALSE TRUE FALSE FALSE objects.
FALSE TRUE ▪ To know the positions of TRUE elelemts of a
▪ (y <- x[x %% 4 == 0]) #[1] 4 8 12 16 20 logical vector x use which(x)
▪ x<- c(1,1,2,3,5,8,13)
▪ The result of x[subset] is that subvector of x for ▪ which(x%%2==0) #[1] 3 6
which corresponding elements of subset are TRUE

Module 1 Module 3 Module 4 Module 5 PAGE 20


Module 2
Rounding Error
▪ *Note: Rounding Error
▪ Only integers and fractions whose
denominator is a power of 2 can be exactly
represented with Floating Point
representation. All other numbers are subject
t torounding error.
▪ 2*2==4 #[1] TRUE
▪ sqrt(2)*sqrt(2)==2 #[1] FALSE
▪ sqrt(2) has a rounding error which gets
amplified when we square it.
▪ all.equal(x,y) #Returns TRUE if difference
between x and y is smaller than some set
tolerance based on R’s operational level of
accuracy.

Module 1 Module 3 Module 4 Module 5 PAGE 21


Module 2
Questions
▪ *

Module 1 Module 3 Module 4 Module 5 PAGE 22


Module 2
Assignment
▪ *

Module 1 Module 3 Module 4 Module 5 PAGE 23


Module 2
Matrices and Arrays and Conditions and
Looping: Defining a Matrix, Sub-setting, Matrix
Operations, if statements, looping with for,
looping with while, vector based programming. Module 2
Text Book 1: Chapter 2- 2.8, chapter 3- 3.2 to
3.5

PAGE 24
Matrices
▪ (A <- matrix(1:6, nrow=2, ncol=3,byrow=TRUE)
▪ Matrix – It is created from a vector using the [,1] [,2] [,3]
matrix function [1,] 1 2 3
▪ matrix(data,nrow=1,ncol=1,byrow=FALSE)
[2,] 4 5 6
▪ data is a vector of length at most nrow*ncol
▪ dim(A) returns the dimensions of a matrix
▪ nrow – No. of Rows (default value 1)
▪ dim(A) #[1] [1] 2 3
▪ ncol – No. of Columns (default value 1)
▪ byrow used to define whether to fill the ▪ Creating Diagonal Matrix
matrix by elements of data, row-by-row or ▪ Use diag(x)
column-by-column. ▪ Joining Matrices with rows of same length (
▪ byrow defaults to FALSE Stacking Vertically)
▪ If length(data) is less than nrow*ncol, then ▪ Use rbind(…)
data is re-used as many times as needed. ▪ Joining Matrices with columns of the same length
( Stacking Horizontally)
▪ Use cbind(…)

Module 2 Module 5 PAGE 25


Module 1 Module 3 Module 4
Matrices
▪ (B<-diag(c(1,2,3)))
▪ [,1] [,2] [,3]
▪ Elements of Matrices are referenced using
▪ [1,] 1 0 0
two indices
▪ [2,] 0 2 0
▪ A[1,3] <- 0 # Sets first row, third column value ▪ [3,] 0 0 3
to 0 ▪ Algebraic operations incluing *(multiply) act
▪ [,1] [,2] [,3] elementwise on matrices
▪ [1,] 1 2 0 ▪ To perform Matrix Multiplication we use %*%
▪ [2,] 4 5 6 ▪ Functions for use with Matrices
▪ A[, 2:3] # Reference All Rows and columns ▪ nrow(x)
from 2 to 3 ▪ ncol(x)
▪ [,1] [,2] ▪ det(x) #Determinant
▪ t(x) #Transpose
▪ [1,] 2 0
▪ solve(A,B) #returns x such that A%*%x==B
▪ [2,] 5 6 ▪ solve(A) #If A is invertible, matrix inverse of A is
returned

Module 2 Module 5 PAGE 26


Module 1 Module 3 Module 4
Matrices
▪ Algebraic operations incluing *(multiply) act elementwise on matrices.
To perform Matrix Multiplication we use %*%
▪ Elements of Matrices are referenced using two ▪ Functions for use with Matrices
indices ▪ nrow(x) , ncol(x), det(x)(Determinant), t(x)(Transpose)
▪ solve(A,B) #returns x such that A%*%x==B
▪ A[1,3] <- 0 # Sets first row, third column value ▪ solve(A) #If A is invertible, matrix inverse of A is returned
to 0
▪ (A <- matrix(c(3,5,2,3),nrow=2,ncol=2))
▪ [,1] [,2] [,3]
▪ [1,] 1 2 0 ▪ (B <- matrix(c(1,1,0,1),nrow=2,ncol=2))
▪ [2,] 4 5 6 ▪ A%*%B
▪ [,1] [,2]
▪ A[, 2:3] # Reference All Rows and columns from ▪ [1,] 5 2
2 to 3 ▪ [2,] 8 3
▪ [,1] [,2] ▪ A*B
▪ [1,] 2 0 ▪ [,1] [,2]
▪ [2,] 5 6 ▪ [1,] 3 0
▪ [2,] 5 3
▪ (B<-diag(c(1,2,3)))
▪ A.inv <- solve(A)
▪ [,1] [,2] [,3]
▪ [1,] 1 0 0 ▪ [,1] [,2]
▪ [2,] 0 2 0 ▪ [1,] -3 2
▪ [3,] 0 0 3 ▪ [2,] 5 -3

Module 2 Module 5 PAGE 27


Module 1 Module 3 Module 4
Matrices
▪ R prints out a vector x as a row vector, however in matrix
operations it will treat x as either a row or column vector in an
attempt to make the components conformable.
▪ A <- matrix(1:9, nrow=3, ncol=3,byrow=TRUE) ▪ A <- matrix(c(3,5,2,3), nrow=2, ncol=2)
▪ [,1] [,2] [,3] ▪ [,1] [,2]
▪ [1,] 3 2
▪ [1,] 1 2 3 ▪ [2,] 5 3
▪ [2,] 4 5 6 ▪ (x<-c(1,2))
▪ [1] 1 2
▪ [3,] 7 8 9 ▪ A%*%x
▪ [,1]
▪ In R, a Matrix is stored as a vector with an ▪ [1,] 7
added dimension attribute, which gives ▪ [2,] 11

number of rows and column. Matrix elements ▪ Now t(x) treats x as a column vector by default and produces an
array with the fixed dimension attributes of a row vector
are stored clumnwise in the vector. ▪ [,1] [,2]
▪ [1,] 1 2
▪ Here refrencing using single index is : A[1] = 1 ,
A[2] = 4, A[3] = 7, A[4] = 2 , A[5] = 5, A[6] = 8, ▪ A%*%t(x) #Error in A %*% t(x) : non-conformable arguments
A[7] =3 , A[8]=6, A[9] =9 ▪ To check if na objectis a matrix or a vector you can use is.matrix(x)
and is.vector(x)
▪ Matheematically speaking they’re equivalent but they’re treated
as different objects in R.

Module 2 Module 5 PAGE 28


Module 1 Module 3 Module 4
Matrices
▪ Sometimes it is convenient to arrange objects in
▪ To create a matrix A with one column from a arrays of more than two dimensions.
vector x, we use as.matrix(x) ▪ This is done with Arrays
▪ A <- as.matrix(x) ▪ array(data,dim)
▪ To create a vector x from the columns of a ▪ data is a vector containing the elements of the array
matrix A, we use as.vector(A) ▪ dim is a vector whose length is the number of
▪ x <- as.vector(A) dimensions and whose elements give the size of the
arrayalonh each dimensional axis
▪ This just strips the dimension attribute from A
and leaves the elements as they are (Stored ▪ To fill the array you need length(data) equal to
Clolumnwise) prod(dim)
▪ This process of changing object type is called
“coercion”
▪ In many instances R will implicitly coerce the
type of an object in order to apply specified
operations or functions

Module 2 Module 5 PAGE 29


Module 1 Module 3 Module 4
*The Workspace

▪ Objects created in R exist until explicitly deleted or session is concluded.


▪ To list all currently defined objects – ls() or objects()
▪ To remove object x use rm(x)
▪ To remove all currently defined objects -> rm(list=ls())
▪ To save all existing objects to a file fname -> save.image(file=“fname”)
▪ To save objects x and y -> save(x,y, file=“fname”)
▪ To load a set of saved objects -> load(file=“fname”)
▪ Whjen quitting R, if you save the data when prompted , then the objects will be stored in file
.Rdata in the current working directory
▪ R keeps a record of all commands you type. To save the history use savehistory(file=“fname”)
and for loading use loadhistory(file=“fname”)
▪ IF workspace image is saved when quitting, then current history is saved in .Rhistory in current
working directory

Module 2 Module 5 PAGE 30


Module 1 Module 3 Module 4
Branching with if
▪ Braces { } are aused to group together one or
▪ Useful to choose the execution of some or more expressions
other part of a program depending on
condition. ▪ If there is only one expression, then bracews
▪ if(logical_expression){ are optional
expression_1 ▪ During evaluation of an “if” expression, if the
... logical_expression evaluates to TRUE, then
} the first group of expressions is executed and
▪ if(logical_expression){ the second group is not executed.
expression_1 ▪ During evaluation of an “if” expression, if the
... logical_expression evaluates to FALSE, then
} else { only the second group of expressions is
executed and the first group is not executed.
expression_2
... ▪ If statements can be nested to create
} elaborate pathways through a program

Module 2 Module 5 PAGE 31


Module 1 Module 3 Module 4
Branching with if
▪ Example – Find the roots of a quadratic
▪ Else part is optional and if the “if” statement is equation
finished before it sees the “else” part on a ▪ #find the zeros of a2*x2+a1*x+a0 =0
written on new line, then R treats else as the
start of a new command. Since there is no ▪ #clear the workspace
command starting with else, it will give an error
▪ rm(list=ls())
▪ if(logical_expression){
expression_1 ▪ #Input
▪ a2<-1
...
▪ a1<-4
} ▪ a0<-5
else { #This will cause an error
▪ #Calculate the Discriminant
expression_2 ▪ discrim <- a1^2 - 4*a2*a0
...
▪ #Calculate the roots depending on the value
} of the discriminant

Module 2 Module 5 PAGE 32


Module 1 Module 3 Module 4
Branching with if
▪ If(discrim>0){
▪ Example – Find the roots of a quadratic roots<-c((-a1+sqrt(a1^2-4*a2*a0))/(2*a2),
equation (-a1-sqrt(a1^2-4*a2*a0))/(2*a2))
} else {
▪ #find the zeros of a2*x2+a1*x+a0 =0
If(discrim==0){
▪ #clear the workspace rootsa1-/(2*a2)
▪ rm(list=ls())
}else{
▪ #Input roots<-c()
▪ a2<-1 }
▪ a1<-4
}
▪ a0<-5
#Output
▪ #Calculate the Discriminant show(roots)
▪ discrim <- a1^2 - 4*a2*a0
▪ #Calculate the roots depending on the value
of the discriminant ▪ Modify code to handle a2=0

Module 2 Module 5 PAGE 33


Module 1 Module 3 Module 4
Branching with if
▪ if(logical_expression_1){ ▪ if(logical_expression_1){
expression_1 expression_1
... ...
} else { } else if(logical_expression_2) {
expression_2 expression_2
... ...
} else { } else {
expression_3 expression_3
... ...
} }

Module 2 Module 5 PAGE 34


Module 1 Module 3 Module 4
Looping with for
▪ Example – Summing a vector
▪ for command executes the group of ▪ (x_list <- seq(1,9 by = 2)) #[1]1 3 5 7 9
expressions within braces { } once for each
element of vector. ▪ sum_x <- 0

▪ The grouped expression can makes use of x ▪ for(x in x_list){


which takes on each of the values of the sum_x <- sum_x + x
elements of the vector as the loop repeats.
cat(“The current loop element is”,x,”\n”)
The vector can be a list
▪ for(x in vector){ cat(“The cumulative total is”,sum_x,”\n”)
expression_1 }
...
}
▪ Built in function for same
▪ cat – concatenate – allows us to combine text
▪ sum(x_list)
and variables together and display. Unlike
print and show

Module 2 Module 5 PAGE 35


Module 1 Module 3 Module 4
Looping with for
▪ Calculate n factorial 1 (n!)
▪ #clear the workspace
▪ rm(list=ls())
▪ #input
▪ n <- 6
▪ #Calculation
▪ n_factorial <- 1
▪ for(i in 1:n){
n_factorial <- n_factorial*I
}
▪ #Output
▪ show(n_factorial) #[1] 720
▪ Alternate methods:
▪ prod(1:n)
▪ factorial(n)

Module 2 Module 5 PAGE 36


Module 1 Module 3 Module 4
Looping with for
▪ Example – Pension Value – Forecast Pension growth under compound
interest
▪ #clear the workspace
▪ rm(list=ls())
▪ #input
▪ r <- 0.11 #Annual Rate of Interest
▪ term <- 10 #forecast duration
▪ period <- 1/12 #Time between payments in years
▪ payments <- 100 #Amount deposited each period
▪ #Calculations
▪ n <- floor(term/period) #Number of payments
▪ pension <- 0
▪ for(i in 1:n){
pension[i+1] <- pension[i] * (1+r*period)+payments
}
time <- (0:n)*period
▪ #Output
▪ plot(time,pension)

Module 2 Module 5 PAGE 37


Module 1 Module 3 Module 4
Looping with for
▪ Program 1 ▪ Program 1 is faster than Program to achieve the
n <- 1000000 same result
x <- rep(0:n) ▪ Changing the size of a vector takes about as long
for(i in 1:n){ as creating a new vector does.
x[i] <- I ▪ R needs to reconsider uts allocation of memory
} to the object each time the size changes
▪ In program 1, x is already preallocated memory
▪ Program 2 – Preallocation
n <- 1000000
▪ In program 2, size of vector x is changed with
x <- 1 every execution opf x[i] <- i – Redimensioning
for(i in 2:n){
x[i] <- i
}

Module 2 Module 5 PAGE 38


Module 1 Module 3 Module 4
Looping with while
▪ When we do not know beforehand how ▪ When a while command is executed,
many times we need to go around a loop, we logical_expression is evaluated first.
check some condition to see if we are done ▪ If it evaluates to TRUE, then the group of
yet. – While loop is used for this expressions in braces { } is executed.
▪ Control is then passed to start of the command.
▪ while(logical_expression){ If logical_expression is still TRUE, the grouped
expression_1 expression are executed again.

... ▪ For the loop to stop, the logical_expression must


evaluate to FALSE and this usally depends on a
} variable that is modified within the grouped
expressions\
▪ While loop is more fundamental than for loop as
we can always rewrite a for loop as a while loop

Module 2 Module 5 PAGE 39


Module 1 Module 3 Module 4
Looping with while
▪ Example – Compound Interest – Duration of Loan
▪ Example - Fibonacci Sequence under compound interest
#clear the workspace
▪ #clear the workspace ▪ rm(list=ls())
▪ rm(list=ls())
▪ #inputs
▪ #initialize variables ▪ r <- 0.11 #Annual Rate of Interest
▪ F <- c(1,1) #List of Fibonacci numbers ▪ period <- 1/12 #Time between repayments in years
▪ n <- 2 #length of F ▪ debt_initial <- 1000
▪ #Iteratively calculate new Fibonacci Numbers ▪ repayments <- 12 #Amount repayed each period
while(F[n] <=100){ ▪ #Calculations
#cat("n =",n, "F[n] =", F[n], "\n" ) ▪ time <- 0
n <- n+1 ▪ debt <- debt_initial
F[n] <- F[n-1]+F[n-2] ▪ while(debt>0){
} time <- time+period
▪ #Output debt <- debt*(1+r*period) - repayments
▪ cat("The First Fibonacci Number >100 is F(",n, ") = ", }
F[n], "\n" )
▪ #Output
▪ cat(" Loan will be repaid in ", time, " years\n" )

Module 2 Module 5 PAGE 40


Module 1 Module 3 Module 4
Vector Based Programming
▪ Often it is necessary to perform operations ▪ Find Sum of First n Squares
on each element of a vector. ▪ i.e n(n+1)(2n+1)/6
▪ R is designed so that such tasks can be ▪ Looping
accomplished using vector operations rather ▪ n<-100
than looping ▪ S<-0
▪ Vector operations are more efficient and ▪ for(i in 1:n){
concise literally. S <- S+i^2
}
▪ In the Vector Operation:
S #[1] 338350
▪ R interprests 1:n as “integers from 1 upto n,
inclusive” then squared those integers usiong ▪ Vector Operation
the vectorized “^2” and then added them up ▪ sum((1:n)^2) #[1] 338350
in sum

Module 2 Module 5 PAGE 41


Module 1 Module 3 Module 4
Vector Based Programming
▪ x <- c(-2,-1,1,2)
▪ Ifelse function performs element-wise
conditional evaluation upon a vector ▪ ifelse(x>0, "Positive", "Negative")
▪ ifelse(test, A,B) ▪ #[1] "Negative" "Negative" "Positive" "Positive"
▪ x>0 is the test logical expression that evaluates if
▪ It takes 3 vector arguments elements of vector x are greater than 0 or not.
▪ A logical expression test ▪ “Positive” is the expression returned as the result of
▪ Two expressions A and B evaluation of vector element whose value is greater
than 0
▪ The function returns a vector that is a ▪ “Negative” is the expression returned as the result of
combination of the evaluated expressions A evaluation of vector element whose value is not
and B greater than 0
▪ The elements of A that correspond to ▪ The final result returned is a vector of results of
elements of test that are TRUE evaluation of all the values of x
▪ The elements of B that correspond to ▪ pmin and pmax provide vectorized versions of the
elements of test that are FALSE minimum and maximum
▪ pmin(c(1,2,3),c(3,2,1),c(2,2,2)) #[1] 1 2 1
▪ If vectors have different lengths, R will repeat
the shorter vector(s) to match the longer ▪ The function returns our desired vector (The minimum
values from each vector)
▪ pmax(c(1,2,3),c(3,2,1),c(2,2,2)) #[1] 3 2 3

Module 2 Module 5 PAGE 42


Module 1 Module 3 Module 4
Questions
▪ *

Module 2 Module 5 PAGE 43


Module 1 Module 3 Module 4
Assignment
▪ *

Module 2 Module 5 PAGE 44


Module 1 Module 3 Module 4
Lists and Data Frames: Data Frames, Lists,
Special Values, The apply family

Text Book 1: Chapter 6- 6.2 to 6.4 Module 3

PAGE 45
Dataframes
▪ In Vector data structure in R, all components must be of the same mode – numeric, character or logical
vectors
▪ Real datasets require grouping of data of differing modes.
▪ Matrices cannot contain heterogenous data – data of different modes
▪ Lists and Dataframes are able to store much more complicated data structures
▪ Dataframe is a list that is tailor-made to meet the practical needs of representing multivariate datasets
▪ It is a list of vectors restricted to be of equal lengths
▪ Each vector or column corresponds to a variable in an experiment and each row corresponds to a single
observation.
▪ Each vector can be of any of the basic modes of object

Module 3 PAGE 46
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ If a header is present, it is used to name the
▪ Large Dataframes are usually read into R from a file. columns of the dataframe.
▪ read.table(file,header=FALSE,sep=" ") ▪ The column names can be assigned after reading
▪ It returns a dataframe the file using “names” function or when reading it
▪ file - the name of the file to be read – relative to in using the col.names argument which should be
current working directory, absolute or URL.
▪ header – indicates if the first line of the file is a line assigned a character vector, whose length is same
of text giving the variable names or not. as that of the number of columns.
▪ sep – gives the character used to separate the values
in each row. Default is variable amount of white ▪ If there is no col.names argument and no header,
space given by sep=" ". then R uses the names “V1”, “V2”, etc.
▪ ?read.table can be used for more details
▪ File
▪ Commonly used Variants:
▪ read.csv(file) – Comma Separated data
▪ read.delim(file) – tab-delimited data
▪ Equivalents
▪ read.table(file,header=TRUE,sep=",")
▪ read.table(file,header=TRUE,sep=“\t")

Module 3 PAGE 47
Module 1 Module 2 Module 4 Module 5
Dataframes
Sample Dataset ufc.csv
▪ "plot","tree","species","dbh.cm","height.m"
▪ 2,1,"DF",39,20.5
▪ 2,2,"WL",48,33
▪ 3,2,"GF",52,30
▪ 3,5,"WC",36,20.7
▪ 3,8,"WC",38,22.5
▪ ufc <-
read.csv("C:/Users/Praahas/OneDrive/Documents/Desktop/ufc.csv")
▪ Ufc
▪ To examine the dataset head(ufc) and tail(ufc) can be used

Module 3 PAGE 48
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ To select more than one of the variables in a
▪ Each column or variable in a dataframe has a dataframe we [ ] notaion. We can also use names.
unique name and can be extracted using ▪ ufc[4:5] is same as ufc[c("dbh.cm ", "height.m")]
dataframe name, column name and a dollar sign ▪ diam.height<- ufc[4:5] #"dbh.cm “ and "height.m“
▪ x <- ufc$height.m columns will be stored in diam.height
▪ x[1:5] #[1] 20.5 33.0 30.0 20.7 22.5 ▪ diam.height[1:4,] #Will display rows from 1 to 5
▪ Note: Indexing starts from 1
▪ We can use [ [ ] ] notation to extract columns.
▪ ufc$height.m ,ufc[[5]] and ufc[[“height.m"]] are all
equivalent
▪ Elements of the dataframe can be extracted
directly using Matrix indexing ufc[1:5, 5]
▪ #[1] 20.5 33.0 30.0 20.7 22.5
▪ Check if an object is a dataframe –
is.data.frame(diam.height) #[1] TRUE

Module 3 PAGE 49
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Variable can be extracted one at a time using [ [ ] ]
▪ Result of selecting columns using [ ] is another ▪ Selecting a column using [ [ ] ] preserves the mode of
dataframe. This can sometimes cause confusion
when you select only one variable the object being that is being extracted
▪ Using [ ] preserves the mode of the object from
which the extraction is being made.
▪ mode(ufc)
▪ [1] "list"
▪ x<-ufc[5] ▪ mode(ufc[5])
▪ height.m
▪ 1 20.5 ▪ [1] "list"
▪ 2 33.0 ▪ mode(ufc[[5]])
▪ 3 30.0
▪ 4 20.7 ▪ [1] "numeric"
▪ 5 22.5
▪ x[1:5] #Error in `[.data.frame`(x, 1:5) : undefined
columns selected

Module 3 PAGE 50
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ In ufc example, lets add a new variable to the dataset –
▪ Dataframes can bne constructed from a Volume
collection of vectors and/or existing dataframes ▪ ufc$volume.m3<-pi*(ufc$dbh.cm/200)^2 * ufc$height.m/2
using data.frame ▪ mean(ufc$volume.m3) #[1] 1.93294
▪ data.frame(col1=x1,col2=x2…….,df1,df2,…..) ▪ Equivalently we could assign to
▪ col1, col2,…are column names given as character ▪ ufc[6], ufc["volume.m3"], ufc[[6]] or ufc[["volume.m3"]]
strings without quotes.
▪ x1,x2… are vectors of equal length ▪ For better readability you can use
▪ ufc$volume.m3 <- with(ufc,pi*(dbh.cm/200)^2*height.m/2)
▪ df1, df2,…. Are dataframes whose column length
must be same as vectors x1, x2
▪ Column names maybe omitted in which case R
will choose a name
▪ A new variable can also be created within a
dataframe by naming it and assigning a value

Module 3 PAGE 51
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(df) will return number of rows and columns of a
▪ names(df) returns the names of the dataframe df as a vector of dataframe
character string
▪ #[1] "plot" "tree" "species" "dbh.cm" ▪ dim(df) <- c(x,y) wil however generate errors
▪ [5] "height.m" "volume.m3" ▪ It is not an attribute of a dataframe, it has been
▪ names(ufc) <-c("P","T","S","D","H","V") extended to dataframes only for convenience
▪ #[1] "P" "T" "S" "D" "H" "V“
▪ names can be used to set or get the object’s names
▪ names is an attribute technically
▪ We must have exactly one name for each column and they
must all be different
▪ dim(dimsnion) of a matrix is another example for attribute
▪ As long as the total number of elements remain the same
we can change the shape of a matrix by changing the dim
attribute
▪ R will reassign values from the old matrix to the new one
column by column
▪ If you delete a column, the remaining columns names are
unchanged

Module 3 PAGE 52
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(For example if we are interested only in DF and GF tree
▪ Dataframe also has row-names. By default they are heights in ufc dataset:
named "1", "2", "3", etc. When dataframe is ▪ fir.height<-subset(ufc, subset=species %in% c("DF","GF"),
created select=c(plot,tree,height.m))
▪ Both read.table and data.frame take optional
argument row.names, where row names can be
specified
▪ row.names(df) will ;return row names of a df as a ▪ For vectors x & y of the same mode, the expression in
character vector x%in%y returns a logical vector the same length as x whose i-
th element is TRUE if and only if x[i] is an element of y
▪ row.names is an attribute of a dataframe and
therefore row names can eb set by making ▪ %in% operator is performing many-to-many matching
assignment to rown.names(df) ▪ Subset argument accepts a logical vector anhd determines
which rows are selected
▪ If you delete a row, the remaining row names are ▪ Note that the vector is of columns, not column names.
unchanged ▪ Note that expressions assigning values to subset and select
▪ subset function is useful for selecting rows of a can directly use the columns of the target dataframe which is
given as the first argument
dataframe especially combined with %in%

Module 3 PAGE 53
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Complete rows (without missing values) can be
▪ To write a dataframe to a file: identified from a 2-Dimensional such as a dataframe
▪ write.table(x, file= " ", append=FALSE, sep = " ", using complete.cases command
row.names=TRUE, col.names=TRUE ) ▪ Rows with missing values can be removed using na.omit
▪ For complete list of argument use ?write.table function.
▪ x – dataframe to be written
▪ file – name and address of the file to write to. File
is created if it doesn’t exist. By default I twrites to
screen
▪ append – Indicates whether to append to file or
overwrite
▪ sep – Indicates character used to separate values
within a row. Rows are separated by new lines
▪ row.names – indicates whether or not to include
the existing row names as the first column, or a
character vector of column names

Module 3 PAGE 54
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ attach(ufc)
▪ R allows to attach a dataframe to the workspace.
When attached the variables in the dataframe ▪ max(height.m[species=="GF"]) #[1] 47
can be referred to without being prefixed by the ▪ height.m <-0 #Changing value in attached df variable
dataframe name.
▪ attach(ufc) ▪ max(height.m) # [1] 0
▪ max(height.m[species=="GF"]) #[1] 47 ▪ max(ufc$height.m) #[1] 47
▪ To detach use detach(df)
▪ detach(ufc)
▪ When a dataframe is attached, R makes a copy of
each variable which is deleted when the datafram ▪ max(ufc$height.m) #[1] 47
is detached. So changing an attached dataframe
does not change the dataframe
▪ It is preferrbly avoided as it can be a source of
potential errors
▪ with and transform function provide a safer
alternative

Module 3 PAGE 55
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list("one", TRUE,3,c("f","o","u","r"))
▪ A generic container for other objects. ▪ my.list[[2] ] #[1] TRUE
▪ Like a vector, a list is an indexed set of
objects(and has length) ▪ mode(my.list[[2]]) # "logical“
▪ But unlike vector, elements of a list can be of ▪ my.list[[4]] #[1] "f" "o" "u" "r“
different types, including other lists
▪ Mode of a list is list ▪ my.list[[4]][1] #[1] "f"
▪ It might contain an individual measurement, a ▪ R uses double square brackets [ [1] ] to indicate
vector of observations on a single response List Elements then single square brackets [1] to
variable, a dataframe or a list of dataframes indicate vector elements within the list
containing the results of several experiments
▪ A List is created using list(…) command with
comma-separated arguments.
▪ Single square brackets are used to select a sub-
list
▪ Double square brackets are used to extract a
single element

Module 3 PAGE 56
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list(first="one",second=TRUE,third=3,fourth=c("f","o","u","r"))
▪ > my.list
▪ Elements of a list can be named when the list ▪ $first
is created using arguments of the form ▪ [1] "one"
▪ $second
name1=x1,name2=x2, etc. ▪ [1] TRUE
▪ Elements of a list can be named later by ▪ $third
▪ [1] 3
assigning a value to the names attribute ▪ $fourth
▪ Unlike dataframe, the elements of a list do ▪ [1] "f" "o" "u" "r“

not have to be named ▪ names(my.list) # [1] "first" "second" "third" "fourth“


▪ my.list$second # [1] TRUE
▪ Names can be used (within quotes) when
▪ names(my.list)<- c("First Element","Second Element","Third
indexing with single or double square Element","Fourth Element")
brackets. ▪ Changes element names

▪ Or they can also be used (with or without ▪ my.list$'Second Element’ # [1] TRUE
quotes) after a dollar sign to extract a list ▪ x<-'Second Element’
element ▪ my.list[[x]] # [1] TRUE

Module 3 PAGE 57
Module 1 Module 2 Module 4 Module 5
Lists
▪ To Flatten a list x, i.e convert it into a vector,
we use unlist(x)
▪ x<-list(1,c(2,3),c(4,5,6))
▪ unlist(x) # [1] 1 2 3 4 5 6
▪ If the list object itself comprises of lists, then
these lists are also flattened, unless the
argument recursive = FALSE is set

Module 3 PAGE 58
Module 1 Module 2 Module 4 Module 5
Lists
▪ Linear Regression:
▪ lm.xy<-lm(y ~ x,data=data.frame(x=1:5,y=1:5))
▪ mode(lm.xy) #[1]”list”
▪ names(lm.xy)

Module 3 PAGE 59
Module 1 Module 2 Module 4 Module 5
The apply family
▪ R has functions that allow you to easily apply a function to all or selected elements of a list or dataframe
▪ apply() - takes Data frame or matrix as an input and gives output in vector, list or array. Apply function in R is
primarily used to avoid explicit uses of loop constructs. It is the most basic of all collections can be used over a
matrice.
▪ lapply()
▪ sapply()
▪ tapply()

Module 3 PAGE 60
Module 1 Module 2 Module 4 Module 5
The apply family
▪ apply() - apply a function to the rows or columns of a
matrix or data frame. This function takes matrix or data
frame as an argument along with function and whether it
has to be applied by row or column and return
▪ apply(X,MARGIN,FUN)
▪ If margin is 1 FUN is applied across row
▪ If margin is 2 FUN is applied across the column

▪ # create sample
▪ datasample_matrix <- matrix(C<-(1:10),nrow=3, ncol=10)
▪ print( "sample matrix:")
▪ sample_matrix
▪ # Use apply() function across row to find sum
▪ print("sum across rows:")
▪ apply( sample_matrix, 1, sum)
▪ # use apply() function across column to find mean
▪ print("mean across columns:")apply( sample_matrix, 2, mean)

Module 3 PAGE 61
Module 1 Module 2 Module 4 Module 5
The apply family
▪ lapply() - apply functions on list objects and returns a list
object of the same length. It takes a list, vector, or data
frame as input and gives output in the form of a list
object.It applies a certain operation to all the elements of
the list it doesn’t need a MARGIN.
▪ lapply(X,FUN)

▪ # create sample data


▪ names <- c("tony stark", "steve rogers","stephen strange",
"peter parker","natasha romanoff")
▪ print( "original data:")
▪ names
▪ # apply lapply() function
▪ print("data after lapply():")
▪ lapply(names, toupper)

Module 3 PAGE 62
Module 1 Module 2 Module 4 Module 5
The apply family
▪ sapply() – apply functions on a list, vector, or data
frame and returns an array or matrix object of the
same length. It takes a list, vector, or data frame as
input and gives output in the form of an array or
matrix object. Since the sapply() function applies a
certain operation to all the elements of the object it
doesn’t need a MARGIN.
▪ It is the same as lapply() with the only difference being
the type of return object.
▪ sapply(X,FUN)

▪ # create sample data


▪ sample_data<- data.frame( x=c(1,2,3,4,5,6),y=c(3,2,4,2,34,5))
▪ print( "original data:")
▪ sample_data
▪ # apply sapply() function
▪ print("data after sapply():")
▪ sapply(sample_data, max)

Module 3 PAGE 63
Module 1 Module 2 Module 4 Module 5
The apply family
▪ tapply() – Vectorise the application of a function to subsets of
data. It is useful for applying a function operation for each
factor variable in a vector. It helps to create a subset of a
vector and then apply some functions to each of the subsets
▪ tapply(X, INDEX, FUN, …)
▪ X – Target Vector to which function will be applied
▪ INDEX – It is a factor, which is used to group the elements of X. It
will be coerced to a factor if it is not one already. It has same
length as X
▪ FUN – Function to be applied. It is applied to subvectors of X
corresponding to a single level of Index

▪ #install.packages("tidyverse")
▪ # load library tidyverse
▪ library(tidyverse)
▪ # print head of diamonds dataset
▪ print(" Head of data:")
▪ head(diamonds)
▪ # apply tapply function to get average price by cut
▪ print("Average price for each cut of diamond:")
▪ tapply(diamonds$price, diamonds$cut, mean)
Module 3 PAGE 64
Module 1 Module 2 Module 4 Module 5
The apply family
▪ mapply() – This function stands for multivariate apply
and is used to perform mathematical operations on
multiple lists simultaneously.
▪ mapply(FUN,LIST1, LIST2 …)
▪ LIST1, LIST2… – Created Lists
▪ FUN – Function to be applied on the lists.

▪ # Creating a list
▪ A = list(c(1, 2, 3, 4))
▪ # Creating another list
▪ B = list(c(2, 5, 1, 6))
▪ # Applying mapply()
▪ result = mapply(sum, A, B)
▪ print(result) #[1] 24
Module 3 PAGE 65
Module 1 Module 2 Module 4 Module 5
Questions
▪ *

Module 3 PAGE 66
Module 1 Module 2 Module 4 Module 5
Assignment
▪ *

Module 3 PAGE 67
Module 1 Module 2 Module 4 Module 5
Functions: Calling functions, scoping,
Arguments matching, writing functions: The
function command, Arguments, specialized
function. Module 4
Text Book 1: Chapter 5- 5.1 to 5.6

PAGE 68
Functions
▪ The value of x1, x2 etc are copied to arg_1 ,arg_2 etc.
the arguments then act as variables within the function
▪ Building Blocks for large programs and essential for structuring
complex algorithms. ▪ Function next evaluates the grouped expressions
contained within the braces { }
▪ Once loaded it can be reused without having to reload it.
▪ The value of the expression output is returned as the
▪ Break down a program into smaller logical units which does a value of the function
simple well defined task
▪ A function may have more than 1 return statement, in
▪ A Function’s general form: which case it stops after executing the first one it reaches.
▪ name <- function(arg_1, arg_2, …) { ▪ If there is no return statement, then the value returned
exp_1 by the function is the value of the last expression in the
exp_2 braces – A function ALWAYS returns a value in R.
<some other exp> ▪ NULL may be returned by the function
return(output)
} ▪ Some functions have no arguments
▪ arg_1, arg_2 etc are names of variables ▪ Braces are necessary only if the function comprises more
▪ exp_1, exp_2 and output are all regular R expressions than 1 expression
▪ name is the name of the function ▪ When a function is called, if the returned value is not
▪ Function call is made using name(x1,x2) assigned to a variable then it is printed.
▪ The value of this expression is the value of the expression output. ▪ Expression invisible(x) will return the same value as x, but
the value is not printed.

Module 4 PAGE 69
Module 1 Module 2 Module 3 Module 5
Functions
▪ quad<-function(a0,a1,a2){
▪ #Find the zeros of a2*x^2+a1*x+a0=0
▪ Roots of a quadratic Equation ▪ if (a2==0 && a1==0 & a0==0){
roots<-NA
▪ #Main }else if(a2==0 && a1==0){
roots<-NULL
▪ rm(list=ls()) }else if(a2==0){
roots<--a0/a1
▪ source("C:/Users/Praahas/Projects/R/quad.r") }else {
#calculate the discriminant
▪ quad(1,0,-1) discrim <- a1^2 - 4*a2*a0
#calculate the roots depending on the value of the discriminant
▪ quad(1,-2,1) if (discrim>0){
roots<- (-a1 +c(1,-1)*sqrt(a1^2 - 4*a2*a0))/(2*a2)
▪ quad(1,1,1) } else if (discrim == 0){
roots<- -a1/(2*a2)
}else{
roots<-NULL
}
}
return(roots)
}

Module 4 PAGE 70
Module 1 Module 2 Module 3 Module 5
Functions
▪ n_factorial<-function(n){
𝑛!
▪ nCr = ▪ #Calculate n Factorial
𝑟! 𝑛−𝑟 !
▪ n_fact<-prod(1:n)
▪ #Main
▪ return(n_fact)
▪ rm(list=ls())
▪ }
▪ source("C:/Users/Praahas/Projects/R/ncr.r")
▪ ncr(4,2) #[1] 6
▪ ncr<-function(n,r){
▪ ncr(6,4) #[1] 15
▪ #Calculate ncr
▪ n_ch_r<-n_factorial(n)/n_factorial(r)/n_factorial(n-r)
▪ Return(n_ch_r)
▪ }

Module 4 PAGE 71
Module 1 Module 2 Module 3 Module 5
Functions
▪ wmean <- function(x,k){
▪ Discard K Smallest and K largest values and then calculate Mean- Eliminates outliers compared to
untrimmed mean
▪ x<-sort(x)
▪ Winsorised Mean – instead of discarding k-th largest and k-th smallest values, we replace them by ▪ n<-length(x)
𝑥(𝑛−𝑘) and 𝑥(𝑘+1) respectively
▪ x[1:k]<-x[k+1]
▪ This can be used when a sample may contain occasional extraordinary values

▪ #Main ▪ x[(n-k+1):n]<-x[n-k]

▪ rm(list=ls()) ▪ return(mean(x))

▪ source("C:/Users/Praahas/Projects/R/wmean.r")
▪ x<-c(8.244,51.421,39.020,90.574,44.697,83.600,73.760,81.106,38.811,68.517)
▪ mean(x)
▪ wmean(x,2)

Module 4 PAGE 72
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ When a function is executed, the computer sets aside space for the ▪ #swap values of x[1] and x[2]
function variables, makes a copy of the function code and then y<-x[2]
transfers control to the function
▪ x[2]<-x[1]
▪ When the function finishes executing, the output is passed to the main
program and the copy of the function variables and code is deleted ▪ x[1]<-y

▪ Function to swap numbers ▪ return(x)

▪ x<-c(7,8,9) ▪ }

▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7

Module 4 PAGE 73
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ #swap values of x[1] and x[2]
▪ y<-x[2]
▪ x[2]<-x[1]
▪ x[1]<-y
▪ return(x)
▪ }

▪ #Main
▪ x<-c(7,8,9)
▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7

Module 4 PAGE 74
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test<-function(x){
▪ Arguments and variables that are defined within a function exist y<-x+1
only within that function
return(y)
▪ If variables with same name exist inside and outside a function,
then they are separate and do not interact at all }
▪ #main
▪ If we execute command rm(list=ls()) inside a function then, you
only delete those objectsthat are defined inside the function ▪ test(1) #[1] 2
▪ The part of a program in which a variable is defined is called its ▪ y # Error: object 'y' not found
scope ▪ y<-10
▪ Restricting the scope of variabels ensures that a function call will
▪ test(1) #[1] 2
not modify a variable outside the function, escept by assigneing
the returned value. ▪ y #[1] 10

Module 4 PAGE 75
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test2<-function(x){
▪ Scope of a variable is not symmetric y<-x+z
▪ Variables defined insode a function cannot be seen outside, but return(y)
variables defined outside the function can be seen inside the
function, provided there is no varaibel with the same name defiend }
insided the function. ▪ z<-1
▪ test2(1) #[1] 2
▪ z<-2
▪ test2(1) #[1] 3

Module 4 PAGE 76
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test3<-function(x=1){
▪ Arguments used in a function are named when return(x)
the function is created }
▪ test3(2) #[1] 2
▪ Some arguments may be assigned default values,
which are used in case tehj argument is not ▪ test3() #[1] 1
provided in the function call.
▪ Sometimes arguments have to be defined so that ▪ funk<-function(words=c(“Apple", "Bat", “Cat", "Dog")){
they can only take a small number of different ▪ words<-match.arg(words)
values and the function will stop informatively if ▪ return(words)
an inappropriate value is passed. ▪ }
▪ This can be done with if statement, but R
provides a method for this. – Include a vector of ▪ funk() #[1] “Apple“
permissible values for any such argument and ▪ funk(“Bat") #[1] “Bat"
check them using match.arg function
▪ funk("Dum") # Error in match.arg(words) from( #2)
▪ ‘arg’ should be one of “Apple", "Bat", “Cat", "Dog"
Module 4 PAGE 77
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test4<-function(x, ...){
▪ R provides a means for passing arguments ▪ return(sd(x,...))
unaltered from the function that is being called
to the functions that are called within it. ▪ }

▪ These arguments do not have to be named ▪ test4(1:3) #[1] 1


explicitly in the outer function ▪ test4(c(1:2,NA)# [1] NA
▪ Three dots (…) – an ellipsis act as a placeholder ▪ test4(c(1:2,NA),na.rm=TRUE) # [1] 0.7071068
for any extra arguments ▪ test4(c(1:2,NA),TRUE) # [1] 0.7071068
▪ R assigns arguments to variables from the left,
unless an argument is named
▪ Naming an argument in the function call is good
practice for better readability

Module 4 PAGE 78
Module 1 Module 2 Module 3 Module 5
Arguments
▪ *Note
▪ R provides a means for partial matching of ▪ seq.int(0, 1, len = 11)
arguments, where doing so is not ambiguous ▪ seq.int(0, 1, length.out = 11)
▪ Argument names in the function call do not have
to be complete ▪ ls(all = TRUE)
▪ This can make the code more fragile and ▪ ls(all.names = TRUE)
therefore and this style is therefore not ▪ Partial matching exists to save you typing long argument
encouraged names.
▪ The danger with it is that functions may gain additional
arguments later on which conflict with your partial
▪ test6<-function(a=1, b.c.d=1){ match.
return (a+b.c.d) ▪ This means that it is only suitable for interactive use – if
} you are writing code that will stick around for a long
time (to go in a package, for example) then you should
▪ test6() #[1] 2 always write the full argument name.
▪ test6(b=5) #[1] 6 ▪ The other problem is that by abbreviating an argument
name, you can make your code less readable.

Module 4 PAGE 79
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Many R Functions are vectorized – for a given vector ▪ Example: sapply(X,FUN)
input, the function acts on each element separately and
returns a vector output. ▪ The use of the above expression is to apply
the function FUN to every element of
▪ This enables R to have compact efficient and readable vector X.
code
▪ X can be a list or an atomic vector (vector
▪ Applying function to a vector is much faster than that contains atomic objects like logical,
iteratively looping and applying the function on each integer, numeric, complex character and
element raw)
▪ apply, sapply,lapply,tapply,mapply ▪ sapply(X,FUN) returns a vector whose i-th
element is the value of the expression
FUN(X[i])

Module 4 PAGE 80
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Example for sapply() – Density of Primes
▪ Write a function prime that tests if a given integer is prime
or not
▪ Use sapply() to apply the prime checker function to the
vector 2:n so that we know all primes less than or equal to
n
▪ ρ(n) -> number of primes less than or equal to n
ρ(n) log(𝑛)
▪ Legendre and Gauss’ Assertion -> lim -> 1
𝑛→∞ 𝑛
▪ Result proved by Hadamard and de la Vallee Poussin
▪ Cumulative Sum Function of a vector X-> cumsum(X)

Module 4 PAGE 81
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ rm(list=ls())
▪ prime<-function(n){
▪ n<-1000
if(n==1){ ▪ m.vec<-2:n
is.prime<-FALSE ▪ primes<-sapply(m.vec,prime)
}else if(n==2){
▪ num.primes<cumsum(primes)
is.prime<-TRUE
}else{ ▪ #print(num.primes)
is.prime<-TRUE ▪ par(mfrow = c(1,2),las=1)
for(m in 2:(n/2)){
▪ plot(m.vec, num.primes/m.vec,type="l",main ="prime
if(n%%m==0) is.prime<-FALSE density",xlab="n",ylab="")
}
}
▪ lines(m.vec,1/log(m.vec),col="red")
return(is.prime) ▪ plot(m.vec, num.primes/m.vec*log(m.vec),type="l",main
} ="prime density * log(n)",xlab="n",ylab="")

Module 4 PAGE 82
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions

Module 4 PAGE 83
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Optimised Code for Prime Density
▪ Check for factors upto 𝑛, since n=ab
▪ prime<-function(n){
if(n==1){ ▪ Atleast one of a and b is less than or equal to 𝑛
is.prime<-FALSE ▪ Once we find one factor we don’t need to keep checking
}else if(n==2){
is.prime<-TRUE
}else{
is.prime<-TRUE
m<-2
m.max<-sqrt(n)
while(is.prime && m<=m.max){
if(n%%m ==0) is.prime<-FALSE
m<-m+1
}
}
return(is.prime)
}
PAGE 84
Recursive Programming
▪ When a function is called , a new copy of the
▪ A programming technique made possible by functions, function is created with a new set of function
where a function calls itself. variables in a new environment
▪ Example n factorial -> n! = n*((n-1)!)
▪ Therefore elegant but not efficient

▪ nfact2<-function(n){
if(n==1){
cat("Called nfact2(1)\n")
return(1)
}else{
cat("called nfact2(",n,")\n",sep="")
return(n*nfact2(n-1))
}
}
nfact2(6)

Module 4 PAGE 85
Module 1 Module 2 Module 3 Module 5
Recursive Programming ▪ primesieve<- function(sieved,unsieved){
p<-unsieved[1]
▪ Example Sieve of Eratosthenes – Finding all of n<-unsieved[length(unsieved)]
the primes less than or equal to a given number
n if(p^2 >n){
1. Start with a list 2,3,….n and largest known return(c(sieved, unsieved))
prime p=2 }else{
2. Remove from the list all elements that are unsieved<-unsieved[unsieved%%p!=0]
multiples of p (but keep p itself) sieved<-c(sieved,p)
return(primesieve(sieved,unsieved)) }
3. Increase p to the smallest element of the
remaining list that is larger than the current p. }
4. If p is larger than 𝑛 then stop, otherwise go primesieve(c(),2:200)
back to step 2

Module 4 PAGE 86
Module 1 Module 2 Module 3 Module 5
Sieve of
Eratosthenes

276 B.C.
Module 4 PAGE 87
Module 1 Module 2 Module 3 Module 5
Debugging Functions
▪ Unexpected inputs can lead to undesirable ▪ In Browser environment, R Commands can be
consequences and the user may not know why entered normally and evaluated normally, but some
commands have specific new interpretations.
▪ Functions can work, but may return plausible
nonsense. ▪ n – evaluates the current step and prints the next
step to eb evaluated. Return Key has same effect
▪ Perform simple checks of the input to ensure it
conforms to expectations ▪ c – continues evaluation from the next expression
to the end of te hcurrent set of expressions,
▪ stop(“Your message here.”) function is useful for whether that be the end of the current loop or the
this. It ceases processing and prints message to end of the function – same as cont. c stops the
user. browser and continues evaluation starting at the
next statement. Return Key and cont has same
▪ browser() function is useful to invoke inside your erffect
own functions . – Temporarily stop the program
and allows inspection of objects ▪ Q – stops evaluation and exists browser returning
the user to the top-level prompt.
▪ You can step through the code executing one
instruction at a time.

Module 4 PAGE 88
Module 1 Module 2 Module 3 Module 5

You might also like