R - Programming - Moduel 1 - Module 4
R - Programming - Moduel 1 - Module 4
PAGE 2
Course Objectives
▪ Explore and understand R and R Studio interactive
environment. Course
▪ To learn and practice programming techniques using R
programming. Objectives
▪ Read Structured Data into R from various sources.
▪ Understand the different data Structures, data types in R.
▪ To develop small applications using R Programming
PAGE 3
CO Outcomes RBT Level
Understand the fundamental syntax of R data types,
CO1 L2
expressions and the usage of the R-Studio IDE
Apply critical programming language concepts of
CO2 control structures in R for conditional branching and L3
looping Course
Apply the List and Data Frame data structures of R
CO3 programming language and import data into R
programs
L3 Outcomes
Utilize the functions in R-Programs and understand
CO4 L3
their scope in R language.
Use advanced R concepts of debugging and object
CO5 L3
oriented concepts
PAGE 4
CO-PO-PSO Mapping
Program
Course
Program Outcomes Specific
Outcomes
Outcomes
CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 2 2 3 3 1 1 1 2
CO2 2 2 3 3 1 1 1 2
CO3 2 2 3 3 1 1 1 2
CO4 2 2 3 3 1 1 1 2
CO5 2 2 3 3 1 1 1 2
PAGE 5
1. Jones, O., Maillardet. R. and Robinson, A. (2014).
Introduction to Scientific Programming and
Simulation Using R. Chapman & Hall/CRC, The R
Series. Text Books
2. Michael J. Crawley, “Statistics: An Introduction using &
R” Second edition, Wiley,2015
References
3. Wickham, H. & Grolemund, G. (2018). for Data
Science. O’Reilly New York. Available for free at
http://r4ds.had.co.nz/
PAGE 6
Assessment Details
Teaching Hours/Week(L:T:P:S) 0:2:0:0
Total Hours 24
Credits 01
CIE (1 hour) 3
Assignments 2
Quiz/GD/Seminar (1 Hour) 1
SEE (1 Hour) 1
CIE Test Marks 20 Marks
Assignment Marks
Quiz/GD/Seminar Marks
10 Marks
20 Marks
Assessment
CIE Marks 50
SEE Marks 50
Total Marks 100
CIE Type MCQ
SEE Type MCQ
Min Passing Marks CIE 40% of Max (i.e 20/50)
Min Passing Marks SEE 35% of Max(18/50)
Total Min Passing Marks 40% of Total Max (40/100) PAGE 7
Numeric, Arithmetic, Assignment, and Vectors:
R for Basic Math, Arithmetic, Variables,
Functions, Vectors, Expressions and
Assignments and Logical expressions. Module 1
Text Book 1: Chapter 2(2.1 to 2.7)
PAGE 8
Variables
▪ A placeholder to hold a value (like a folder) Note:
▪ Can place a value in it, operate on it or modify it bu
the name of the placeholder remains the same. ▪ To display the value of the variable we can
use print(x) or show(x)
▪ Assigning Values to variables
▪ x <- 2.5 ▪ To get the datatype of the variable we can
▪ x = 2.5 use typeof(x)
▪ Variables are created when the values are assigned ▪ We can show outcome of na assignment by
top them. surrounding with parenthesis
▪ Naming of Variables ▪ X<-200
▪ Any name made up of letters, numbers and . Or_ ▪ (y<-(1+1/x)^x)
▪ Name should start with letter or . then a letter.
▪ Names are case-sensitive
▪ Use informative names for readability
▪ When assigning values to a variable, the
expression on the RHS is evaluated first and then
the value is placed in the variable on the LHS
▪ v <- c(3,0,TRUE,2+2i)
▪ print(!v) # [1] FALSE TRUE FALSE FALSE
PAGE 24
Matrices
▪ (A <- matrix(1:6, nrow=2, ncol=3,byrow=TRUE)
▪ Matrix – It is created from a vector using the [,1] [,2] [,3]
matrix function [1,] 1 2 3
▪ matrix(data,nrow=1,ncol=1,byrow=FALSE)
[2,] 4 5 6
▪ data is a vector of length at most nrow*ncol
▪ dim(A) returns the dimensions of a matrix
▪ nrow – No. of Rows (default value 1)
▪ dim(A) #[1] [1] 2 3
▪ ncol – No. of Columns (default value 1)
▪ byrow used to define whether to fill the ▪ Creating Diagonal Matrix
matrix by elements of data, row-by-row or ▪ Use diag(x)
column-by-column. ▪ Joining Matrices with rows of same length (
▪ byrow defaults to FALSE Stacking Vertically)
▪ If length(data) is less than nrow*ncol, then ▪ Use rbind(…)
data is re-used as many times as needed. ▪ Joining Matrices with columns of the same length
( Stacking Horizontally)
▪ Use cbind(…)
number of rows and column. Matrix elements ▪ Now t(x) treats x as a column vector by default and produces an
array with the fixed dimension attributes of a row vector
are stored clumnwise in the vector. ▪ [,1] [,2]
▪ [1,] 1 2
▪ Here refrencing using single index is : A[1] = 1 ,
A[2] = 4, A[3] = 7, A[4] = 2 , A[5] = 5, A[6] = 8, ▪ A%*%t(x) #Error in A %*% t(x) : non-conformable arguments
A[7] =3 , A[8]=6, A[9] =9 ▪ To check if na objectis a matrix or a vector you can use is.matrix(x)
and is.vector(x)
▪ Matheematically speaking they’re equivalent but they’re treated
as different objects in R.
PAGE 45
Dataframes
▪ In Vector data structure in R, all components must be of the same mode – numeric, character or logical
vectors
▪ Real datasets require grouping of data of differing modes.
▪ Matrices cannot contain heterogenous data – data of different modes
▪ Lists and Dataframes are able to store much more complicated data structures
▪ Dataframe is a list that is tailor-made to meet the practical needs of representing multivariate datasets
▪ It is a list of vectors restricted to be of equal lengths
▪ Each vector or column corresponds to a variable in an experiment and each row corresponds to a single
observation.
▪ Each vector can be of any of the basic modes of object
Module 3 PAGE 46
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ If a header is present, it is used to name the
▪ Large Dataframes are usually read into R from a file. columns of the dataframe.
▪ read.table(file,header=FALSE,sep=" ") ▪ The column names can be assigned after reading
▪ It returns a dataframe the file using “names” function or when reading it
▪ file - the name of the file to be read – relative to in using the col.names argument which should be
current working directory, absolute or URL.
▪ header – indicates if the first line of the file is a line assigned a character vector, whose length is same
of text giving the variable names or not. as that of the number of columns.
▪ sep – gives the character used to separate the values
in each row. Default is variable amount of white ▪ If there is no col.names argument and no header,
space given by sep=" ". then R uses the names “V1”, “V2”, etc.
▪ ?read.table can be used for more details
▪ File
▪ Commonly used Variants:
▪ read.csv(file) – Comma Separated data
▪ read.delim(file) – tab-delimited data
▪ Equivalents
▪ read.table(file,header=TRUE,sep=",")
▪ read.table(file,header=TRUE,sep=“\t")
Module 3 PAGE 47
Module 1 Module 2 Module 4 Module 5
Dataframes
Sample Dataset ufc.csv
▪ "plot","tree","species","dbh.cm","height.m"
▪ 2,1,"DF",39,20.5
▪ 2,2,"WL",48,33
▪ 3,2,"GF",52,30
▪ 3,5,"WC",36,20.7
▪ 3,8,"WC",38,22.5
▪ ufc <-
read.csv("C:/Users/Praahas/OneDrive/Documents/Desktop/ufc.csv")
▪ Ufc
▪ To examine the dataset head(ufc) and tail(ufc) can be used
Module 3 PAGE 48
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ To select more than one of the variables in a
▪ Each column or variable in a dataframe has a dataframe we [ ] notaion. We can also use names.
unique name and can be extracted using ▪ ufc[4:5] is same as ufc[c("dbh.cm ", "height.m")]
dataframe name, column name and a dollar sign ▪ diam.height<- ufc[4:5] #"dbh.cm “ and "height.m“
▪ x <- ufc$height.m columns will be stored in diam.height
▪ x[1:5] #[1] 20.5 33.0 30.0 20.7 22.5 ▪ diam.height[1:4,] #Will display rows from 1 to 5
▪ Note: Indexing starts from 1
▪ We can use [ [ ] ] notation to extract columns.
▪ ufc$height.m ,ufc[[5]] and ufc[[“height.m"]] are all
equivalent
▪ Elements of the dataframe can be extracted
directly using Matrix indexing ufc[1:5, 5]
▪ #[1] 20.5 33.0 30.0 20.7 22.5
▪ Check if an object is a dataframe –
is.data.frame(diam.height) #[1] TRUE
Module 3 PAGE 49
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Variable can be extracted one at a time using [ [ ] ]
▪ Result of selecting columns using [ ] is another ▪ Selecting a column using [ [ ] ] preserves the mode of
dataframe. This can sometimes cause confusion
when you select only one variable the object being that is being extracted
▪ Using [ ] preserves the mode of the object from
which the extraction is being made.
▪ mode(ufc)
▪ [1] "list"
▪ x<-ufc[5] ▪ mode(ufc[5])
▪ height.m
▪ 1 20.5 ▪ [1] "list"
▪ 2 33.0 ▪ mode(ufc[[5]])
▪ 3 30.0
▪ 4 20.7 ▪ [1] "numeric"
▪ 5 22.5
▪ x[1:5] #Error in `[.data.frame`(x, 1:5) : undefined
columns selected
Module 3 PAGE 50
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ In ufc example, lets add a new variable to the dataset –
▪ Dataframes can bne constructed from a Volume
collection of vectors and/or existing dataframes ▪ ufc$volume.m3<-pi*(ufc$dbh.cm/200)^2 * ufc$height.m/2
using data.frame ▪ mean(ufc$volume.m3) #[1] 1.93294
▪ data.frame(col1=x1,col2=x2…….,df1,df2,…..) ▪ Equivalently we could assign to
▪ col1, col2,…are column names given as character ▪ ufc[6], ufc["volume.m3"], ufc[[6]] or ufc[["volume.m3"]]
strings without quotes.
▪ x1,x2… are vectors of equal length ▪ For better readability you can use
▪ ufc$volume.m3 <- with(ufc,pi*(dbh.cm/200)^2*height.m/2)
▪ df1, df2,…. Are dataframes whose column length
must be same as vectors x1, x2
▪ Column names maybe omitted in which case R
will choose a name
▪ A new variable can also be created within a
dataframe by naming it and assigning a value
Module 3 PAGE 51
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(df) will return number of rows and columns of a
▪ names(df) returns the names of the dataframe df as a vector of dataframe
character string
▪ #[1] "plot" "tree" "species" "dbh.cm" ▪ dim(df) <- c(x,y) wil however generate errors
▪ [5] "height.m" "volume.m3" ▪ It is not an attribute of a dataframe, it has been
▪ names(ufc) <-c("P","T","S","D","H","V") extended to dataframes only for convenience
▪ #[1] "P" "T" "S" "D" "H" "V“
▪ names can be used to set or get the object’s names
▪ names is an attribute technically
▪ We must have exactly one name for each column and they
must all be different
▪ dim(dimsnion) of a matrix is another example for attribute
▪ As long as the total number of elements remain the same
we can change the shape of a matrix by changing the dim
attribute
▪ R will reassign values from the old matrix to the new one
column by column
▪ If you delete a column, the remaining columns names are
unchanged
Module 3 PAGE 52
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(For example if we are interested only in DF and GF tree
▪ Dataframe also has row-names. By default they are heights in ufc dataset:
named "1", "2", "3", etc. When dataframe is ▪ fir.height<-subset(ufc, subset=species %in% c("DF","GF"),
created select=c(plot,tree,height.m))
▪ Both read.table and data.frame take optional
argument row.names, where row names can be
specified
▪ row.names(df) will ;return row names of a df as a ▪ For vectors x & y of the same mode, the expression in
character vector x%in%y returns a logical vector the same length as x whose i-
th element is TRUE if and only if x[i] is an element of y
▪ row.names is an attribute of a dataframe and
therefore row names can eb set by making ▪ %in% operator is performing many-to-many matching
assignment to rown.names(df) ▪ Subset argument accepts a logical vector anhd determines
which rows are selected
▪ If you delete a row, the remaining row names are ▪ Note that the vector is of columns, not column names.
unchanged ▪ Note that expressions assigning values to subset and select
▪ subset function is useful for selecting rows of a can directly use the columns of the target dataframe which is
given as the first argument
dataframe especially combined with %in%
Module 3 PAGE 53
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Complete rows (without missing values) can be
▪ To write a dataframe to a file: identified from a 2-Dimensional such as a dataframe
▪ write.table(x, file= " ", append=FALSE, sep = " ", using complete.cases command
row.names=TRUE, col.names=TRUE ) ▪ Rows with missing values can be removed using na.omit
▪ For complete list of argument use ?write.table function.
▪ x – dataframe to be written
▪ file – name and address of the file to write to. File
is created if it doesn’t exist. By default I twrites to
screen
▪ append – Indicates whether to append to file or
overwrite
▪ sep – Indicates character used to separate values
within a row. Rows are separated by new lines
▪ row.names – indicates whether or not to include
the existing row names as the first column, or a
character vector of column names
Module 3 PAGE 54
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ attach(ufc)
▪ R allows to attach a dataframe to the workspace.
When attached the variables in the dataframe ▪ max(height.m[species=="GF"]) #[1] 47
can be referred to without being prefixed by the ▪ height.m <-0 #Changing value in attached df variable
dataframe name.
▪ attach(ufc) ▪ max(height.m) # [1] 0
▪ max(height.m[species=="GF"]) #[1] 47 ▪ max(ufc$height.m) #[1] 47
▪ To detach use detach(df)
▪ detach(ufc)
▪ When a dataframe is attached, R makes a copy of
each variable which is deleted when the datafram ▪ max(ufc$height.m) #[1] 47
is detached. So changing an attached dataframe
does not change the dataframe
▪ It is preferrbly avoided as it can be a source of
potential errors
▪ with and transform function provide a safer
alternative
Module 3 PAGE 55
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list("one", TRUE,3,c("f","o","u","r"))
▪ A generic container for other objects. ▪ my.list[[2] ] #[1] TRUE
▪ Like a vector, a list is an indexed set of
objects(and has length) ▪ mode(my.list[[2]]) # "logical“
▪ But unlike vector, elements of a list can be of ▪ my.list[[4]] #[1] "f" "o" "u" "r“
different types, including other lists
▪ Mode of a list is list ▪ my.list[[4]][1] #[1] "f"
▪ It might contain an individual measurement, a ▪ R uses double square brackets [ [1] ] to indicate
vector of observations on a single response List Elements then single square brackets [1] to
variable, a dataframe or a list of dataframes indicate vector elements within the list
containing the results of several experiments
▪ A List is created using list(…) command with
comma-separated arguments.
▪ Single square brackets are used to select a sub-
list
▪ Double square brackets are used to extract a
single element
Module 3 PAGE 56
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list(first="one",second=TRUE,third=3,fourth=c("f","o","u","r"))
▪ > my.list
▪ Elements of a list can be named when the list ▪ $first
is created using arguments of the form ▪ [1] "one"
▪ $second
name1=x1,name2=x2, etc. ▪ [1] TRUE
▪ Elements of a list can be named later by ▪ $third
▪ [1] 3
assigning a value to the names attribute ▪ $fourth
▪ Unlike dataframe, the elements of a list do ▪ [1] "f" "o" "u" "r“
▪ Or they can also be used (with or without ▪ my.list$'Second Element’ # [1] TRUE
quotes) after a dollar sign to extract a list ▪ x<-'Second Element’
element ▪ my.list[[x]] # [1] TRUE
Module 3 PAGE 57
Module 1 Module 2 Module 4 Module 5
Lists
▪ To Flatten a list x, i.e convert it into a vector,
we use unlist(x)
▪ x<-list(1,c(2,3),c(4,5,6))
▪ unlist(x) # [1] 1 2 3 4 5 6
▪ If the list object itself comprises of lists, then
these lists are also flattened, unless the
argument recursive = FALSE is set
Module 3 PAGE 58
Module 1 Module 2 Module 4 Module 5
Lists
▪ Linear Regression:
▪ lm.xy<-lm(y ~ x,data=data.frame(x=1:5,y=1:5))
▪ mode(lm.xy) #[1]”list”
▪ names(lm.xy)
Module 3 PAGE 59
Module 1 Module 2 Module 4 Module 5
The apply family
▪ R has functions that allow you to easily apply a function to all or selected elements of a list or dataframe
▪ apply() - takes Data frame or matrix as an input and gives output in vector, list or array. Apply function in R is
primarily used to avoid explicit uses of loop constructs. It is the most basic of all collections can be used over a
matrice.
▪ lapply()
▪ sapply()
▪ tapply()
Module 3 PAGE 60
Module 1 Module 2 Module 4 Module 5
The apply family
▪ apply() - apply a function to the rows or columns of a
matrix or data frame. This function takes matrix or data
frame as an argument along with function and whether it
has to be applied by row or column and return
▪ apply(X,MARGIN,FUN)
▪ If margin is 1 FUN is applied across row
▪ If margin is 2 FUN is applied across the column
▪ # create sample
▪ datasample_matrix <- matrix(C<-(1:10),nrow=3, ncol=10)
▪ print( "sample matrix:")
▪ sample_matrix
▪ # Use apply() function across row to find sum
▪ print("sum across rows:")
▪ apply( sample_matrix, 1, sum)
▪ # use apply() function across column to find mean
▪ print("mean across columns:")apply( sample_matrix, 2, mean)
Module 3 PAGE 61
Module 1 Module 2 Module 4 Module 5
The apply family
▪ lapply() - apply functions on list objects and returns a list
object of the same length. It takes a list, vector, or data
frame as input and gives output in the form of a list
object.It applies a certain operation to all the elements of
the list it doesn’t need a MARGIN.
▪ lapply(X,FUN)
Module 3 PAGE 62
Module 1 Module 2 Module 4 Module 5
The apply family
▪ sapply() – apply functions on a list, vector, or data
frame and returns an array or matrix object of the
same length. It takes a list, vector, or data frame as
input and gives output in the form of an array or
matrix object. Since the sapply() function applies a
certain operation to all the elements of the object it
doesn’t need a MARGIN.
▪ It is the same as lapply() with the only difference being
the type of return object.
▪ sapply(X,FUN)
Module 3 PAGE 63
Module 1 Module 2 Module 4 Module 5
The apply family
▪ tapply() – Vectorise the application of a function to subsets of
data. It is useful for applying a function operation for each
factor variable in a vector. It helps to create a subset of a
vector and then apply some functions to each of the subsets
▪ tapply(X, INDEX, FUN, …)
▪ X – Target Vector to which function will be applied
▪ INDEX – It is a factor, which is used to group the elements of X. It
will be coerced to a factor if it is not one already. It has same
length as X
▪ FUN – Function to be applied. It is applied to subvectors of X
corresponding to a single level of Index
▪ #install.packages("tidyverse")
▪ # load library tidyverse
▪ library(tidyverse)
▪ # print head of diamonds dataset
▪ print(" Head of data:")
▪ head(diamonds)
▪ # apply tapply function to get average price by cut
▪ print("Average price for each cut of diamond:")
▪ tapply(diamonds$price, diamonds$cut, mean)
Module 3 PAGE 64
Module 1 Module 2 Module 4 Module 5
The apply family
▪ mapply() – This function stands for multivariate apply
and is used to perform mathematical operations on
multiple lists simultaneously.
▪ mapply(FUN,LIST1, LIST2 …)
▪ LIST1, LIST2… – Created Lists
▪ FUN – Function to be applied on the lists.
▪ # Creating a list
▪ A = list(c(1, 2, 3, 4))
▪ # Creating another list
▪ B = list(c(2, 5, 1, 6))
▪ # Applying mapply()
▪ result = mapply(sum, A, B)
▪ print(result) #[1] 24
Module 3 PAGE 65
Module 1 Module 2 Module 4 Module 5
Questions
▪ *
Module 3 PAGE 66
Module 1 Module 2 Module 4 Module 5
Assignment
▪ *
Module 3 PAGE 67
Module 1 Module 2 Module 4 Module 5
Functions: Calling functions, scoping,
Arguments matching, writing functions: The
function command, Arguments, specialized
function. Module 4
Text Book 1: Chapter 5- 5.1 to 5.6
PAGE 68
Functions
▪ The value of x1, x2 etc are copied to arg_1 ,arg_2 etc.
the arguments then act as variables within the function
▪ Building Blocks for large programs and essential for structuring
complex algorithms. ▪ Function next evaluates the grouped expressions
contained within the braces { }
▪ Once loaded it can be reused without having to reload it.
▪ The value of the expression output is returned as the
▪ Break down a program into smaller logical units which does a value of the function
simple well defined task
▪ A function may have more than 1 return statement, in
▪ A Function’s general form: which case it stops after executing the first one it reaches.
▪ name <- function(arg_1, arg_2, …) { ▪ If there is no return statement, then the value returned
exp_1 by the function is the value of the last expression in the
exp_2 braces – A function ALWAYS returns a value in R.
<some other exp> ▪ NULL may be returned by the function
return(output)
} ▪ Some functions have no arguments
▪ arg_1, arg_2 etc are names of variables ▪ Braces are necessary only if the function comprises more
▪ exp_1, exp_2 and output are all regular R expressions than 1 expression
▪ name is the name of the function ▪ When a function is called, if the returned value is not
▪ Function call is made using name(x1,x2) assigned to a variable then it is printed.
▪ The value of this expression is the value of the expression output. ▪ Expression invisible(x) will return the same value as x, but
the value is not printed.
Module 4 PAGE 69
Module 1 Module 2 Module 3 Module 5
Functions
▪ quad<-function(a0,a1,a2){
▪ #Find the zeros of a2*x^2+a1*x+a0=0
▪ Roots of a quadratic Equation ▪ if (a2==0 && a1==0 & a0==0){
roots<-NA
▪ #Main }else if(a2==0 && a1==0){
roots<-NULL
▪ rm(list=ls()) }else if(a2==0){
roots<--a0/a1
▪ source("C:/Users/Praahas/Projects/R/quad.r") }else {
#calculate the discriminant
▪ quad(1,0,-1) discrim <- a1^2 - 4*a2*a0
#calculate the roots depending on the value of the discriminant
▪ quad(1,-2,1) if (discrim>0){
roots<- (-a1 +c(1,-1)*sqrt(a1^2 - 4*a2*a0))/(2*a2)
▪ quad(1,1,1) } else if (discrim == 0){
roots<- -a1/(2*a2)
}else{
roots<-NULL
}
}
return(roots)
}
Module 4 PAGE 70
Module 1 Module 2 Module 3 Module 5
Functions
▪ n_factorial<-function(n){
𝑛!
▪ nCr = ▪ #Calculate n Factorial
𝑟! 𝑛−𝑟 !
▪ n_fact<-prod(1:n)
▪ #Main
▪ return(n_fact)
▪ rm(list=ls())
▪ }
▪ source("C:/Users/Praahas/Projects/R/ncr.r")
▪ ncr(4,2) #[1] 6
▪ ncr<-function(n,r){
▪ ncr(6,4) #[1] 15
▪ #Calculate ncr
▪ n_ch_r<-n_factorial(n)/n_factorial(r)/n_factorial(n-r)
▪ Return(n_ch_r)
▪ }
Module 4 PAGE 71
Module 1 Module 2 Module 3 Module 5
Functions
▪ wmean <- function(x,k){
▪ Discard K Smallest and K largest values and then calculate Mean- Eliminates outliers compared to
untrimmed mean
▪ x<-sort(x)
▪ Winsorised Mean – instead of discarding k-th largest and k-th smallest values, we replace them by ▪ n<-length(x)
𝑥(𝑛−𝑘) and 𝑥(𝑘+1) respectively
▪ x[1:k]<-x[k+1]
▪ This can be used when a sample may contain occasional extraordinary values
▪ #Main ▪ x[(n-k+1):n]<-x[n-k]
▪ rm(list=ls()) ▪ return(mean(x))
▪ source("C:/Users/Praahas/Projects/R/wmean.r")
▪ x<-c(8.244,51.421,39.020,90.574,44.697,83.600,73.760,81.106,38.811,68.517)
▪ mean(x)
▪ wmean(x,2)
Module 4 PAGE 72
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ When a function is executed, the computer sets aside space for the ▪ #swap values of x[1] and x[2]
function variables, makes a copy of the function code and then y<-x[2]
transfers control to the function
▪ x[2]<-x[1]
▪ When the function finishes executing, the output is passed to the main
program and the copy of the function variables and code is deleted ▪ x[1]<-y
▪ x<-c(7,8,9) ▪ }
▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7
Module 4 PAGE 73
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ #swap values of x[1] and x[2]
▪ y<-x[2]
▪ x[2]<-x[1]
▪ x[1]<-y
▪ return(x)
▪ }
▪ #Main
▪ x<-c(7,8,9)
▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7
Module 4 PAGE 74
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test<-function(x){
▪ Arguments and variables that are defined within a function exist y<-x+1
only within that function
return(y)
▪ If variables with same name exist inside and outside a function,
then they are separate and do not interact at all }
▪ #main
▪ If we execute command rm(list=ls()) inside a function then, you
only delete those objectsthat are defined inside the function ▪ test(1) #[1] 2
▪ The part of a program in which a variable is defined is called its ▪ y # Error: object 'y' not found
scope ▪ y<-10
▪ Restricting the scope of variabels ensures that a function call will
▪ test(1) #[1] 2
not modify a variable outside the function, escept by assigneing
the returned value. ▪ y #[1] 10
Module 4 PAGE 75
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test2<-function(x){
▪ Scope of a variable is not symmetric y<-x+z
▪ Variables defined insode a function cannot be seen outside, but return(y)
variables defined outside the function can be seen inside the
function, provided there is no varaibel with the same name defiend }
insided the function. ▪ z<-1
▪ test2(1) #[1] 2
▪ z<-2
▪ test2(1) #[1] 3
Module 4 PAGE 76
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test3<-function(x=1){
▪ Arguments used in a function are named when return(x)
the function is created }
▪ test3(2) #[1] 2
▪ Some arguments may be assigned default values,
which are used in case tehj argument is not ▪ test3() #[1] 1
provided in the function call.
▪ Sometimes arguments have to be defined so that ▪ funk<-function(words=c(“Apple", "Bat", “Cat", "Dog")){
they can only take a small number of different ▪ words<-match.arg(words)
values and the function will stop informatively if ▪ return(words)
an inappropriate value is passed. ▪ }
▪ This can be done with if statement, but R
provides a method for this. – Include a vector of ▪ funk() #[1] “Apple“
permissible values for any such argument and ▪ funk(“Bat") #[1] “Bat"
check them using match.arg function
▪ funk("Dum") # Error in match.arg(words) from( #2)
▪ ‘arg’ should be one of “Apple", "Bat", “Cat", "Dog"
Module 4 PAGE 77
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test4<-function(x, ...){
▪ R provides a means for passing arguments ▪ return(sd(x,...))
unaltered from the function that is being called
to the functions that are called within it. ▪ }
Module 4 PAGE 78
Module 1 Module 2 Module 3 Module 5
Arguments
▪ *Note
▪ R provides a means for partial matching of ▪ seq.int(0, 1, len = 11)
arguments, where doing so is not ambiguous ▪ seq.int(0, 1, length.out = 11)
▪ Argument names in the function call do not have
to be complete ▪ ls(all = TRUE)
▪ This can make the code more fragile and ▪ ls(all.names = TRUE)
therefore and this style is therefore not ▪ Partial matching exists to save you typing long argument
encouraged names.
▪ The danger with it is that functions may gain additional
arguments later on which conflict with your partial
▪ test6<-function(a=1, b.c.d=1){ match.
return (a+b.c.d) ▪ This means that it is only suitable for interactive use – if
} you are writing code that will stick around for a long
time (to go in a package, for example) then you should
▪ test6() #[1] 2 always write the full argument name.
▪ test6(b=5) #[1] 6 ▪ The other problem is that by abbreviating an argument
name, you can make your code less readable.
Module 4 PAGE 79
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Many R Functions are vectorized – for a given vector ▪ Example: sapply(X,FUN)
input, the function acts on each element separately and
returns a vector output. ▪ The use of the above expression is to apply
the function FUN to every element of
▪ This enables R to have compact efficient and readable vector X.
code
▪ X can be a list or an atomic vector (vector
▪ Applying function to a vector is much faster than that contains atomic objects like logical,
iteratively looping and applying the function on each integer, numeric, complex character and
element raw)
▪ apply, sapply,lapply,tapply,mapply ▪ sapply(X,FUN) returns a vector whose i-th
element is the value of the expression
FUN(X[i])
Module 4 PAGE 80
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Example for sapply() – Density of Primes
▪ Write a function prime that tests if a given integer is prime
or not
▪ Use sapply() to apply the prime checker function to the
vector 2:n so that we know all primes less than or equal to
n
▪ ρ(n) -> number of primes less than or equal to n
ρ(n) log(𝑛)
▪ Legendre and Gauss’ Assertion -> lim -> 1
𝑛→∞ 𝑛
▪ Result proved by Hadamard and de la Vallee Poussin
▪ Cumulative Sum Function of a vector X-> cumsum(X)
Module 4 PAGE 81
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ rm(list=ls())
▪ prime<-function(n){
▪ n<-1000
if(n==1){ ▪ m.vec<-2:n
is.prime<-FALSE ▪ primes<-sapply(m.vec,prime)
}else if(n==2){
▪ num.primes<cumsum(primes)
is.prime<-TRUE
}else{ ▪ #print(num.primes)
is.prime<-TRUE ▪ par(mfrow = c(1,2),las=1)
for(m in 2:(n/2)){
▪ plot(m.vec, num.primes/m.vec,type="l",main ="prime
if(n%%m==0) is.prime<-FALSE density",xlab="n",ylab="")
}
}
▪ lines(m.vec,1/log(m.vec),col="red")
return(is.prime) ▪ plot(m.vec, num.primes/m.vec*log(m.vec),type="l",main
} ="prime density * log(n)",xlab="n",ylab="")
Module 4 PAGE 82
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
Module 4 PAGE 83
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Optimised Code for Prime Density
▪ Check for factors upto 𝑛, since n=ab
▪ prime<-function(n){
if(n==1){ ▪ Atleast one of a and b is less than or equal to 𝑛
is.prime<-FALSE ▪ Once we find one factor we don’t need to keep checking
}else if(n==2){
is.prime<-TRUE
}else{
is.prime<-TRUE
m<-2
m.max<-sqrt(n)
while(is.prime && m<=m.max){
if(n%%m ==0) is.prime<-FALSE
m<-m+1
}
}
return(is.prime)
}
PAGE 84
Recursive Programming
▪ When a function is called , a new copy of the
▪ A programming technique made possible by functions, function is created with a new set of function
where a function calls itself. variables in a new environment
▪ Example n factorial -> n! = n*((n-1)!)
▪ Therefore elegant but not efficient
▪ nfact2<-function(n){
if(n==1){
cat("Called nfact2(1)\n")
return(1)
}else{
cat("called nfact2(",n,")\n",sep="")
return(n*nfact2(n-1))
}
}
nfact2(6)
Module 4 PAGE 85
Module 1 Module 2 Module 3 Module 5
Recursive Programming ▪ primesieve<- function(sieved,unsieved){
p<-unsieved[1]
▪ Example Sieve of Eratosthenes – Finding all of n<-unsieved[length(unsieved)]
the primes less than or equal to a given number
n if(p^2 >n){
1. Start with a list 2,3,….n and largest known return(c(sieved, unsieved))
prime p=2 }else{
2. Remove from the list all elements that are unsieved<-unsieved[unsieved%%p!=0]
multiples of p (but keep p itself) sieved<-c(sieved,p)
return(primesieve(sieved,unsieved)) }
3. Increase p to the smallest element of the
remaining list that is larger than the current p. }
4. If p is larger than 𝑛 then stop, otherwise go primesieve(c(),2:200)
back to step 2
Module 4 PAGE 86
Module 1 Module 2 Module 3 Module 5
Sieve of
Eratosthenes
276 B.C.
Module 4 PAGE 87
Module 1 Module 2 Module 3 Module 5
Debugging Functions
▪ Unexpected inputs can lead to undesirable ▪ In Browser environment, R Commands can be
consequences and the user may not know why entered normally and evaluated normally, but some
commands have specific new interpretations.
▪ Functions can work, but may return plausible
nonsense. ▪ n – evaluates the current step and prints the next
step to eb evaluated. Return Key has same effect
▪ Perform simple checks of the input to ensure it
conforms to expectations ▪ c – continues evaluation from the next expression
to the end of te hcurrent set of expressions,
▪ stop(“Your message here.”) function is useful for whether that be the end of the current loop or the
this. It ceases processing and prints message to end of the function – same as cont. c stops the
user. browser and continues evaluation starting at the
next statement. Return Key and cont has same
▪ browser() function is useful to invoke inside your erffect
own functions . – Temporarily stop the program
and allows inspection of objects ▪ Q – stops evaluation and exists browser returning
the user to the top-level prompt.
▪ You can step through the code executing one
instruction at a time.
Module 4 PAGE 88
Module 1 Module 2 Module 3 Module 5