0% found this document useful (0 votes)

161 views

SMB-R Programming Lab

The document provides an index of 14 experiments related to R programming. It then summarizes Experiment 1 on implementation of vectors and lists in R. It defines vectors as collections of homogeneous elements that can be of different data types. It describes how to create, access, and manipulate vector elements using functions like c(), [ ], length(), sort(), and NULL. It also defines lists as objects that can contain elements of different types, and provides examples of creating and accessing list elements.

Uploaded by

ẞãï Kríßhñä Baythapudi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views

SMB-R Programming Lab

Uploaded by

ẞãï Kríßhñä Baythapudi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

NRI INSTITUTE OF TECHNOLOGY

R – Programming Lab
INDEX
Experiment
Name of the Experiments Page No.
No.

1 Implementation of Vectors and Lists. 2

2 Implementation of DATA FRAMES 11

Implementation of Matrix Addition, Subtraction
3 16
Multiplication and Division.
4 Implementation of Quick Sort. 23

5 Implementation of Binary Search Tree. 26

6 Implementation of Set Operations. 27

7 Implementation of Reading and Writing files. 28

8 Implementation of Graph Operations. 30

Implementation of Corelation.
9 44
Implementation of ANNOVA.
10 47

11 Implementation of Linear Regression. 49

12 Implementation of Logistic Regression. 50

Implementation of Random Forest.
13 51

14 Viva Voce questions 52

Sk.Mahaboob Basha,Associate Professor of IT Page 1

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

1. Implementation of Vectors and Lists.

Vector: a collection of homogeneous elements.
A vector supports logical, integer, double, character, complex, or raw data type.
The elements which are contained in vector known as components of the
vector. We can check the type of vector with the help of the typeof() function.

# Vector of strings
fruits <- c("banana", "apple", "orange")

# Vector of numerical values

numbers <- c(1, 2, 3)

# Vector of logical values

log_values <- c(TRUE, FALSE, TRUE, FALSE)

Vectors are commonly created using the c() function, it is the easiest way to
create vectors in R. While, creating vector we must pass elements of the same
type, but, if the elements are of different type then elements are converted to
the same data type from lower data type to higher data types from logical to
integer to double to character.

x <- c(5, 3.2, TRUE,) # Converted to Characters

x
typeof(x)
character

Vectors of consecutive or sequential numeric values can simply be generated

using colon (:) operator as following –

Syntax:-
c(start:end)
or
x <- start:end
Example

Print(“R Creating Vector using Colon")

x <- 1:5
x

Sk.Mahaboob Basha,Associate Professor of IT Page 2

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
y <- 3:-3
y

Vector using seq() function

The seq() function enable us to create vectors with sequential values at
specified step size.
seq(startValue, endValue, by=stepSize)
Example:
a <- seq(1,5,by=1)
a
b <- seq(1,5,by=2)
b
c <- seq(1,5,by=3)
or
1. eq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)

Accessing Vector Elements

Vector elements can be accessed by passing index value(s) in brackets [ ]. An
index value can be logical, integer or character.
integer Index:-
An integer index can be used to denote the element position. An integer index
value start with 1.
# Accessing vector elements using integer indexing.
t <- c("January","February","March","April","May","June")
u <- t[c(2,4)]
print(u)

# Accessing vector elements using logical indexing.

v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE)]
print(v)
# Accessing vector elements using character indexing.

Naming Vectors

t <- c(l1="January",l2="February",l3="March",l4="April"l5=,"May",l6="June")
q<-t{c(l1,l5)]

Sk.Mahaboob Basha,Associate Professor of IT Page 3

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
we can also create naming vectors by names() function

x<-10:13

y<-c(“l1”,”l2”,”l3”)

names(x)<-y

now we can access with name index

x[“l1”]

Combining vectors
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)

Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided
giving the result as a vector output.

Vector Length

To find out how many items a vector has, use the length() function:

Length(vector)

Sort a Vector

To sort items in a vector alphabetically or numerically, use the sort() function:

Example

Sort(vector)

SORT IN REVERSE ORDER

SORT(V,DECREASING=TRUE)

Sk.Mahaboob Basha,Associate Professor of IT Page 4

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
To delete vectors

V<-NULL

Applications of vectors

1. In machine learning for principal component analysis vectors are used.

They are extended to Eigen values and eigenvector and then used for
performing decomposition in vector spaces.

2. The inputs which are provided to the deep learning model are in the form
of vectors. These vectors consist of standardized data which is supplied
to the input layer of the neural network.

3. In the development of support vector machine algorithms, vectors are

used.

4. Vector operations are utilized in neural networks for various operations

like image recognition and text processing.

LISTS

List is the object which contains elements of different types – like strings,
numbers, vectors and another list inside it. R list can also contain a matrix. A
list is a data structure which has components of mixed data types. We can
imagine the R list as a bag to put many different items. When we need to use
an item, we can open the bag and use it.

The list is created using the list() function in R. In other words, a list is a
generic vector containing other objects.

Let’s create a list containing string, numbers, vectors and logical values.
For example:
list_data <- list("Red", "White", c(1,2,3), TRUE, 22.4)
print(list_data)

Sk.Mahaboob Basha,Associate Professor of IT Page 5

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

1. vec <- c(3,4,5,6)

2. char_vec<-c("shubham","nishka","gunjan","sumit")
3. logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
4. out_list<-list(vec,char_vec,logic_vec)
5. out_list

OUTPUT

[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE

Example 1: Creating list with same data type

1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1

Sk.Mahaboob Basha,Associate Professor of IT Page 6

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
6. list_2
7. list_3
8. list_4
OUTPUT

[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3

[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"

[[1]]
[1] 1 2 3

[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Sometimes it’s necessary to have repeated values, for which we use rep()
> rep(5,3)
[1] 5 5 5
> rep(2:5,each=3)
[1] 2 2 2 3 3 3 4 4 4 5 5 5
> rep(-1:3, length.out=10)
[1] -1 0 1 2 3 -1 0 1 2 3
Naming List Elements
The list elements can be given names and they can be accessed using these
names.

# Create a list containing a vector, a matrix and a list.

Sk.Mahaboob Basha,Associate Professor of IT Page 7

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),

list("green",12.3))

# Give names to the elements in the list.

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.

print(list_data)

output

$`1st_Quarter`

[1] "Jan" "Feb" "Mar"

$A_Matrix

[,1] [,2] [,3]

[1,] 3 5 -2

[2,] 9 1 8

$A_Inner_list

$A_Inner_list[[1]]

Sk.Mahaboob Basha,Associate Professor of IT Page 8

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[1] "green"

$A_Inner_list[[2]]

[1] 12.3

Accessing List Elements

Elements of the list can be accessed by the index of the element in the list. In
case of named lists it can also be accessed using the names.

We continue to use the list in the above example −

# Create a list containing a vector, a matrix and a list.

list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),

list("green",12.3))

# Give names to the elements in the list.

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Access the first element of the list.

print(list_data[1])

# Access the thrid element. As it is also a list, all its elements will be printed.

print(list_data[3])

Sk.Mahaboob Basha,Associate Professor of IT Page 9

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
# Access the list element using the name of the element.

print(list_data$A_Matrix)

Manipulating List Elements

We can add, delete and update list elements as shown below. We can add and
delete elements only at the end of a list. But we can update any element.

# Add element at the end of the list.

list_data[4] <- "New element"

print(list_data[4])

# Remove the last element.

list_data[4] <- NULL

# Print the 4th Element.

print(list_data[4])

# Update the 3rd Element.

list_data[3] <- "updated element"

print(list_data[3])

Sk.Mahaboob Basha,Associate Professor of IT Page 10

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
2. Implementation of Data Frames.
Data Frames

Data Frames are data displayed in a format as a table.

Use the data.frame() function to create a data frame:

ex1.
#Author DataFlair
int_vec <- c(1,2,3)
char_vec <- c("a", "b", "c")
bool_vec <- c(TRUE, TRUE, FALSE)
data_frame <- data.frame(int_vec, char_vec, bool_vec)
ex2.
sno<-c(1,2,3)
sname<-c("smb","bbk","BNR")
marks<-c(97,96,95)
df<-data.frame(sno,sname,marks)
df

sno sname marks

1 1 smb 97
2 2 bbk 96
3 3 BNR 95
summary(df)

sno sname marks

Min. :1.0 Length:3 Min. :95.0

1st Qu.:1.5 Class :character 1st Qu.:95.5

Median :2.0 Mode :character Median :96.0

Sk.Mahaboob Basha,Associate Professor of IT Page 11

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Mean :2.0 Mean :96.0

3rd Qu.:2.5 3rd Qu.:96.5

Max. :3.0 Max. :97.0

create employee data

employee_data <- data.frame(

employee_id = c (1:5),

employee_name = c("James","Harry","Shinji","Jim","Oliver"),

sal = c(642.3,535.2,681.0,739.0,925.26),

join_date = as.Date(c("2013-02-04", "2017-06-21", "2012-11-14", "2018-

05-19","2016-03-25")),

stringsAsFactors = FALSE)

print(employee_data)

employee_id employee_name sal join_date

1 1 James 642.30 2013-02-04

2 2 Harry 535.20 2017-06-21

3 3 Shinji 681.00 2012-11-14

4 4 Jim 739.00 2018-05-19

5 5 Oliver 925.26 2016-03-25

Get the Structure of the R Data Frame

Sk.Mahaboob Basha,Associate Professor of IT Page 12

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
The structure of the data frame can see by using the str () function.
> str(employee_data)

'data.frame': 5 obs. of 4 variables:

$ employee_id : int 1 2 3 4 5

$ employee_name: chr "James" "Harry" "Shinji" "Jim" ...

$ sal : num 642 535 681 739 925

$ join_date : Date, format: "2013-02-04" ...

Extract data from Data Frame
By using the name of the column, extract a specific column from the columns.

emp_data<-
data.frame(employee_data$employee_id,employee_data$employee_name)

emp_data

employee_data.employee_name

1 James

2 Harry

3 Shinji

4 Jim

5 Oliver

 Extract first two rows

a<-employee_data[1:2,]

employee_id employee_name sal join_date

Sk.Mahaboob Basha,Associate Professor of IT Page 13

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
1 1 James 642.3 2013-02-04

2 2 Harry 535.2 2017-06-21

 Extract first two columns
 a<-employee_data[1:2]
 a

employee_id employee_name

1 1 James

2 2 Harry

3 3 Shinji

4 4 Jim

5 5 Oliver

Extract 1st and 2nd row with the 3rd and 4th column of the below
data.

> result <- employee_data[c(1,2),c(3,4)]

> result

sal join_date

1 642.3 2013-02-04

2 535.2 2017-06-21

Expand R Data Frame

A data frame can be expanded by adding columns and rows.

employee_data$dept <-
c("IT","Finance","Operations","HR","Administration")

Add Row

Sk.Mahaboob Basha,Associate Professor of IT Page 14

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
 Create the second R data frame
#DataFlair
employee_new_data <- data.frame(
employee_id = c (6:8),
employee_name = c("Aman", "Piyush", "Aakash"),
sal = c(523.0,721.3,622.8),
join_date = as.Date(c("2015-06-22","2016-04-30","2011-03-17")),
stringsAsFactors = FALSE
)

 Bind the two data frames.

> employee_out_data <- rbind(employee_data,employee_new_data)
> employee_out_data

Sk.Mahaboob Basha,Associate Professor of IT Page 15

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
3.Implementation of Matrix Addition, Subtraction
Multiplication and Division.
Theory: Matrices are a special type of two - dimensional arrays.
Matrices are much used in statistics, and so play an important role in R. To
create a matrix
use the function matrix(),

> matrix(1:12, nrow=3, ncol=4)

[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

This is called column-major order. Of course, we need only give one of the
dimensions:
> matrix(1:12, nrow=3)
unless we want vector recycling to help us:
> matrix(1:3, nrow=3, ncol=4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
Sometimes it’s useful to specify the elements by row first
> matrix(1:12, nrow=3, byrow=TRUE)
There are special functions for constructing certain matrices:
> diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
> diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3

> 1:5
1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5

Sk.Mahaboob Basha,Associate Professor of IT Page 16

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[2,] 2 4 6 8 10
[3,] 3 6 9 12 15
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
The last operator performs an outer product

matrix also be create by using following functions

rbind(),cbind()

a<- rbind(c(1:3),c(4:6))

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

a<- cbind(c(1:3),c(4:6))
a
[,1] [,2]

[1,] 1 4

[2,] 2 5
[3,] 3 6

a[1,2]

[1] 4

a[1,]
[1] 1 4

Sk.Mahaboob Basha,Associate Professor of IT Page 17

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Standard functions exist for common mathematical operations on matrices
> t(A) # transpose
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 10
> det(A) # determinant
[1] -3
> diag(A) # diagonal
[1] 1 5 10
Array:
Of course, if we have a data set consisting of more than two pieces of
categorical information
about each subject, then a matrix is not sufficient. The generalization of
matrices to higher
dimensions is the array. Arrays are defined much like matrices, with a call to
the array()
command.
The syntax of the Array in R Programming language is

Array_Name <- array(data, dim = (row_Size, column_Size, no.of matrices,

dimnames)

Here is a 2 × 3 × 3 array:
> arr = array(1:18, dim=c(2,3,3))
> arr
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
,,3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown
as a 2 × 3

Sk.Mahaboob Basha,Associate Professor of IT Page 18

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
matrix. Note that we no longer specify the number of rows and columns
separately, but use a
single vector dim whose length is the number of dimensions. You can recover
this vector
with the dim() function.
> dim(arr)
[1] 2 3 3
Note that a 2-dimensional array is identical to a matrix. Arrays can be
subsetted and modified in exactly the same way as a matrix, only using the
appropriate number of co-ordinates:
three_d_arr <- array(1:24,
dim = c (4, 3, 2),
dimname = list (
c("one", "two", "three", "four"),
c("ray", "karl", "mimo"),
c("steve", "mark")
))
three_d_arr
, , steve

ray karl mimo

one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12

, , mark

ray karl mimo

one 13 17 21
two 14 18 22
three 15 19 23
four 16 20 24

# creating 2 vectors of dissimilar lengths

vec1 <- c (3, 4, 2)
vec2 <- c (11, 12, 13, 14, 15, 16)
# taking these vectors as input for this array
res1 <- array (c (vec1, vec2), dim=c (3,3,2))
print (res1)
,,1

[,1] [,2] [,3]

Sk.Mahaboob Basha,Associate Professor of IT Page 19

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[1,] 3 11 14
[2,] 4 12 15
[3,] 2 13 16

,,2

[,1] [,2] [,3]

[1,] 3 11 14
[2,] 4 12 15
[3,] 2 13 16
Naming Columns and rows
# Creating 2 vectors having different lengths.
vec1 <- c (2, 4, 6)
vec2 <- c (11, 12, 13, 14, 15, 16)
column.names <- c ("COLA","COLB","COLC")
row.names <- c ("ROWA","ROWB","ROWC")
matrix.names <- c ("MatA", "MatB")

res1 <- array (c (vec1,vec2), dim=c (3,3,2), dimnames=list (column.names,

row.names, matrix.names))
print(res1)
, , MatA

COLA COLB COLC

ROWA 2 11 14
ROWB 4 12 15
ROWC 6 13 16

, , MatB

COLA COLB COLC

ROWA 2 11 14
ROWB 4 12 15
ROWC 6 13 16

Accessing and Indexing Arrays

print (res1 [3,,1])
# this statement prints the 3rd row of the first matrix of the array.
print (result [2,2,1])
# the above statement prints the element in the 2nd row and 2nd column of
the 1st matrix.
print (result [,,2])
# the above statement prints the 2nd matrix entirely

Sk.Mahaboob Basha,Associate Professor of IT Page 20

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Factors
R has a special data structure to store categorical variables. It tells R that a
variable is
nominal or ordinal by making it a factor. Creating a vector
 Converting the vector created into a factor using function factor()
# Creating a vector
x<-c("female", "male", "male", "female")
print(x)

# Converting the vector x into a factor named gender

gender<-factor(x)
print(gender)
Output:
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
Simplest form of the factor function :

Ideal form of the factor function :

The factor function has three parameters:

1. Vector Name
2. Values (Optional)
3. Value labels (Optional)

Implementation of Matirx Addition and Multiplication.

m1 = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)

print("Matrix-1:")
print(m1)
m2 = matrix(c(0, 1, 2, 3, 0, 2), nrow = 2)
print("Matrix-2:")

Sk.Mahaboob Basha,Associate Professor of IT Page 21

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
print(m2)

result = m1 + m2
print("Result of addition")
print(result)

result = m1 - m2
print("Result of subtraction")
print(result)

result = m1 * m2
print("Result of multiplication")
print(result)

result = m1 / m2
print("Result of division:")
print(result)
[1] "Matrix-1:"
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] "Matrix-2:"
[,1] [,2] [,3]
[1,] 0 2 0
[2,] 1 3 2
[1] "Result of addition"
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 3 7 8
[1] "Result of subtraction"
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 1 4
[1] "Result of multiplication"
[,1] [,2] [,3]
[1,] 0 6 0
[2,] 2 12 12
[1] "Result of division:"
[,1] [,2] [,3]
[1,] Inf 1.500000 Inf
[2,] 2 1.333333 3

Sk.Mahaboob Basha,Associate Professor of IT Page 22

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
4.Implementation of Quick Sort.
sort(x, decreasing = FALSE, na.last = NA, …)

sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE,method =

c("auto", "shell", "quick", "radix"), index.return = FALSE)

Arguments
x : for sort an R object with a class or a numeric, complex, character or
logical vector. For sort.int, a numeric, complex, character or logical vector,
or a factor.

decreasing : logical. Should the sort be increasing or decreasing? For

the "radix" method, this can be a vector of length equal to the number of
arguments in …. For the other methods, it must be length one. Not available
for partial sorting.
… arguments to be passed to or from methods or (for the default methods
and objects without a class) to sort.int.
na.last :for controlling the treatment of NAs. If TRUE, missing values in the
data are put last; if FALSE, they are put first; if NA, they are removed.
partial : NULL or a vector of indices for partial sorting.

method :character string specifying the algorithm used. Not available for
partial sorting. Can be abbreviated.
index.return : logical indicating if the ordering index vector should be
returned as well. Supported by method == "radix" for any na.last mode and
data type, and the other methods when na.last = NA (the default) and fully
sorting non-factors.

Ex:
A<-c(51:60,5:50,60:100,1:5)
A
[1] 51 52 53 54 55 56 57 58 59 60 5 6
[13] 7 8 9 10 11 12 13 14 15 16 17 18
[25] 19 20 21 22 23 24 25 26 27 28 29 30
[37] 31 32 33 34 35 36 37 38 39 40 41 42
[49] 43 44 45 46 47 48 49 50 60 61 62 63
[61] 64 65 66 67 68 69 70 71 72 73 74 75
[73] 76 77 78 79 80 81 82 83 84 85 86 87
[85] 88 89 90 91 92 93 94 95 96 97 98 99
[97] 100 1 2 3 4 5
A<-sort(A,decreasing=FALSE,method="quick")

Sk.Mahaboob Basha,Associate Professor of IT Page 23

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
A
[1] 1 2 3 4 5 5 6 7 8 9 10 11
[13] 12 13 14 15 16 17 18 19 20 21 22 23
[25] 24 25 26 27 28 29 30 31 32 33 34 35
[37] 36 37 38 39 40 41 42 43 44 45 46 47
[49] 48 49 50 51 52 53 54 55 56 57 58 59
[61] 60 60 61 62 63 64 65 66 67 68 69 70
[73] 71 72 73 74 75 76 77 78 79 80 81 82
[85] 83 84 85 86 87 88 89 90 91 92 93 94
[97] 95 96 97 98 99 100

Quick Sort
Quicksort is a sorting algorithm based on the divide and conquer approach
where An array is divided into subarrays by selecting a pivot
element (element selected from the array).

1 While dividing the array, the pivot element should be positioned in such a
way that elements less than pivot are kept on the left side and elements
greater than pivot are on the right side of the pivot.

2. The left and right subarrays are also divided using the same approach.
This process continues until each subarray contains a single element.

3. At this point, elements are already sorted. Finally, elements are combined
to form a sorted array.

# Simple implementation of Selection Sort and Quicksort in R.

# Quick sort algorithm:
# 1. Select a random value from the array.
# 2. Put all values less than the random in arrayLeft.
# 3. Put all values greater than the random in arrayRight.
# 4. If arrayLeft or arrayRight has more than 1 value, repeat the
above steps on it.
# 5. The sorted result is arrayLeft, random, arrayRight.

R has a built-in quicksort function but in some rare cases you might want
to modify the pivot value selection part of the algorithm. Here’s a custom
implementation
quickSort <- function(arr) {
# Pick a number at random.
p <- sample(arr, 1)
# Place-holders for left and right values.
left <- c()

Sk.Mahaboob Basha,Associate Professor of IT Page 24

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
right <- c()
# Move all the smaller values to the left, bigger values to the right.
lapply(arr[arr != p], function(d) {
if (d < p) {
left <<- c(left, d)
}
else {
right <<- c(right, d)
}
})

if (length(left) > 1) {
left <- quickSort(left)
}

if (length(right) > 1) {
right <- quickSort(right)
}
# Finally, return the sorted values.
c(left, p, right)
}

x <-sample(1:100,10)
x
[1] 22 92 30 12 48 20 88 80 8 34

RES <- quickSort(x)

RES
[1] 8 12 20 22 30 34 48 80 88 92
OR
quicksort= function(x)
{
if(length(x)<1)
return(x)
pivot=x[1]
rest=x[-1]
pivot_less=quicksort(rest[rest<pivot])
pivot_greater=quicksort(rest[rest>=pivot])
return(c(pivot_less,pivot,pivot_greater))

}
quicksort(c(10,24,33,21,22,66,11))

Sk.Mahaboob Basha,Associate Professor of IT Page 25

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
5.Implementation of Binary Search Tree.
binarysearch<-function(v,t){
i=1
l=length(v)
while(i<=l){
mid<-((v[i]+v[l])%/%2)
if(a[mid]==t)
return(mid)
else if(a[mid]>t)
l<-mid-1
else
i<-mid+1
}
return(0)
}
a<-seq(1,10)
a
x<-binarysearch(a,2)
x
if(x==0){
print("search unsuccessful")
}else{
print("search successful")
cat("\n","Element found at",x)
}

Sk.Mahaboob Basha,Associate Professor of IT Page 26

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
6.Implementation of Set Operations.
R includes some handy set operations, including these:
union(x,y): Union of the sets x and y
intersect(x,y): Intersection of the sets x and y
setdiff(x,y): Set difference between x and y, consisting of all elements
of x that are not in y
setequal(x,y): Test for equality between x and y
c %in% y: Membership, testing whether c is an element of the set y
choose(n,k): Number of possible subsets of size k chosen from a set of size n
Here are some simple examples of using these functions:
> x <- c(1,2,5)
> y <- c(5,1,8,9)
> union(x,y)
[1] 1 2 5 8 9
> intersect(x,y)
[1] 1 5
> setdiff(x,y)
[1] 2
> setdiff(y,x)
[1] 8 9
> setequal(x,y)
[1] FALSE
> setequal(x,c(1,2,5))
[1] TRUE
> 2 %in% x
[1] TRUE
> 2 %in% y
[1] FALSE
> choose(5,2)
[1] 10
x<-c(1:10)
y<-c(5,6,7,8)
a<- union(x,y)
a
[1] 1 2 3 4 5 6 7 8 9 10
b<-intersect(x,y)
b
[1] 5 6 7 8
c<-setdiff(x,y)
c
d<-setequal(x,y)
d [1] FALSE

Sk.Mahaboob Basha,Associate Professor of IT Page 27

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
8. Implementation of Reading and Writing files.
a. Reading different types of data sets (.txt, .csv) from web and disk and
writing in file in specific disk location.
library(utils)
data<- read.csv("F:/R -smb/sampledata/input.csv")
data
eno ename sal doj dept
1 101 Ramesh 50000 13-06-2016 IT
2 102 Rama 25000 14-06-2016 CSE
3 103 sirish 30000 15-06-2016 ECE
4 104 Bindhu 35000 16-06-2016 IT
5 105 karthik 69000 17-06-2016 IT
6 106 nagaraju 85000 18-06-2016 IT
7 107 Anil 65000 19-06-2016 CSE
8 108 Uma 65000 20-06-2016 ECE
9 109 venkat 5000 21-06-2016 CSE
10 110 hari 30000 22-06-2016 ECE
print(is.data.frame(data))
TRUE
print(ncol(data))
5
print(nrow(data))
10
sal<- max(data$sal)
sal
85000
# Get the person detail having max salary.
retval<- subset(data,sal == max(sal))
retval
eno ename sal doj dept
6 106 nagaraju 85000 18-06-2016 IT
retval<- subset( data, dept == "IT")
Get all the people working in IT department # Create a data frame. data<-
read.csv("input.csv") retval<- subset( data, dept == "IT")
retval
eno ename sal doj dept
1 101 Ramesh 50000 13-06-2016 IT
4 104 Bindhu 35000 16-06-2016 IT
5 105 karthik 69000 17-06-2016 IT
6 106 nagaraju 85000 18-06-2016 IT
# Write filtered data into a new file.
write.csv(retval,"F:/R -smb/sampledata/output.csv")

Sk.Mahaboob Basha,Associate Professor of IT Page 28

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
newdata<- read.csv("F:/R -smb/sampledata/output.csv")
newdata
eno ename sal doj dept
1 1 101 Ramesh 50000 13-06-2016 IT
2 4 104 Bindhu 35000 16-06-2016 IT
3 5 105 karthik 69000 17-06-2016 IT
4 6 106 nagaraju 85000 18-06-2016 IT
b. Reading Excel data sheet in R.
install.packages("xlsx")
library("xlsx")
data<- read.xlsx("input.xlsx", sheetIndex = 1)
data
eno ename sal doj dept
1 101 Ramesh 50000 2016-06-13 IT
2 102 Rama 25000 2016-06-14 CSE
3 103 sirish 30000 2016-06-15 ECE
4 104 Bindhu 35000 2016-06-16 IT
5 105 karthik 69000 2016-06-17 IT
6 106 nagaraju 85000 2016-06-18 IT
7 107 Anil 65000 2016-06-19 CSE
8 108 Uma 65000 2016-06-20 ECE
9 109 venkat 5000 2016-06-21 CSE
10 110 hari 30000 2016-06-22 ECE
c. Reading XML dataset in R.
install.packages("XML")
library("XML")
library("methods")
result<- xmlParse(file = "F:/R -smb/sampledata/input.xml")
result
<?xml version="1.0"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
……..

Sk.Mahaboob Basha,Associate Professor of IT Page 29

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
9.Implementation of Graph Operations.
R offers three main graphics packages: traditional (or base), lattice and
ggplot2. Traditional graphics are built into R, create nice looking graphs,
and are very flexible. However, they require a lot of work when repeating a
graph for different groups in your data. Lattice graphics excel at repeating
graphs for various groups. The ggplot2 package also deals with groups well
and is quite a bit more flexible than lattice graphics.
Use data Sets
cars
mtcars
iris
1.Scatter Plot
A scatter plot (aka scatter chart, scatter graph) uses dots to represent
values for two different numeric variables. The position of each dot on the
horizontal and vertical axis indicates values for an individual data point.
Scatter plots are used to observe relationships between variables.
To make a scatter plot use plot() with a vector of x values and a vector of y
values:
# base R

The plot() function is used to draw points (markers) in a diagram.

The function takes parameters for specifying points in the diagram.

Parameter 1 specifies points on the x-axis.

Parameter 2 specifies points on the y-axis.

Draw one point in the diagram, at position (1) and position (3):

plot(1, 3)

Draw two points in the diagram, one at position (1, 3) and one in position (8,
10):

Sk.Mahaboob Basha,Associate Professor of IT Page 30

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
plot(c(1, 8), c(3, 10))

You can plot as many points as you like, just make sure you have the same
number of points in both axis:

x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)

plot(x, y)

If you want to draw dots in a sequence, on both the x-axis and the y-axis,
use the : operator:
plot(1:10)

plot(iris)

Sk.Mahaboob Basha,Associate Professor of IT Page 31

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

plot(iris$Sepal.Length)

Draw a Line

The plot() function also takes a type parameter with the value l to draw a line to
connect all the points in the diagram:

plot(1:10, type="l")

Plot Labels

The plot() function also accept other parameters, such as main, xlab and ylab if
you want to customize the graph with a main title and different labels for the x
and y-axis:

plot(1:10, main="My Graph", xlab="x-axis", ylab="y axis")

Sk.Mahaboob Basha,Associate Professor of IT Page 32

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Graph Appearance

There are many other parameters you can use to change the appearance of the
points.

Colors

Use col="color" to add a color to the points:

plot(1:10, col="red")

Size

Use cex=number to change the size of the points (1 is default, while 0.5 means
50% smaller, and 2 means 100% larger):

Point Shape

Use pch with a value from 0 to 25 to change the point shape format:

plot(1:10, pch=25, cex=2, col="green")

Sk.Mahaboob Basha,Associate Professor of IT Page 33

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

The values of the pch parameter ranges from 0 to 25, which means that we can
choose up to 26 different types of point shapes:

Line Width

To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):

Sk.Mahaboob Basha,Associate Professor of IT Page 34

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Example
Line Graphs

A line graph has a line that connects all the points in a diagram.

To create a line, use the plot() function and add the type parameter with a
value of "l"

Line Width

To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):

plot(1:10, type="l", lwd=2)

Line Styles

The line is solid by default. Use the lty parameter with a value from 0 to 6 to
specify the line format.

Available parameter values for lty:

 0 removes the line

 1 displays a solid line
 2 displays a dashed line
 3 displays a dotted line
 4 displays a "dot dashed" line
 5 displays a "long dashed" line
 6 displays a "two dashed" line

plot(1:10, type="l", lwd=1, lty=6)

Multiple Lines

To display more than one line in a graph, use the plot() function together with
the lines() function:

line1 <- c(1,2,3,4,5,10)

line2 <- c(2,5,7,8,9,10)

plot(line1, type = "l", col = "blue")

lines(line2, type="l", col = "red")

Sk.Mahaboob Basha,Associate Professor of IT Page 35

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Pie Charts

A pie chart is a circular graphical view of data.

Use the pie() function to draw pie charts:

# Create a vector of pies

x <- c(10,20,30,40)

# Display the pie chart

pie(x)

Sk.Mahaboob Basha,Associate Professor of IT Page 36

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Labels and Header

Use the label parameter to add a label to the pie chart, and use
the main parameter to add a header:

# Create a vector of pies

x <- c(10,20,30,40)

# Create a vector of labels

mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

# Display the pie chart with labels

pie(x, label = mylabel, main = "Fruits")

Sk.Mahaboob Basha,Associate Professor of IT Page 37

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Colors

You can add a color to each pie with the col parameter:

# Create a vector of colors

colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors

pie(x, label = mylabel, main = "Fruits", col = colors)

Legend

To add a list of explanation for each pie, use the legend() function:

# Create a vector of labels

mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

Sk.Mahaboob Basha,Associate Professor of IT Page 38

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

# Create a vector of colors

colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors

pie(x, label = mylabel, main = "Pie Chart", col = colors)

# Display the explanation box

legend("bottomright", mylabel, fill = colors)

The legend can be positioned as either:

bottomright, bottom, bottomleft, left, topleft, top, topright, right, center

Bar Charts

A bar chart uses rectangular bars to visualize data. Bar charts can be
displayed horizontally or vertically. The height or length of the bars are
proportional to the values they represent.

Use the barplot() function to draw a vertical bar chart:

# x-axis values
x <- c("A", "B", "C", "D")

# y-axis values

Sk.Mahaboob Basha,Associate Professor of IT Page 39

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x)

 The x variable represents values in the x-axis (A,B,C,D)

 The y variable represents values in the y-axis (2,4,6,8)
 Then we use the barplot() function to create a bar chart of the values
 names.arg defines the names of each observation in the x-axis
 Bar Color
 Use the col parameter to change the color of the bars:

x <- c("A", "B", "C", "D")

y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, col = "red")

Sk.Mahaboob Basha,Associate Professor of IT Page 40

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Density / Bar Texture

To change the bar texture, use the density parameter:

x <- c("A", "B", "C", "D")

y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, density = 10)

Bar Width

Sk.Mahaboob Basha,Associate Professor of IT Page 41

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Use the width parameter to change the width of the bars:

x <- c("A", "B", "C", "D")

y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, width = c(1,2,3,4))

Horizontal Bars

If you want the bars to be displayed horizontally instead of vertically,

use horiz=TRUE:

x <- c("A", "B", "C", "D")

y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, horiz = TRUE)

Sk.Mahaboob Basha,Associate Professor of IT Page 42

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

(Histograms and boxplots) Try the commands

hist(DATA)
boxplot(Volume)

Sk.Mahaboob Basha,Associate Professor of IT Page 43

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
9.Implementation of Corelation.
What is Correlation?

It is a statistical measure that defines the relationship between two variables

that is how the two variables are linked with each other. It describes the effect
of change in one variable on another variable.

If the two variables are increasing or decreasing in parallel then they have a
positive correlation between them and if one of the variables is increasing and
another one is decreasing then they have a negative correlation with each
other. If the change of one variable has no effect on another variable then they
have a zero correlation between them.

It is used to identify the degree of the linear relationship between two variables.
It is represented by 𝝆 and calculated as:-

𝜌 (𝑥, 𝑦) = 𝑐𝑜𝑣(𝑥, 𝑦) /(𝜎𝑥 × 𝜎𝑦 )

Where

𝑐𝑜(𝑥, 𝑦) = covariance of x and y

𝜎x = Standard deviation of x

𝜎𝑦 = Standard deviation of y

𝜌 (𝑥, 𝑦) = correlation between x and y

The value of 𝜌 (𝑥, 𝑦) varies between -1 to +1.

A positive value has a range from 0 to 1 where 𝜌 (𝑥, 𝑦) = 1 defines the strong
positive correlation between the variables.

A negative value has a range from -1 to 0 where 𝜌 (𝑥, 𝑦) = -1 defines the strong
negative correlation between the variables.

No correlation is defined if the value of 𝜌 (𝑥, 𝑦) = 0

height<-c(168,169,170,172,174)

Sk.Mahaboob Basha,Associate Professor of IT Page 44

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
weight<-c(65,70,75,78,80)

plot(height,weight,main="human",col="green", type="l")

cor(height,weight)

[1] 0.9382329

Tes<- cor.test(height,weight,method = "pearson")

Tes

Pearson's product-moment correlation

data: height and weight

t = 4.6967, df = 3, p-value = 0.01826

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3249548 0.9960214

sample estimates:

cor

0.9382329

km<-c(0,20,40,60,80,100)

oilquntity<-c(20,19,18,17,16,15)

Sk.Mahaboob Basha,Associate Professor of IT Page 45

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
plot(km,oilquntity,main="human",col="green", type="l")

cor(km,oilquntity)

[1] -1

Tes<- cor.test(km,oilquntity,method = "pearson")

Tes

Pearson's product-moment correlation

data: km and oilquntity

t = -Inf, df = 4, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-1 -1

sample estimates:

cor

-1

Sk.Mahaboob Basha,Associate Professor of IT Page 46

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
10.Implementation of ANNOVA.
ANOVA also known as Analysis of variance is used to investigate relations
between categorical variable and continuous variable in R Programming. It is a
type of hypothesis testing for population variance. ANOVA test involves setting
up:
• Null Hypothesis: All population mean is equal.
• Alternate Hypothesis: At least one population mean is different from other.
ANOVA test are of two types:
• One-way ANOVA: It takes one categorical group into consideration.
• Two-way ANOVA: It takes two categorical group into consideration.
Performing One Way ANOVA test
One-way ANOVA test is performed using mtcars dataset which comes
preinstalled with dplyr package between disp attribute, a continuous attribute
and gear attribute, a categorical attribute.
# Installing the package install.packages(dplyr)
# Loading the package library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),xlab = "gear", ylab ="disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu = mu01 =
mu02(There is no difference
# between average displacement for different gear) # H1 = Not all means are
equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value # and conclude test p <
alpha, Reject Null Hypothesis
The box plot shows the mean values of gear with respect of displacement. Hear
categorical variable is gear on which factor function is used and continuous
variable is disp. 14
Performing Two Way ANOVA test
Two way ANOVA test is performed using mtcars dataset which comes
preinstalled with dplyr package between disp attribute, a continuous attribute
and gear attribute, a categorical attribute, am attribute, a categorical attribute.
# Installing the package install.packages(dplyr) # Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),

Sk.Mahaboob Basha,Associate Professor of IT Page 47

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
xlab = "gear", ylab = "disp", main = "Manual")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between # average
displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function mtcars_aov2 <-
aov(mtcars$disp~factor(mtcars$gear) *
factor(mtcars$am)) summary(mtcars_aov2)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05 # Step 4: Compare
test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis
The box plot shows the mean values of gear with respect of displacement. Hear
categorical variables are gear and am on which factor function is used and
continuous variable is disp.
The summary shows that gear attribute is very significant to displacement
(Three stars denoting it) and am attribute is not much significant to
displacement. P-value of gear is less than 0.05, so it proves that gear is
significant to displacement i.e related to each other. P-value of am is greater
than 0.05, am is not significant to displacement i.e not related to each other.
15• y is the response variable.
• x is the predictor variable.
• a and b are constants which are called the coefficients.

Sk.Mahaboob Basha,Associate Professor of IT Page 48

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
11. Implementation of LINEAR REGRESSION
In Linear Regression these two variables are related through an equation,
where exponent (power) of both these variables is 1. Mathematically a linear
relationship represents a straight line when plotted as a graph. A non-linear
relationship where the exponent of any variable is not equal to 1 creates a
curve.
The general mathematical equation for a linear regression is − y = ax + b
Following is the description of the parameters used −
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function. relation <- lm(y~x)
print(relation)
• object is the formula which is already created using the lm() function.
• newdata is the vector containing the new value for predictor variable.

When we execute the above code, it produces the following result − Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
predict() Function
Syntax
The basic syntax for predict() in linear regression is − predict(object, newdata)
Following is the description of the parameters used − 16
Predict the weight of new persons
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function. relation <- lm(y~x)
# Find weight of a person with height 170. a <- data.frame(x = 170)
result <- predict(relation,a) print(result)
When we execute the above code, it produces the following result − 1
76.22869 17
• y is the response variable.
• formula is the symbol presenting the relationship between the variables.
• data is the data set giving the values of these variables.
• family is R object to specify the details of the model. It's value is binomial for
logistic regression.
• x is the predictor variable.
• a and b are the coefficients which are numeric constants.

Sk.Mahaboob Basha,Associate Professor of IT Page 49

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
12. Implementation of LOGISTIC REGRESSION
The Logistic Regression is a regression model in which the response variable
(dependent variable) has categorical values such as True/False or 0/1. It
actually measures the probability of a binary response as the value of response
variable based on the mathematical equation relating it with the predictor
variables.
The general mathematical equation for logistic regression is − y = 1/(1+e^-
(a+b1x1+b2x2+b3x3+...))
Following is the description of the parameters used −
The function used to create the regression model is the glm() function.
Syntax
The basic syntax for glm() function in logistic regression is −
glm(formula,data,family)
Following is the description of the parameters used −
Example
The in-built data set "mtcars" describes different models of a car with their
various engine specifications. In "mtcars" data set, the transmission mode
(automatic or manual) is described by the column am which is a binary value
(0 or 1). We can create a logistic regression model between the columns "am"
and 3 other columns - hp, wt and cyl.
# Select some columns form mtcars. input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))
When we execute the above code, it produces the following result − am cyl hp
wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460 18

Sk.Mahaboob Basha,Associate Professor of IT Page 50

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
13. Implementation of Random forest
In the random forest approach, a large number of decision trees are created.
Every observation is fed into every decision tree. The most common outcome for
each observation is used as the final output. A new observation is fed into all
the trees and taking a majority vote for each classification model.
An error estimate is made for the cases which were not used while building the
tree. That is called an OOB (Out-of-bag) error estimate which is mentioned as a
percentage.
The R package "randomForest" is used to create random forests. PROGRAM:
We will use the randomForest() function to create the decision tree and see it's
graph.
When we execute the above code, it produces the following result − Call:
randomForest(formula = nativeSpeaker ~ age + shoeSize + score, data =
readingSkills)
Type of random forest: classification Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 1%
Confusion matrix:
no yes class.error
no 99 1 0.01
yes 1 99 0.01
MeanDecreaseGini
age 13.95406
shoeSize 18.91006
score 56.73051

Sk.Mahaboob Basha,Associate Professor of IT Page 51

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Viva Voce questions
1. What is R?
R is an interpreted computer programming language which was created by
Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand".

R is a language and environment for statistical computing and graphics. R is

available as Free Software under the terms of the Free Software
Foundation’s GNU General Public License in source code form

2. List out some of the function that R provides?

The function that R provides are

 Mean
 Median
 Distribution
 Covariance
 Regression
 Non-linear

3. Explain how you can start the R commander GUI?

 Typing the command, (“Rcmdr”) into the R console starts the R

commander GUI.

4. Differentiate between vector, List, Matrix, and Data frame.

A vector is a series of data elements of the same basic type. The members in
the vector are known as a component.

The R object that contains elements of different types such as numbers,

strings, vectors, or another list inside it, is known as List.

A two-dimensional data structure used to bind the vectors from the same
length, known as the matrix. The matrix contains the same types of elements.

A Data frame is a generic form of a matrix. It is a combination of lists and

matrices. In the Data frame, different data columns contain different data
types.

Sk.Mahaboob Basha,Associate Professor of IT Page 52

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
5. Give any five features of R.

1. Simple and effective programming language.

2. It is a data analysis software.

3. It gives effective storage facility and data handling.

4. It gives high extensible graphical techniques.

5. It is an interpreted language.

6. What are the data structures in R that is used to perform statistical

analyses and create graphs?

R has data structures like

 Vectors
 Matrices
 Arrays
 Data frames

7. Explain general format of Matrices in R?

General format is

Mymatrix< - matrix (vector, nrow=r , ncol=c , byrow=FALSE,

dimnames = list ( char_vector_ rowname, char_vector_colnames))
8. What is the function used for adding datasets in R?

rbind function can be used to join two data frames (datasets). The two data
frames must have the same variables, but they do not have to be in the same
order.

9. What is the use of subset() function and sample() function in R ?

In R, subset() functions help you to select variables and observations while

through sample() function you can choose a random sample of size n from a
dataset.

10. Explain how you can create a table in R without external file?

Use the code

Sk.Mahaboob Basha,Associate Professor of IT Page 53

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
myTable = data.frame()
edit(myTable)

11. Give the command to create a histogram and to remove a vector from
the R workspace?

hist() and rm() function are used as a command to create a histogram and
remove a vector from the R workspace.

12. Differentiate b/w "%%" and "%/%".

The "%%" provides a reminder of the division of the first vector with the second,
and the "%/%" gives the quotient of the division of the first vector with the
second.

13. How do you list the preloaded datasets in R?

To view a list of preloaded datasets in R, simply type data() into the console and
hit enter.

14. What are the disadvantages of R?

Just as you should know what R does well, you should understand its failings.

Memory and performance. In comparison to Python, R is often said to be the

lesser language in terms of memory and performance.

Open source. Being open source has its disadvantages as well as its
advantages. For one, there’s no governing body managing R, so there’s no
single source for support or quality control. This also means that sometimes
the packages developed for R are not the highest quality.

Security. R was not built with security in mind, so it must rely on external
resources to mind these gaps.

15.Write a custom function in R

Sometimes you’ll be asked to create a custom function on the fly. An example
of a custom function from

myFunction <- function(arg1, arg2, ... ){

statements

Sk.Mahaboob Basha,Associate Professor of IT Page 54

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
return(object)
}

16. How do you install a package in R?

There are many ways to install a package in R. Some even include using the
GUI. We’re coders, so we’re not going to give those attention.

Type the following into your console and hit enter:

install.packages("package_name")

Followed by:

library(package_name)

It’s that simple. The first command installs the package and the second loads
the package into the session.

17. What is a factor variable, and why would you use one?
A factor variable is a form of categorical variable that accepts either numeric or
character string values. The most salient reason to use a factor variable is that
it can be used in statistical modeling with great accuracy. Another reason is
that they are more memory efficient.

Simply use the factor() function to create a factor variable.

18. How do you concatenate strings in R?

Concatenating strings in R is less than intuitive. You don’t use a . operator, nor
a + operator, and forget about the & operator. In fact, you don’t use an
operator at all. Concatenating strings in R requires the use of
the paste() function. Here’s an example:

hello <- "Hello, "

world <- "World."
paste(hello, world)
[1] "Hello, World."

Sk.Mahaboob Basha,Associate Professor of IT Page 55

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
19. How do you read a CSV file in R?
We’ve covered this already with the import process. Simply use
the read.csv() function.

yourRDateHere <- read.csv("Data.csv", header = TRUE)

20. What are 3 sorting algorithms available in R?

R uses the sort() function to order a vector or factor, listed and described
below.

Radix: Usually the most performant algorithm, this is a non-comparative

sorting algorithm that avoids overhead. It’s stable, and it’s the default
algorithm for integer vectors and factors.

Quick Sort: This method “uses Singleton (1969)’s implementation of Hoare’s

Quicksort method and is only available when x is numeric (double or integer)
and partial is NULL,” according to R Documentation. It’s not considered a
stable sort.

Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick
(1986)),” according to R Documentation.

21 Why is R useful for data science?

R turns otherwise hours of graphically intensive jobs into minutes and
keystrokes. In reality, you probably wouldn’t encounter the language of R
outside the realm of data science or an adjacent field. It’s great for linear
modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so
much more.

Simply put, R is designed for data manipulation and visualization, so it’s

natural that it would be used for data science.

22What is the t-test() in R?

The t-test() function is used to determine that the mean of the two groups are
equal or not.

Sk.Mahaboob Basha,Associate Professor of IT Page 56

NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
23 Differentiate b/w lapply and sapply.

The lapply is used to show the output in the form of the list, whereas sapply is
used to show the output in the form of a vector or data frame.

24. Explain anova() function.

The anova() function is used for comparing the nested models.

25. Give names of visualization packages.

There are the following packages of visualization in R:

1. Plotly

2. ggplot2

3. tidyquant

4. geofacet

5. googleVis

6. Shiny

Sk.Mahaboob Basha,Associate Professor of IT Page 57

Skerik Grey Scale Test (v.03)
80% (5)
Skerik Grey Scale Test (v.03)
1 page
Courses Unisys
0% (2)
Courses Unisys
26 pages
My R Report
No ratings yet
My R Report
52 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Vectors and lists in R
No ratings yet
Vectors and lists in R
9 pages
Experiment2
No ratings yet
Experiment2
17 pages
R-notes1-merged
No ratings yet
R-notes1-merged
36 pages
IDS-UNIT-3-FINAL (1)
No ratings yet
IDS-UNIT-3-FINAL (1)
42 pages
R Vectors
No ratings yet
R Vectors
12 pages
Presentation 1
No ratings yet
Presentation 1
20 pages
R Data Structures_07_1
No ratings yet
R Data Structures_07_1
30 pages
R Programming Basics
No ratings yet
R Programming Basics
27 pages
R 1st unit
No ratings yet
R 1st unit
61 pages
Ids Unit LLL Jntuh Cse
No ratings yet
Ids Unit LLL Jntuh Cse
100 pages
R - Lecture 2
No ratings yet
R - Lecture 2
51 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
Data Structure in
No ratings yet
Data Structure in
18 pages
IDS - Unit 3 - 5
No ratings yet
IDS - Unit 3 - 5
80 pages
MLlab5th
No ratings yet
MLlab5th
17 pages
R Fundamental
No ratings yet
R Fundamental
8 pages
unit 3
No ratings yet
unit 3
45 pages
lec_10 (1)
No ratings yet
lec_10 (1)
15 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
R Session A
No ratings yet
R Session A
107 pages
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
No ratings yet
WIN SEM (2022-23) CSE4027 ETH AP2022236000324 Reference Material I 25-Jan-2023 Module-1 Topic-3 - R Datatypes
41 pages
Me-I 2022 - ML Lab
No ratings yet
Me-I 2022 - ML Lab
28 pages
Intr2R Week2 2020
No ratings yet
Intr2R Week2 2020
13 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
BDA Section 3
No ratings yet
BDA Section 3
33 pages
R Programming LAB Manual
No ratings yet
R Programming LAB Manual
39 pages
R Programming notes
No ratings yet
R Programming notes
23 pages
2.vector and List
No ratings yet
2.vector and List
19 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
Rbasics
No ratings yet
Rbasics
96 pages
DSR_Unit_V
No ratings yet
DSR_Unit_V
14 pages
Introduction to r
No ratings yet
Introduction to r
18 pages
R-Workshop: Training Program On R Programming Basic Concepts
No ratings yet
R-Workshop: Training Program On R Programming Basic Concepts
21 pages
Smda Unit III
No ratings yet
Smda Unit III
80 pages
ex3
No ratings yet
ex3
20 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
Chapter_3_R objects or data types
No ratings yet
Chapter_3_R objects or data types
7 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
40 pages
IDS-UNIT-3-BY
No ratings yet
IDS-UNIT-3-BY
109 pages
Data Structures
No ratings yet
Data Structures
8 pages
R Vectors
No ratings yet
R Vectors
3 pages
DA Experiment - 2
No ratings yet
DA Experiment - 2
3 pages
Chapter 2 Data Structures in R
No ratings yet
Chapter 2 Data Structures in R
14 pages
1mission-493-vectors-in-r-takeaways
No ratings yet
1mission-493-vectors-in-r-takeaways
3 pages
RStudio
No ratings yet
RStudio
60 pages
Unit2 Part 2 R Data Structures
No ratings yet
Unit2 Part 2 R Data Structures
51 pages
R Data Structures_07_2
No ratings yet
R Data Structures_07_2
18 pages
Biostat S1 Handout
No ratings yet
Biostat S1 Handout
7 pages
Mod 2 Summary Table
No ratings yet
Mod 2 Summary Table
16 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
RemoveWatermark pdf24 Merged+
No ratings yet
RemoveWatermark pdf24 Merged+
76 pages
Introduction to Analytics and R file
No ratings yet
Introduction to Analytics and R file
29 pages
Understanding Basic Data Types and Data Structures in R
No ratings yet
Understanding Basic Data Types and Data Structures in R
10 pages
R-pres
No ratings yet
R-pres
53 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Act Frag
No ratings yet
Act Frag
7 pages
HUAWEI GSM-R OPH R952 Description V1.1
No ratings yet
HUAWEI GSM-R OPH R952 Description V1.1
12 pages
Binary Shift: (P) Practical (T) Theory (3) Denotes Number of Lessons
No ratings yet
Binary Shift: (P) Practical (T) Theory (3) Denotes Number of Lessons
9 pages
GIS 132 Substation
No ratings yet
GIS 132 Substation
63 pages
National Tiger Conservation Authority 1
No ratings yet
National Tiger Conservation Authority 1
3 pages
Journal of Mobilization Vol. 17 (1) January-March, 2022 Final Book 28.4.22
No ratings yet
Journal of Mobilization Vol. 17 (1) January-March, 2022 Final Book 28.4.22
354 pages
About Me - Bow
No ratings yet
About Me - Bow
1 page
Most Common Passwords List 2022 - Passwords Hackers Easily Guess
No ratings yet
Most Common Passwords List 2022 - Passwords Hackers Easily Guess
1 page
Booking.com_ Confirmation vietnam
No ratings yet
Booking.com_ Confirmation vietnam
1 page
Ethereum. A Store of Value With Cash Flow
No ratings yet
Ethereum. A Store of Value With Cash Flow
4 pages
Co2 Daily Check List
No ratings yet
Co2 Daily Check List
3 pages
Medical Device Regulations in Canada Key Challenges and International Initiatives
No ratings yet
Medical Device Regulations in Canada Key Challenges and International Initiatives
18 pages
Generator Spare Parts Budget-2020
No ratings yet
Generator Spare Parts Budget-2020
106 pages
Std12e ch1
No ratings yet
Std12e ch1
9 pages
Hasan Zahid MTCRE- 14-3-25
No ratings yet
Hasan Zahid MTCRE- 14-3-25
4 pages
CCNA 200-125 Exam: EIGRP GRE Troubleshooting Sim With Answers
No ratings yet
CCNA 200-125 Exam: EIGRP GRE Troubleshooting Sim With Answers
5 pages
L&T Question Paper
No ratings yet
L&T Question Paper
5 pages
零售行业求职信
100% (1)
零售行业求职信
6 pages
Google Chrome: Language Download PDF Watch Edit
No ratings yet
Google Chrome: Language Download PDF Watch Edit
22 pages
Candidate Evaluation Details: Hidayath Ali Mokula
No ratings yet
Candidate Evaluation Details: Hidayath Ali Mokula
2 pages
Business Model Canvass ADVANCE
No ratings yet
Business Model Canvass ADVANCE
5 pages
Udyam Registration Certificate
100% (2)
Udyam Registration Certificate
2 pages
Atoll 3 Drive Test PDF
No ratings yet
Atoll 3 Drive Test PDF
14 pages
LCM-OPM Walkthrough Full Testcase
No ratings yet
LCM-OPM Walkthrough Full Testcase
41 pages
ans-c01_8
No ratings yet
ans-c01_8
11 pages
FC COMMANDS
No ratings yet
FC COMMANDS
20 pages
Las Madres No Les Decimos Esas Cosas a Las Hijas by Federico Jeanmaire Compress
No ratings yet
Las Madres No Les Decimos Esas Cosas a Las Hijas by Federico Jeanmaire Compress
3 pages
Sri Vidya College of Engineering & Technology Course Material (Lecture Notes)
No ratings yet
Sri Vidya College of Engineering & Technology Course Material (Lecture Notes)
26 pages