Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
161 views

SMB-R Programming Lab

The document provides an index of 14 experiments related to R programming. It then summarizes Experiment 1 on implementation of vectors and lists in R. It defines vectors as collections of homogeneous elements that can be of different data types. It describes how to create, access, and manipulate vector elements using functions like c(), [ ], length(), sort(), and NULL. It also defines lists as objects that can contain elements of different types, and provides examples of creating and accessing list elements.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views

SMB-R Programming Lab

The document provides an index of 14 experiments related to R programming. It then summarizes Experiment 1 on implementation of vectors and lists in R. It defines vectors as collections of homogeneous elements that can be of different data types. It describes how to create, access, and manipulate vector elements using functions like c(), [ ], length(), sort(), and NULL. It also defines lists as objects that can contain elements of different types, and provides examples of creating and accessing list elements.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

NRI INSTITUTE OF TECHNOLOGY

R – Programming Lab
INDEX
Experiment
Name of the Experiments Page No.
No.

1 Implementation of Vectors and Lists. 2

2 Implementation of DATA FRAMES 11


Implementation of Matrix Addition, Subtraction
3 16
Multiplication and Division.
4 Implementation of Quick Sort. 23

5 Implementation of Binary Search Tree. 26

6 Implementation of Set Operations. 27

7 Implementation of Reading and Writing files. 28

8 Implementation of Graph Operations. 30


Implementation of Corelation.
9 44
Implementation of ANNOVA.
10 47

11 Implementation of Linear Regression. 49

12 Implementation of Logistic Regression. 50


Implementation of Random Forest.
13 51

14 Viva Voce questions 52

Sk.Mahaboob Basha,Associate Professor of IT Page 1


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

1. Implementation of Vectors and Lists.


Vector: a collection of homogeneous elements.
A vector supports logical, integer, double, character, complex, or raw data type.
The elements which are contained in vector known as components of the
vector. We can check the type of vector with the help of the typeof() function.

# Vector of strings
fruits <- c("banana", "apple", "orange")

# Vector of numerical values


numbers <- c(1, 2, 3)

# Vector of logical values


log_values <- c(TRUE, FALSE, TRUE, FALSE)

Vectors are commonly created using the c() function, it is the easiest way to
create vectors in R. While, creating vector we must pass elements of the same
type, but, if the elements are of different type then elements are converted to
the same data type from lower data type to higher data types from logical to
integer to double to character.

x <- c(5, 3.2, TRUE,) # Converted to Characters


x
typeof(x)
character

Vectors of consecutive or sequential numeric values can simply be generated


using colon (:) operator as following –

Syntax:-
c(start:end)
or
x <- start:end
Example

Print(“R Creating Vector using Colon")


x <- 1:5
x

Sk.Mahaboob Basha,Associate Professor of IT Page 2


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
y <- 3:-3
y

Vector using seq() function


The seq() function enable us to create vectors with sequential values at
specified step size.
seq(startValue, endValue, by=stepSize)
Example:
a <- seq(1,5,by=1)
a
b <- seq(1,5,by=2)
b
c <- seq(1,5,by=3)
or
1. eq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)

Accessing Vector Elements


Vector elements can be accessed by passing index value(s) in brackets [ ]. An
index value can be logical, integer or character.
integer Index:-
An integer index can be used to denote the element position. An integer index
value start with 1.
# Accessing vector elements using integer indexing.
t <- c("January","February","March","April","May","June")
u <- t[c(2,4)]
print(u)

# Accessing vector elements using logical indexing.


v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE)]
print(v)
# Accessing vector elements using character indexing.

Naming Vectors

t <- c(l1="January",l2="February",l3="March",l4="April"l5=,"May",l6="June")
q<-t{c(l1,l5)]

Sk.Mahaboob Basha,Associate Professor of IT Page 3


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
we can also create naming vectors by names() function

x<-10:13

y<-c(“l1”,”l2”,”l3”)

names(x)<-y

now we can access with name index

x[“l1”]

Combining vectors
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)

Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided
giving the result as a vector output.

Vector Length

To find out how many items a vector has, use the length() function:

Length(vector)

Sort a Vector

To sort items in a vector alphabetically or numerically, use the sort() function:

Example

Sort(vector)

SORT IN REVERSE ORDER

SORT(V,DECREASING=TRUE)

Sk.Mahaboob Basha,Associate Professor of IT Page 4


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
To delete vectors

V<-NULL

Applications of vectors

1. In machine learning for principal component analysis vectors are used.


They are extended to Eigen values and eigenvector and then used for
performing decomposition in vector spaces.

2. The inputs which are provided to the deep learning model are in the form
of vectors. These vectors consist of standardized data which is supplied
to the input layer of the neural network.

3. In the development of support vector machine algorithms, vectors are


used.

4. Vector operations are utilized in neural networks for various operations


like image recognition and text processing.

LISTS

List is the object which contains elements of different types – like strings,
numbers, vectors and another list inside it. R list can also contain a matrix. A
list is a data structure which has components of mixed data types. We can
imagine the R list as a bag to put many different items. When we need to use
an item, we can open the bag and use it.

The list is created using the list() function in R. In other words, a list is a
generic vector containing other objects.

Let’s create a list containing string, numbers, vectors and logical values.
For example:
list_data <- list("Red", "White", c(1,2,3), TRUE, 22.4)
print(list_data)

Sk.Mahaboob Basha,Associate Professor of IT Page 5


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

1. vec <- c(3,4,5,6)


2. char_vec<-c("shubham","nishka","gunjan","sumit")
3. logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
4. out_list<-list(vec,char_vec,logic_vec)
5. out_list

OUTPUT

[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE

Example 1: Creating list with same data type

1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1

Sk.Mahaboob Basha,Associate Professor of IT Page 6


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
6. list_2
7. list_3
8. list_4
OUTPUT

[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3

[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"

[[1]]
[1] 1 2 3

[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Sometimes it’s necessary to have repeated values, for which we use rep()
> rep(5,3)
[1] 5 5 5
> rep(2:5,each=3)
[1] 2 2 2 3 3 3 4 4 4 5 5 5
> rep(-1:3, length.out=10)
[1] -1 0 1 2 3 -1 0 1 2 3
Naming List Elements
The list elements can be given names and they can be accessed using these
names.

# Create a list containing a vector, a matrix and a list.

Sk.Mahaboob Basha,Associate Professor of IT Page 7


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),

list("green",12.3))

# Give names to the elements in the list.

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.

print(list_data)

output

$`1st_Quarter`

[1] "Jan" "Feb" "Mar"

$A_Matrix

[,1] [,2] [,3]

[1,] 3 5 -2

[2,] 9 1 8

$A_Inner_list

$A_Inner_list[[1]]

Sk.Mahaboob Basha,Associate Professor of IT Page 8


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[1] "green"

$A_Inner_list[[2]]

[1] 12.3

Accessing List Elements

Elements of the list can be accessed by the index of the element in the list. In
case of named lists it can also be accessed using the names.

We continue to use the list in the above example −

# Create a list containing a vector, a matrix and a list.

list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),

list("green",12.3))

# Give names to the elements in the list.

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Access the first element of the list.

print(list_data[1])

# Access the thrid element. As it is also a list, all its elements will be printed.

print(list_data[3])

Sk.Mahaboob Basha,Associate Professor of IT Page 9


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
# Access the list element using the name of the element.

print(list_data$A_Matrix)

Manipulating List Elements

We can add, delete and update list elements as shown below. We can add and
delete elements only at the end of a list. But we can update any element.

# Add element at the end of the list.

list_data[4] <- "New element"

print(list_data[4])

# Remove the last element.

list_data[4] <- NULL

# Print the 4th Element.

print(list_data[4])

# Update the 3rd Element.

list_data[3] <- "updated element"

print(list_data[3])

Sk.Mahaboob Basha,Associate Professor of IT Page 10


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
2. Implementation of Data Frames.
Data Frames

Data Frames are data displayed in a format as a table.

Use the data.frame() function to create a data frame:


ex1.
#Author DataFlair
int_vec <- c(1,2,3)
char_vec <- c("a", "b", "c")
bool_vec <- c(TRUE, TRUE, FALSE)
data_frame <- data.frame(int_vec, char_vec, bool_vec)
ex2.
sno<-c(1,2,3)
sname<-c("smb","bbk","BNR")
marks<-c(97,96,95)
df<-data.frame(sno,sname,marks)
df

sno sname marks


1 1 smb 97
2 2 bbk 96
3 3 BNR 95
summary(df)

sno sname marks

Min. :1.0 Length:3 Min. :95.0

1st Qu.:1.5 Class :character 1st Qu.:95.5

Median :2.0 Mode :character Median :96.0

Sk.Mahaboob Basha,Associate Professor of IT Page 11


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Mean :2.0 Mean :96.0

3rd Qu.:2.5 3rd Qu.:96.5

Max. :3.0 Max. :97.0


create employee data

employee_data <- data.frame(

employee_id = c (1:5),

employee_name = c("James","Harry","Shinji","Jim","Oliver"),

sal = c(642.3,535.2,681.0,739.0,925.26),

join_date = as.Date(c("2013-02-04", "2017-06-21", "2012-11-14", "2018-


05-19","2016-03-25")),

stringsAsFactors = FALSE)

print(employee_data)

employee_id employee_name sal join_date

1 1 James 642.30 2013-02-04

2 2 Harry 535.20 2017-06-21

3 3 Shinji 681.00 2012-11-14

4 4 Jim 739.00 2018-05-19

5 5 Oliver 925.26 2016-03-25

Get the Structure of the R Data Frame

Sk.Mahaboob Basha,Associate Professor of IT Page 12


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
The structure of the data frame can see by using the str () function.
> str(employee_data)

'data.frame': 5 obs. of 4 variables:

$ employee_id : int 1 2 3 4 5

$ employee_name: chr "James" "Harry" "Shinji" "Jim" ...

$ sal : num 642 535 681 739 925

$ join_date : Date, format: "2013-02-04" ...


Extract data from Data Frame
By using the name of the column, extract a specific column from the columns.

emp_data<-
data.frame(employee_data$employee_id,employee_data$employee_name)

emp_data

employee_data.employee_name

1 James

2 Harry

3 Shinji

4 Jim

5 Oliver

 Extract first two rows

a<-employee_data[1:2,]

employee_id employee_name sal join_date

Sk.Mahaboob Basha,Associate Professor of IT Page 13


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
1 1 James 642.3 2013-02-04

2 2 Harry 535.2 2017-06-21


 Extract first two columns
 a<-employee_data[1:2]
 a

employee_id employee_name

1 1 James

2 2 Harry

3 3 Shinji

4 4 Jim

5 5 Oliver

Extract 1st and 2nd row with the 3rd and 4th column of the below
data.

> result <- employee_data[c(1,2),c(3,4)]


> result

sal join_date

1 642.3 2013-02-04

2 535.2 2017-06-21

Expand R Data Frame


A data frame can be expanded by adding columns and rows.

employee_data$dept <-
c("IT","Finance","Operations","HR","Administration")

Add Row

Sk.Mahaboob Basha,Associate Professor of IT Page 14


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
 Create the second R data frame
#DataFlair
employee_new_data <- data.frame(
employee_id = c (6:8),
employee_name = c("Aman", "Piyush", "Aakash"),
sal = c(523.0,721.3,622.8),
join_date = as.Date(c("2015-06-22","2016-04-30","2011-03-17")),
stringsAsFactors = FALSE
)

 Bind the two data frames.


> employee_out_data <- rbind(employee_data,employee_new_data)
> employee_out_data

Sk.Mahaboob Basha,Associate Professor of IT Page 15


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
3.Implementation of Matrix Addition, Subtraction
Multiplication and Division.
Theory: Matrices are a special type of two - dimensional arrays.
Matrices are much used in statistics, and so play an important role in R. To
create a matrix
use the function matrix(),

> matrix(1:12, nrow=3, ncol=4)


[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

This is called column-major order. Of course, we need only give one of the
dimensions:
> matrix(1:12, nrow=3)
unless we want vector recycling to help us:
> matrix(1:3, nrow=3, ncol=4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
Sometimes it’s useful to specify the elements by row first
> matrix(1:12, nrow=3, byrow=TRUE)
There are special functions for constructing certain matrices:
> diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
> diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3

> 1:5
1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5

Sk.Mahaboob Basha,Associate Professor of IT Page 16


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[2,] 2 4 6 8 10
[3,] 3 6 9 12 15
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
The last operator performs an outer product

matrix also be create by using following functions


rbind(),cbind()

a<- rbind(c(1:3),c(4:6))

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

a<- cbind(c(1:3),c(4:6))
a
[,1] [,2]

[1,] 1 4

[2,] 2 5
[3,] 3 6

a[1,2]

[1] 4

a[1,]
[1] 1 4

Sk.Mahaboob Basha,Associate Professor of IT Page 17


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Standard functions exist for common mathematical operations on matrices
> t(A) # transpose
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 10
> det(A) # determinant
[1] -3
> diag(A) # diagonal
[1] 1 5 10
Array:
Of course, if we have a data set consisting of more than two pieces of
categorical information
about each subject, then a matrix is not sufficient. The generalization of
matrices to higher
dimensions is the array. Arrays are defined much like matrices, with a call to
the array()
command.
The syntax of the Array in R Programming language is

Array_Name <- array(data, dim = (row_Size, column_Size, no.of matrices,


dimnames)

Here is a 2 × 3 × 3 array:
> arr = array(1:18, dim=c(2,3,3))
> arr
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
,,3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown
as a 2 × 3

Sk.Mahaboob Basha,Associate Professor of IT Page 18


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
matrix. Note that we no longer specify the number of rows and columns
separately, but use a
single vector dim whose length is the number of dimensions. You can recover
this vector
with the dim() function.
> dim(arr)
[1] 2 3 3
Note that a 2-dimensional array is identical to a matrix. Arrays can be
subsetted and modified in exactly the same way as a matrix, only using the
appropriate number of co-ordinates:
three_d_arr <- array(1:24,
dim = c (4, 3, 2),
dimname = list (
c("one", "two", "three", "four"),
c("ray", "karl", "mimo"),
c("steve", "mark")
))
three_d_arr
, , steve

ray karl mimo


one 1 5 9
two 2 6 10
three 3 7 11
four 4 8 12

, , mark

ray karl mimo


one 13 17 21
two 14 18 22
three 15 19 23
four 16 20 24

# creating 2 vectors of dissimilar lengths


vec1 <- c (3, 4, 2)
vec2 <- c (11, 12, 13, 14, 15, 16)
# taking these vectors as input for this array
res1 <- array (c (vec1, vec2), dim=c (3,3,2))
print (res1)
,,1

[,1] [,2] [,3]

Sk.Mahaboob Basha,Associate Professor of IT Page 19


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
[1,] 3 11 14
[2,] 4 12 15
[3,] 2 13 16

,,2

[,1] [,2] [,3]


[1,] 3 11 14
[2,] 4 12 15
[3,] 2 13 16
Naming Columns and rows
# Creating 2 vectors having different lengths.
vec1 <- c (2, 4, 6)
vec2 <- c (11, 12, 13, 14, 15, 16)
column.names <- c ("COLA","COLB","COLC")
row.names <- c ("ROWA","ROWB","ROWC")
matrix.names <- c ("MatA", "MatB")

res1 <- array (c (vec1,vec2), dim=c (3,3,2), dimnames=list (column.names,


row.names, matrix.names))
print(res1)
, , MatA

COLA COLB COLC


ROWA 2 11 14
ROWB 4 12 15
ROWC 6 13 16

, , MatB

COLA COLB COLC


ROWA 2 11 14
ROWB 4 12 15
ROWC 6 13 16

Accessing and Indexing Arrays


print (res1 [3,,1])
# this statement prints the 3rd row of the first matrix of the array.
print (result [2,2,1])
# the above statement prints the element in the 2nd row and 2nd column of
the 1st matrix.
print (result [,,2])
# the above statement prints the 2nd matrix entirely

Sk.Mahaboob Basha,Associate Professor of IT Page 20


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Factors
R has a special data structure to store categorical variables. It tells R that a
variable is
nominal or ordinal by making it a factor. Creating a vector
 Converting the vector created into a factor using function factor()
# Creating a vector
x<-c("female", "male", "male", "female")
print(x)

# Converting the vector x into a factor named gender


gender<-factor(x)
print(gender)
Output:
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
Simplest form of the factor function :

Ideal form of the factor function :

The factor function has three parameters:


1. Vector Name
2. Values (Optional)
3. Value labels (Optional)

Implementation of Matirx Addition and Multiplication.

m1 = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)


print("Matrix-1:")
print(m1)
m2 = matrix(c(0, 1, 2, 3, 0, 2), nrow = 2)
print("Matrix-2:")

Sk.Mahaboob Basha,Associate Professor of IT Page 21


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
print(m2)

result = m1 + m2
print("Result of addition")
print(result)

result = m1 - m2
print("Result of subtraction")
print(result)

result = m1 * m2
print("Result of multiplication")
print(result)

result = m1 / m2
print("Result of division:")
print(result)
[1] "Matrix-1:"
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] "Matrix-2:"
[,1] [,2] [,3]
[1,] 0 2 0
[2,] 1 3 2
[1] "Result of addition"
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 3 7 8
[1] "Result of subtraction"
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 1 4
[1] "Result of multiplication"
[,1] [,2] [,3]
[1,] 0 6 0
[2,] 2 12 12
[1] "Result of division:"
[,1] [,2] [,3]
[1,] Inf 1.500000 Inf
[2,] 2 1.333333 3

Sk.Mahaboob Basha,Associate Professor of IT Page 22


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
4.Implementation of Quick Sort.
sort(x, decreasing = FALSE, na.last = NA, …)

sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE,method =


c("auto", "shell", "quick", "radix"), index.return = FALSE)

Arguments
x : for sort an R object with a class or a numeric, complex, character or
logical vector. For sort.int, a numeric, complex, character or logical vector,
or a factor.

decreasing : logical. Should the sort be increasing or decreasing? For


the "radix" method, this can be a vector of length equal to the number of
arguments in …. For the other methods, it must be length one. Not available
for partial sorting.
… arguments to be passed to or from methods or (for the default methods
and objects without a class) to sort.int.
na.last :for controlling the treatment of NAs. If TRUE, missing values in the
data are put last; if FALSE, they are put first; if NA, they are removed.
partial : NULL or a vector of indices for partial sorting.

method :character string specifying the algorithm used. Not available for
partial sorting. Can be abbreviated.
index.return : logical indicating if the ordering index vector should be
returned as well. Supported by method == "radix" for any na.last mode and
data type, and the other methods when na.last = NA (the default) and fully
sorting non-factors.

Ex:
A<-c(51:60,5:50,60:100,1:5)
A
[1] 51 52 53 54 55 56 57 58 59 60 5 6
[13] 7 8 9 10 11 12 13 14 15 16 17 18
[25] 19 20 21 22 23 24 25 26 27 28 29 30
[37] 31 32 33 34 35 36 37 38 39 40 41 42
[49] 43 44 45 46 47 48 49 50 60 61 62 63
[61] 64 65 66 67 68 69 70 71 72 73 74 75
[73] 76 77 78 79 80 81 82 83 84 85 86 87
[85] 88 89 90 91 92 93 94 95 96 97 98 99
[97] 100 1 2 3 4 5
A<-sort(A,decreasing=FALSE,method="quick")

Sk.Mahaboob Basha,Associate Professor of IT Page 23


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
A
[1] 1 2 3 4 5 5 6 7 8 9 10 11
[13] 12 13 14 15 16 17 18 19 20 21 22 23
[25] 24 25 26 27 28 29 30 31 32 33 34 35
[37] 36 37 38 39 40 41 42 43 44 45 46 47
[49] 48 49 50 51 52 53 54 55 56 57 58 59
[61] 60 60 61 62 63 64 65 66 67 68 69 70
[73] 71 72 73 74 75 76 77 78 79 80 81 82
[85] 83 84 85 86 87 88 89 90 91 92 93 94
[97] 95 96 97 98 99 100

Quick Sort
Quicksort is a sorting algorithm based on the divide and conquer approach
where An array is divided into subarrays by selecting a pivot
element (element selected from the array).

1 While dividing the array, the pivot element should be positioned in such a
way that elements less than pivot are kept on the left side and elements
greater than pivot are on the right side of the pivot.

2. The left and right subarrays are also divided using the same approach.
This process continues until each subarray contains a single element.

3. At this point, elements are already sorted. Finally, elements are combined
to form a sorted array.

# Simple implementation of Selection Sort and Quicksort in R.


# Quick sort algorithm:
# 1. Select a random value from the array.
# 2. Put all values less than the random in arrayLeft.
# 3. Put all values greater than the random in arrayRight.
# 4. If arrayLeft or arrayRight has more than 1 value, repeat the
above steps on it.
# 5. The sorted result is arrayLeft, random, arrayRight.

R has a built-in quicksort function but in some rare cases you might want
to modify the pivot value selection part of the algorithm. Here’s a custom
implementation
quickSort <- function(arr) {
# Pick a number at random.
p <- sample(arr, 1)
# Place-holders for left and right values.
left <- c()

Sk.Mahaboob Basha,Associate Professor of IT Page 24


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
right <- c()
# Move all the smaller values to the left, bigger values to the right.
lapply(arr[arr != p], function(d) {
if (d < p) {
left <<- c(left, d)
}
else {
right <<- c(right, d)
}
})

if (length(left) > 1) {
left <- quickSort(left)
}

if (length(right) > 1) {
right <- quickSort(right)
}
# Finally, return the sorted values.
c(left, p, right)
}

x <-sample(1:100,10)
x
[1] 22 92 30 12 48 20 88 80 8 34

RES <- quickSort(x)


RES
[1] 8 12 20 22 30 34 48 80 88 92
OR
quicksort= function(x)
{
if(length(x)<1)
return(x)
pivot=x[1]
rest=x[-1]
pivot_less=quicksort(rest[rest<pivot])
pivot_greater=quicksort(rest[rest>=pivot])
return(c(pivot_less,pivot,pivot_greater))

}
quicksort(c(10,24,33,21,22,66,11))

Sk.Mahaboob Basha,Associate Professor of IT Page 25


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
5.Implementation of Binary Search Tree.
binarysearch<-function(v,t){
i=1
l=length(v)
while(i<=l){
mid<-((v[i]+v[l])%/%2)
if(a[mid]==t)
return(mid)
else if(a[mid]>t)
l<-mid-1
else
i<-mid+1
}
return(0)
}
a<-seq(1,10)
a
x<-binarysearch(a,2)
x
if(x==0){
print("search unsuccessful")
}else{
print("search successful")
cat("\n","Element found at",x)
}

Sk.Mahaboob Basha,Associate Professor of IT Page 26


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
6.Implementation of Set Operations.
R includes some handy set operations, including these:
union(x,y): Union of the sets x and y
intersect(x,y): Intersection of the sets x and y
setdiff(x,y): Set difference between x and y, consisting of all elements
of x that are not in y
setequal(x,y): Test for equality between x and y
c %in% y: Membership, testing whether c is an element of the set y
choose(n,k): Number of possible subsets of size k chosen from a set of size n
Here are some simple examples of using these functions:
> x <- c(1,2,5)
> y <- c(5,1,8,9)
> union(x,y)
[1] 1 2 5 8 9
> intersect(x,y)
[1] 1 5
> setdiff(x,y)
[1] 2
> setdiff(y,x)
[1] 8 9
> setequal(x,y)
[1] FALSE
> setequal(x,c(1,2,5))
[1] TRUE
> 2 %in% x
[1] TRUE
> 2 %in% y
[1] FALSE
> choose(5,2)
[1] 10
x<-c(1:10)
y<-c(5,6,7,8)
a<- union(x,y)
a
[1] 1 2 3 4 5 6 7 8 9 10
b<-intersect(x,y)
b
[1] 5 6 7 8
c<-setdiff(x,y)
c
d<-setequal(x,y)
d [1] FALSE

Sk.Mahaboob Basha,Associate Professor of IT Page 27


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
8. Implementation of Reading and Writing files.
a. Reading different types of data sets (.txt, .csv) from web and disk and
writing in file in specific disk location.
library(utils)
data<- read.csv("F:/R -smb/sampledata/input.csv")
data
eno ename sal doj dept
1 101 Ramesh 50000 13-06-2016 IT
2 102 Rama 25000 14-06-2016 CSE
3 103 sirish 30000 15-06-2016 ECE
4 104 Bindhu 35000 16-06-2016 IT
5 105 karthik 69000 17-06-2016 IT
6 106 nagaraju 85000 18-06-2016 IT
7 107 Anil 65000 19-06-2016 CSE
8 108 Uma 65000 20-06-2016 ECE
9 109 venkat 5000 21-06-2016 CSE
10 110 hari 30000 22-06-2016 ECE
print(is.data.frame(data))
TRUE
print(ncol(data))
5
print(nrow(data))
10
sal<- max(data$sal)
sal
85000
# Get the person detail having max salary.
retval<- subset(data,sal == max(sal))
retval
eno ename sal doj dept
6 106 nagaraju 85000 18-06-2016 IT
retval<- subset( data, dept == "IT")
Get all the people working in IT department # Create a data frame. data<-
read.csv("input.csv") retval<- subset( data, dept == "IT")
retval
eno ename sal doj dept
1 101 Ramesh 50000 13-06-2016 IT
4 104 Bindhu 35000 16-06-2016 IT
5 105 karthik 69000 17-06-2016 IT
6 106 nagaraju 85000 18-06-2016 IT
# Write filtered data into a new file.
write.csv(retval,"F:/R -smb/sampledata/output.csv")

Sk.Mahaboob Basha,Associate Professor of IT Page 28


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
newdata<- read.csv("F:/R -smb/sampledata/output.csv")
newdata
eno ename sal doj dept
1 1 101 Ramesh 50000 13-06-2016 IT
2 4 104 Bindhu 35000 16-06-2016 IT
3 5 105 karthik 69000 17-06-2016 IT
4 6 106 nagaraju 85000 18-06-2016 IT
b. Reading Excel data sheet in R.
install.packages("xlsx")
library("xlsx")
data<- read.xlsx("input.xlsx", sheetIndex = 1)
data
eno ename sal doj dept
1 101 Ramesh 50000 2016-06-13 IT
2 102 Rama 25000 2016-06-14 CSE
3 103 sirish 30000 2016-06-15 ECE
4 104 Bindhu 35000 2016-06-16 IT
5 105 karthik 69000 2016-06-17 IT
6 106 nagaraju 85000 2016-06-18 IT
7 107 Anil 65000 2016-06-19 CSE
8 108 Uma 65000 2016-06-20 ECE
9 109 venkat 5000 2016-06-21 CSE
10 110 hari 30000 2016-06-22 ECE
c. Reading XML dataset in R.
install.packages("XML")
library("XML")
library("methods")
result<- xmlParse(file = "F:/R -smb/sampledata/input.xml")
result
<?xml version="1.0"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
……..

Sk.Mahaboob Basha,Associate Professor of IT Page 29


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
9.Implementation of Graph Operations.
R offers three main graphics packages: traditional (or base), lattice and
ggplot2. Traditional graphics are built into R, create nice looking graphs,
and are very flexible. However, they require a lot of work when repeating a
graph for different groups in your data. Lattice graphics excel at repeating
graphs for various groups. The ggplot2 package also deals with groups well
and is quite a bit more flexible than lattice graphics.
Use data Sets
cars
mtcars
iris
1.Scatter Plot
A scatter plot (aka scatter chart, scatter graph) uses dots to represent
values for two different numeric variables. The position of each dot on the
horizontal and vertical axis indicates values for an individual data point.
Scatter plots are used to observe relationships between variables.
To make a scatter plot use plot() with a vector of x values and a vector of y
values:
# base R

The plot() function is used to draw points (markers) in a diagram.

The function takes parameters for specifying points in the diagram.

Parameter 1 specifies points on the x-axis.

Parameter 2 specifies points on the y-axis.

Draw one point in the diagram, at position (1) and position (3):

plot(1, 3)

Draw two points in the diagram, one at position (1, 3) and one in position (8,
10):

Sk.Mahaboob Basha,Associate Professor of IT Page 30


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
plot(c(1, 8), c(3, 10))

You can plot as many points as you like, just make sure you have the same
number of points in both axis:

x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)

plot(x, y)

If you want to draw dots in a sequence, on both the x-axis and the y-axis,
use the : operator:
plot(1:10)

plot(iris)

Sk.Mahaboob Basha,Associate Professor of IT Page 31


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

plot(iris$Sepal.Length)

Draw a Line

The plot() function also takes a type parameter with the value l to draw a line to
connect all the points in the diagram:

plot(1:10, type="l")

Plot Labels

The plot() function also accept other parameters, such as main, xlab and ylab if
you want to customize the graph with a main title and different labels for the x
and y-axis:

plot(1:10, main="My Graph", xlab="x-axis", ylab="y axis")

Sk.Mahaboob Basha,Associate Professor of IT Page 32


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Graph Appearance

There are many other parameters you can use to change the appearance of the
points.

Colors

Use col="color" to add a color to the points:

plot(1:10, col="red")

Size

Use cex=number to change the size of the points (1 is default, while 0.5 means
50% smaller, and 2 means 100% larger):

Point Shape

Use pch with a value from 0 to 25 to change the point shape format:

plot(1:10, pch=25, cex=2, col="green")

Sk.Mahaboob Basha,Associate Professor of IT Page 33


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

The values of the pch parameter ranges from 0 to 25, which means that we can
choose up to 26 different types of point shapes:

Line Width

To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):

Sk.Mahaboob Basha,Associate Professor of IT Page 34


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Example
Line Graphs

A line graph has a line that connects all the points in a diagram.

To create a line, use the plot() function and add the type parameter with a
value of "l"

Line Width

To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):

plot(1:10, type="l", lwd=2)


Line Styles

The line is solid by default. Use the lty parameter with a value from 0 to 6 to
specify the line format.

Available parameter values for lty:

 0 removes the line


 1 displays a solid line
 2 displays a dashed line
 3 displays a dotted line
 4 displays a "dot dashed" line
 5 displays a "long dashed" line
 6 displays a "two dashed" line

plot(1:10, type="l", lwd=1, lty=6)


Multiple Lines

To display more than one line in a graph, use the plot() function together with
the lines() function:

line1 <- c(1,2,3,4,5,10)


line2 <- c(2,5,7,8,9,10)

plot(line1, type = "l", col = "blue")


lines(line2, type="l", col = "red")

Sk.Mahaboob Basha,Associate Professor of IT Page 35


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Pie Charts

A pie chart is a circular graphical view of data.

Use the pie() function to draw pie charts:

# Create a vector of pies


x <- c(10,20,30,40)

# Display the pie chart


pie(x)

Sk.Mahaboob Basha,Associate Professor of IT Page 36


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Labels and Header

Use the label parameter to add a label to the pie chart, and use
the main parameter to add a header:

# Create a vector of pies


x <- c(10,20,30,40)

# Create a vector of labels


mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

# Display the pie chart with labels


pie(x, label = mylabel, main = "Fruits")

Sk.Mahaboob Basha,Associate Professor of IT Page 37


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Colors

You can add a color to each pie with the col parameter:

# Create a vector of colors


colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors


pie(x, label = mylabel, main = "Fruits", col = colors)

Legend

To add a list of explanation for each pie, use the legend() function:

# Create a vector of labels


mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

Sk.Mahaboob Basha,Associate Professor of IT Page 38


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

# Create a vector of colors


colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors


pie(x, label = mylabel, main = "Pie Chart", col = colors)

# Display the explanation box


legend("bottomright", mylabel, fill = colors)

The legend can be positioned as either:

bottomright, bottom, bottomleft, left, topleft, top, topright, right, center

Bar Charts

A bar chart uses rectangular bars to visualize data. Bar charts can be
displayed horizontally or vertically. The height or length of the bars are
proportional to the values they represent.

Use the barplot() function to draw a vertical bar chart:

# x-axis values
x <- c("A", "B", "C", "D")

# y-axis values

Sk.Mahaboob Basha,Associate Professor of IT Page 39


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x)

 The x variable represents values in the x-axis (A,B,C,D)


 The y variable represents values in the y-axis (2,4,6,8)
 Then we use the barplot() function to create a bar chart of the values
 names.arg defines the names of each observation in the x-axis
 Bar Color
 Use the col parameter to change the color of the bars:

x <- c("A", "B", "C", "D")


y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, col = "red")

Sk.Mahaboob Basha,Associate Professor of IT Page 40


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

Density / Bar Texture

To change the bar texture, use the density parameter:

x <- c("A", "B", "C", "D")


y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, density = 10)

Bar Width

Sk.Mahaboob Basha,Associate Professor of IT Page 41


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Use the width parameter to change the width of the bars:

x <- c("A", "B", "C", "D")


y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, width = c(1,2,3,4))

Horizontal Bars

If you want the bars to be displayed horizontally instead of vertically,


use horiz=TRUE:

x <- c("A", "B", "C", "D")


y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, horiz = TRUE)

Sk.Mahaboob Basha,Associate Professor of IT Page 42


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab

(Histograms and boxplots) Try the commands


hist(DATA)
boxplot(Volume)

Sk.Mahaboob Basha,Associate Professor of IT Page 43


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
9.Implementation of Corelation.
What is Correlation?

It is a statistical measure that defines the relationship between two variables


that is how the two variables are linked with each other. It describes the effect
of change in one variable on another variable.

If the two variables are increasing or decreasing in parallel then they have a
positive correlation between them and if one of the variables is increasing and
another one is decreasing then they have a negative correlation with each
other. If the change of one variable has no effect on another variable then they
have a zero correlation between them.

It is used to identify the degree of the linear relationship between two variables.
It is represented by 𝝆 and calculated as:-

𝜌 (𝑥, 𝑦) = 𝑐𝑜𝑣(𝑥, 𝑦) /(𝜎𝑥 × 𝜎𝑦 )

Where

𝑐𝑜(𝑥, 𝑦) = covariance of x and y

𝜎x = Standard deviation of x

𝜎𝑦 = Standard deviation of y

𝜌 (𝑥, 𝑦) = correlation between x and y

The value of 𝜌 (𝑥, 𝑦) varies between -1 to +1.

A positive value has a range from 0 to 1 where 𝜌 (𝑥, 𝑦) = 1 defines the strong
positive correlation between the variables.

A negative value has a range from -1 to 0 where 𝜌 (𝑥, 𝑦) = -1 defines the strong
negative correlation between the variables.

No correlation is defined if the value of 𝜌 (𝑥, 𝑦) = 0

height<-c(168,169,170,172,174)

Sk.Mahaboob Basha,Associate Professor of IT Page 44


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
weight<-c(65,70,75,78,80)

plot(height,weight,main="human",col="green", type="l")

cor(height,weight)

[1] 0.9382329

Tes<- cor.test(height,weight,method = "pearson")

Tes

Pearson's product-moment correlation

data: height and weight

t = 4.6967, df = 3, p-value = 0.01826

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3249548 0.9960214

sample estimates:

cor

0.9382329

km<-c(0,20,40,60,80,100)

oilquntity<-c(20,19,18,17,16,15)

Sk.Mahaboob Basha,Associate Professor of IT Page 45


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
plot(km,oilquntity,main="human",col="green", type="l")

cor(km,oilquntity)

[1] -1

Tes<- cor.test(km,oilquntity,method = "pearson")

Tes

Pearson's product-moment correlation

data: km and oilquntity

t = -Inf, df = 4, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-1 -1

sample estimates:

cor

-1

Sk.Mahaboob Basha,Associate Professor of IT Page 46


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
10.Implementation of ANNOVA.
ANOVA also known as Analysis of variance is used to investigate relations
between categorical variable and continuous variable in R Programming. It is a
type of hypothesis testing for population variance. ANOVA test involves setting
up:
• Null Hypothesis: All population mean is equal.
• Alternate Hypothesis: At least one population mean is different from other.
ANOVA test are of two types:
• One-way ANOVA: It takes one categorical group into consideration.
• Two-way ANOVA: It takes two categorical group into consideration.
Performing One Way ANOVA test
One-way ANOVA test is performed using mtcars dataset which comes
preinstalled with dplyr package between disp attribute, a continuous attribute
and gear attribute, a categorical attribute.
# Installing the package install.packages(dplyr)
# Loading the package library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),xlab = "gear", ylab ="disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis # H0 = mu = mu01 =
mu02(There is no difference
# between average displacement for different gear) # H1 = Not all means are
equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value # and conclude test p <
alpha, Reject Null Hypothesis
The box plot shows the mean values of gear with respect of displacement. Hear
categorical variable is gear on which factor function is used and continuous
variable is disp. 14
Performing Two Way ANOVA test
Two way ANOVA test is performed using mtcars dataset which comes
preinstalled with dplyr package between disp attribute, a continuous attribute
and gear attribute, a categorical attribute, am attribute, a categorical attribute.
# Installing the package install.packages(dplyr) # Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),

Sk.Mahaboob Basha,Associate Professor of IT Page 47


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
xlab = "gear", ylab = "disp", main = "Manual")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between # average
displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function mtcars_aov2 <-
aov(mtcars$disp~factor(mtcars$gear) *
factor(mtcars$am)) summary(mtcars_aov2)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05 # Step 4: Compare
test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis
The box plot shows the mean values of gear with respect of displacement. Hear
categorical variables are gear and am on which factor function is used and
continuous variable is disp.
The summary shows that gear attribute is very significant to displacement
(Three stars denoting it) and am attribute is not much significant to
displacement. P-value of gear is less than 0.05, so it proves that gear is
significant to displacement i.e related to each other. P-value of am is greater
than 0.05, am is not significant to displacement i.e not related to each other.
15• y is the response variable.
• x is the predictor variable.
• a and b are constants which are called the coefficients.

Sk.Mahaboob Basha,Associate Professor of IT Page 48


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
11. Implementation of LINEAR REGRESSION
In Linear Regression these two variables are related through an equation,
where exponent (power) of both these variables is 1. Mathematically a linear
relationship represents a straight line when plotted as a graph. A non-linear
relationship where the exponent of any variable is not equal to 1 creates a
curve.
The general mathematical equation for a linear regression is − y = ax + b
Following is the description of the parameters used −
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function. relation <- lm(y~x)
print(relation)
• object is the formula which is already created using the lm() function.
• newdata is the vector containing the new value for predictor variable.

When we execute the above code, it produces the following result − Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
predict() Function
Syntax
The basic syntax for predict() in linear regression is − predict(object, newdata)
Following is the description of the parameters used − 16
Predict the weight of new persons
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function. relation <- lm(y~x)
# Find weight of a person with height 170. a <- data.frame(x = 170)
result <- predict(relation,a) print(result)
When we execute the above code, it produces the following result − 1
76.22869 17
• y is the response variable.
• formula is the symbol presenting the relationship between the variables.
• data is the data set giving the values of these variables.
• family is R object to specify the details of the model. It's value is binomial for
logistic regression.
• x is the predictor variable.
• a and b are the coefficients which are numeric constants.

Sk.Mahaboob Basha,Associate Professor of IT Page 49


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
12. Implementation of LOGISTIC REGRESSION
The Logistic Regression is a regression model in which the response variable
(dependent variable) has categorical values such as True/False or 0/1. It
actually measures the probability of a binary response as the value of response
variable based on the mathematical equation relating it with the predictor
variables.
The general mathematical equation for logistic regression is − y = 1/(1+e^-
(a+b1x1+b2x2+b3x3+...))
Following is the description of the parameters used −
The function used to create the regression model is the glm() function.
Syntax
The basic syntax for glm() function in logistic regression is −
glm(formula,data,family)
Following is the description of the parameters used −
Example
The in-built data set "mtcars" describes different models of a car with their
various engine specifications. In "mtcars" data set, the transmission mode
(automatic or manual) is described by the column am which is a binary value
(0 or 1). We can create a logistic regression model between the columns "am"
and 3 other columns - hp, wt and cyl.
# Select some columns form mtcars. input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))
When we execute the above code, it produces the following result − am cyl hp
wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460 18

Sk.Mahaboob Basha,Associate Professor of IT Page 50


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
13. Implementation of Random forest
In the random forest approach, a large number of decision trees are created.
Every observation is fed into every decision tree. The most common outcome for
each observation is used as the final output. A new observation is fed into all
the trees and taking a majority vote for each classification model.
An error estimate is made for the cases which were not used while building the
tree. That is called an OOB (Out-of-bag) error estimate which is mentioned as a
percentage.
The R package "randomForest" is used to create random forests. PROGRAM:
We will use the randomForest() function to create the decision tree and see it's
graph.
When we execute the above code, it produces the following result − Call:
randomForest(formula = nativeSpeaker ~ age + shoeSize + score, data =
readingSkills)
Type of random forest: classification Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 1%
Confusion matrix:
no yes class.error
no 99 1 0.01
yes 1 99 0.01
MeanDecreaseGini
age 13.95406
shoeSize 18.91006
score 56.73051

Sk.Mahaboob Basha,Associate Professor of IT Page 51


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
Viva Voce questions
1. What is R?
R is an interpreted computer programming language which was created by
Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand".

R is a language and environment for statistical computing and graphics. R is


available as Free Software under the terms of the Free Software
Foundation’s GNU General Public License in source code form

2. List out some of the function that R provides?

The function that R provides are

 Mean
 Median
 Distribution
 Covariance
 Regression
 Non-linear

3. Explain how you can start the R commander GUI?

 Typing the command, (“Rcmdr”) into the R console starts the R


commander GUI.

4. Differentiate between vector, List, Matrix, and Data frame.

A vector is a series of data elements of the same basic type. The members in
the vector are known as a component.

The R object that contains elements of different types such as numbers,


strings, vectors, or another list inside it, is known as List.

A two-dimensional data structure used to bind the vectors from the same
length, known as the matrix. The matrix contains the same types of elements.

A Data frame is a generic form of a matrix. It is a combination of lists and


matrices. In the Data frame, different data columns contain different data
types.

Sk.Mahaboob Basha,Associate Professor of IT Page 52


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
5. Give any five features of R.

1. Simple and effective programming language.

2. It is a data analysis software.

3. It gives effective storage facility and data handling.

4. It gives high extensible graphical techniques.

5. It is an interpreted language.

6. What are the data structures in R that is used to perform statistical


analyses and create graphs?

R has data structures like

 Vectors
 Matrices
 Arrays
 Data frames

7. Explain general format of Matrices in R?

General format is

Mymatrix< - matrix (vector, nrow=r , ncol=c , byrow=FALSE,


dimnames = list ( char_vector_ rowname, char_vector_colnames))
8. What is the function used for adding datasets in R?

rbind function can be used to join two data frames (datasets). The two data
frames must have the same variables, but they do not have to be in the same
order.

9. What is the use of subset() function and sample() function in R ?

In R, subset() functions help you to select variables and observations while


through sample() function you can choose a random sample of size n from a
dataset.

10. Explain how you can create a table in R without external file?

Use the code

Sk.Mahaboob Basha,Associate Professor of IT Page 53


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
myTable = data.frame()
edit(myTable)

11. Give the command to create a histogram and to remove a vector from
the R workspace?

hist() and rm() function are used as a command to create a histogram and
remove a vector from the R workspace.

12. Differentiate b/w "%%" and "%/%".

The "%%" provides a reminder of the division of the first vector with the second,
and the "%/%" gives the quotient of the division of the first vector with the
second.

13. How do you list the preloaded datasets in R?


To view a list of preloaded datasets in R, simply type data() into the console and
hit enter.

14. What are the disadvantages of R?


Just as you should know what R does well, you should understand its failings.

Memory and performance. In comparison to Python, R is often said to be the


lesser language in terms of memory and performance.

Open source. Being open source has its disadvantages as well as its
advantages. For one, there’s no governing body managing R, so there’s no
single source for support or quality control. This also means that sometimes
the packages developed for R are not the highest quality.

Security. R was not built with security in mind, so it must rely on external
resources to mind these gaps.

15.Write a custom function in R


Sometimes you’ll be asked to create a custom function on the fly. An example
of a custom function from

myFunction <- function(arg1, arg2, ... ){


statements

Sk.Mahaboob Basha,Associate Professor of IT Page 54


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
return(object)
}

16. How do you install a package in R?


There are many ways to install a package in R. Some even include using the
GUI. We’re coders, so we’re not going to give those attention.

Type the following into your console and hit enter:

install.packages("package_name")

Followed by:

library(package_name)

It’s that simple. The first command installs the package and the second loads
the package into the session.

17. What is a factor variable, and why would you use one?
A factor variable is a form of categorical variable that accepts either numeric or
character string values. The most salient reason to use a factor variable is that
it can be used in statistical modeling with great accuracy. Another reason is
that they are more memory efficient.

Simply use the factor() function to create a factor variable.

18. How do you concatenate strings in R?


Concatenating strings in R is less than intuitive. You don’t use a . operator, nor
a + operator, and forget about the & operator. In fact, you don’t use an
operator at all. Concatenating strings in R requires the use of
the paste() function. Here’s an example:

hello <- "Hello, "


world <- "World."
paste(hello, world)
[1] "Hello, World."

Sk.Mahaboob Basha,Associate Professor of IT Page 55


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
19. How do you read a CSV file in R?
We’ve covered this already with the import process. Simply use
the read.csv() function.

yourRDateHere <- read.csv("Data.csv", header = TRUE)

20. What are 3 sorting algorithms available in R?


R uses the sort() function to order a vector or factor, listed and described
below.

Radix: Usually the most performant algorithm, this is a non-comparative


sorting algorithm that avoids overhead. It’s stable, and it’s the default
algorithm for integer vectors and factors.

Quick Sort: This method “uses Singleton (1969)’s implementation of Hoare’s


Quicksort method and is only available when x is numeric (double or integer)
and partial is NULL,” according to R Documentation. It’s not considered a
stable sort.

Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick
(1986)),” according to R Documentation.

21 Why is R useful for data science?


R turns otherwise hours of graphically intensive jobs into minutes and
keystrokes. In reality, you probably wouldn’t encounter the language of R
outside the realm of data science or an adjacent field. It’s great for linear
modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so
much more.

Simply put, R is designed for data manipulation and visualization, so it’s


natural that it would be used for data science.

22What is the t-test() in R?

The t-test() function is used to determine that the mean of the two groups are
equal or not.

Sk.Mahaboob Basha,Associate Professor of IT Page 56


NRI INSTITUTE OF TECHNOLOGY
R – Programming Lab
23 Differentiate b/w lapply and sapply.

The lapply is used to show the output in the form of the list, whereas sapply is
used to show the output in the form of a vector or data frame.

24. Explain anova() function.

The anova() function is used for comparing the nested models.

25. Give names of visualization packages.

There are the following packages of visualization in R:

1. Plotly

2. ggplot2

3. tidyquant

4. geofacet

5. googleVis

6. Shiny

Sk.Mahaboob Basha,Associate Professor of IT Page 57

You might also like