SMB-R Programming Lab
SMB-R Programming Lab
R – Programming Lab
INDEX
Experiment
Name of the Experiments Page No.
No.
# Vector of strings
fruits <- c("banana", "apple", "orange")
Vectors are commonly created using the c() function, it is the easiest way to
create vectors in R. While, creating vector we must pass elements of the same
type, but, if the elements are of different type then elements are converted to
the same data type from lower data type to higher data types from logical to
integer to double to character.
Syntax:-
c(start:end)
or
x <- start:end
Example
Naming Vectors
t <- c(l1="January",l2="February",l3="March",l4="April"l5=,"May",l6="June")
q<-t{c(l1,l5)]
x<-10:13
y<-c(“l1”,”l2”,”l3”)
names(x)<-y
x[“l1”]
Combining vectors
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided
giving the result as a vector output.
Vector Length
To find out how many items a vector has, use the length() function:
Length(vector)
Sort a Vector
Example
Sort(vector)
SORT(V,DECREASING=TRUE)
V<-NULL
Applications of vectors
2. The inputs which are provided to the deep learning model are in the form
of vectors. These vectors consist of standardized data which is supplied
to the input layer of the neural network.
LISTS
List is the object which contains elements of different types – like strings,
numbers, vectors and another list inside it. R list can also contain a matrix. A
list is a data structure which has components of mixed data types. We can
imagine the R list as a bag to put many different items. When we need to use
an item, we can open the bag and use it.
The list is created using the list() function in R. In other words, a list is a
generic vector containing other objects.
Let’s create a list containing string, numbers, vectors and logical values.
For example:
list_data <- list("Red", "White", c(1,2,3), TRUE, 22.4)
print(list_data)
OUTPUT
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"
[[1]]
[1] 1 2 3
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
Sometimes it’s necessary to have repeated values, for which we use rep()
> rep(5,3)
[1] 5 5 5
> rep(2:5,each=3)
[1] 2 2 2 3 3 3 4 4 4 5 5 5
> rep(-1:3, length.out=10)
[1] -1 0 1 2 3 -1 0 1 2 3
Naming List Elements
The list elements can be given names and they can be accessed using these
names.
list("green",12.3))
print(list_data)
output
$`1st_Quarter`
$A_Matrix
[1,] 3 5 -2
[2,] 9 1 8
$A_Inner_list
$A_Inner_list[[1]]
$A_Inner_list[[2]]
[1] 12.3
Elements of the list can be accessed by the index of the element in the list. In
case of named lists it can also be accessed using the names.
list("green",12.3))
print(list_data[1])
# Access the thrid element. As it is also a list, all its elements will be printed.
print(list_data[3])
print(list_data$A_Matrix)
We can add, delete and update list elements as shown below. We can add and
delete elements only at the end of a list. But we can update any element.
print(list_data[4])
print(list_data[4])
print(list_data[3])
employee_id = c (1:5),
employee_name = c("James","Harry","Shinji","Jim","Oliver"),
sal = c(642.3,535.2,681.0,739.0,925.26),
stringsAsFactors = FALSE)
print(employee_data)
$ employee_id : int 1 2 3 4 5
emp_data<-
data.frame(employee_data$employee_id,employee_data$employee_name)
emp_data
employee_data.employee_name
1 James
2 Harry
3 Shinji
4 Jim
5 Oliver
a<-employee_data[1:2,]
employee_id employee_name
1 1 James
2 2 Harry
3 3 Shinji
4 4 Jim
5 5 Oliver
Extract 1st and 2nd row with the 3rd and 4th column of the below
data.
sal join_date
1 642.3 2013-02-04
2 535.2 2017-06-21
employee_data$dept <-
c("IT","Finance","Operations","HR","Administration")
Add Row
This is called column-major order. Of course, we need only give one of the
dimensions:
> matrix(1:12, nrow=3)
unless we want vector recycling to help us:
> matrix(1:3, nrow=3, ncol=4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
Sometimes it’s useful to specify the elements by row first
> matrix(1:12, nrow=3, byrow=TRUE)
There are special functions for constructing certain matrices:
> diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
> diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
> 1:5
1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
a<- rbind(c(1:3),c(4:6))
[1,] 1 2 3
[2,] 4 5 6
a<- cbind(c(1:3),c(4:6))
a
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
a[1,2]
[1] 4
a[1,]
[1] 1 4
Here is a 2 × 3 × 3 array:
> arr = array(1:18, dim=c(2,3,3))
> arr
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
,,3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown
as a 2 × 3
, , mark
,,2
, , MatB
result = m1 + m2
print("Result of addition")
print(result)
result = m1 - m2
print("Result of subtraction")
print(result)
result = m1 * m2
print("Result of multiplication")
print(result)
result = m1 / m2
print("Result of division:")
print(result)
[1] "Matrix-1:"
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] "Matrix-2:"
[,1] [,2] [,3]
[1,] 0 2 0
[2,] 1 3 2
[1] "Result of addition"
[,1] [,2] [,3]
[1,] 1 5 5
[2,] 3 7 8
[1] "Result of subtraction"
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 1 4
[1] "Result of multiplication"
[,1] [,2] [,3]
[1,] 0 6 0
[2,] 2 12 12
[1] "Result of division:"
[,1] [,2] [,3]
[1,] Inf 1.500000 Inf
[2,] 2 1.333333 3
Arguments
x : for sort an R object with a class or a numeric, complex, character or
logical vector. For sort.int, a numeric, complex, character or logical vector,
or a factor.
method :character string specifying the algorithm used. Not available for
partial sorting. Can be abbreviated.
index.return : logical indicating if the ordering index vector should be
returned as well. Supported by method == "radix" for any na.last mode and
data type, and the other methods when na.last = NA (the default) and fully
sorting non-factors.
Ex:
A<-c(51:60,5:50,60:100,1:5)
A
[1] 51 52 53 54 55 56 57 58 59 60 5 6
[13] 7 8 9 10 11 12 13 14 15 16 17 18
[25] 19 20 21 22 23 24 25 26 27 28 29 30
[37] 31 32 33 34 35 36 37 38 39 40 41 42
[49] 43 44 45 46 47 48 49 50 60 61 62 63
[61] 64 65 66 67 68 69 70 71 72 73 74 75
[73] 76 77 78 79 80 81 82 83 84 85 86 87
[85] 88 89 90 91 92 93 94 95 96 97 98 99
[97] 100 1 2 3 4 5
A<-sort(A,decreasing=FALSE,method="quick")
Quick Sort
Quicksort is a sorting algorithm based on the divide and conquer approach
where An array is divided into subarrays by selecting a pivot
element (element selected from the array).
1 While dividing the array, the pivot element should be positioned in such a
way that elements less than pivot are kept on the left side and elements
greater than pivot are on the right side of the pivot.
2. The left and right subarrays are also divided using the same approach.
This process continues until each subarray contains a single element.
3. At this point, elements are already sorted. Finally, elements are combined
to form a sorted array.
R has a built-in quicksort function but in some rare cases you might want
to modify the pivot value selection part of the algorithm. Here’s a custom
implementation
quickSort <- function(arr) {
# Pick a number at random.
p <- sample(arr, 1)
# Place-holders for left and right values.
left <- c()
if (length(left) > 1) {
left <- quickSort(left)
}
if (length(right) > 1) {
right <- quickSort(right)
}
# Finally, return the sorted values.
c(left, p, right)
}
x <-sample(1:100,10)
x
[1] 22 92 30 12 48 20 88 80 8 34
}
quicksort(c(10,24,33,21,22,66,11))
Draw one point in the diagram, at position (1) and position (3):
plot(1, 3)
Draw two points in the diagram, one at position (1, 3) and one in position (8,
10):
You can plot as many points as you like, just make sure you have the same
number of points in both axis:
x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)
plot(x, y)
If you want to draw dots in a sequence, on both the x-axis and the y-axis,
use the : operator:
plot(1:10)
plot(iris)
plot(iris$Sepal.Length)
Draw a Line
The plot() function also takes a type parameter with the value l to draw a line to
connect all the points in the diagram:
plot(1:10, type="l")
Plot Labels
The plot() function also accept other parameters, such as main, xlab and ylab if
you want to customize the graph with a main title and different labels for the x
and y-axis:
Graph Appearance
There are many other parameters you can use to change the appearance of the
points.
Colors
plot(1:10, col="red")
Size
Use cex=number to change the size of the points (1 is default, while 0.5 means
50% smaller, and 2 means 100% larger):
Point Shape
Use pch with a value from 0 to 25 to change the point shape format:
The values of the pch parameter ranges from 0 to 25, which means that we can
choose up to 26 different types of point shapes:
Line Width
To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):
A line graph has a line that connects all the points in a diagram.
To create a line, use the plot() function and add the type parameter with a
value of "l"
Line Width
To change the width of the line, use the lwd parameter (1 is default,
while 0.5 means 50% smaller, and 2 means 100% larger):
The line is solid by default. Use the lty parameter with a value from 0 to 6 to
specify the line format.
To display more than one line in a graph, use the plot() function together with
the lines() function:
Pie Charts
Use the label parameter to add a label to the pie chart, and use
the main parameter to add a header:
Colors
You can add a color to each pie with the col parameter:
Legend
To add a list of explanation for each pie, use the legend() function:
Bar Charts
A bar chart uses rectangular bars to visualize data. Bar charts can be
displayed horizontally or vertically. The height or length of the bars are
proportional to the values they represent.
# x-axis values
x <- c("A", "B", "C", "D")
# y-axis values
barplot(y, names.arg = x)
Bar Width
Horizontal Bars
If the two variables are increasing or decreasing in parallel then they have a
positive correlation between them and if one of the variables is increasing and
another one is decreasing then they have a negative correlation with each
other. If the change of one variable has no effect on another variable then they
have a zero correlation between them.
It is used to identify the degree of the linear relationship between two variables.
It is represented by 𝝆 and calculated as:-
Where
𝜎x = Standard deviation of x
𝜎𝑦 = Standard deviation of y
A positive value has a range from 0 to 1 where 𝜌 (𝑥, 𝑦) = 1 defines the strong
positive correlation between the variables.
A negative value has a range from -1 to 0 where 𝜌 (𝑥, 𝑦) = -1 defines the strong
negative correlation between the variables.
height<-c(168,169,170,172,174)
plot(height,weight,main="human",col="green", type="l")
cor(height,weight)
[1] 0.9382329
Tes
0.3249548 0.9960214
sample estimates:
cor
0.9382329
km<-c(0,20,40,60,80,100)
oilquntity<-c(20,19,18,17,16,15)
cor(km,oilquntity)
[1] -1
Tes
-1 -1
sample estimates:
cor
-1
When we execute the above code, it produces the following result − Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
predict() Function
Syntax
The basic syntax for predict() in linear regression is − predict(object, newdata)
Following is the description of the parameters used − 16
Predict the weight of new persons
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function. relation <- lm(y~x)
# Find weight of a person with height 170. a <- data.frame(x = 170)
result <- predict(relation,a) print(result)
When we execute the above code, it produces the following result − 1
76.22869 17
• y is the response variable.
• formula is the symbol presenting the relationship between the variables.
• data is the data set giving the values of these variables.
• family is R object to specify the details of the model. It's value is binomial for
logistic regression.
• x is the predictor variable.
• a and b are the coefficients which are numeric constants.
Mean
Median
Distribution
Covariance
Regression
Non-linear
A vector is a series of data elements of the same basic type. The members in
the vector are known as a component.
A two-dimensional data structure used to bind the vectors from the same
length, known as the matrix. The matrix contains the same types of elements.
5. It is an interpreted language.
Vectors
Matrices
Arrays
Data frames
General format is
rbind function can be used to join two data frames (datasets). The two data
frames must have the same variables, but they do not have to be in the same
order.
10. Explain how you can create a table in R without external file?
11. Give the command to create a histogram and to remove a vector from
the R workspace?
hist() and rm() function are used as a command to create a histogram and
remove a vector from the R workspace.
The "%%" provides a reminder of the division of the first vector with the second,
and the "%/%" gives the quotient of the division of the first vector with the
second.
Open source. Being open source has its disadvantages as well as its
advantages. For one, there’s no governing body managing R, so there’s no
single source for support or quality control. This also means that sometimes
the packages developed for R are not the highest quality.
Security. R was not built with security in mind, so it must rely on external
resources to mind these gaps.
install.packages("package_name")
Followed by:
library(package_name)
It’s that simple. The first command installs the package and the second loads
the package into the session.
17. What is a factor variable, and why would you use one?
A factor variable is a form of categorical variable that accepts either numeric or
character string values. The most salient reason to use a factor variable is that
it can be used in statistical modeling with great accuracy. Another reason is
that they are more memory efficient.
Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick
(1986)),” according to R Documentation.
The t-test() function is used to determine that the mean of the two groups are
equal or not.
The lapply is used to show the output in the form of the list, whereas sapply is
used to show the output in the form of a vector or data frame.
1. Plotly
2. ggplot2
3. tidyquant
4. geofacet
5. googleVis
6. Shiny