R Practicals
R Practicals
R Practicals
Elements of R Programming
BBA BUSINESS ANALYTICS, Sem-II
2022-23
Faculty
Dr. D. Srinivasa Rao
INDEX
Remarks
PNO Name of the Experiment
Signature of Faculty
Declaration
Signature of Student
Practical 1: Creating Data Structures in R
2. Concept : Data structures in R. In R program there are
Number of data structures. A data structure is a
Frame work is to hold different types of data in R.
There are 5 main data structures in R:
1. Vector
2. Matrix
3. Array
4. Data structure
5. List
3. Example: We shall create all five data Structures.
4. Procedure:
- To create a vector we shall use c() function
-To create a matrix we shall use matrix() function
##############################
#Data structure in R program
#############################
#Creating a vector
# vector of Doubles
x=c(1.1,2,9,3.4,4.7)
class(x)
#Vector of integers
Y=c(1L,3L,5L)
class(Y)
#charecter vector
z=c('a','b','c')
class(z)
# logical vector
A=c(T,F,T)
class(A)
# complex vector
B=complex(real=c(1,2,3), imaginary=c(2,3,1))
class(B)
# creating matrices
mymat=matrix(1:10,nrow=2)
mymat
class(mymat)
# creating an Array
myarr=array(1:12,dim=c(2,2,3))
myarr
class(myarr)
#data frame
x1=c(1,2,3,5)
y1=c('a','b','c','d')
z1=c(T,F,T,F)
mydf=data.frame(x1,y1,z1)
mydf
class(mydf)
# creating a list
mylist=list(c(12,3), matrix(1:10,2), mydf, myarr)
mylist
class(mylist)
1.
Screenshot of R syntax
Practical 2: Basic Mathematical and Statistical Functions in R
2.Concept: Built in Mathematical and Statistical Functions in R
The following are some of the mathematical functions in R:
sum() # sum of number
prod() # product of number
seq() # sequence of number
rep() # repeating an input
min() # minimum of a numeric vector
max() # maximum of a numeric vector
log() #logarithm with a base e
exp()# exponentiation
abs() # absolute value
length() #no. of elements in the vector
dim() # no. of rows and columns in a data frame
sqrt()# square root
factorial() #factorial of a given number
choose() #combinations
rank() # ranking of numbers
The following are some of the statistical functions in R:
mean()
median()
sd() # standard deviation of a vector
var() #variance of vector
quantile() # quintiles
skew() #skewness
range() #range
cor()#correaltion
summary()#descriptive summary of data frames
3.Example:
sum(1:100) # sum of first 100 numbers
prod(1:10) # product of first 10 numbers
seq(1:30) # sequence from 1 to 30
rep(c(1,2,3),3) # repeating an input 3 times
min(c(2,10,20,30)) # minimum of a numeric
vector
max(c(10,30,100,7)) # maximum of a numeric
vector
log(10) #logarithm with a base e
exp(20)# exponentiation of 20
abs(c(-2,-3,1)) # absolute value of vector with
negative elements
length(c(2,2,8,9,10)) #no. of elements in the
vector
dim(trees) # no. of rows and columns in trees
data frame
sqrt(169)# square root of 169
factorial(5) #factorial of 5
choose(5,3) #5c3
rank(c(2,10,12,7,9,3,14)) # ranking of elements in
the vector
# Statistical functions :
mean(1:10) #mean of first 10 numbers
median(20:40)# median of numbers from 20 - 40
sd(c(2,2,3,10,10)) # standard deviation of a
vector
var(c(12,10,20,31,21)) #variance of vector
quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psysch)
skew(c(23,13,14,20,12)) #skewness
range(c(2,10,34,23,18)) #range
cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correal
tion between x and y
summary(trees) #descriptive summary of data
frames:trees
4.Procedure: We shall execute them in R script
5.R code output:
####################################
##Lab 2: Mathematical function in R
####################################
sum(1:100) # sum of first 100 numbers
prod(1:10) # product of first 10 numbers
seq(1:30) # sequence from 1 to 30
rep(c(1,2,3),3) # repeating an input 3 times
min(c(2,10,20,30)) # minimum of a numeric vector
max(c(10,30,100,7)) # maximum of a numeric vector
log(10) #logarithm with a base e
exp(20)# exponentiation of 20 by 2
abs(c(-2,-3,1)) # absolute value of vector with negative elements
length(c(2,2,8,9,10)) #no. of elements in the vector
dim(trees) # no. of rows and columns in trees data frame
sqrt(169)# square root of 169
factorial(5) #factorial of 5
choose(5,3) #5c3
rank(c(2,10,12,7,9,3,14)) # ranking of elements in the vector
################################
##Statistical functions in R
###############################
mean(1:10) #mean of first 10 numbers
median(20:40)# median of numbers from 20 - 40
sd(c(2,2,3,10,10)) # standard deviation of a vector
var(c(12,10,20,31,21)) #variance of vector
quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psych)
skew(c(23,13,14,20,12)) #skewness
range(c(2,10,34,23,18)) #range
cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correaltion between x and y
summary(trees) #descriptive summary of data frames:trees
#############################################
##Practical-3: Indexing and Sub-setting of data frames
#############################################
## craeting a numeric vector x and charecter vector y
x=c(1,3,6,9,3,3)
y=c('a','b','c')
##subset of first three elements of x
x[1:3]
## subset of third and fifth element of x
x[c(3,5)]
## subset of last element of y
y[3]
## subsetting of a data frame 'trees'
## method 1: subsetting a column with $ operator
## subset 'volume' column from trees data frame
trees$Volume
## subset species column from 'iris' data frame
iris$Species
## method 2: using [] operator
## subset first 12 rows of trees data frame
trees[1:12,]
## subset first 1,4,5,9 rows of trees data frame
trees[c(1,4,5,9),]
## subset first two columns of trees data frame
trees[,c(1:2)]
## subset first three rows and first two columns
trees[c(1:3),c(1:2)]
## neagtive subsetting
trees[-(4:31),-3]
## method-3:subset() function
## find trees with height > 75 in trees data frame
subset(trees,trees$Height>75)
## find trees with height > 75 and volume>18in trees data frame
subset(trees,trees$Height>75&trees$Volume>18)
## find trees with height > 75 and volume>18 and girth<18 in
trees data frame
subset(trees,trees$Height>75&trees$Volume>18&trees$Girth<18
)
3.Output:
Practical-4: Looping Functions in R
1.Concept:
3.Output:
Practical-5: Basic Graphs with R
1.Concept:
b. Box plot:
c. Density plot:
Lattice package:
Lattice package is built on the principle of grid system
It uses formula interface
Lattice Syntax format:(DV~ IV |group, data)
ggplot2 package:
Based on grammar of graphics
It uses three components in the syntax
o Data
o Aesthetics(aes)
o Geometry(geom)
3.R-code:
#############################
## Graphs with lattice package
## Graphs for quantitative variables
# Histogram
library(lattice)
histogram(~ Height,data=trees)
## Grouped Histogram
histogram(~Sepal.Length|Species,data=iris)
## Density plot
densityplot(~Sepal.Width,data=iris)
## grouped density plots
densityplot(~Sepal.Width|Species,data=iris)
## Box plot
bwplot(~Petal.Length,data=iris)
## Grouped Box plots
bwplot(~Petal.Length|Species,data=iris)
# Scatter plot
xyplot(Height~Girth,data=trees)
## Bar plot
barchart(~Sepal.Length|Species,data=iris)
######################
##plots with ggplot2
#####################
## univariate plots
## plots in Quantitative variables
library(ggplot2)
##Histogram
ggplot(data=trees,aes(x=Height))+geom_histogram(bins=10)
## Box plot
ggplot(data=trees,aes(x=Girth))+geom_boxplot()
##Density plot
ggplot(data=trees,aes(x=Volume))+geom_density()
##plots for qualitative variable
## Bar plot
ggplot(data=chickwts,aes(x=feed))+geom_bar()
##Pie chart
ggplot(data=chickwts,aes(x=feed))+geom_bar()+coord_polar()
##Bivariate plots
## Scatter plot
ggplot(data=trees,aes(x=Height,y=Volume))+geom_point()
4.Screen shot of R syntax:
5.Graphs:
# Lattice Package
#Quantitative variables
1.Histogram:
2.Grouped Histogram:
3.Density plot:
4.Grouped Density Plot:
5.Box plot:
6.Grouped Box plots:
7.Scatter plot:
8.Bar plot:
## ggplot2 package
## Univariate plots
##Quantitative variables
1. Histogram:
2. Box plot:
3.Density plot:
# Plots for Qualitative Variables:
1. Bar plot:
2. Pie chart:
##Bivariate plot:
1. Scatter plot:
Practical 7: Correlation and Regression with R
1.Concept:
a) Correlation:
- Correlation is a measure of linear association
between two numeric variables
- Consideration can be positive , negative , zero
- We can measure correlation by two methods:
o Pearson’s Correlation Coefficient
o Spearman’s Rank Correlation Coefficient
- Correlation between more than two variables is
called Partial Correlation
b) Regression:
Regression tries to find functional relationship
between Dependent and Independent variable
Regression is of two types: Linear and Non-linear
2.Example :
## Summary of Model
>Summary(Model)
## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
##Multiple Linear Regression Model
#DV= Volume , IVS=Height , Girth
>Model=lm(Volume~Height+Girth,data=trees)
## Summary of Model
>Summary(Model)
## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
4. R-Output:
cor(mtcars$mpg,mtcars$hp)
cor(mtcars$mpg,mtcars$hp,method = 'spearman')
## Correlation Matrix
cor(mtcars)
Model=lm(Volume~Height,data=trees)
summary(Model)
##Finding MSE
MSE=mean((Model$fitted.values-trees$Volume)^2)
MSE
Model=lm(Volume~Height+Girth,data=trees)
##Summary
summary(Model)
#Finding MSE
MSE=mean((Model$fitted.values-trees$Volume)^2)
MSE
5. R-Syntax:
Practical-8: Sampling with R
1. Concept:
i. Sampling-
Sampling means the method of taking a sample
from a population with or without replacement.
ii. Types of Sampling-
There are two types of sampling
o Probability sampling
o Non – Probability sampling
iii. Probability Sampling- It is based on rules of
Probability. There are four types of Probability
sampling:
o Simple Random Sampling
o Systematic Random Sampling
o Stratified Random Sampling
o Cluster Sampling
1. Concept:
Decomposition in time series means separating
the components of a time series.
There are two types of Decompositions
o Additive : T+C+S+R
o Multiplicative : T*C*S*R
o Here T is Trend, C is Cycles, S is
Seasonality, and R is Random components
2. Procedure:
From a given time series we shall calculate
Trend, Seasonality, Cycles separately and
then subtract three components from the
time series and get the random component.
3. Example:
We shall consider a built in R-data set
‘JohnsonJohnson’ for decomposition.
R-code for decomposition of
‘JohnsonJohnson’ data.
4. R-CODE:
#############
## Lab-9
#############
## data: Johnsonjohnson
#######################
data("JohnsonJohnson")
## Structure of data
str(JohnsonJohnson)
plot(JohnsonJohnson)
mod=decompose(JohnsonJohnson,type="multiplicative")
mod_season=mod$seasonal
mod_rand=mod$random
plot(decompose(JohnsonJohnson))
5. R-output: