R Practicals

A LAB Manual of
Elements of R Programming
BBA BUSINESS ANALYTICS, Sem-II
2022-23
Faculty
Dr. D. Srinivasa Rao
INDEX
Remarks
PNO Name of the Experiment
1 Introduction to R programming: creating Data

Structures in R
2 Basic Mathematical and Statistical Functions in
R
3 Indexing and sub-setting of Data Frames
4 Looping Functions in R
5 Basic Graphs with R
6 Advanced Graphs with R packages
7 Correlation and Regression with R
8 Sampling with R
9
10
Signature of Faculty
Declaration
The data and information created and presented is the manual is

original to the best of my knowledge.
Signature of Student
Practical 1: Creating Data Structures in R
2. Concept : Data structures in R. In R program there are
Number of data structures. A data structure is a
Frame work is to hold different types of data in R.
There are 5 main data structures in R:
1. Vector
2. Matrix
3. Array
4. Data structure
5. List
3. Example: We shall create all five data Structures.
4. Procedure:
- To create a vector we shall use c() function
-To create a matrix we shall use matrix() function
-To create an array we shall use array() function

-To create a data frame we shall use dataframe()
Function
-To create a list we shall use list() function
5.R code:
##############################
#Data structure in R program
#############################
#Creating a vector
# vector of Doubles
x=c(1.1,2,9,3.4,4.7)
class(x)
#Vector of integers
Y=c(1L,3L,5L)
class(Y)
#charecter vector
z=c('a','b','c')
class(z)
# logical vector
A=c(T,F,T)
class(A)
# complex vector
B=complex(real=c(1,2,3), imaginary=c(2,3,1))
class(B)
# creating matrices
mymat=matrix(1:10,nrow=2)
mymat
class(mymat)
# creating an Array
myarr=array(1:12,dim=c(2,2,3))
myarr
class(myarr)
#data frame
x1=c(1,2,3,5)
y1=c('a','b','c','d')
z1=c(T,F,T,F)
mydf=data.frame(x1,y1,z1)
mydf
class(mydf)
# creating a list
mylist=list(c(12,3), matrix(1:10,2), mydf, myarr)
mylist
class(mylist)
1.
Screenshot of R syntax
Practical 2: Basic Mathematical and Statistical Functions in R
2.Concept: Built in Mathematical and Statistical Functions in R
The following are some of the mathematical functions in R:
 sum() # sum of number
 prod() # product of number
 seq() # sequence of number
 rep() # repeating an input
 min() # minimum of a numeric vector
 max() # maximum of a numeric vector
 log() #logarithm with a base e
 exp()# exponentiation
 abs() # absolute value
 length() #no. of elements in the vector
 dim() # no. of rows and columns in a data frame
 sqrt()# square root
 factorial() #factorial of a given number
 choose() #combinations
 rank() # ranking of numbers
The following are some of the statistical functions in R:
 mean()
 median()
 sd() # standard deviation of a vector
 var() #variance of vector
 quantile() # quintiles
 skew() #skewness
 range() #range
 cor()#correaltion
 summary()#descriptive summary of data frames
3.Example:
 sum(1:100) # sum of first 100 numbers
 prod(1:10) # product of first 10 numbers
 seq(1:30) # sequence from 1 to 30
 rep(c(1,2,3),3) # repeating an input 3 times
 min(c(2,10,20,30)) # minimum of a numeric
vector
 max(c(10,30,100,7)) # maximum of a numeric
vector
 log(10) #logarithm with a base e
 exp(20)# exponentiation of 20
 abs(c(-2,-3,1)) # absolute value of vector with
negative elements
 length(c(2,2,8,9,10)) #no. of elements in the
vector
 dim(trees) # no. of rows and columns in trees
data frame
 sqrt(169)# square root of 169
 factorial(5) #factorial of 5
 choose(5,3) #5c3
 rank(c(2,10,12,7,9,3,14)) # ranking of elements in
the vector
# Statistical functions :
 mean(1:10) #mean of first 10 numbers
 median(20:40)# median of numbers from 20 - 40
 sd(c(2,2,3,10,10)) # standard deviation of a
vector
 var(c(12,10,20,31,21)) #variance of vector
 quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psysch)
 skew(c(23,13,14,20,12)) #skewness
 range(c(2,10,34,23,18)) #range
 cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correal
tion between x and y
 summary(trees) #descriptive summary of data
frames:trees
4.Procedure: We shall execute them in R script
5.R code output:
####################################
##Lab 2: Mathematical function in R
####################################
sum(1:100) # sum of first 100 numbers
prod(1:10) # product of first 10 numbers
seq(1:30) # sequence from 1 to 30
rep(c(1,2,3),3) # repeating an input 3 times
min(c(2,10,20,30)) # minimum of a numeric vector
max(c(10,30,100,7)) # maximum of a numeric vector
log(10) #logarithm with a base e
exp(20)# exponentiation of 20 by 2
abs(c(-2,-3,1)) # absolute value of vector with negative elements
length(c(2,2,8,9,10)) #no. of elements in the vector
dim(trees) # no. of rows and columns in trees data frame
sqrt(169)# square root of 169
factorial(5) #factorial of 5
choose(5,3) #5c3
rank(c(2,10,12,7,9,3,14)) # ranking of elements in the vector
################################
##Statistical functions in R
###############################
mean(1:10) #mean of first 10 numbers
median(20:40)# median of numbers from 20 - 40
sd(c(2,2,3,10,10)) # standard deviation of a vector
var(c(12,10,20,31,21)) #variance of vector
quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psych)
skew(c(23,13,14,20,12)) #skewness
range(c(2,10,34,23,18)) #range
cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correaltion between x and y
summary(trees) #descriptive summary of data frames:trees
Screen shot of R syntax:

Practical-3: Indexing and Sub-Setting of
Data frames
1.Concept:
Indexing or sub-setting means taking the part of the data.

Data could be a vector or a dataframe.
-To subset a vector we use the [] operator
-To subset a dataframe we use three methods:
1. The $ operator
2. The [i,j] operator, ‘i’ for rows and ‘j’ for column
3. The subset () function
2. Examples & Procedure:
#############################################
##Practical-3: Indexing and Sub-setting of data frames
#############################################
## craeting a numeric vector x and charecter vector y
x=c(1,3,6,9,3,3)
y=c('a','b','c')
##subset of first three elements of x
x[1:3]
## subset of third and fifth element of x
x[c(3,5)]
## subset of last element of y
y[3]
## subsetting of a data frame 'trees'
## method 1: subsetting a column with $ operator
## subset 'volume' column from trees data frame
trees$Volume
## subset species column from 'iris' data frame
iris$Species
## method 2: using [] operator
## subset first 12 rows of trees data frame
trees[1:12,]
## subset first 1,4,5,9 rows of trees data frame
trees[c(1,4,5,9),]
## subset first two columns of trees data frame
trees[,c(1:2)]
## subset first three rows and first two columns
trees[c(1:3),c(1:2)]
## neagtive subsetting
trees[-(4:31),-3]
## method-3:subset() function
## find trees with height > 75 in trees data frame
subset(trees,trees$Height>75)
## find trees with height > 75 and volume>18in trees data frame
subset(trees,trees$Height>75&trees$Volume>18)
## find trees with height > 75 and volume>18 and girth<18 in
trees data frame
subset(trees,trees$Height>75&trees$Volume>18&trees$Girth<18
)
3.Output:
Practical-4: Looping Functions in R
1.Concept:
Looping means repeatedly performing a function on a vector or a data

frame. In R programming there are several looping functions:
1. T apply()
2. apply()
3. L apply()
4. S apply()
o T apply function is used on a vector across groups.

o Apply function is used on a dataframe : row wise and column wise.
o L apply function is used only on columns of a data frame.
o S apply function is a simplification of l apply function.
2.Examples and Procedure:

#############################################
##Practical-4: Looping functions in R
#############################################
## tapply() function
## find group(species) wise arthematic mean of Sepal.Length of iris data
tapply(iris$Sepal.Length,iris$Species,mean)
## find group wise median of petal width of iris data
tapply(iris$Petal.Width,iris$Species,median)
## apply() funtion
## find standard deviation all varibles in trees dataframe
apply(trees,2,sd)
## find summary all varibles in trees dataframe
apply(trees,2,summary)
### lapply()Function- lapply functions applies on columns
## the output is a list
lapply(trees,var)
## s apply -it is similar to lapply and simplifies the output of lapply
sapply(trees,var)
3.Output:
Practical-5: Basic Graphs with R
1.Concept:
 Graphs and charts means converting numbers into visuals

 The type of graph depends on the types of data
 We have two types of data: Quantitative and Qualitative
 We have three types of Graphs and Charts
o Uni-variate
o Bi-variate
o Multivariate
 From the above classification we have

o Plots for Univariate quant variables
o Plots for Univariate Qualitative variables
o Plots for Bivariate Quant variables
o Plots for Bivariate Qualitative variables
o Plots for Multivariate Quant variables
2.: Examples and Procedures:

 Plots for Univariate Quant variables:
o Histogram
o Box plot
o Density plot
 Plots for Univariate Qualitative variables:
o Pie chart
o Bar chart
 Plots for Bi variate Quant variables:
o Scatter plot
 Plots for Bivariate Qualitative variables:
o Stacked Bar plot
o Side by Side bar plot
 Plots for Multivariate Quant variables:
o Heat map
3.R – Code :
#################################
### Practical-5:Basic Graphs with R
###############################
###Plots for univariate quant variables
## Histogram
hist(trees$Girth)
## Box plot
boxplot(iris$Sepal.Length)
## Density plot
plot(density(chickwts$weight))
###Plots for univariate qualitative variables
pie(table(chickwts$feed))
## barchart
barplot(table(chickwts$feed))
## Plots for Bivariate Quant variables
##Scatter plot
plot(iris$Sepal.Length,iris$Sepal.Width)
## Plots for Bivariate Qualitative variables
## stacked bar diagram
## create gender and religion
gender=rep(c('m','f','m'),30)
religion=rep(c('H','M','C','O'),c(50,20,10,10))
barplot(table(gender,religion))
## side by side barplot
barplot(table(gender,religion),beside = T)
### Multivariate plots
## Heat map
library(psych)
psych::cor.plot(trees)
cor.plot(trees)
4.Screen shot of R syntax:
5.Graphs:
#Univariate quant variable:
a. Histogram:
b. Box plot:
c. Density plot:
# Univariate Qualitative variables

a.Pie chart:
b. Bar plot:
# Bivariate of Quant variables:

a. Scatter plot:
#Bivariate for Qualitative variables:
a. Stacked bar diagram:
b. Side by side bar plot:

#Multivariate plots Quant variables :
a. Heat Map:
Practical-6: Advanced Graphs with R packages
1.Concept:
 In R there are two special packages for advanced graphs: Lattice
and ggplot2
 Lattice package is based on grid system
 ggplot2 package is based on grammar of graphics
 We can draw both basic and advanced graphs with these
packages
2. Examples and Procedure:
Lattice package:
 Lattice package is built on the principle of grid system
 It uses formula interface
 Lattice Syntax format:(DV~ IV |group, data)
ggplot2 package:
 Based on grammar of graphics
 It uses three components in the syntax
o Data
o Aesthetics(aes)
o Geometry(geom)
3.R-code:
#############################
## Graphs with lattice package
## Graphs for quantitative variables
# Histogram
library(lattice)
histogram(~ Height,data=trees)
## Grouped Histogram
histogram(~Sepal.Length|Species,data=iris)
## Density plot
densityplot(~Sepal.Width,data=iris)
## grouped density plots
densityplot(~Sepal.Width|Species,data=iris)
## Box plot
bwplot(~Petal.Length,data=iris)
## Grouped Box plots
bwplot(~Petal.Length|Species,data=iris)
# Scatter plot
xyplot(Height~Girth,data=trees)
## Bar plot
barchart(~Sepal.Length|Species,data=iris)
######################
##plots with ggplot2
#####################
## univariate plots
## plots in Quantitative variables
library(ggplot2)
##Histogram
ggplot(data=trees,aes(x=Height))+geom_histogram(bins=10)
## Box plot
ggplot(data=trees,aes(x=Girth))+geom_boxplot()
##Density plot
ggplot(data=trees,aes(x=Volume))+geom_density()
##plots for qualitative variable
## Bar plot
ggplot(data=chickwts,aes(x=feed))+geom_bar()
##Pie chart
ggplot(data=chickwts,aes(x=feed))+geom_bar()+coord_polar()
##Bivariate plots
## Scatter plot
ggplot(data=trees,aes(x=Height,y=Volume))+geom_point()
4.Screen shot of R syntax:
5.Graphs:
# Lattice Package
#Quantitative variables
1.Histogram:
2.Grouped Histogram:
3.Density plot:
4.Grouped Density Plot:
5.Box plot:
6.Grouped Box plots:
7.Scatter plot:
8.Bar plot:
## ggplot2 package
## Univariate plots
##Quantitative variables
1. Histogram:
2. Box plot:
3.Density plot:
# Plots for Qualitative Variables:
1. Bar plot:
2. Pie chart:
##Bivariate plot:
1. Scatter plot:
Practical 7: Correlation and Regression with R
1.Concept:
a) Correlation:
- Correlation is a measure of linear association
between two numeric variables
- Consideration can be positive , negative , zero
- We can measure correlation by two methods:
o Pearson’s Correlation Coefficient
o Spearman’s Rank Correlation Coefficient
- Correlation between more than two variables is
called Partial Correlation
b) Regression:
 Regression tries to find functional relationship
between Dependent and Independent variable
 Regression is of two types: Linear and Non-linear
2.Example :
 We shall consider mtcars data from R base.

 We use the R function: cor(mtcars) to find out both
Pearson and spearman’s correlation coefficient
## Simple Linear Regression Model:
 We shall use trees data set for this

 DV=Volume , IV=Height
 R function for Linear Regression is lm()
3.Procedure: We shall execute them in R script.
 >Cor(mtcars) #Correlation Matrix

 ## Simple Linear Regression Model
>Model=lm(Volume~Height , data=trees)
 ## Summary of Model
>Summary(Model)
 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
 ##Multiple Linear Regression Model
#DV= Volume , IVS=Height , Girth
>Model=lm(Volume~Height+Girth,data=trees)
 ## Summary of Model
>Summary(Model)
 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
4. R-Output:
## Finding correlation coefficients
## Pearson correlation coefficient between mpg and hp of mtcars

data
cor(mtcars$mpg,mtcars$hp)
## Spearman's Rank Correlation between mpg and cyl
cor(mtcars$mpg,mtcars$hp,method = 'spearman')
## Correlation Matrix
cor(mtcars)
##Simple Linear Regression
Model=lm(Volume~Height,data=trees)
summary(Model)
##R-Square value is 0.3579
##Finding MSE
MSE=mean((Model$fitted.values-trees$Volume)^2)
MSE
## We find R- Square as0.3579 and MSE 167.89
##Multiple Linear Regression
Model=lm(Volume~Height+Girth,data=trees)
##Summary
summary(Model)
## we find R-Square value is 0.9442
#Finding MSE
MSE=mean((Model$fitted.values-trees$Volume)^2)
MSE
## We find that MSE is 13.61
5. R-Syntax:
Practical-8: Sampling with R
1. Concept:
i. Sampling-
 Sampling means the method of taking a sample
from a population with or without replacement.
ii. Types of Sampling-
 There are two types of sampling
o Probability sampling
o Non – Probability sampling
iii. Probability Sampling- It is based on rules of
Probability. There are four types of Probability
sampling:
o Simple Random Sampling
o Systematic Random Sampling
o Stratified Random Sampling
o Cluster Sampling
2. Procedure: We shall execute the three different sampling

methods in R.
#Simple Random Sampling with/without Replacement of a
Vector.
###############################
##Practical 8- sampling with R
###############################
## simple random sampling
##Create a population pf 100 natural numbers
pop=1:100
##Random sample of 20 members without replacement
sample(pop,size = 20)
##Random sample of 20 members with replacement
sample(pop,size=20,replace = T)
##Systematic random sampling
##with 20 natural numbers take systematic random sampling of
3 elements.
##code for systematic sampling
obtain_sys=function(N,n){
k=ceiling(N/n)
r=sample(1:k,1)
seq(r,r+k*(n-1),k)}
obtain_sys(20,3)
##Stratified Random Sampling
## we shall do stratified randomsampling on chickwts data
library(sampling)
sampling::strata(chickwts,stratanames =
c('feed'),size=c(4,4,3,2,3,3))
3. R-output screen shots:
Practical-9: Time Series Analysis: Decomposition
1. Concept:
 Decomposition in time series means separating
the components of a time series.
 There are two types of Decompositions
o Additive : T+C+S+R
o Multiplicative : T*C*S*R
o Here T is Trend, C is Cycles, S is
Seasonality, and R is Random components
2. Procedure:
 From a given time series we shall calculate
Trend, Seasonality, Cycles separately and
then subtract three components from the
time series and get the random component.
3. Example:
 We shall consider a built in R-data set
‘JohnsonJohnson’ for decomposition.
 R-code for decomposition of
‘JohnsonJohnson’ data.
4. R-CODE:
#############
## Lab-9
#############
## Decomposition of Time series data
## data: Johnsonjohnson
#######################
# Loading the data
data("JohnsonJohnson")
## Structure of data
str(JohnsonJohnson)
## it is quaterely time series data
## Examing the time series plot
plot(JohnsonJohnson)
## we find that seasonality in the data is increasing
## we shall use multiplicative model :T *S*R
mod=decompose(JohnsonJohnson,type="multiplicative")
## Finding trend,seasonality and random components

mod_trend=mod$trend
mod_season=mod$seasonal
mod_rand=mod$random
##ploting the decomposition of johnson johnson data
plot(decompose(JohnsonJohnson))
## in the plot all the three components are shown
5. R-output:

R Practicals

Uploaded by

Copyright:

Available Formats

R Practicals

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Practicals

Uploaded by

Copyright:

Available Formats

A LAB Manual of

1 Introduction to R programming: creating Data

The data and information created and presented is the manual is

-To create an array we shall use array() function

Screen shot of R syntax:

Indexing or sub-setting means taking the part of the data.

2. Examples & Procedure:

Looping means repeatedly performing a function on a vector or a data

o T apply function is used on a vector across groups.

2.Examples and Procedure:

 Graphs and charts means converting numbers into visuals

 From the above classification we have

2.: Examples and Procedures:

# Univariate Qualitative variables

# Bivariate of Quant variables:

b. Side by side bar plot:

2. Examples and Procedure:

 We shall consider mtcars data from R base.

 We shall use trees data set for this

3.Procedure: We shall execute them in R script.

 >Cor(mtcars) #Correlation Matrix

## Finding correlation coefficients

## Pearson correlation coefficient between mpg and hp of mtcars

## Spearman's Rank Correlation between mpg and cyl

##Simple Linear Regression

##R-Square value is 0.3579

## We find R- Square as0.3579 and MSE 167.89

##Multiple Linear Regression

## we find R-Square value is 0.9442

## We find that MSE is 13.61

2. Procedure: We shall execute the three different sampling

## Decomposition of Time series data

# Loading the data

## it is quaterely time series data

## Examing the time series plot

## we find that seasonality in the data is increasing

## we shall use multiplicative model :T *S*R

## Finding trend,seasonality and random components

##ploting the decomposition of johnson johnson data

## in the plot all the three components are shown

You might also like

## we shall use multiplicative model :T SR