Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Practicals

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

A LAB Manual of

Elements of R Programming
BBA BUSINESS ANALYTICS, Sem-II
2022-23

Faculty
Dr. D. Srinivasa Rao
INDEX

Remarks
PNO Name of the Experiment

1 Introduction to R programming: creating Data


Structures in R
2 Basic Mathematical and Statistical Functions in
R
3 Indexing and sub-setting of Data Frames
4 Looping Functions in R
5 Basic Graphs with R
6 Advanced Graphs with R packages
7 Correlation and Regression with R
8 Sampling with R
9
10

Signature of Faculty
Declaration

The data and information created and presented is the manual is


original to the best of my knowledge.

Signature of Student
Practical 1: Creating Data Structures in R
2. Concept : Data structures in R. In R program there are
Number of data structures. A data structure is a
Frame work is to hold different types of data in R.
There are 5 main data structures in R:
1. Vector
2. Matrix
3. Array
4. Data structure
5. List
3. Example: We shall create all five data Structures.
4. Procedure:
- To create a vector we shall use c() function
-To create a matrix we shall use matrix() function

-To create an array we shall use array() function


-To create a data frame we shall use dataframe()
Function
-To create a list we shall use list() function
5.R code:

##############################
#Data structure in R program
#############################
#Creating a vector
# vector of Doubles
x=c(1.1,2,9,3.4,4.7)
class(x)
#Vector of integers
Y=c(1L,3L,5L)
class(Y)
#charecter vector
z=c('a','b','c')
class(z)
# logical vector
A=c(T,F,T)
class(A)
# complex vector
B=complex(real=c(1,2,3), imaginary=c(2,3,1))
class(B)
# creating matrices
mymat=matrix(1:10,nrow=2)
mymat
class(mymat)
# creating an Array
myarr=array(1:12,dim=c(2,2,3))
myarr
class(myarr)
#data frame
x1=c(1,2,3,5)
y1=c('a','b','c','d')
z1=c(T,F,T,F)
mydf=data.frame(x1,y1,z1)
mydf
class(mydf)
# creating a list
mylist=list(c(12,3), matrix(1:10,2), mydf, myarr)
mylist
class(mylist)

1.
Screenshot of R syntax
Practical 2: Basic Mathematical and Statistical Functions in R
2.Concept: Built in Mathematical and Statistical Functions in R
The following are some of the mathematical functions in R:
 sum() # sum of number
 prod() # product of number
 seq() # sequence of number
 rep() # repeating an input
 min() # minimum of a numeric vector
 max() # maximum of a numeric vector
 log() #logarithm with a base e
 exp()# exponentiation
 abs() # absolute value
 length() #no. of elements in the vector
 dim() # no. of rows and columns in a data frame
 sqrt()# square root
 factorial() #factorial of a given number
 choose() #combinations
 rank() # ranking of numbers
The following are some of the statistical functions in R:
 mean()
 median()
 sd() # standard deviation of a vector
 var() #variance of vector
 quantile() # quintiles
 skew() #skewness
 range() #range
 cor()#correaltion
 summary()#descriptive summary of data frames
3.Example:
 sum(1:100) # sum of first 100 numbers
 prod(1:10) # product of first 10 numbers
 seq(1:30) # sequence from 1 to 30
 rep(c(1,2,3),3) # repeating an input 3 times
 min(c(2,10,20,30)) # minimum of a numeric
vector
 max(c(10,30,100,7)) # maximum of a numeric
vector
 log(10) #logarithm with a base e
 exp(20)# exponentiation of 20
 abs(c(-2,-3,1)) # absolute value of vector with
negative elements
 length(c(2,2,8,9,10)) #no. of elements in the
vector
 dim(trees) # no. of rows and columns in trees
data frame
 sqrt(169)# square root of 169
 factorial(5) #factorial of 5
 choose(5,3) #5c3
 rank(c(2,10,12,7,9,3,14)) # ranking of elements in
the vector
# Statistical functions :
 mean(1:10) #mean of first 10 numbers
 median(20:40)# median of numbers from 20 - 40
 sd(c(2,2,3,10,10)) # standard deviation of a
vector
 var(c(12,10,20,31,21)) #variance of vector
 quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psysch)
 skew(c(23,13,14,20,12)) #skewness
 range(c(2,10,34,23,18)) #range
 cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correal
tion between x and y
 summary(trees) #descriptive summary of data
frames:trees
4.Procedure: We shall execute them in R script
5.R code output:
####################################
##Lab 2: Mathematical function in R
####################################
sum(1:100) # sum of first 100 numbers
prod(1:10) # product of first 10 numbers
seq(1:30) # sequence from 1 to 30
rep(c(1,2,3),3) # repeating an input 3 times
min(c(2,10,20,30)) # minimum of a numeric vector
max(c(10,30,100,7)) # maximum of a numeric vector
log(10) #logarithm with a base e
exp(20)# exponentiation of 20 by 2
abs(c(-2,-3,1)) # absolute value of vector with negative elements
length(c(2,2,8,9,10)) #no. of elements in the vector
dim(trees) # no. of rows and columns in trees data frame
sqrt(169)# square root of 169
factorial(5) #factorial of 5
choose(5,3) #5c3
rank(c(2,10,12,7,9,3,14)) # ranking of elements in the vector
################################
##Statistical functions in R
###############################
mean(1:10) #mean of first 10 numbers
median(20:40)# median of numbers from 20 - 40
sd(c(2,2,3,10,10)) # standard deviation of a vector
var(c(12,10,20,31,21)) #variance of vector
quantile(c(1,3,5,9,10)) # quartiles of a vector
library(psych)
skew(c(23,13,14,20,12)) #skewness
range(c(2,10,34,23,18)) #range
cor(x=c(12,18,23,12,10),y=c(1,3,8,4,12))#correaltion between x and y
summary(trees) #descriptive summary of data frames:trees

Screen shot of R syntax:


Practical-3: Indexing and Sub-Setting of
Data frames
1.Concept:

Indexing or sub-setting means taking the part of the data.


Data could be a vector or a dataframe.
-To subset a vector we use the [] operator
-To subset a dataframe we use three methods:
1. The $ operator
2. The [i,j] operator, ‘i’ for rows and ‘j’ for column
3. The subset () function

2. Examples & Procedure:

#############################################
##Practical-3: Indexing and Sub-setting of data frames
#############################################
## craeting a numeric vector x and charecter vector y
x=c(1,3,6,9,3,3)
y=c('a','b','c')
##subset of first three elements of x
x[1:3]
## subset of third and fifth element of x
x[c(3,5)]
## subset of last element of y
y[3]
## subsetting of a data frame 'trees'
## method 1: subsetting a column with $ operator
## subset 'volume' column from trees data frame
trees$Volume
## subset species column from 'iris' data frame
iris$Species
## method 2: using [] operator
## subset first 12 rows of trees data frame
trees[1:12,]
## subset first 1,4,5,9 rows of trees data frame
trees[c(1,4,5,9),]
## subset first two columns of trees data frame
trees[,c(1:2)]
## subset first three rows and first two columns
trees[c(1:3),c(1:2)]
## neagtive subsetting
trees[-(4:31),-3]
## method-3:subset() function
## find trees with height > 75 in trees data frame
subset(trees,trees$Height>75)
## find trees with height > 75 and volume>18in trees data frame
subset(trees,trees$Height>75&trees$Volume>18)
## find trees with height > 75 and volume>18 and girth<18 in
trees data frame
subset(trees,trees$Height>75&trees$Volume>18&trees$Girth<18
)
3.Output:
Practical-4: Looping Functions in R

1.Concept:

Looping means repeatedly performing a function on a vector or a data


frame. In R programming there are several looping functions:
1. T apply()
2. apply()
3. L apply()
4. S apply()

o T apply function is used on a vector across groups.


o Apply function is used on a dataframe : row wise and column wise.
o L apply function is used only on columns of a data frame.
o S apply function is a simplification of l apply function.

2.Examples and Procedure:


#############################################
##Practical-4: Looping functions in R
#############################################
## tapply() function
## find group(species) wise arthematic mean of Sepal.Length of iris data
tapply(iris$Sepal.Length,iris$Species,mean)
## find group wise median of petal width of iris data
tapply(iris$Petal.Width,iris$Species,median)
## apply() funtion
## find standard deviation all varibles in trees dataframe
apply(trees,2,sd)
## find summary all varibles in trees dataframe
apply(trees,2,summary)
### lapply()Function- lapply functions applies on columns
## the output is a list
lapply(trees,var)
## s apply -it is similar to lapply and simplifies the output of lapply
sapply(trees,var)

3.Output:
Practical-5: Basic Graphs with R

1.Concept:

 Graphs and charts means converting numbers into visuals


 The type of graph depends on the types of data
 We have two types of data: Quantitative and Qualitative
 We have three types of Graphs and Charts
o Uni-variate
o Bi-variate
o Multivariate

 From the above classification we have


o Plots for Univariate quant variables
o Plots for Univariate Qualitative variables
o Plots for Bivariate Quant variables
o Plots for Bivariate Qualitative variables
o Plots for Multivariate Quant variables

2.: Examples and Procedures:


 Plots for Univariate Quant variables:
o Histogram
o Box plot
o Density plot
 Plots for Univariate Qualitative variables:
o Pie chart
o Bar chart
 Plots for Bi variate Quant variables:
o Scatter plot
 Plots for Bivariate Qualitative variables:
o Stacked Bar plot
o Side by Side bar plot
 Plots for Multivariate Quant variables:
o Heat map
3.R – Code :
#################################
### Practical-5:Basic Graphs with R
###############################
###Plots for univariate quant variables
## Histogram
hist(trees$Girth)
## Box plot
boxplot(iris$Sepal.Length)
## Density plot
plot(density(chickwts$weight))
###Plots for univariate qualitative variables
pie(table(chickwts$feed))
## barchart
barplot(table(chickwts$feed))
## Plots for Bivariate Quant variables
##Scatter plot
plot(iris$Sepal.Length,iris$Sepal.Width)
## Plots for Bivariate Qualitative variables
## stacked bar diagram
## create gender and religion
gender=rep(c('m','f','m'),30)
religion=rep(c('H','M','C','O'),c(50,20,10,10))
barplot(table(gender,religion))
## side by side barplot
barplot(table(gender,religion),beside = T)
### Multivariate plots
## Heat map
library(psych)
psych::cor.plot(trees)
cor.plot(trees)
4.Screen shot of R syntax:
5.Graphs:
#Univariate quant variable:
a. Histogram:

b. Box plot:
c. Density plot:

# Univariate Qualitative variables


a.Pie chart:
b. Bar plot:

# Bivariate of Quant variables:


a. Scatter plot:
#Bivariate for Qualitative variables:
a. Stacked bar diagram:

b. Side by side bar plot:


#Multivariate plots Quant variables :
a. Heat Map:
Practical-6: Advanced Graphs with R packages
1.Concept:
 In R there are two special packages for advanced graphs: Lattice
and ggplot2
 Lattice package is based on grid system
 ggplot2 package is based on grammar of graphics
 We can draw both basic and advanced graphs with these
packages

2. Examples and Procedure:

Lattice package:
 Lattice package is built on the principle of grid system
 It uses formula interface
 Lattice Syntax format:(DV~ IV |group, data)

ggplot2 package:
 Based on grammar of graphics
 It uses three components in the syntax
o Data
o Aesthetics(aes)
o Geometry(geom)
3.R-code:
#############################
## Graphs with lattice package
## Graphs for quantitative variables
# Histogram
library(lattice)
histogram(~ Height,data=trees)
## Grouped Histogram
histogram(~Sepal.Length|Species,data=iris)
## Density plot
densityplot(~Sepal.Width,data=iris)
## grouped density plots
densityplot(~Sepal.Width|Species,data=iris)
## Box plot
bwplot(~Petal.Length,data=iris)
## Grouped Box plots
bwplot(~Petal.Length|Species,data=iris)
# Scatter plot
xyplot(Height~Girth,data=trees)
## Bar plot
barchart(~Sepal.Length|Species,data=iris)
######################
##plots with ggplot2
#####################
## univariate plots
## plots in Quantitative variables
library(ggplot2)
##Histogram
ggplot(data=trees,aes(x=Height))+geom_histogram(bins=10)
## Box plot
ggplot(data=trees,aes(x=Girth))+geom_boxplot()
##Density plot
ggplot(data=trees,aes(x=Volume))+geom_density()
##plots for qualitative variable
## Bar plot
ggplot(data=chickwts,aes(x=feed))+geom_bar()
##Pie chart
ggplot(data=chickwts,aes(x=feed))+geom_bar()+coord_polar()
##Bivariate plots
## Scatter plot
ggplot(data=trees,aes(x=Height,y=Volume))+geom_point()
4.Screen shot of R syntax:

5.Graphs:

# Lattice Package

#Quantitative variables

1.Histogram:
2.Grouped Histogram:

3.Density plot:
4.Grouped Density Plot:

5.Box plot:
6.Grouped Box plots:

7.Scatter plot:
8.Bar plot:

## ggplot2 package

## Univariate plots

##Quantitative variables
1. Histogram:

2. Box plot:

3.Density plot:
# Plots for Qualitative Variables:
1. Bar plot:

2. Pie chart:
##Bivariate plot:

1. Scatter plot:
Practical 7: Correlation and Regression with R

1.Concept:

a) Correlation:
- Correlation is a measure of linear association
between two numeric variables
- Consideration can be positive , negative , zero
- We can measure correlation by two methods:
o Pearson’s Correlation Coefficient
o Spearman’s Rank Correlation Coefficient
- Correlation between more than two variables is
called Partial Correlation
b) Regression:
 Regression tries to find functional relationship
between Dependent and Independent variable
 Regression is of two types: Linear and Non-linear

2.Example :

 We shall consider mtcars data from R base.


 We use the R function: cor(mtcars) to find out both
Pearson and spearman’s correlation coefficient
## Simple Linear Regression Model:

 We shall use trees data set for this


 DV=Volume , IV=Height
 R function for Linear Regression is lm()

3.Procedure: We shall execute them in R script.

 >Cor(mtcars) #Correlation Matrix


 ## Simple Linear Regression Model
>Model=lm(Volume~Height , data=trees)

 ## Summary of Model
>Summary(Model)

 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
 ##Multiple Linear Regression Model
#DV= Volume , IVS=Height , Girth
>Model=lm(Volume~Height+Girth,data=trees)

 ## Summary of Model
>Summary(Model)

 ## Finding MSE
>mse=(mean(Model$fitted-Model$Volume)^2)
4. R-Output:

## Finding correlation coefficients

## Pearson correlation coefficient between mpg and hp of mtcars


data

cor(mtcars$mpg,mtcars$hp)

## Spearman's Rank Correlation between mpg and cyl

cor(mtcars$mpg,mtcars$hp,method = 'spearman')

## Correlation Matrix

cor(mtcars)

##Simple Linear Regression

Model=lm(Volume~Height,data=trees)
summary(Model)

##R-Square value is 0.3579

##Finding MSE

MSE=mean((Model$fitted.values-trees$Volume)^2)

MSE

## We find R- Square as0.3579 and MSE 167.89

##Multiple Linear Regression

Model=lm(Volume~Height+Girth,data=trees)

##Summary

summary(Model)

## we find R-Square value is 0.9442

#Finding MSE

MSE=mean((Model$fitted.values-trees$Volume)^2)

MSE

## We find that MSE is 13.61

5. R-Syntax:
Practical-8: Sampling with R

1. Concept:
i. Sampling-
 Sampling means the method of taking a sample
from a population with or without replacement.
ii. Types of Sampling-
 There are two types of sampling
o Probability sampling
o Non – Probability sampling
iii. Probability Sampling- It is based on rules of
Probability. There are four types of Probability
sampling:
o Simple Random Sampling
o Systematic Random Sampling
o Stratified Random Sampling
o Cluster Sampling

2. Procedure: We shall execute the three different sampling


methods in R.
#Simple Random Sampling with/without Replacement of a
Vector.
###############################
##Practical 8- sampling with R
###############################
## simple random sampling
##Create a population pf 100 natural numbers
pop=1:100
##Random sample of 20 members without replacement
sample(pop,size = 20)
##Random sample of 20 members with replacement
sample(pop,size=20,replace = T)
##Systematic random sampling
##with 20 natural numbers take systematic random sampling of
3 elements.
##code for systematic sampling
obtain_sys=function(N,n){
k=ceiling(N/n)
r=sample(1:k,1)
seq(r,r+k*(n-1),k)}
obtain_sys(20,3)
##Stratified Random Sampling
## we shall do stratified randomsampling on chickwts data
library(sampling)
sampling::strata(chickwts,stratanames =
c('feed'),size=c(4,4,3,2,3,3))
3. R-output screen shots:
Practical-9: Time Series Analysis: Decomposition

1. Concept:
 Decomposition in time series means separating
the components of a time series.
 There are two types of Decompositions
o Additive : T+C+S+R
o Multiplicative : T*C*S*R
o Here T is Trend, C is Cycles, S is
Seasonality, and R is Random components
2. Procedure:
 From a given time series we shall calculate
Trend, Seasonality, Cycles separately and
then subtract three components from the
time series and get the random component.
3. Example:
 We shall consider a built in R-data set
‘JohnsonJohnson’ for decomposition.
 R-code for decomposition of
‘JohnsonJohnson’ data.

4. R-CODE:
#############

## Lab-9

#############

## Decomposition of Time series data

## data: Johnsonjohnson

#######################

# Loading the data

data("JohnsonJohnson")

## Structure of data

str(JohnsonJohnson)

## it is quaterely time series data

## Examing the time series plot

plot(JohnsonJohnson)

## we find that seasonality in the data is increasing

## we shall use multiplicative model :T *S*R

mod=decompose(JohnsonJohnson,type="multiplicative")

## Finding trend,seasonality and random components


mod_trend=mod$trend

mod_season=mod$seasonal

mod_rand=mod$random

##ploting the decomposition of johnson johnson data

plot(decompose(JohnsonJohnson))

## in the plot all the three components are shown

5. R-output:

You might also like