R Lab File Deepak

INTRODUCTION OF R
R is an open-source programming language that is widely used as a statistical software and

data analysis tool. R generally comes with the Command-line interface. R is available
across widely used platforms like Windows, Linux, and macOS. Also, the R programming
language is the latest cutting-edge tool.
It was designed by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team. R
programming language is an implementation of the S programming language. It also
combines with lexical scoping semantics inspired by Scheme. Moreover, the project
conceives in 1992, with an initial version released in 1995 and a stable beta version in
2000.
WHY R PROGRAMMING LANGUAGE?
 R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R .
 R programming language is not only a statistic package but also allows us to
integrate with other languages (C, C++). Thus, you can easily interact with many
data sources and statistical packages.
 R is currently one of the most requested programming languages in the Data Science
job market that makes it the hottest trend nowadays .
FEATURES OF R PRGRAMMING LANGAUGE
1. STATISTICAL FEATURES OF R:
 Basic Statistics: The most common basic statistics terms are the mean, mode,
and median. These are all known as “Measures of Central Tendency.” So using
the R language we can measure central tendency very easily.
 Static graphics: R is rich with facilities for creating and developing interesting
static graphics. R contains functionality for many plot types including graphic
maps, mosaic plots, biplots, and the list goes on.
 Probability distributions: Probability distributions play a vital role in statistics
and by using R we can easily handle various types of probability distribution
such as Binomial Distribution, Normal Distribution, Chi-squared Distribution
and many more.
 Data analysis: It provides a large, coherent and integrated collection of tools for
data analysis .
2. PROGRAMMING FEATURES OF R
 R Packages: One of the major features of R is it has a wide availability of

libraries.
 Distributed Computing: Components of a software system are shared among
multiple computers to improve efficiency and performance. Two new
EXPERIMENT-1
To perform the basic arithmetic operations.
a <- 7.5
b <- 2
print ( a+b ) #addition
print ( a-b ) #subtraction
print ( a*b ) #multiplication
print ( a/b ) #Division
print ( a%%b ) #Reminder
print ( a%/%b ) #Quotient
print ( a^b ) #Power of
OUTPUT-
EXPERIMENT-2
TO PERFORM THE DATA FRAMING.
# R program to create dataframe
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)
OUTPUT-
EXPERIMENT-3
TO PRINT MAXIMUM AND MINIMUM VALUE USING DATA
FRAMING.
# create a dataframe
data=data.frame(column1=c(23,4,56,21),
column2=c("sai","deepu","ram","govind"),
column3=c(1.3,4.6,7.8,6.3))
# get the minimum value in first column
print(min(data$column1))
# get the minimum value in second column
# get the minimum value in third column
# get the maximum value in first column
print(max(data$column1))
# get the maximum value in second column
# get the maximumvalue in third column
OUTPUT-
EXPERIMENT-4
TO GET THE INPUT FROM THE USER AND PERFORM
NUMERICAL OPERATIONS(MAX, MIN, AVG, SUM, SORT,
ROUND)IN R
print("Enter the numbers: ")
x = scan()
print(x)
print("Max value is: ")
max(x)
print("Min value is: ")
min(x)
print("Average is: ")
print(mean(x))
print("Sum is: ")
sum(x)
print("Sorted array is: ")
sort(x)
data <- c(.3, 1.03, 2.67, 5, 8.91)
round(data, digits = 1)
OUTPUT-
EXPERIMENT-5
TO PERFORM DATA IMPORT/ EXPORT (.CSV, .XLS, .TXT)
OPERATIONS USING DATA FRAMES IN R.
#Read csv file
code <- read.csv("c:\\Users\\deepak\\Desktop\\TEST.csv")
code
#Read xls file
library(readxl)
mydatasheet <- read_excel("c:\\Users\\deepak\\Desktop\\TEST2.xls")
OUTPUT-
EXPERIMENT-6
TO GET THE INPUT MATRIX FROM THE USER AND PERFORM
MATRIX ADDITION, SUBTRACTION, MULTIPLICATION, INVERSE
TRANSPOSE, AND DIVISION OPERATIONS USING VECTOR
CONCEPT IN R.
#matrix a
data.a = scan()
matrix.a <-matrix(data.a,nrow = 3,ncol = 3, byrow = TRUE)
matrix.a
#matrix b
data.b = scan()
matrix.b <-matrix(data.b,nrow = 3,ncol = 3, byrow = TRUE)
matrix.b
#addition
data.a+data.b
sum<-data.a+data.b
matrix.sum <-matrix(sum,nrow = 3,ncol = 3,byrow = TRUE)
matrix.sum
#subtraction
data.a-data.b
diff<-data.a-data.b
matrix.diff <-matrix(diff,nrow = 3,ncol = 3,byrow = TRUE)
matrix.diff
#multiplication
data.a*data.b
mul<-data.a*data.b
matrix.mul<-matrix(mul,nrow = 3,ncol = 3,byrow = TRUE)
matrix.mul
#transpose
t(data.a)
t<-t(data.a)
matrix.t<- matrix(t,nrow = 3,ncol = ,byrow = TRUE)
matrix.t
#determinant of matrix a
det(matrix.a)
#inverse of matrix a
solve(matrix.a)
OUTPUT-
EXPERIMENT-7
TO PERFORM STATISTICAL OPERATIONS (MEAN, MEADIAN,
MODE AND STANDARD DEVIATION) USING R.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
# Find the median.
median.result <- median(x)
print(median.result)
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
# Calculate the mode using the user function.
result <- getmode(v)
print(result)
#standard deviation
v <- c(12,24,74,32,14,29,84,56,67,41)
s<-sd(v)
print(s)
OUTPUT-
EXPERIMENT-8
TO PERFORM DATA PREPROCESSING OPERAIONS
i) HANDLING MISSING DATA
ii) MIN-MAX NORMALIZATION
# load packages and data
#install.packages("caret")
library(caret)
# creating a dataset
data = data.frame(var1=c(120, 345, 145, 122, 596, 285, 211),
var2=c(10, 15, 45, 22, 53, 28, 12),
var3=c(-34, 0.05, 0.15, 0.12, -6, 0.85, 0.11))
data
# summary of data
summary(data)
# preprocess the data
preproc <- preProcess(data, method=c("range"))
# perform normalization
norm <- predict(preproc, data)
head(norm)
# checking summary after normalization
summary(norm)
OUTPUT-
EXPERIMENT-9
TO PERFORM DIMENSIONALITY REDUCTION OPERATION USING
PCA FOR HOUSES DATA SET
install.packages("stats")
install.packages("dplyr")
#importing the libraries
library(stats)
library(dplyr)
# Iris data set
View(iris)
#unsupervised learning - hence converting iris data to unlabelled data set
mydata = select(iris,c(1,2,3,4))
#PAC eligibility
cor(mydata)
mean(cor(mydata))
#PRINCIPLE COMPONENT ANALYSIS
PCA = princomp(mydata)
#evaluate the PCA
PCA$loadings
OUTPUT-
EXPERIMENT-10
TO PERFORM SIMPLE LINEAR REGRESSION WITH R
#import the file to apply linear regression
ads <- read.csv('d:/TEST1.csv')
#viewing columns, rows, column names etc

View(ads)
nrow(ads)
ncol(ads)
colnames(ads)
TV <- ads$TV
Sales <- ads$sales
#plotting the values in graph

plot(TV,Sales)
plot(TV,Sales,pch=16,cex=1,col='blue',
main='TV vs Sales',xlab = 'TV',ylab = 'Sales')
#applying linear regression model(least square method)

model <- lm(Sales ~ TV)
summary(model)
abline(model)
OUTPUT-
EXPERIMENT-11
TO PERFORM K-MEANS CLUSTERING OPERATIONS AND
VISUALIZE FOR IRIS DATA SET.
#install the required libraries
install.packages("stats")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggfortify")
#import the libraries

library(stats)
library(dplyr)
library(ggplot2)
library(ggfortify)
#unsupervised learning - Hence converting iris data set into unlabblled data set
View(iris)
mydata = select(iris,c(1,2,3,4))
#WSS plot function to choose the cluster

wssplot(mydata)
#spotting the kink in the cluster (in this case it is at 2)
#applying k-mean cluster analysis

KM = kmeans(mydata,2)
#cluster plot
autoplot(KM,mydata,frame=TRUE)
#cluster centers
KM$centers
OUTPUT-
EXPERIMENT-11
LEARN HOW TO COLLECT DATA VIA WEB SCRAPPING, API’s
AND DATA CONNECTORS FROM SUITABLE SOURCES AS
SPECIFIED BY THE INSTRUCTOR.
#install the packages
#install.packages("dplyr")
#install.packages("rvest")
#import the library

library("rvest")
library("dplyr")
#creating the link variable for the desired website

link = "https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm"
page = read_html(link)
name = page %>% html_nodes("#main a") %>% html_text()

year = page %>% html_nodes("a+ .secondaryInfo") %>% html_text()
ratings = page %>% html_nodes("strong") %>% html_text()
OUTPUT-
EXPERIMENT-12
PERFORM ASSOCIATION ANALYSIS ON A GIVEN DATASET AND
EVALUATE ITS ACCURACY.
#get the data
#install the libraries
#install.packages("arules")
library("arules")
data("Groceries")
head(as(Groceries,"list"),5)
#apriori algo
model = apriori(data = Groceries,
parameter = list(support = 0.001,
confidence = 0.15))
#visualization
inspect(sort(model,by = 'lift')[1:10])
OUTPUT-

R Lab File Deepak

Uploaded by

Copyright:

Available Formats

R Lab File Deepak

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Lab File Deepak

Uploaded by

Copyright:

Available Formats

INTRODUCTION OF R

R is an open-source programming language that is widely used as a statistical software and

WHY R PROGRAMMING LANGUAGE?

FEATURES OF R PRGRAMMING LANGAUGE

 R Packages: One of the major features of R is it has a wide availability of

To perform the basic arithmetic operations.

#viewing columns, rows, column names etc

#plotting the values in graph

#applying linear regression model(least square method)

#import the libraries

#WSS plot function to choose the cluster

#applying k-mean cluster analysis

#import the library

#creating the link variable for the desired website

name = page %>% html_nodes("#main a") %>% html_text()

You might also like