R Programming Lab Manual (1)
R Programming Lab Manual (1)
R Programming Lab Manual (1)
Lab Manual
III YEAR- I SEM
R22
2024-2025
Vision:
To emerge as the Data Science Centre of Excellence in the region, offer technology
services to industry, academia, and community
Mission:
o To provide modern infrastructure facilities to access, process, and analyze large-
scale data sets.
o Continuous empowerment of faculty and students in the latest technological
advancements in Data Science.
o To Strengthen industry connections, collaborate with academia, and develop
projects to serve the community by large.
Program Outcomes and Program Specific Outcomes
PO 2 An Ability to Identify, formulate and analyze complex engineering problems, solving them by
applying principles of mathematics and engineering sciences.
PO 3 An ability to design, effective and efficient solutions for problems to meet the requirements.
PO 5 An ability to adopt and apply modern technical concepts, tools and practices.
PO 7 Apply knowledge and skill set to develop solutions by considering environmental and
sustainability constraints.
PO 11 Ability to deliver cost effective solutions and contribute towards project management.
PO 12 Ability to engage in lifelong learning to abreast with modern technologies and practices.
Professional Skills and Foundations of Software development: Ability to analyze, design and
PSO 1
implement applications by adopting the dynamic nature of Software developments.
Applications of Computing and Research Ability: Ability to use knowledge in cutting edge
PSO 2
technologies in identifying research gaps and to render solutions with innovative ideas.
COURSE DESCRIPTION
Name of the Dept.: Data Science
Regulation R22
Team of Instructors
Course Objectives:
Familiarize with R basic programming concepts, various data structures for handling
datasets,
various graph representations and Exploratory Data Analysis concepts
Course Outcomes:
History of R Programming:
The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and Robert
Gentleman in the University of Auckland, New Zealand, and the R Development Core Team
currently develops it. This programming language name is taken from the name of both the
developers. The first project was considered in 1992. The initial version was released in 1995,
and in 2000, a stable beta version was released.
Features of R:
1. Statistical Analysis: R provides a wide array of statistical techniques such as linear and nonlinear
modeling, classical statistical tests, time-series analysis, classification, clustering, and more.
2. Graphics: R is well-known for its capabilities to produce high-quality graphical data
representations. You can create everything from simple graphs to complex multi-panel plots.
3. Packages: R's functionality is greatly enhanced by its package system. Thousands of packages are
available on CRAN (Comprehensive R Archive Network), allowing users to extend R's base
functionality.
4. Data Handling: R provides extensive tools for data manipulation, cleaning, and aggregation.
5. Programming: R is a full-fledged programming language, supporting loops, conditional
statements, and user-defined functions.
EXP 1: Download and install R-Programming environment and install basic
packages using install. packages () command in R
R-Environment Setup:
R programming is a very popular language and to work on that we have to install two things,
i.e., R and RStudio. R and RStudio works together to create a project on R.
Install R in Windows
Step 1:
When we click on Download R 3.6.1 for windows, our downloading will be started of R setup.
Once the downloading is finished, we have to run the setup of R in the following way:
1) Select the path where we want to download the R and proceed to Next.
2) Select all components which we want to install, and then we will proceed to Next
3) In the next step, we have to select either customized startup or accept the default, and then
we proceed to Next.
4) When we proceed to next, our installation of R in our system will get started:
5) In the last, we will click on finish to successfully install R in our system.
Installation of RStudio:
Step 1:
In the first step, we visit the RStudio official site and click on Download RStudio.
Step 2:
In the next step, we will select the RStudio desktop for open-source license and click on
download.
Step 3:
In the next step, we will select the appropriate installer. When we select the installer, our
downloading of RStudio setup will start.
Step 4:
In the next step, we will run our setup in the following way:
1) Click on Next.
2) Click on Install.
3) Click on finish.
# Integer
int <- 42L
print(paste("Integer:", int))
print(paste("Type:", typeof(int)))
# Character (String)
char <- "Hello, R!"
print(paste("Character:", char))
print(paste("Type:", typeof(char)))
# Logical (Boolean)
logi <- TRUE
print(paste("Logical:", logi))
print(paste("Type:", typeof(logi)))
# Complex
comp <- 2 + 3i
print(paste("Complex:", comp))
print(paste("Type:", typeof(comp)))
# Factor
fact <- factor(c("Male", "Female", "Male", "Female"))
print("Factor:")
print(fact)
print(paste("Type:", typeof(fact)))
# Date
date <- as.Date("2024-08-26")
print(paste("Date:", date))
print(paste("Type:", typeof(date)))
# Date-Time
datetime <- as.POSIXct("2024-08-26 14:00:00")
print(paste("Date-Time:", datetime))
print(paste("Type:", typeof(datetime)))
Output:
[1] "Numeric: 42.5"
[1] "Type: double"
[1] "Integer: 42"
[1] "Type: integer"
[1] "Character: Hello, R!"
[1] "Type: character"
[1] "Logical: TRUE"
[1] "Type: logical"
[1] "Complex: 2+3i"
[1] "Type: complex"
[1] "Factor:"
[1] Male Female Male Female
Levels: Female Male
[1] "Type: integer"
[1] "Date: 2024-08-26"
[1] "Type: double"
[1] "Date-Time: 2024-08-26 14:00:00"
[1] "Type: double"
2.Operators in R:
R supports various operators for performing calculations, comparisons, and logical
operations.
Arithmetic Operators
+ : Addition
- : Subtraction
* : Multiplication
/ : Division
^ : Exponentiation
%% : Modulus (remainder of division)
%/% : Integer division
Comparison Operators
== : Equal to
!= : Not equal to
> : Greater than
< : Less than
>= : Greater than or equal to
<= : Less than or equal to
Logical Operators
& : Logical AND
| : Logical OR
! : Logical NOT
Assignment Operators
<- : Assign value to a variable (left assignment)
-> : Assign value to a variable (right assignment)
<<- : Global assignment operator, used within functions to assign values to
variables in the global environment.
PROGRAM OPERATORS:
# Assigning values to variables
a <- 10
b <- 3
# 1. Arithmetic Operators
sum <- a + b
difference <- a - b
product <- a * b
quotient <- a / b
exponentiation <- a ^ b
remainder <- a %% b
int_division <- a %/% b
# 2. Comparison Operators
is_equal <- a == b
is_not_equal <- a != b
is_greater <- a > b
is_less <- a < b
is_greater_equal <- a >= b
is_less_equal <- a <= b
# 4. Assignment Operators
z <- 42 # Left assignment
42 -> w # Right assignment
# 3. Updating Variables
# Incrementing x by 5
x <- x + 5
# 1. Summation of Vectors
vector_sum <- vector1 + vector2
print("Summation of vectors:")
print(vector_sum)
# 2. Subtraction of Vectors
vector_difference <- vector1 - vector2
print("Subtraction of vectors:")
print(vector_difference)
# 3. Multiplication of Vectors
vector_product <- vector1 * vector2
print("Multiplication of vectors:")
print(vector_product)
# 4. Division of Vectors
vector_quotient <- vector1 / vector2
print("Division of vectors:")
print(vector_quotient)
Output:
[1] "Summation of vectors:"
[1] 15 24 33 42 51
# 1. Subsetting by Index
subset1 <- vector_A[2] # Extract the 2nd element
subset2 <- vector_A[3:5] # Extract elements from the 3rd to the 5th position
,,3
In R, a sequence of elements which share the same data type is known as vector. A
vector supports logical, integer, double, character, complex, or raw data type. The
elements which are contained in vector known as components of the vector. We
can check the type of vector with the help of the typeof() function.
1. Using the c() Function: The c() function combines values into a vector.
3. Using the rep() Function: The rep() function repeats elements of a vector.
my_vector <- rep(1:3, times=3) # Repeats the sequence 1, 2, 3
three times
Naming Vectors
You can assign names to the elements of a vector using the names() function. This
is useful for identifying elements within the vector.
1. Assigning Names:
# Output: 10
Introduction to Lists
A list in R is an ordered collection of elements, where each element can be
of a different type or structure. Lists are particularly useful when dealing
with datasets that are heterogeneous in nature.
Key Characteristics of a List:
Heterogeneous Elements: Lists can store elements of different types (e.g.,
numeric, character, logical, and even other lists).
Indexing: Elements in a list can be accessed by their index or by their names
(if the list elements are named).
Nested Structure: Lists can contain other lists, allowing for complex, nested
data structures.
PROGRAM:
Creating a List
You can create a list in R using the list() function. Here's how you can create
a simple list:
Example 1: Creating a Basic List
# Creating a list with different types of elements
my_list <- list(
name = "Alice",
age = 25,
height = 5.5,
is_student = TRUE
)
Output:
name
[1] "Alice"
age
[1] 25
height
[1] 5.5
is_student
[1] TRUE
Output:
numbers
[1] 1 2 3 4 5
letters
[1] "A" "B" "C"
Output:
[1] "Alice"
[1] 5.5
Introduction to Data Frame:
A data frame in R is one of the most commonly used data structures, particularly
for data analysis. It is similar to a table or spreadsheet in that it organizes data into
rows and columns, where each column can hold different types of data (e.g.,
numeric, character, factor). Data frames are powerful because they allow you to
work with and manipulate structured datasets efficiently.
Key Features of a Data Frame
1. Tabular Structure: A data frame is essentially a list of vectors of equal
length, where each vector forms a column.
2. Mixed Data Types: Unlike matrices, which can only hold one type of data,
data frames can have different types of data in different columns (e.g.,
numeric, character, factor).
3. Row and Column Names: Data frames have row names (often just
numbers) and column names (which describe the data in each column).
4. Data Manipulation: R provides various functions for data manipulation,
such as subsetting, filtering, and transforming data frames.
PROGRAM:
Creating a Data Frame:
Output:
Name Age Gender Height Weight
2 Bob 30 Male 6.0 150
3 Charlie 35 Male 5.8 180
EXP-5: Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart
along with chart legend by considering suitable CSV file
PROGRAM:
install.packages("plotrix")
# Load necessary library
library(plotrix) # for 3D pie chart
3D Pie chart:
Bar Chart:
# i) Box Plots
boxplot(speed, distance,
names = c("Speed", "Distance"),
main = "Box Plots for Speed and Distance",
col = c("lightblue", "lightgreen"))
# ii) Histogram
hist(speed,
main = "Histogram of Speed",
xlab = "Speed",
col = "lightblue",
border = "black")
hist(distance,
main = "Histogram of Distance",
xlab = "Distance",
col = "lightgreen",
border = "black")
# iii) Line Graph
plot(speed, type = "l", col = "blue", xlab = "Index", ylab = "Speed",
main = "Line Graph of Speed")
# v) Scatter Plot
plot(speed, distance,
main = "Scatter Plot of Speed vs Distance",
xlab = "Speed",
ylab = "Distance",
col = "darkgreen",
pch = 19)
Output:
Box Plots:
Histogram:
Line Graph:
Multiple Line Graphs:
Scatter Plot:
EXP-8: Write an R program to read a csv file and analyze the data in the file using
EDA (Explorative Data Analysis) techniques
Exploratory Data Analysis refers to the critical process of performing initial
investigations on data so as to discover patterns,to spot anomalies,to test
hypothesis and to check assumptions with the help of summary statistics and
graphical representations.
It is a good practice to understand the data first and try to gather as many insights
from it. EDA is all about making sense of data in hand,before getting them dirty
with it.
PROGRAM:
library('ggvis')
library('tidyverse')
library('ggplot2')
bike_buyers = read.csv(“C:/Users/SPHOORTHY/Downloads/bike_buyers.csv”,
header=T, na.strings='')
head(bike_buyers)
class(bike_buyers)
str(bike_buyers)
summary(bike_buyers)
levels(bike_buyers$Gender)
1. 'Female'
2. 'Male'
str(bike_buyers)
colSums(is.na(bike_buyers))
ID
0
Marital.Status
7
Gender
11
Income
6
Children
8
Education
0
Occupation
0
Home.Owner
4
Cars
9
Commute.Distance
0
Region
0
Age
8
Purchased.Bike
0
summary(bike_buyers)
hist(bike_buyers$Age)
Dealing with NA values¶
Since, the distribution of Income and Age is left-skewed. We will impute median
values
median(na.omit((bike_buyers$Income)))
median(na.omit((bike_buyers$Age)))
60000
43
bike_buyers_clean <- bike_buyers
colSums(is.na(bike_buyers_clean))
ID
0
Marital.Status
7
Gender
11
Income
6
Children
8
Education
0
Occupation
0
Home.Owner
4
Cars
9
Commute.Distance
0
Region
0
Age
8
Purchased.Bike
0
counts <- table(bike_buyers$Cars, bike_buyers$Gender)
barplot(counts, main = '',
xlab="Number of Gears",
legend = rownames(counts))
plot(bike_buyers$Income, type= "p")
plot(density(bike_buyers$Income), main='Income Density Spread')
# Independent variables
feature1 <- runif(size, 50, 100)
feature2 <- runif(size, 20, 80)
feature3 <- runif(size, 30, 90)
Comparison:
Multiple Linear Regression: