Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

11 RIC Journal

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 46

THAKUR COLLEGE OF

SCIENCE AND COMMERCE

Affiliated to University of Mumbai 2022 – 2023

PRACTICAL JOURNAL OF
RESEARCH IN COMPUTING
SUBJECT GUIDE
NILESH SINGH
(Asst. Professor)

SUBMITTED BY:
BHAVESH SUTRAVE
ROLL NO: 111

SUBMITTED IN PARTIAL FULFILLMENT OF THE


REQUIREMENTS

FOR QUALIFYING M. Sc.IT PART I


(SEMESTER I EXAMINATION)

1
CERTIFICATE OF APPROVAL

This is to certify that Mr. Bhavesh Sutrave student of “Master of Science


(Information Technology)” of “Thakur College of Science and Commerce”.
Roll No. 111 has successfully completed and submitted the Practical &
Assignment entitled “Research in Computing” in the partial fulfilment as per
the syllabus defined by the University of Mumbai in the academic year 2022-
2023.

It is further certified that the student has completed all the required phases of the
practical & assignment.

__________________________
HEAD OF DEPARTMENT

_________________________ ________________________
PROFESSOR INCHARGE EXTERNAL EXAMINER

2
INDEX

SR. NO PRACTICAL NAME DATE PAGE NO SIGN

1 Write a program to create and save the pie 26-08-2022 4


chart in the current R working directory
2 Perform program based on Data Frame/ 02-09-2022 8
CSV operation
3 Write a program to draw the line graph in 20-09-2022 12
the other directory. Draw a line based on
the Temp and sales parameters of the
above data.
4 Perform nominal, Interval/Ratio and 04-11-2022 14
Ordinal variables using Factor () method.
5 Write a program to read JSON file and 06-11-2022 19
convert JSON data into DataFrame
6 Perform program based on Return value 05-09-2022 21
from Function
7 Data 'mtcars' 27-09-2022 24

8 Use Covid data and perform following: 11-11-2022 26

9 Write a program to visualize and analyze 27-09-2022 38


the statistic of sample data using plot
chart, histogram, multi-line, Scattered plot.

3
Practical No: 1
Aim: Write a program to create and save the pie chart in the current R working
directory
a. X=21, 62, 10, 53 Y="London", "New York", "Singapore", "Mumbai"

b. Salary and name of employee:

Description:
R Programming language has numerous libraries to create charts and graphs. A
pie-chart is a representation of values as slices of a circle with different colors.
The slices are labeled and the numbers corresponding to each slice is also
represented in the chart.
In R, the pie chart is created using the pie() function which takes positive
numbers as a vector input. The additional parameters are used to control labels,
color, title etc. We can expand the features of the chart by adding more
parameters to the function. We will use parameter main to add a title to the chart
and another parameter is col which will make use of rainbow colour pallet while
drawing the chart. 
Syntax: pie(x, labels, radius, main, col, clockwise)
Parameters:
 x: This parameter is a vector that contains the numeric values which are
used in the pie chart.
 labels: This parameter gives the description to the slices in pie chart.
 radius: This parameter is used to indicate the radius of the circle of the pie
chart. (Value between -1 and +1).
 main: This parameter is representing title of the pie chart.
 clockwise: This parameter contains the logical value which indicates
whether the slices are drawn clockwise or in anti-clockwise direction.
 col: This parameter gives colors to the pie in the graph.

4
(a) Data: X=21, 62, 10, 53 Y="London", "New York", "Singapore", "Mumbai"

Source Code:
#Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
#Give the chart file a name.
png(file = "city_title_colours.jpg")
#Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x))) #Save the file.
dev.off()

#Create data for the graph.


x <- c(21, 62, 10, 53)
labels <- c("London","New York","Singapore","Mumbai")
piepercent<- round(100*x/sum(x), 1)
#Give the chart file a name.
png(file = "city_percentage_legends.jpg")
#Plot the chart.
pie(x, labels = piepercent, main = "City Pie Chart",col = rainbow(length(x)))
legend("topright", c("London","New York","Singapore","Mumbai"), cex = 0.8, fill =
rainbow(length(x)))
#Save the file.
dev.off()

library(plotrix)
#Create data for the graph.
x <- c(21, 62, 10, 53)
lbl <- c("London","New York","Singapore","Mumbai")
#Give the chart file a name.
png(file = "3d_pie_chart.jpg")
#Plot the chart.
pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries")
#Save the file.
dev.off()

5
Output:

6
(b) Data: Salary and name of employee:
emp_id emp_name salary start_date
1 Rick 623.30 2012-01-01
2 Dan 515.20 2013-09-23
3 Michelle 611.00 2014-11-15
4 Ryan 729.00 2014-05-11
5 Gary 843.25 2015-03-27

Source Code:
library(plotrix)
data<-read.csv("Employee.csv")
data
png(file = "Prac.jpg")
pie3D(data$Salary,labels=data$emp_name,radius=1.5,main = "PIE CHART OF THE
EMPLOYEE")
dev.off()

7
Output:

Practical No: 2
Aim: Perform program based on DataFrame/ CSV operation
a. Convert given data in DataFrame
b. Print TRUE if data available and find the number of column and rows
c. Find Average of total sales
d. Add 3 rows in DataFrame
e. Find day, date when maximum sales are done
f. Find day, date when minimum sales are done
g. Find maximum sales based on temperature
h. Add new column Total
Data:

8
Date Day TemperatureRainfall Flyer Price Sales
01/01/2017 Sunday 27 2 15 0.3 10
01/02/2017 Monday 28.9 1.3 15 0.3 13
01/03/2017 Tuesday 34.5 1.3 27 0.3 15
01/04/2017 Wednesday 44.1 1.1 28 0.3 17
01/05/2017 Thursday 42.4 1 33 0.3 18
01/06/2017 Friday 25.3 1.5 23 0.3 11
01/07/2017 Saturday 32.9 1.5 99 0.5 13
01/08/2017 Sunday 37.5 1.2 28 0.5 15
01/09/2017 Monday 38.1 1.2 20 0.5 17
01/10/2017 Tuesday 43.4 1.1 33 0.5 18
01/11/2017 Wednesday 32.6 1.5 23 0.5 12
01/12/2017 Thursday 38.2 1.3 16 0.5 14
01/13/2017 Friday 37.5 1.3 19 0.5 15
01/14/2017 Saturday 44.1 1.1 23 0.3 17
01/15/2017 Sunday 43.4 1.1 33 0.3 18
01/16/2017 Monday 30.6 1.7 24 0.3 12
01/17/2017 Tuesday 32.2 1.4 26 0.3 14

Description:
In R, we can read data from files stored outside the R environment. We can also
write data into files which will be stored and accessed by the operating system.
R can read and write into various file formats like csv, excel, xml etc. The
contents of a CSV file can be read as a data frame in R using the read.csv
function. The CSV file to be read should be either present in the current
working directory or the directory should be set accordingly using the setwd
command in R. The CSV file can also be read from a URL
using read.csv() function.
Source Code:
#Perform program based on dataframe/ csv operation (112)
data = read.csv("sales.csv")
#Converting given data in dataframe
test <- data.frame(data)
print(test)
#Printing TRUE if data available and finding the number of column and rows
nr <- nrow(test)
nc <- ncol(test)
cat("Data available ",all(test !=0),"\n")
cat("Number of rows ",nr,"\n")
cat('Number of columns ',nc,"\n")
#Finding average of total sales
cat("Average of sales",mean(test$Sales),"\n")

9
#Adding 3 rows in dataframe
new.data <- data.frame(
Date = c("1/11/2017","1/12/2017","1/13/2017"),
Day = c("Wednesday","Thursday","Friday"),
Temperature = c(32.6,38.2,37.5),
Rainfall = c(1.54,1.33,0.33),
Flyers = c(23,16,19),
Price = c(0.5,0.5,0.5),
Sales = c(12,14,15)
)
final.data <- rbind(test,new.data)
print(final.data)
print("3 rows are added")
#Finding maximum sales
salesmax<-max(new.data$Sales)
print(salesmax)
#Finding day, date when maximum sales are done
data2<-subset(new.data,(new.data$Sales==salesmax),Select=c(Date,Day))
print(data2)
#Finding minimum sales
salesmin<-min(new.data$Sales)
print(salesmin)
#Finding day, date when minimum sales are done
data3<-subset(new.data,(new.data$Sales==salesmin),Select = c(Day,Date))
print(data3)
#Finding maximum sales based on temp.
data4 <- tapply(data$Sales, data$Temperature, max)
print(data4)
#Adding new column Total
final.data$Total <- c("Mercedes-Benz", "Tesla", "BMW", "Volvo", "Audi",
"Porsche", "Lexus", "Lamborghini", "Ferrari", "Land Rover", "Cadillac", "Jaguar",
"Rolls-Royce", "Bugatti", "Aston Martin", " Genesis", "Pagani", "Koenigsegg", "
Bentley", " McLaren")
v<-final.data
print(v)
Output:

10
11
12
Practical No: 3
Aim: Write a program to draw the line graph in the other directory. Draw a line
based on the Temp and Sales parameters of the above data.
Data:
Date Day Temperature Rainfall Flyers Price Sales
1/1/2017 Sunday 8 2 15 0.3 12
2/1/2017 Monday 12 1.33 15 0.3 5
3/1/2017 Tuesday 22 1.33 17 0.3 6
4/1/2017 Wednesday 3 1.05 28 0.3 16
5/1/2017 Thursday 31 1 33 0.3 7

Description:
A line chart is a graph that connects a series of points by drawing line segments
between them. These points are ordered in one of their coordinate’s values. Line
charts are usually used in identifying the trends in data. The plot() function in R
is used to create the line graph.
Syntax: plot(v,type,col,xlab,ylab)
Parameters:
 v is a vector containing the numeric values.
 type takes the value "p" to draw only the points, "l" to draw only the lines and
"o" to draw both points and lines.
 xlab is the label for x axis.
 ylab is the label for y axis.
 main is the Title of the chart.
 col is used to give colors to both the points and lines.

Source Code:

data = read.csv("DTST.csv")
print(data)
t <-data[, c('Temperature')]
s <-data[, c('Sales')]
print(t)
print(s)
# Give the chart file a name. png(file = "Sales_Temp_Line2.jpg")
# Plot the bar chart.
plot(t,type = "o",col = "red", xlab = "Day", ylab = "", main = "Sales & Temperature")
lines(s, type = "o", col = "blue")
# Save the file.

13
dev.off()

Output:

14
Practical No: 4
Aim: Perform nominal, Interval/Ratio and Ordinal variables using Factor ()
method.
a. Write a R program to find the levels of factor of a given vector.
b. Write a R program to change the first level of a factor with another level
of a given factor.
c. Write a R program to create an ordered factor from data consisting of the
names of months.
d. Write a R program to concatenate two given factors in a single factor.
e. Write a R program to convert a given pH levels of soil to an ordered
factor.
Description:
Factors in R Programming Language are data structures that are implemented to
categorize the data or represent categorical data and store it on multiple levels. 
They can be stored as integers with a corresponding label to every unique
integer. Though factors may look like character vectors, they are integers and
care must be taken while using them as strings. The factor accepts only a
restricted number of distinct values. Factors are the data objects which are used
to categorize the data and store it as levels. They can store both strings and
integers. They are useful in the columns which have a limited number of unique
values. Like "Male, "Female" and True, False etc. They are useful in data
analysis for statistical modelling. all the possible cases are known beforehand
and are predefined. These distinct values are known as levels. After a factor is
created it only consists of levels that are by default sorted.
Syntax: gl(n, k, labels)
Parameters:
 x: It is the vector that needs to be converted into a factor.
 Levels: It is a set of distinct values which are given to the input vector x.
 Labels: It is a character vector corresponding to the number of labels.
 Exclude: This will mention all the values you want to exclude.
 Ordered: This logical attribute decides whether the levels are ordered.
 nmax: It will decide the upper limit for the maximum number of levels.

15
Source Code:
#Write a R program to find the levels of factor of a given vector.
data = c("Mercedes", "BMW", "Lamborghini", "Bugatti", "Porsche", "Ferrari")
print(data)
a = factor(data)
print(a)

#Write a R program to change the first level of a factor with another level of a given
factor.
A= c("1", "1", "2", "3", "4")
print ("Original vector is: ")
print(A)
fa = factor(A)
print ("Factor of the vector is:")
print(fa)

#Write a R program to create an ordered factor from data consisting of the names of
months.
mons_v = c("March","April","January","November","January",
"September","October","September","November","August","February",
"January","November","November","February","May","August","February",
"July","December","August","August","September","November","September",
"February","April")
print("Original vector:")
print(mons_v)
f = factor(mons_v)
print("Ordered factors of the said vector:")
print(f)
print(table(f))

#Write a R program to concatenate two given factor in a single factor.


fac1 <- as.factor (letters [1:5])
print ("Factor1: ")
print (fac1)
sapply(fac1,class)
fac2 <- as.factor (c("Bhavesh"))
print ("Factor2: ")
print (fac2)
sapply(fac2,class)
# Combine into one factor
concatenate <- unlist (list (fac1, fac2))
print ("Concatenate Factor: ")
print (concatenate)
sapply(concatenate,class)

16
#Write a R program to convert a given pH levels of soil to an ordered factor.
ph = c(1,3,10,7,5,4,3,7,8,7,5,3,10,10,7)
print("Original data:")
print(ph)
ph_f = factor(ph,levels=c(3,7,10),ordered=TRUE)
print("pH levels of soil to an ordered factor:")
print(ph_f)

# Write a R program to extract the five of the levels of factor created from a random
sample from the LETTERS
L = sample(LETTERS,size=50,replace=TRUE)
print("Original data:")
print(L)
f = factor(L)
print("Original factors:")
print(f)
print("Only five of the levels")
print(table(L[1:5]))

#Write a R program to create factor corresponding to height of the woman data set
which contains height and weighs for the sample of woman
data = women
print ("Women data set of height and weights:")
print(data)
height_f = cut (women$height, 3)
print ("Factor corresponding to height:")
print (table(height_f))
Output:

17
18
19
Practical No: 5
Aim: Write a program to read JSON file and convert JSON data into
DataFrame.
Data:
input.json:
{
"ID”: ["1","2","3","4","5","6","7","8" ],
"Name”: ["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
"Salary”: ["623.3","515.2","611","729","843.25","578","632.8","722.5" ],

"StartDate”:
[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
"7/30/2013","6/17/2014"],
"Dept”: [ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}

Description:
JSON stands for JavaScript Object Notation. These files contain the data in
human readable format, i.e., as text. Like any other file, one can read as well as
write into the JSON files. In order to work with JSON files in R, one needs to
install the “rjson” package. The JSON file is read by R using the function
from JSON(). It is stored as a list in R. We can convert the extracted data to a R
data frame for further analysis using the as.data.frame() function.

Source Code:
# Load the package required to read JSON files.
library("rjson")
# Give the input file name to the function.
result <- fromJSON(file = "input.json")
# Print the result.
print(result)
# Convert JSON file to a data frame.
json_data_frame <- as.data.frame(result)
print(json_data_frame)

20
Output:

21
Practical No: 6
Aim: Perform program based on Return value from Function
a. Write a program to check whether number is Palindrome or not using user
defined function with return statement.
b. Write a program to check Factorial, Prime number using user defined
function with return statement.

Description:
A function is a set of statements organized together to perform a specific task. R
has a large number of in-built functions, and the user can create their own
functions. In R, a function is an object so the R interpreter can pass control to
the function, along with arguments that may be necessary for the function to
accomplish the actions. The function in turn performs its task and returns
control to the interpreter as well as any result which may be stored in other
objects. Functions are useful when you want to perform a certain task multiple
time. A function accepts input arguments and produces the output by executing
valid R commands that are inside the function. In R Programming Language
when you are creating a function the function name and the file in which you
are creating the function need not be the same and you can have one or more
function definitions in a single R file. Many a times, we will require our
functions to do some processing and return the result. This is accomplished with
the return () function in R.
Syntax: return(expression)
(a) Write a program to check whether number is Palindrome or not using
user defined function with return statement.
Source Code:
#PALINDROME
# Create a function to print palindrome of input
newFn.palindrome <- function(n) {
rev = 0
num = n
while (n > 0) {
r = n %% 10
rev = rev * 10 + r
n = n %/% 10
}
if (rev == num) {
result <- paste("Number is palindrome :", rev)
22
}else{
result <- paste("Number is Not palindrome :", rev)
}
return(result)
}
newFn.palindrome(4568)

Output:

(b) Write a program to check Factorial, Prime number using user defined
function with return statement.
Source Code:
#FACTORIAL
findfactorial <- function(n){

factorial <- 1

if ((n==0)|(n==1))
factorial <- 1

else{
for( i in 1:n)
factorial <- factorial * i
}
return (factorial)

23
}
findfactorial(4)

#PRIME
prime_numbers <- function(n) {
if (n >= 2) {
x = seq(2, n)
prime_nums = c()
for (i in seq(2, n)) {
if (any(x == i)) {
prime_nums = c(prime_nums, i)
x = c(x[(x %% i) != 0], i)
}
}
return(prime_nums)
}
else
{
stop("Input number should be at least 2.")
}
}
prime_numbers(12)
Output:

Practical No: 7
24
Aim: Write a program to use data set "mtcars" available in the R environment
to create a basic scatterplot.
Data: mtcars: A built-in dataset dataset in R that contains measurements on 11
different attributes for 32 different cars.
Description:
A data set is a collection of data, often presented in a table. There is a popular
built-in data set in R called "mtcars" (Motor Trend Car Road Tests), which is
retrieved from the 1974 Motor Trend US Magazine. It is a built-in dataset
dataset in R that contains measurements on 11 different attributes for 32
different cars. A scatter plot is a set of dotted points to represent individual
pieces of data in the horizontal and vertical axis. A graph in which the values of
two variables are plotted along X-axis and Y-axis, the pattern of the resulting
points reveals a correlation between them.
Source Code:
# Get the input values.
input <- mtcars[,c('wt','mpg')]
# Give the chart file a name.
png(file = "scatterplot.png")
# Plotting the chart for cars with weight between 2.5 to 5 and mileage between 15 and
30.
plot(x = input$wt,y = input$mpg, xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Mileage" )
# Save the file.
dev.off()
input <- read.csv('mtcars.csv')
#Identifying NAs in specific column or location
is.na(input)
#Identifying count of NAs in data frame
colSums(is.na(input))

25
Output:

Practical No: 8

26
Aim: Use Covid data and perform following:
a. Compare the total covid cases of one month with 5 district/states and
show the best recoverable percentage of the state
b. Compare cases raised from previous month to current month.
c. Plot a Pie chart/ Histogram to get total cases, recovered cases and total
tests [Maharashtra Only]
d. Get the month where maximum patients were recovered & what is the test
ratio.
e. Compare country – wise the Covid cases in terms of total case
Data:
a. covid_state.csv
c_date State Confirmed Deceased Recovered tot_cases tot_tests State_Name
6/4/2020 AN 0 0 0 0 3 Andaman and Nicobar Islands
5/17/2020 AP 67 0 1 68 842 Andhra Pradesh
4/13/2020 AR 0 0 0 0 530 Arunachal Pradesh
8/9/2020 AS 15 0 0 15 480 Assam
5/16/2020 BR 3 0 0 3 147 Bihar
6/17/2020 CH 2 0 0 2 672 Chandigarh
4/17/2020 CT 0 0 0 0 318 Chhattisgarh
8/22/2020 DD 0 0 0 0 615 Daman and Diu
8/23/2020 DL 32 0 0 32 284 Delhi

b. covid19_countries.csv
c_date Country Confirmed Deceased Recovered tot_cases tot_tests c_codes
5/24/2020 Vatican City 2448 127 237 2812 6366 VC
6/5/2020 Anguilla 4679 871 1278 6828 9445 ANG
7/3/2020 Western Saha 1452 1881 1178 4511 10485 WS
9/9/2020 India 1879 1357 1781 5017 6725 IND
10/16/2020 Montserrat 4500 1449 1724 7673 15703 MON
7/27/2020 Vatican City 1784 883 113 2780 11323 VC
5/28/2020 Vatican City 2895 1658 181 4734 8064 VC
4/30/2020 Russia 294 391 2170 2855 5368 RUSS
8/5/2020 Montserrat 1361 131 1586 3078 7778 MON

Description:
A data set R programming offers a set of inbuilt libraries that help build visualizations
with minimal code and flexibility. Data visualization is the technique used to deliver
insights in data using visual cues such as graphs, charts, maps, and many others. This
is useful as it helps in intuitive and easy understanding of the large quantities of data
and thereby make better decisions regarding it. The popular data visualization tools
27
that are available are Tableau, Plotly, R, Google Charts, Infogram, and Kibana. The
various data visualization platforms have different capabilities, functionality, and use
cases. They also require a different skill set. This article discusses the use of R for data
visualization. R is a language that is designed for statistical computing, graphical data
analysis, and scientific research. It is usually preferred for data visualization as it offers
flexibility and minimum required coding through its packages. Graphics play an
important role in carrying out the important features of the data. Graphics are used to
examine marginal distributions, relationships between variables, and summary of very
large data. It is a very important complement for many statistical and computational
techniques.
(a) Compare the total covid cases of one month with 5 district/states and
show the best recoverable percentage of the state
Source Code:
#Comparing the total covid cases of one month with 5 states and showing the best
recoverable percentage of the state
rm(list=ls())
data1<-read.csv("covid_state.csv");
print(data1)
sumfun<-function(y){
sum<-0
for(i in 1:(length(y))){
sum=sum+y[i]
}
return(sum)
}
findindex<-function(v,a){
val<-0
for(i in 1:(length(a))){
if(a[i]==v){
val=i
break
}}

28
return(i)
}
mh<-subset(data1, State=="MH" & as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
ap<-subset(data1, State=='AP' & as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
ar<-subset(data1, State=='AR' & as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
as<-subset(data1, State=='AS' & as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
br<-subset(data1, State=='BR' & as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
print(ap)
print(ar)
print(as)
print(br)
rec_mh<-sumfun(mh$Recovered)
con_mh<-sumfun(mh$tot_cases)
print(rec_mh)
rec_ap<-sumfun(ap$Recovered)
con_ap<-sumfun(ap$tot_cases)
rec_ar<-sumfun(ar$Recovered)
con_ar<-sumfun(ar$tot_cases)
rec_as<-sumfun(as$Recovered)
con_as<-sumfun(as$tot_cases)
rec_br<-sumfun(br$Recovered)
con_br<-sumfun(br$tot_cases)
print(rec_br)
mh_rec_per<-(rec_mh/con_mh)*100

29
ap_rec_per<-(rec_ap/con_ap)*100
ar_rec_per<-(rec_ar/con_ar)*100
as_rec_per<-(rec_as/con_as)*100
br_rec_per<-(rec_br/con_br)*100
labels1<-c("Maharashtra","Andhra-Pradesh","Arunachal Pradesh","Assam","Bihar")
perc_list<-c(mh_rec_per,ap_rec_per,ar_rec_per,as_rec_per,br_rec_per)
print(perc_list)
max_perc_list<-max(perc_list)
labelindex<-findindex(max_perc_list,perc_list)
print(labels1[labelindex])
state1<-labels1[labelindex]
print(max_perc_list)
cat(state1," is the best state with a recovery rate of ",max_perc_list,".")
#Visualizing this pie chart... Putting this value into a pie chart ... For all five state
colors....
x<-c(mh_rec_per,ap_rec_per,ar_rec_per,as_rec_per,br_rec_per)
x<-round(x,2)
labels <- c("Maharastra","Andhra Pradesh","Arunachal Pradesh","Assam","Bihar")
png(file = "State Recovery Pie Chart 1.jpg")
pie(x, labels = x, main = "State Recovery Pie Chart",col = rainbow(length(x)))
legend("topright", c("Maharastra","Andhra Pradesh","Arunachal
Pradesh","Assam","Bihar"),
cex = 0.6,
fill = rainbow(length(x)))
dev.off()

Output:
30
(b) Compare cases raised from previous month to current month.
Source Code:
#Comparing cases raised from previous month to current month.
rm(list=ls())
data1<-read.csv("covid_state.csv");
print(data1)
mh_april<-subset(data1, State=='MH' & as.Date(c_date) >= as.Date("2020-04- 01") &
as.Date(c_date) <= as.Date("2020-04-30"))
mh_may<-subset(data1, State=='MH' & as.Date(c_date) >= as.Date("2020-05- 01") &
as.Date(c_date) <= as.Date("2020-05-31"))
mh_super<-subset(data1, State=='MH' & as.Date(c_date) >= as.Date("2020-04- 01")
&
as.Date(c_date) <= as.Date("2020-05-31"))
print(mh_super)

31
print(mh_april)
rec_mh_april<-sum(mh_april$tot_cases)
rec_mh_may<-sum(mh_may$tot_cases)
print(rec_mh_april)
print(rec_mh_may)
cases_raised<-rec_mh_may-rec_mh_april
print(cases_raised)
cat("Difference in cases from April to May is ",cases_raised,".")
#Line chart visualize, histogram..... Legends,,,,
# Give the chart file a name.
png(file = "Cases comparison of April and May.jpg")
# Plot the bar chart.
plot(mh_april$tot_cases,type = "o",col = "red", xlab = "Month", ylab = "Total Cases",
main = "Total Cases Chart month-wise (112)")
lines(mh_may$tot_cases, type = "o", col = "blue")
dev.off()
Output:

32
(c) Plot a Pie chart/ Histogram to get total cases, recovered cases and total
tests [Maharashtra Only]
Source Code:
#Plot a pie chart/ histogram to get total cases, recovered cases and total tests
[Maharashtra Only]
data1<-read.csv("covid_state.csv");
print(data1)
mh_03<-subset(data1, State=='MH' &
as.Date(c_date) >= as.Date("2020-04- 01") &
as.Date(c_date) <= as.Date("2020-04-30"))
print(mh_03)
rec03<-mh_03$Recovered
tot03<-mh_03$tot_tests
conf03<-mh_03$Confirmed
tot<-rec03+tot03+conf03
v <- c(sumfun(rec03),sumfun(tot03),sumfun(conf03))
print(v)
# Give the chart file a name.
png(file = "Covid-19 cases.png")
#Show numbers on legends
# Create the histogram.
hist(rec03,xlab = "Weight",col = "cyan1",border = "navy", main="Covid-19 cases
(112)")
# Save the file.
dev.off()
###################################
png(file="Covid19 PIE.png")
labels<-c("Recovered ","Total Tests","Confirmed Cases")
pie(v,labels,main="Covid Analysis (112)",col=topo.colors(length(v)))
legend("topright", labels, cex = 0.8, fill = topo.colors(length(v)))

33
dev.off()
Output:

34
(d) Get the month where maximum patients were recovered & what is the
test ratio.
Source Code:
#Get the month where maximum patients were recovered & what is the test ratio.
data1<-read.csv("covid_state.csv");
print(data1)
mh_mon<-subset(data1,State=="MH")
print(mh_mon)
apr<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-04-01") &
as.Date(c_date) <= as.Date("2020-04-30"))
may<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-05-01") &
as.Date(c_date) <= as.Date("2020-05-30"))
jun<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-06-01") &
as.Date(c_date) <= as.Date("2020-06-30"))
jul<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-07-01") &
as.Date(c_date) <= as.Date("2020-07-30"))
aug<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-08-01") &
as.Date(c_date) <= as.Date("2020-08-30"))
sep<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-09-01") &
as.Date(c_date) <= as.Date("2020-09-30"))
oct<-subset(mh_mon, as.Date(c_date) >= as.Date("2020-10-01") &
as.Date(c_date) <= as.Date("2020-10-30"))
recapr<-sumfun(apr$Recovered)
totapr<-sumfun(apr$tot_tests)
recmay<-sumfun(may$Recovered)
totmay<-sumfun(may$tot_tests)
recjun<-sumfun(jun$Recovered)
totjun<-sumfun(jun$tot_tests)
recjul<-sumfun(jul$Recovered)
totjul<-sumfun(jul$tot_tests)
recaug<-sumfun(aug$Recovered)
totaug<-sumfun(aug$tot_tests)
recsep<-sumfun(sep$Recovered)
totsep<-sumfun(sep$tot_tests)
recoct<-sumfun(oct$Recovered)
totoct<-sumfun(oct$tot_tests)
monthrec<-c(recapr,recmay,recjun,recjul,recaug,recsep,recoct)
print(monthrec)
maxrecmonth<-max(monthrec)
print(maxrecmonth)
labels2<-c("April","May","June","July","August","September","October")
testratio<-
c(recapr/totapr,recmay/totmay,recjun/totjun,recjul/totjun,recaug/totaug,recsep/
totsep,recoct/totoct)
print(testratio)

35
labelindex2<-findindex(maxrecmonth,monthrec)
state3<-labels2[labelindex2]
cat(state3," is the best month with a recovery rate of ",maxrecmonth,".")
testratioframe<-data.frame(months=labels2,test_ratio=testratio,stringsAsFactors =
FALSE)
print("Test Ratio DataFrame=")
print(testratioframe)
x<-testratioframe$test_ratio
print(x)
# Give the chart file a name.
png(file = "Test Ratio Histogram.png")
# Create the histogram.
hist(x,xlab = "Weight",col = "aquamarine",border = "blue4", xlim = c(0,0.5), ylim =
c(0,5),breaks=5, main = "Histogram (112)")
# Save the file.
dev.off()
Output:

36
(e) Compare country – wise the Covid cases in terms of total case
Source Code:
#Compare Country – wise the Covid cases in terms of total case
rm(list=ls())
data1<-read.csv("covid_state.csv");
print(data1)
sumfun<-function(y){
sum<-0
for(i in 1:(length(y))){
sum=sum+y[i]
}
return(sum)
}
findindex<-function(v,a){
val<-0
for(i in 1:(length(a))){
if(a[i]==v){
val=i
break
}}
return(i)
}
data2<-read.csv("covid19_countries.csv")
#print(data1)
ind<-subset(data2, c_codes=="IND")
us<-subset(data2, c_codes=="US")
bra<-subset(data2, c_codes=="BRA")
russ<-subset(data2, c_codes=="RUSS")
uk<-subset(data2, c_codes=="UK")
recind<-sumfun(ind$Recovered)
totind<-sumfun(ind$tot_cases)
recus<-sumfun(us$Recovered)
totus<-sumfun(us$tot_cases)
recbra<-sumfun(bra$Recovered)
totbra<-sumfun(bra$tot_cases)
recruss<-sumfun(russ$Recovered)
totruss<-sumfun(russ$tot_cases)
recuk<-sumfun(uk$Recovered)
totuk<-sumfun(uk$tot_cases)
percind<-(recind/totind)*100
percus<-(recus/totus)*100
percbra<-(recbra/totbra)*100
percruss<-(recruss/totruss)*100
percuk<-(recuk/totuk)*100
ctry_labels<-c("India","United States","Brazil","Russia","United Kingdom")

37
recperccountry<-c(percind,percus,percbra,percruss,percuk)
print(recperccountry)
maxperccrty<-max(recperccountry)
labelindex4<-findindex(maxperccrty,recperccountry)
print(ctry_labels[labelindex4])
state1<-ctry_labels[labelindex4]
print(state1)
cat(state1," is the best country with a recovery rate of ",maxperccrty,".")
#Graphics Visualize.... add legends
#Visualize this pie chart... Put this value into a pie chart ... All five state colors....
x<-c(percind,percus,percbra,percruss,percuk)
x<-round(x,2)
labels <- c("India","United States","Brazil","Russia","United Kingdom")
png(file = "International Recovery Pie Chart 1.jpg")
pie(x, labels = x, main = "International Recovery Pie Chart",col = rainbow(length(x)))
legend("topright", c("India","United States","Brazil","Russia","United Kingdom"),
cex = 0.6,fill = rainbow(length(x)))
dev.off()

Output:

38
Practical No: 9
Aim: Write a program to visualize and analyze the statistic of sample data using
plot chart, histogram, multi-line, Scattered plot.
Data:
Temperatur
Date Day e Rainfall Flyer Price Sales
01/01/17 Sunday 27 2 15 0.3 10
01/02/17 Monday 28.9 1.3 15 0.3 13
01/03/17 Tuesday 34.5 1.3 27 0.3 15
Wednesda
01/04/17 y 44.1 1.1 28 0.3 17
01/05/17 Thursday 42.4 1 33 0.3 18
01/06/17 Friday 25.3 1.5 23 0.3 11
01/07/17 Saturday 32.9 1.5 99 0.5 13
01/08/17 Sunday 37.5 1.2 28 0.5 15
01/09/17 Monday 38.1 1.2 20 0.5 17
01/10/17 Tuesday 43.4 1.1 33 0.5 18
Wednesda
01/11/17 y 32.6 1.5 23 0.5 12
01/12/17 Thursday 38.2 1.3 16 0.5 14
01/13/201
7 Friday 37.5 1.3 19 0.5 15
01/14/201
7 Saturday 44.1 1.1 23 0.3 17
01/15/201
7 Sunday 43.4 1.1 33 0.3 18
01/16/201
7 Monday 30.6 1.7 24 0.3 12
01/17/201
7 Tuesday 32.2 1.4 26 0.3 14

Description:
Data visualization is the technique used to deliver insights in data using visual
cues such as graphs, charts, maps, and many others. This is useful as it helps in
intuitive and easy understanding of the large quantities of data and thereby
make better decisions regarding it. The popular data visualization tools that are
available are Tableau, Plotly, R, Google Charts, Infogram, and Kibana. The
various data visualization platforms have different capabilities, functionality,
and use cases. They also require a different skill set. This article discusses the
use of R for data visualization. R is a language that is designed for statistical
computing, graphical data analysis, and scientific research. It is usually

39
preferred for data visualization as it offers flexibility and minimum required
coding through its packages. Types of Data Visualizations are:
Bar Plot: There are two types of bar plots- horizontal and vertical which
represent data points as horizontal or vertical bars of certain lengths
proportional to the value of the data item. They are generally used for
continuous and categorical variable plotting. By setting the horiz parameter to
true and false, we can get horizontal and vertical bar plots respectively.Bar plots
are used for the following scenarios:
 To perform a comparative study between the various data categories in the
data set.
 To analyze the change of a variable over time in months or years.

Histogram: A histogram is like a bar chart as it uses bars of varying height to


represent data distribution. However, in histogram values are grouped into
consecutive intervals called bins. In a Histogram, continuous values are grouped
and displayed in these bins whose size can be varied.

Scatter Plot: A scatter plot is composed of many points on a Cartesian plane.


Each point denotes the value taken by two parameters and helps us easily
identify the relationship between them.
Source Code:
#BAR PLOT
library(readxl)
input<-read_excel('RDATA.xlsx')
input
# Give the chart file a name.
png(file = "BarPlot.jpg")
barplot(input$Temperature,xlab="Temperature",ylab = "Sales",col="green")
dev.off()dev.off()

Output:

40
Source Code:
#HISTOGRAM
library(readxl)
input<-read_excel('RDATA.xlsx')
input
# Give the chart file a name.
png(file = "Histogram.jpg")
hist(input$Temperature,labels=TRUE,xlab="Temperature",main="Histogram of
Temperature-112", col = "gold", border = "firebrick1" )
dev.off()

41
Output:

Source Code:
#MULTI-LINE
data <- read.csv("RDATA2.csv")
print(data)
t <-data[, c('Temperature')]
s <-data[, c('Sales')]
print(t)
print(s)
# Give the chart file a name.
png(file = "Sales_Temp_Line2.jpg")
# Plot the bar chart.
plot(t,type = "o",col = "red", xlab = "Day", ylab = "", main = "Sales & Temperature
(112)")
lines(s, type = "o", col = "blue")
# Save the file.
dev.off()

Output:

42
43
Source Code:
#SCATTERPLOT
library(readxl)
# Get the input values.
input<-read_excel('RDATA.xlsx')
print(head(input))
# Give the chart file a name.
png(file = "scatterplot.png")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$Temperature,y = input$Sales,
xlab = "Temperature",
ylab = "Sales",
xlim = c(30,40),
ylim = c(10,20),
main = "TEMPERATURE vs SALES"
)
# Save the file.
dev.off()
# Give the chart file a name.
png(file = "scatterplot_matrices.png")
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and total 4 variables.
pairs(~ Temperature+Rainfall+Price+Sales,data = input, main = "SCATTERPLOT
MATRIX (112)")
# Save the file.
dev.off()

44
Output:

45
46

You might also like