Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
19 views

R Programming

This document discusses various input-output and file handling features in R. It covers functions for reading data from and writing data to files, keyboards, URLs and for manipulating strings. It also discusses regular expressions in R including metacharacters, quantifiers, sequences, character classes and how to use regular expression functions like grep, gsub and regexpr.

Uploaded by

zeliqzayyan2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

R Programming

This document discusses various input-output and file handling features in R. It covers functions for reading data from and writing data to files, keyboards, URLs and for manipulating strings. It also discusses regular expressions in R including metacharacters, quantifiers, sequences, character classes and how to use regular expression functions like grep, gsub and regexpr.

Uploaded by

zeliqzayyan2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DATA VISUALIZATION USING R

PROGRAMMING
21CSL481
Dr. Pushpalatha K
Professor
Sahyadri College of Engineering & Management
Input-Output Features in R
› Accessing the Keyboard
– scan() : used for reading data into the input vector or an input list from the
environment console or file.
› inp = scan()
– readline(): used to read multiple lines from a connection.
› str = readline()
– print()/cat(): displays the contents of its argument object.
› print("DataFlair")
› cat("DataFlair", "Big Data\n")
– Scan function to take numeric input from a text file.
› scan("d:/program1.txt")
– Scan function to take string input from a text file.
› scan('d:/program.txt',what = "")
– Reading a single file one line at a time
› lines <- file("d:/program.txt") % file() sets the connection.
› readLines(lines,n=1)
– Using readline() function with optional prompt message.
› W=readline("Type your name:")
› Reading and Writing Files
– read.table() which writes a data frame to a file
– write.table() function to write a data-frame in the form of a table
data <- read.table(header=TRUE, text=‘
slno Name Age
1 Rama 7
2 Bhima NA
3 Soma 9
4 Seema 11
‘)
write.table(data,"d:/hh.txt",row.names=F,col.names=F)
› Creating a matrix and store it in a text file
m<-matrix(1:9,ncol=3)
write.table(m,"d:/hh.txt",row.names=F,col.names=F)
› Accessing Files on Remote Machines via URLs
uci <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/adult/adult.data",header=FALSE)
head(uci)
– With column names
df<-c("age","workclass","fnlwgt","education","education-num","marital-
status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-
country","income")
colnames(uci)<-df
head(uci)
Getting File and Directory Information
› getwd() - Used to determine the current working directory.
› setwd() - Used to change the current working directory
› file.info - Gives file size, creation time, directory-versus-
ordinary file status, and so on for each file whose name is in
the argument, a character vector.
› dir() - Returns a character vector listing the names of all the
files in the directory specified in its first argument.
› file.exists() - Returns a Boolean vector indicating whether the
given file exists for each name in the first argument, a
character vector.
R String Manipulation Functions
Actions Descriptions
nchar() It counts the number of characters in a string or vector. In the stringr package, it's substitute function is str_length()
tolower() It converts a string to the lower case. Alternatively, you can also use the str_to_lower() function
toupper() It converts a string to the upper case. Alternatively, you can also use the str_to_upper() function
It is used to replace each character in a string. Alternatively, you can use str_replace() function to replace a complete
chartr()
string
It is used to extract parts of a string. Start and end positions need to be specified. Alternatively, you can use the
substr()
str_sub() function
setdiff() It is used to determine the difference between two vectors
setequal() It is used to check if the two vectors have the same string values
abbreviate() It is used to abbreviate strings. The length of abbreviated string needs to be specified
It is used to split a string based on a criterion. It returns a list. Alternatively, you can use the str_split() function. This
strsplit()
function lets you convert your list output to a character matrix
sub() It is used to find and replace the first match in a string
gsub() It is used to find and replace all the matches in a string/vector. Alternatively, you can use the str_replace() function
paste() Paste() function combines the strings together.
str_trim() removes leading and trailing whitespace
str_dup() duplicates characters
str_pad() pads a string
str_wrap() wraps a string paragraph
str_trim() trims a string
Examples…
› string =“Sahyadri College of Engineering and Management”
› #count number of characters
– nchar(string)
– str_length(string)
› #convert to lower
– tolower(string)
– str_to_lower(string)
› #convert to upper
– toupper(string)
– str_to_upper(string)
› #replace strings
– chartr("and","for",x = string) #letters a,n,d get replaced by f,o,r
– str_replace_all(string = string, pattern = c("City"),replacement = "state") #this is case sentitive
› #extract parts of string
– `substr(x = string,start = 5,stop = 11)
– #extract angeles str_sub(string = string, start = 5, end = 11)
› #get difference between two vectors
– setdiff(c("monday","tuesday","wednesday"),c("monday","thursday","friday"))
› #check if strings are equal
– setequal(c("monday","tuesday","wednesday"),c("monday","tuesday","wednesday"))
– setequal(c("monday","tuesday","thursday"),c("monday","tuesday","wednesday"))
› #abbreviate strings
– abbreviate(c("monday","tuesday","wednesday"),minlength = 3)
› #split strings
– strsplit(x = c("ID-101","ID-102","ID-103","ID-104"),split = "-")
– str_split(string = c("ID-101","ID-102","ID-103","ID-104"),pattern = "-",simplify = T)
› #find and replace first match
– sub(pattern = "L",replacement = "B",x = string,ignore.case = T)
› #find and replace all matches
– gsub(pattern = "Los",replacement = "Bos",x = string,ignore.case = T)
R Regular Expression Commands
Function Description
grep returns the index or value of the matched string
returns the Boolean value (True or False) of the matched
grepl
string
regexpr return the index of the first match
gregexpr returns the index of all matches
regexec is a hybrid of regexpr and gregexpr
returns the matched string at a specified index. It is used in
regmatches
conjunction with regexpr and gregexpr.

› Regular expressions in R can be divided into 5 categories:


– Metacharacters
– Sequences
– Quantifiers
– Character Classes
– POSIX character classes
› Metacharacters comprises a set of special operators which
regex doesn't capture. These characters include: . \ | ( ) [ ]
{ } $*+?
– dt <- c("percent%","percent")
– grep(pattern = "percent\\%",x = dt,value = T)
› O/P: [1] "percent%"

– #detect all strings


– dt <- c("may?","money$","and&")
– grep(pattern = "[a-z][\\?-\\$-\\&]",x = dt,value = T)
› O/P:[1] "may?" "money$" "and&"

– gsub(pattern = "[\\?-\\$-\\&]",replacement = "",x = dt)


› O/P:[1] "may" "money" "and"
› Quantifiers are mainly used to determine the length of the resulting match
Quantifier Description
. It matches everything except a newline.
? The item to its left is optional and is matched at most once.
* The item to its left will be matched zero or more times.
+ The item to its left is matched one or more times.
The item to its left is matched exactly n times. The item must have a consecutive
{n}
repetition at place. e.g. Anna
{n, } The item to its left is matched n or more times.
{n,m} The item to its left is matched at least n times but not more than m times.

Examples:
names <- c("anna","crissy","puerto","cristian","garcia","steven","alex","rudy")
#doesn't matter if e is a match
grep(pattern = "e*",x = names,value = T)
o/p: "anna" "crissy" "puerto" "cristian" "garcia" "steven" "alex" "rudy"
#must match t one or more times
grep(pattern = "t+",x = names,value = T)
o/p: "puerto" "cristian" "steven"
#must match n two times
grep(pattern = "n{2}",x = names,value = T)
o/p: "anna"
› sequences contain special characters used to describe a pattern in a given string
Sequences Description
\d matches a digit character

› Examples: \D matches a non-digit character


– string <- "I have been to Paris 20 times" \s matches a space character
– #match a digit
– gsub(pattern = "\\d+",replacement = "_",x = string) \S matches a non-space character
– regmatches(string,regexpr(pattern = "\\d+",text = string)) \w matches a word character

– #match a non-digit \W matches a non-word character


– gsub(pattern = "\\D+",replacement = "_",x = string) \b matches a word boundary
– regmatches(string,regexpr(pattern = "\\D+",text = string))
\B matches a non-word boundary
– #match a space - returns positions
– gregexpr(pattern = "\\s+",text = string)

– #match a non space


– gsub(pattern = "\\S+",replacement = "app",x = string)

– #match a word character


– gsub(pattern = "\\w",replacement = "k",x = string)

– #match a non-word character


– gsub(pattern = "\\W",replacement = "k",x = string)
› Character classes refer to a set of characters enclosed in a square
bracket [ ].
– These classes match only the characters enclosed in the bracket. These classes
can also be used in conjunction with quantifiers.
– Caret (^) symbol in character classes negates the expression and searches for
everything except the specified pattern.

Characters Description

[aeiou] matches lower case vowels

[AEIOU] matches upper case vowels

[0123456789] matches any digit

[0-9] same as the previous class

[a-z] match any lower case letter

[A-Z] match any upper case letter

[a-zA-Z0-9] match any of the above classes

[^aeiou] matches everything except letters

[^0-9] matches everything except digits


– Examples:
› string <- "20 people got killed in the mob attack. 14 got severely injured"
› #extract numbers
› regmatches(x = string,gregexpr("[0-9]+",text = string))

› #extract without digits


› regmatches(x = string,gregexpr("[^0-9]+",text = string))
› POSIX Character Classes can be identified as enclosed
within a double square bracket ([[ ]]).
– They work like character classes.
– A caret ahead of an expression negates the expression value.
POSIX Characters Description
[[:lower:]] matches lower case letter
[[:upper:]] matches upper case letter
[[:alpha:]] matches letters
[[:digit:]] matches digits

[[:space:]] matches space characters eg. tab, newline, vertical tab, space, etc

[[:blank:]] matches blank characters (same as previous) such as space, tab

[[:alnum:]] matches alphanumeric characters, e.g. AB12, ID101, etc

matches control characters. Control characters are non-printable characters such as \t (tab),
[[:cntrl:]]
\n (new line), \e (escape), \f (form feed), etc

[[:punct:]] matches punctuation characters


[[:xdigit:]] matches hexadecimal digits (0 - 9 A - E)

[[:print:]] matches printable characters ([[:alpha:]] [[:punct:]] and space)

[[:graph:]] matches graphical characters. Graphical characters comprise [[:alpha:]] and [[:punct:]]
› Examples:
– string <- c("I sleep 16 hours\n, a day","I sleep 8 hours\n a day.","You sleep
how many\t hours ?")

– #get digits
– unlist(regmatches(string,gregexpr("[[:digit:]]+",text = string)))

– #remove punctuations
– gsub(pattern = "[[:punct:]]+",replacement = "",x = string)

– #remove spaces
– gsub(pattern = "[[:blank:]]",replacement = "-",x = string)

– #remove control characters


– gsub(pattern = "[[:cntrl:]]+",replacement = " ",x = string)

– #remove non graphical characters


– gsub(pattern = "[^[:graph:]]+",replacement = "",x = string)
Assignment Questions:
1. Input your USN. Extract your course and year of joining. Also
confirm you are Sahyadrian or not.
2. Write R program to validate the Sahyadri email id.
3. Write R Program to find the number a’s in a given sentence.
Replace ‘a’ with ‘b’ if the count is more than 10.
4. Find the count of all characters in a given string.
5. Write a R program to count the number of vowels in a given
string.
6. Write a R program to find the sequence of one upper case
letter followed by lower case letters.
7. Write a R program that matches a string with a ‘t' followed by
anything ending in ‘n'.
GRAPHICS
› Creating Graphs
› Plot
– The plot() function is used to draw points (markers) in a diagram.
– The function takes parameters for specifying points in the diagram.
› Parameter 1 specifies points on the x-axis.
› Parameter 2 specifies points on the y-axis.
› Examples:
plot(1, 3) # Draw one point in the diagram, at position (1) and position (3)
plot(c(1, 8), c(3, 10)) # Draw two points in the diagram, (1, 3) and (8, 10)
plot(c(1, 2, 3, 4, 5), c(3, 7, 8, 9, 12))

x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)
plot(x, y)
plot(x,y,type="l")
plot(c(-3,3), c(-1,5), type = "n", xlab="x axis", ylab="y axis ")
Barplot barplot(x)
Boxplot boxplot(x)
Density Plot plot(density(x))
Line Plot plot(x, y, type = "l")
Histogram hist(x)

Pie Chart X=c(23, 56, 20, 63)


Y=c("Mumbai", "Pune", "Chennai", "Bangalore")
# Plot the chart.
pie(X, Y)
Saving Graphs to Files in R
Saving graph as a pdf object
# Declaring a data frame
data_frame <- data.frame(col1=c(1: 5), col2=c(20, 32, 12, 57, 33))

# Printing the data frame


print("Data Frame")
print(data_frame)
# Saving in pdf format
pdf(“d:/graph_pdf.pdf")
# Plotting barplot of the data in blue color
barplot(data_frame$col2, col="blue")
# shutting off the current process
dev.off()
Saving graph as a png object
# Declaring a data frame
data_frame <- data.frame(col1=c(1: 5), col2=c(20, 32, 12, 57, 33))

# Printing the data frame


print("Data Frame")
print(data_frame)

# Saving in pdf format


png(“d:/graph_png.png")

# Plotting barplot of the data in blue color


barplot(data_frame$col2, col="blue")

# shutting off the current process


dev.off()
Saving graph as a jpeg object
# Declaring a data frame
data_frame <- data.frame(col1=c(1: 5), col2=c(20, 32, 12, 57, 33))

# Printing the data frame


print("Data Frame")
print(data_frame)

# Saving in pdf format


jpeg(“d:/graph_jpg.jpeg")

# Plotting barplot of the data in blue color


barplot(data_frame$col2, col="blue")

# shutting off the current process


dev.off()
Assignment Questions
› The given table represents the patient’s body temperature
recorded every hour in a hospital. Draw the line graph for the
given information:
Time 9 am 10 am 11 am 12 1 pm 2 pm
noon
Tempe 34°C 35°C 38°C 37°C 34. 35.5°C
rature 5°C
› The table below shows the favourite color of 200 kids in a
class. Using the information provided, create a bar graph
with appropriate colors.
Favourite Red Green Blue Yellow Orange
Colours
Number of 45 17 50 48 40
students
› A person spends his time on different activities daily (in
hours):
– Draw a pie chart for this information.
Activity Office Work Exercise Travelling Watching Sleeping Miscellaneo
shows us
Number of 9 1 2 3 7 2
hours
Install a Package in R
install.packages("package_name")

E.g.:
install.packages("plotrix")
install.packages("plot3D")
3D Plots in R
› 3D Pie Chart
# Get the library.
library(plotrix)

# Create data for the graph.


x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")

# Give the chart file a name.


png(file = " d:/3d_pie_chart.jpg")

# Plot the chart.


pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ")

# Save the file.


dev.off()
Examples
# install.packages("plotrix")
library(plotrix)
data <- c(19, 21, 54, 12, 36, 12)
pie3D(data,
col = hcl.colors(length(data), "Spectral"),
labels = data)
-------------------------------------------------------------------------
library(plotrix)
data <- c(19, 21, 54, 12, 36, 12)

pie3D(data, mar = rep(1.75, 4),


col = hcl.colors(length(data), "Spectral"),
labels = data,
explode = 0.2)
The ggplot2 Package for Graphics
› ggplot2 is a system for declaratively creating graphics, based on
The Grammar of Graphics.
– The gg in ggplot2 means Grammar of Graphics, a graphic concept which
describes plots by using a “grammar”.
› ggplot2 is an R package which is designed especially for data
visualization and providing best exploratory data analysis.
› It provides beautiful, hassle-free plots that take care of minute
details like drawing legends and representing them.
– The plots can be created iteratively and edited later.
› This package is designed to work in a layered fashion, starting
with a layer showing the raw data collected during exploratory
data analysis with R then adding layers of annotations and
statistical summaries.
› According to ggplot2 concept, a plot can be divided into
different fundamental parts :
Plot = data + Aesthetics + Geometry.
– data is a data frame
– Aesthetics is used to indicate x and y variables. It can also be used
to control the color, the size or the shape of points, the height of
bars, etc…..
– Geometry corresponds to the type of graphics (histogram, box plot,
line plot, density plot, dot plot, ….)
› Two main functions, for creating plots, are available in
ggplot2 package : a qplot() and ggplot() functions.
– qplot() is a quick plot function which is easy to use for simple plots.
– The ggplot() function is more flexible and robust than qplot for
building a plot piece by piece.
› The generated plot can be kept as a variable and then
printed at any time using the function print().
Relationship between “Grammar of Graphics” and R
› The Grammar of Graphics provides the information of a graphic maps the
data to the aesthetic attributes (colour, shape, size) of geometric objects
(points, lines, bars).
– The plot may also include statistical transformations of the data and information
about the plot’s coordinate system.
– Facetting can be used to plot for different subsets of the data.
› The combination of these independent components are what make up a graphic.

› The grammar of graphics is a framework which follows a layered approach to


describe and construct visualizations or graphics in a structured manner.
› The layered grammar of graphics approach is implemented in ggplot2, a
widely used graphics library for R.
– All graphics in this library are built using a layered approach, building layers up to
create the final graphic.
Building Blocks of layers with the grammar of graphics
› Data: The element is the data set itself
› Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis,
y-axis, color, fill, size, labels, alpha, shape, line width, line type
› Scale : Scales map values in the data space to values in the aesthetic space.
– This includes the use of colour, shape or size.
– Scales also draw the legend and axes, which make it possible to read the
original data values from the plot (an inverse mapping).
› Geometrics: How our data being displayed using point, line, histogram, bar,
boxplot
› Facets: It displays the subset of the data using Columns and rows
› Statistics: Binning, smoothing, descriptive, intermediate
› Coordinates: the space between data and display using Cartesian, fixed,
polar, limits
› Themes: A theme controls the finer points of display, like the font size and
background colour
› ggplot2 may be used to create different types of plots using the ggplot
function.

Example:
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
› The aes() method specifies all aesthetics for a plot
› ggplot2 may be used to create different types of plots using the following
command:
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
› The aes() method specifies all aesthetics for a plot
› The difference between plots is the number of geometric objects (geoms)
they contain.
› Geoms are supported in a variety of ways for plotting different graphs like:
– Scatter Plot: To plot individual points, use geom_point
– Bar Charts: For drawing bars, use geom_bar
– Histograms: For drawing binned values, geom_histogram
– Line Charts: To plot lines, use geom_line
– Polygons: To draw arbitrary shapes, use geom_polygon
– Creating Maps: Use geom_map for drawing polygons in the shape of a map by using
the map_data() function
– Creating Patterns: Use the geom_smooth function for showing simple trends or
approximations
› Install and load ggplot2 package:
install.packages('ggplot2')
library(ggplot2)
Scatterplot With ggplot2
# Loading
library(ggplot2)
# Create simple example
datadata <- data.frame(x = 1:9,y = c(3, 1, 4, 3, 5, 2,1, 2, 3), group = rep(LETTERS[1:3],
each = 3))
data
ggplot(data,aes(x = x, y = y)) + geom_point()
------------
ggplot(data,aes(x = x, y = y)) + geom_point(size=5)
---------
ggplot(data,aes(x = x, y = y,col = group)) + geom_point(size=5)
-----------
With Iris Dataset
data(iris)
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species))+geom_point()
data(iris)
head(iris)
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
Creating Histogram:
library(ggplot2)
data(iris)
ggplot(data = iris, aes( x = Sepal.Length)) + geom_histogram( )
#visualize various groups in histogram
ggplot(iris, aes(x=Sepal.Length, color=Species)) +
geom_histogram(fill="white", binwidth = 1)
Creating Density Plot
ggplot(iris, aes( x = Sepal.Length)) + geom_density( )
ggplot(iris, aes(x=Sepal.Length, color=Species)) + geom_density( )
ggplot(diam`onds,aes(x = depth, fill = cut)) + geom_density()
› Creating Bar and Column Charts
data("mpg")
head(mpg)g
gplot(mpg, aes(x= class)) + geom_bar()
– Using coord_flip( ) one can inter-change x and y axis.
ggplot(mpg, aes(x= class)) + geom_bar() + coord_flip() # column chart
-----------
diamonds_m_cl = aggregate(diamonds, price ~ clarity, mean)
diamonds_m_cl
ggplot(diamonds_m_cl, aes(x = clarity, y = price)) + geom_bar(stat = "identity")
----------
library(ggplot2)
ODI <- data.frame(match=c("M-1","M-2","M-3","M-4"),runs=c(67,37,74,10))
Perf=ggplot(data=ODI,aes(x=match,y=runs,fill=match))+geom_bar(stat="identity")
Perf
---------
ggplot(data=ODI, aes(x=match, y=runs))+geom_bar(stat="identity",fill="blue")+
theme_dark()
› Creating Line chart
df <- data.frame(store=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'),
week=c(1, 2, 3, 1, 2, 3, 1, 2, 3), sales=c(9, 12, 15, 7, 9, 14, 10, 16, 19))
ggplot(df, aes(x=week, y=sales, group=store)) + geom_line(size=1)
----
ggplot(df, aes(x=week, y=sales, group=store, color=store)) + geom_line(size=2)
-----
ggplot(df, aes(x=week, y=sales, group=store, color=store)) +geom_line(size=2) +
scale_color_manual(values=c('orange', 'pink', 'red’))
----
ggplot(df, aes(x=week, y=sales, group=store, color=store,group=1)) +
geom_line(size=1)+ geom_point(size=2)
add or modify Main Title and Axis Labels
› following functions can be used to add or alter main title
and axis labels.
ggtitle("Main title"): Adds a main title above the plot
xlab("X axis label"): Changes the X axis label
ylab("Y axis label"): Changes the Y axis label
labs(title = "Main title", x = "X axis label", y = "Y axis label)
– Example:
p = ggplot(mpg, aes(x= class)) + geom_bar()
p + labs(title = "Number of Cars in each type", x = "Type of car", y = "Number of
cars")
– Adding data labels
p = ggplot(mpg, aes(x= class)) + geom_bar()
p = p + labs(title = "Number of Cars in each type", x = "Type of car", y = "Number
of cars")
p + geom_text(stat='count', aes(label=..count..), vjust=-0.25)

You might also like