R Programming
R Programming
PROGRAMMING
21CSL481
Dr. Pushpalatha K
Professor
Sahyadri College of Engineering & Management
Input-Output Features in R
› Accessing the Keyboard
– scan() : used for reading data into the input vector or an input list from the
environment console or file.
› inp = scan()
– readline(): used to read multiple lines from a connection.
› str = readline()
– print()/cat(): displays the contents of its argument object.
› print("DataFlair")
› cat("DataFlair", "Big Data\n")
– Scan function to take numeric input from a text file.
› scan("d:/program1.txt")
– Scan function to take string input from a text file.
› scan('d:/program.txt',what = "")
– Reading a single file one line at a time
› lines <- file("d:/program.txt") % file() sets the connection.
› readLines(lines,n=1)
– Using readline() function with optional prompt message.
› W=readline("Type your name:")
› Reading and Writing Files
– read.table() which writes a data frame to a file
– write.table() function to write a data-frame in the form of a table
data <- read.table(header=TRUE, text=‘
slno Name Age
1 Rama 7
2 Bhima NA
3 Soma 9
4 Seema 11
‘)
write.table(data,"d:/hh.txt",row.names=F,col.names=F)
› Creating a matrix and store it in a text file
m<-matrix(1:9,ncol=3)
write.table(m,"d:/hh.txt",row.names=F,col.names=F)
› Accessing Files on Remote Machines via URLs
uci <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/adult/adult.data",header=FALSE)
head(uci)
– With column names
df<-c("age","workclass","fnlwgt","education","education-num","marital-
status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-
country","income")
colnames(uci)<-df
head(uci)
Getting File and Directory Information
› getwd() - Used to determine the current working directory.
› setwd() - Used to change the current working directory
› file.info - Gives file size, creation time, directory-versus-
ordinary file status, and so on for each file whose name is in
the argument, a character vector.
› dir() - Returns a character vector listing the names of all the
files in the directory specified in its first argument.
› file.exists() - Returns a Boolean vector indicating whether the
given file exists for each name in the first argument, a
character vector.
R String Manipulation Functions
Actions Descriptions
nchar() It counts the number of characters in a string or vector. In the stringr package, it's substitute function is str_length()
tolower() It converts a string to the lower case. Alternatively, you can also use the str_to_lower() function
toupper() It converts a string to the upper case. Alternatively, you can also use the str_to_upper() function
It is used to replace each character in a string. Alternatively, you can use str_replace() function to replace a complete
chartr()
string
It is used to extract parts of a string. Start and end positions need to be specified. Alternatively, you can use the
substr()
str_sub() function
setdiff() It is used to determine the difference between two vectors
setequal() It is used to check if the two vectors have the same string values
abbreviate() It is used to abbreviate strings. The length of abbreviated string needs to be specified
It is used to split a string based on a criterion. It returns a list. Alternatively, you can use the str_split() function. This
strsplit()
function lets you convert your list output to a character matrix
sub() It is used to find and replace the first match in a string
gsub() It is used to find and replace all the matches in a string/vector. Alternatively, you can use the str_replace() function
paste() Paste() function combines the strings together.
str_trim() removes leading and trailing whitespace
str_dup() duplicates characters
str_pad() pads a string
str_wrap() wraps a string paragraph
str_trim() trims a string
Examples…
› string =“Sahyadri College of Engineering and Management”
› #count number of characters
– nchar(string)
– str_length(string)
› #convert to lower
– tolower(string)
– str_to_lower(string)
› #convert to upper
– toupper(string)
– str_to_upper(string)
› #replace strings
– chartr("and","for",x = string) #letters a,n,d get replaced by f,o,r
– str_replace_all(string = string, pattern = c("City"),replacement = "state") #this is case sentitive
› #extract parts of string
– `substr(x = string,start = 5,stop = 11)
– #extract angeles str_sub(string = string, start = 5, end = 11)
› #get difference between two vectors
– setdiff(c("monday","tuesday","wednesday"),c("monday","thursday","friday"))
› #check if strings are equal
– setequal(c("monday","tuesday","wednesday"),c("monday","tuesday","wednesday"))
– setequal(c("monday","tuesday","thursday"),c("monday","tuesday","wednesday"))
› #abbreviate strings
– abbreviate(c("monday","tuesday","wednesday"),minlength = 3)
› #split strings
– strsplit(x = c("ID-101","ID-102","ID-103","ID-104"),split = "-")
– str_split(string = c("ID-101","ID-102","ID-103","ID-104"),pattern = "-",simplify = T)
› #find and replace first match
– sub(pattern = "L",replacement = "B",x = string,ignore.case = T)
› #find and replace all matches
– gsub(pattern = "Los",replacement = "Bos",x = string,ignore.case = T)
R Regular Expression Commands
Function Description
grep returns the index or value of the matched string
returns the Boolean value (True or False) of the matched
grepl
string
regexpr return the index of the first match
gregexpr returns the index of all matches
regexec is a hybrid of regexpr and gregexpr
returns the matched string at a specified index. It is used in
regmatches
conjunction with regexpr and gregexpr.
Examples:
names <- c("anna","crissy","puerto","cristian","garcia","steven","alex","rudy")
#doesn't matter if e is a match
grep(pattern = "e*",x = names,value = T)
o/p: "anna" "crissy" "puerto" "cristian" "garcia" "steven" "alex" "rudy"
#must match t one or more times
grep(pattern = "t+",x = names,value = T)
o/p: "puerto" "cristian" "steven"
#must match n two times
grep(pattern = "n{2}",x = names,value = T)
o/p: "anna"
› sequences contain special characters used to describe a pattern in a given string
Sequences Description
\d matches a digit character
Characters Description
[[:space:]] matches space characters eg. tab, newline, vertical tab, space, etc
matches control characters. Control characters are non-printable characters such as \t (tab),
[[:cntrl:]]
\n (new line), \e (escape), \f (form feed), etc
[[:graph:]] matches graphical characters. Graphical characters comprise [[:alpha:]] and [[:punct:]]
› Examples:
– string <- c("I sleep 16 hours\n, a day","I sleep 8 hours\n a day.","You sleep
how many\t hours ?")
– #get digits
– unlist(regmatches(string,gregexpr("[[:digit:]]+",text = string)))
– #remove punctuations
– gsub(pattern = "[[:punct:]]+",replacement = "",x = string)
– #remove spaces
– gsub(pattern = "[[:blank:]]",replacement = "-",x = string)
x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)
plot(x, y)
plot(x,y,type="l")
plot(c(-3,3), c(-1,5), type = "n", xlab="x axis", ylab="y axis ")
Barplot barplot(x)
Boxplot boxplot(x)
Density Plot plot(density(x))
Line Plot plot(x, y, type = "l")
Histogram hist(x)
E.g.:
install.packages("plotrix")
install.packages("plot3D")
3D Plots in R
› 3D Pie Chart
# Get the library.
library(plotrix)
Example:
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
› The aes() method specifies all aesthetics for a plot
› ggplot2 may be used to create different types of plots using the following
command:
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
› The aes() method specifies all aesthetics for a plot
› The difference between plots is the number of geometric objects (geoms)
they contain.
› Geoms are supported in a variety of ways for plotting different graphs like:
– Scatter Plot: To plot individual points, use geom_point
– Bar Charts: For drawing bars, use geom_bar
– Histograms: For drawing binned values, geom_histogram
– Line Charts: To plot lines, use geom_line
– Polygons: To draw arbitrary shapes, use geom_polygon
– Creating Maps: Use geom_map for drawing polygons in the shape of a map by using
the map_data() function
– Creating Patterns: Use the geom_smooth function for showing simple trends or
approximations
› Install and load ggplot2 package:
install.packages('ggplot2')
library(ggplot2)
Scatterplot With ggplot2
# Loading
library(ggplot2)
# Create simple example
datadata <- data.frame(x = 1:9,y = c(3, 1, 4, 3, 5, 2,1, 2, 3), group = rep(LETTERS[1:3],
each = 3))
data
ggplot(data,aes(x = x, y = y)) + geom_point()
------------
ggplot(data,aes(x = x, y = y)) + geom_point(size=5)
---------
ggplot(data,aes(x = x, y = y,col = group)) + geom_point(size=5)
-----------
With Iris Dataset
data(iris)
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species))+geom_point()
data(iris)
head(iris)
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species,
shape=Species))+geom_point()
Creating Histogram:
library(ggplot2)
data(iris)
ggplot(data = iris, aes( x = Sepal.Length)) + geom_histogram( )
#visualize various groups in histogram
ggplot(iris, aes(x=Sepal.Length, color=Species)) +
geom_histogram(fill="white", binwidth = 1)
Creating Density Plot
ggplot(iris, aes( x = Sepal.Length)) + geom_density( )
ggplot(iris, aes(x=Sepal.Length, color=Species)) + geom_density( )
ggplot(diam`onds,aes(x = depth, fill = cut)) + geom_density()
› Creating Bar and Column Charts
data("mpg")
head(mpg)g
gplot(mpg, aes(x= class)) + geom_bar()
– Using coord_flip( ) one can inter-change x and y axis.
ggplot(mpg, aes(x= class)) + geom_bar() + coord_flip() # column chart
-----------
diamonds_m_cl = aggregate(diamonds, price ~ clarity, mean)
diamonds_m_cl
ggplot(diamonds_m_cl, aes(x = clarity, y = price)) + geom_bar(stat = "identity")
----------
library(ggplot2)
ODI <- data.frame(match=c("M-1","M-2","M-3","M-4"),runs=c(67,37,74,10))
Perf=ggplot(data=ODI,aes(x=match,y=runs,fill=match))+geom_bar(stat="identity")
Perf
---------
ggplot(data=ODI, aes(x=match, y=runs))+geom_bar(stat="identity",fill="blue")+
theme_dark()
› Creating Line chart
df <- data.frame(store=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'),
week=c(1, 2, 3, 1, 2, 3, 1, 2, 3), sales=c(9, 12, 15, 7, 9, 14, 10, 16, 19))
ggplot(df, aes(x=week, y=sales, group=store)) + geom_line(size=1)
----
ggplot(df, aes(x=week, y=sales, group=store, color=store)) + geom_line(size=2)
-----
ggplot(df, aes(x=week, y=sales, group=store, color=store)) +geom_line(size=2) +
scale_color_manual(values=c('orange', 'pink', 'red’))
----
ggplot(df, aes(x=week, y=sales, group=store, color=store,group=1)) +
geom_line(size=1)+ geom_point(size=2)
add or modify Main Title and Axis Labels
› following functions can be used to add or alter main title
and axis labels.
ggtitle("Main title"): Adds a main title above the plot
xlab("X axis label"): Changes the X axis label
ylab("Y axis label"): Changes the Y axis label
labs(title = "Main title", x = "X axis label", y = "Y axis label)
– Example:
p = ggplot(mpg, aes(x= class)) + geom_bar()
p + labs(title = "Number of Cars in each type", x = "Type of car", y = "Number of
cars")
– Adding data labels
p = ggplot(mpg, aes(x= class)) + geom_bar()
p = p + labs(title = "Number of Cars in each type", x = "Type of car", y = "Number
of cars")
p + geom_text(stat='count', aes(label=..count..), vjust=-0.25)