R Imp Funtions

File excess in R
getwd(): get current directory where file are stored
ls(): list of files in directory
dir.create(): to create a directory in the current working directory called

"testdir".
setwd(testdir): changes working directory to testdir with in same path
file.create("mytest.R"): create a new file with mytest
list.files(): to get all file in the directory
file.exists("mytest.R"): Check if file exist or not
file.info("mytest.R"):give info
file.rename("mytest.R")
file.rename("mytest.R", "mytest2.R") : Renames the file
file.copy("mytest2.R", "mytest3.R") : copiea file
dir.create(file.path("testdir2", "testdir3"), recursive = TRUE): to create testdir3

under testdir2
unlink("testdir2", recursive = TRUE): deletes everything inside testdir2
Sequencing in R
seq(0,10,by=0.5): give sequence with difference of 0.5
Matrix and Data Frame
dim(my_vector) <- c(4,5): created a matrix of four rows and five colums from vector
lenght of 20
class(my_vector) : tells the class here it is matrix
my_matrix2 <- matrix(data = 1:20, nrow = 4, ncol=5) : will create matrix of 4 rows
and 5 columns
cbind(patients,my_matrix): combines two matrixs implicit coersion will take place

since one character matrix and other is integer result will be whole character
matrix
data.frame(patients , my_matrix ): combine two matrices by keeps type intacted
colnames(my_data) <- cnames : gives name to colums in matrix

Boolean Operators
ou can use the `&` operator to evaluate AND across a vector. The `&&` version of
AND only evaluates the first
| member of a vector.
isTRUE(): takes one argument. If that argument evaluates to TRUE, the function
will return TRUE
identical(): will return TRUE if the two R objects passed to it as arguments are
identical
xor(5 == 6, !FALSE): The xor() function stands for exclusive OR. If one argument
evaluates to TRUE and one argument evaluates to FALSE, then this function will
return TRUE
ints <- sample(10) : The vector `ints` is a random sampling of integers from 1 to
10 without replacement.
which(ints > 7): returns values which are greater then 7 in ints vector
The any()
| function will return TRUE if one or more of the elements in the logical vector is
TRUE. The all() function will
| return TRUE if every element in the logical vector is TRUE.
Functions
Sys.Date(): function returns a string representing today's date.
mean(c(2,4,5)): mean() function takes a vector of numbers as input, and returns

the average of all of the numbers in the
input vector. Inputs to functions are often called arguments. Providing arguments
to a function is also sometimes called passing arguments to that function.
Arguments you want to pass to a function go inside the function's parentheses
Error : options(editor = "internal")
source('~/first function .R'): to open a particular file
args(remainder): shows arguments of remainder
evaluate(function(x){x+1}, 6):tiny anonymous function that takes one argument `x`

and returns `x+1`
paste (..., sep = " ", collapse = NULL):`...` which is referred to as an ellipsis
or simply
| dot-dot-dot. The ellipsis allows an indefinite number of arguments to be passed
into a function.
# simon_says <- function(...){

# paste("Simon says:", ...)
# }
#
# The simon_says function works just like the paste function, except the
# begining of every string is prepended by the string "Simon says:"
Lapply and Sapply
Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a
function to each piece, then
| COMBINE the results. A more detailed discussion of this strategy is found in
Hadley Wickham's Journal of
| Statistical Software paper titled 'The Split-Apply-Combine Strategy for Data
Analysis'
viewinfo(): information can ve viewed
head(flags): to preview the first six lines
cls_list <- lapply(flags, class) : result class of each column
as.character(cls_list): list as factor
sum(flags$orange): flag is a list orange is a column so orange can be accessed in

this way : flags$orange and sum will do the sum
flags[, 11:17]: we want all rowns but columns only from 11 to 17
range() function returns the minimum and maximum of its first argument, which
should be a numeric vector.
sapply(flags,sum) : returns vector of length 1
When given a vector, the unique() function returns a vector with all duplicate
elements removed. In other
| words, unique() returns a vector of only the 'unique' elements
lapply(unique_vals, function(elem){elem[2]}) : we are defining and using our own

function
| right in the call to lapply(). Our function has no name and disappears as soon as
lapply() is done using it.
lapply() and sapply(). Both take a list as input, apply a function to each element
of the list, then combine
| and return the result. lapply() always returns a list, whereas sapply() attempts
to simplify the result
vapply and tapply
str() and summary() : give first line and summary resp.

hereas sapply() tries to 'guess' the correct format of the result, vapply() allows
you to specify it
| explicitly. If the result doesn't match the format you specify, vapply() will
throw an error, causing the
| operation to stop. This can prevent significant problems in your code that might
be caused by getting
| unexpected return values from sapply().
you'll often wish to split your data up into groups based on the value of some
variable,
| then apply a function to the members of each group. The next function we'll look
at, tapply(), does exactly
| that.
table(flags$landmass) to see how many flags/countries fall into each group
tapply(flags$animate, flags$landmass, mean): to apply the mean function to the

'animate' variable separately for each of the six landmass groups, thus giving us
the proportion of flags containing an animate image WITHIN each landmass group.
Looking at DATA
It's very common for data to be stored in a data frame. It is the default class
for data read into R using
| functions like read.csv() and read.table(), which you'll learn about in another
lesson.
nrow(plants) to see only the number of rows. ncol for colums
how much space the dataset is occupying in memory, you can use object.size(plants).
names(plants) will return a character vector of column (i.e. variable) names.
head(plants, 10) will show you the first 10 rows of the dataset.
tail(plants,15): will show last 15 rows
summary() provides different output for each variable, depending on its class. For
numeric data such as
| Precip_Min, summary() displays the minimum, 1st quartile, median, mean, 3rd
quartile, and maximum. These values
| help us understand how the data are distributed.
For categorical variables (called 'factor' variables in R), summary() displays the
number of times each value
| (or 'level') occurs in the data. For example, each value of Scientific_Name only
appears once, since it is
| unique to a specific plant. In contrast, the summary for Duration (also a factor
variable) tells us that our
| dataset contains 3031 Perennial plants, 682 Annual plants, etc.
The beauty of str() is that it combines many of the features of the other functions
you've already seen, all in
| a concise and readable format. At the very top, it tells us that the class of
plants is 'data.frame' and that
| it has 5166 observations and 10 variables. It then gives us the name and class of
each variable, as well as a
| preview of its contents.
Simulation
sample(1:6, 4, replace = TRUE).: sample takes a sample of the specified size from
the elements of x using either with or without replacement. sample(1:6, 4, replace
= TRUE) instructs R to randomly select four numbers between 1 and 6, WITH
replacement.
| Sampling with replacement simply means that each number is "replaced" after it is
selected, so that the same
| number can show up more than once.
LETTERS is a predefined variable in R containing a vector of all 26 letters of the

English alphabet.
When the 'size' argument to

| sample() is not specified, R takes a sample equal in size to the vector from
which you are sampling.
Let the value 0 represent tails and the value 1 represent heads. Use sample() to
draw a sample of size 100 from
| the vector c(0,1), with replacement. Since the coin is unfair, we must attach
specific probabilities to the
| values 0 (tails) and 1 (heads) with a fourth argument, prob = c(0.3, 0.7). Assign
the result to a new variable
| called flips.: flips <- sample(c(0,1),100,replace = TRUE, prob = c(0.3,0.7))
rbinom(): A binomial random variable represents the number of 'successes' (heads)

in a given number of independent
| 'trials' (coin flips). Therefore, we can generate a single random variable that
represents the number of heads
| in 100 flips of our unfair coin using rbinom(1, size = 100, prob = 0.7). Note
that you only specify the
| probability of 'success' (heads) and NOT the probability of 'failure' (tails).
Each probability distribution in R has an r*** function (for "random"), a d***

function (for "density"), a p*** (for
| "probability"), and q*** (for "quantile").
rbinom(1, size = 100, prob = 0.7). Note that you only specify the probability of
'success' (heads) and NOT the probability of
| 'failure' (tails).
rnorm(n,mean,sd): default value of mean = 0 and sd = 1,create random number from
normal distribution
rpois(n,lambda)
rpois(5,10)
Now use replicate(100, rpois(5, 10)) to perform this operation 100 times
colMeans(): calculate mean of all columns in matrix
hist(): plots a histogram
| All of the standard probability distributions are built into R, including

exponential (rexp()), chi-squared (rchisq()), gamma
| (rgamma())
Dates and Times
| Dates are represented by the 'Date' class and times are represented by the
'POSIXct' and 'POSIXlt' classes. Internally, dates are
| stored as the number of days since 1970-01-01 and times are stored as either the
number of seconds since 1970-01-01 (for 'POSIXct')
| or a list of seconds, minutes, hours, etc. (for 'POSIXlt').
Sys.Date() : give today's date
unclass(d1) : gives total no of days since 01.01.1970 if d1 is date object
What if we need to reference a date prior to 1970-01-01? Create a variable d2

containing as.Date("1969-01-01").
> d2 <- as.Date("1969-01-01")
unclass(d2)
[1] -365
You can access the current date and time using the Sys.time()
y default, Sys.time() returns an object of class POSIXct, but we can coerce the
result to POSIXlt with as.POSIXlt(Sys.time())
t1 <- Sys.time()
t2 <- as.POSIXlt(Sys.time())
output is same of t1 and t2 but when we use unclass on both result is different
The weekdays() function will return the day of week from any date or time object.
Try it out on d1, which is the Date object that
| contains today's date.
The months() function also works on any date or time object
The quarters() function returns the quarter of the year (Q1-Q4) from any date or
time object
Often, the dates and times in a dataset will be in a format that R does not
recognize. The strptime() function can be helpful in
| this situation.
t4 <- strptime(t3, "%B %d, %Y %H:%M") for t3 <- "October 17, 1986 08:24"
Finally, there are a number of operations that you can perform on dates and times,
including
| arithmetic operations (+ and -) and comparisons (<, ==, etc.)
Sys.time() - t1 give time difference
Use difftime(Sys.time(), t1, units = 'days') to find the amount of time in DAYS
that has
| passed since you created t1.
>
> difftime(Sys.time(),t1,units = 'days')
Time difference of 0.01397037 days
Basics Graphs
plot(cars)
| Perseverance, that's the answer.
|==================
| 22%
| As always, R tries very hard to give you something sensible given the information
that you
| have provided to it. First, R notes that the data frame you have given it has
just two
| columns, so it assumes that you want to plot one column versus the other.
Second, since we do not provide labels for either axis, R uses the names of the
columns.
| Third, it creates axis tick marks at nice round numbers and labels them
accordingly. Fourth,
| it uses the other defaults supplied in plot().
do not type plot(cars$speed,

| cars$dist), although that will work. Instead, use plot(x = cars$speed, y =
cars$dist).
plot(x = cars$speed, y = cars$dist)
| Excellent work!
|==============================
| 35%
| Note that this produces a slightly different answer than plot(cars). In this
case, R is not
| sure what you want to use as the labels on the axes, so it just uses the
arguments which you
| pass in, data frame name and dollar signs included.
Type plot(x = cars$speed, y = cars$dist, xlab = "Speed") to create the plot.
> plot(x = cars$speed, y = cars$dist,xlab = "Speed")
| You nailed it! Good job!
|=========================================
| 48%
| Recreate the plot with the label of the y-axis set to "Stopping Distance".
> plot(x = cars$speed, y = cars$dist,xlab = "Speed",ylab = "Stopping distance")
lot cars with a main title of "My Plot". Note that the argument for the main title
is
| "main" not "title".
> skip()
| Entering the following correct answer for you...
> plot(cars, main = "My Plot")
Plot cars with a sub title of "My Plot Subtitle".
> skip()
| Entering the following correct answer for you...
> plot(cars, sub = "My Plot Subtitle")
plot(cars, xlim = c(10,15))

plot(cars, pch = 2)
| Great job!
|=============================================================
| 72%
| Arguments like "col" and "pch" may not seem very intuitive. And that is because
they aren't!
| So, many/most people use more modern packages, like ggplot2, for creating their
graphics in
| R.
boxplot(), like many R functions, also takes a "formula" argument, generally an

expression
| with a tilde ("~") which indicates the relationship between the input variables.
This allows
| you to enter something like mpg ~ cyl to plot the relationship between cyl
(number of
| cylinders) on the x-axis and mpg (miles per gallon) on the y-axis.
boxplot(formula = mpg ~ cyl, data = mtcars)

R Imp Funtions

Uploaded by

Copyright:

Available Formats

R Imp Funtions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Imp Funtions

Uploaded by

Copyright:

Available Formats

File excess in R

getwd(): get current directory where file are stored

ls(): list of files in directory

dir.create(): to create a directory in the current working directory called

setwd(testdir): changes working directory to testdir with in same path

file.create("mytest.R"): create a new file with mytest

list.files(): to get all file in the directory

file.exists("mytest.R"): Check if file exist or not

file.rename("mytest.R", "mytest2.R") : Renames the file

file.copy("mytest2.R", "mytest3.R") : copiea file

dir.create(file.path("testdir2", "testdir3"), recursive = TRUE): to create testdir3

unlink("testdir2", recursive = TRUE): deletes everything inside testdir2

seq(0,10,by=0.5): give sequence with difference of 0.5

Matrix and Data Frame

class(my_vector) : tells the class here it is matrix

cbind(patients,my_matrix): combines two matrixs implicit coersion will take place

data.frame(patients , my_matrix ): combine two matrices by keeps type intacted

colnames(my_data) <- cnames : gives name to colums in matrix

Sys.Date(): function returns a string representing today's date.

mean(c(2,4,5)): mean() function takes a vector of numbers as input, and returns

Error : options(editor = "internal")

source('~/first function .R'): to open a particular file

args(remainder): shows arguments of remainder

evaluate(function(x){x+1}, 6):tiny anonymous function that takes one argument `x`

# simon_says <- function(...){

Lapply and Sapply

viewinfo(): information can ve viewed

head(flags): to preview the first six lines

cls_list <- lapply(flags, class) : result class of each column

as.character(cls_list): list as factor

sum(flags$orange): flag is a list orange is a column so orange can be accessed in

flags[, 11:17]: we want all rowns but columns only from 11 to 17

sapply(flags,sum) : returns vector of length 1

lapply(unique_vals, function(elem){elem[2]}) : we are defining and using our own

vapply and tapply

str() and summary() : give first line and summary resp.

table(flags$landmass) to see how many flags/countries fall into each group

tapply(flags$animate, flags$landmass, mean): to apply the mean function to the

nrow(plants) to see only the number of rows. ncol for colums

names(plants) will return a character vector of column (i.e. variable) names.

tail(plants,15): will show last 15 rows

LETTERS is a predefined variable in R containing a vector of all 26 letters of the

When the 'size' argument to

rbinom(): A binomial random variable represents the number of 'successes' (heads)

Each probability distribution in R has an r*** function (for "random"), a d***

colMeans(): calculate mean of all columns in matrix

hist(): plots a histogram

| All of the standard probability distributions are built into R, including

Dates and Times

Sys.Date() : give today's date

unclass(d1) : gives total no of days since 01.01.1970 if d1 is date object

What if we need to reference a date prior to 1970-01-01? Create a variable d2

> d2 <- as.Date("1969-01-01")

The months() function also works on any date or time object

Sys.time() - t1 give time difference

| Perseverance, that's the answer.

do not type plot(cars$speed,

plot(x = cars$speed, y = cars$dist)

Type plot(x = cars$speed, y = cars$dist, xlab = "Speed") to create the plot.

Each probability distribution in R has an r* function (for "random"), a d*