Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Imp Funtions

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 10

File excess in R

getwd(): get current directory where file are stored

ls(): list of files in directory

dir.create(): to create a directory in the current working directory called


"testdir".

setwd(testdir): changes working directory to testdir with in same path

file.create("mytest.R"): create a new file with mytest

list.files(): to get all file in the directory

file.exists("mytest.R"): Check if file exist or not

file.info("mytest.R"):give info

file.rename("mytest.R")

file.rename("mytest.R", "mytest2.R") : Renames the file

file.copy("mytest2.R", "mytest3.R") : copiea file

dir.create(file.path("testdir2", "testdir3"), recursive = TRUE): to create testdir3


under testdir2

unlink("testdir2", recursive = TRUE): deletes everything inside testdir2

Sequencing in R

seq(0,10,by=0.5): give sequence with difference of 0.5

Matrix and Data Frame

dim(my_vector) <- c(4,5): created a matrix of four rows and five colums from vector
lenght of 20

class(my_vector) : tells the class here it is matrix

my_matrix2 <- matrix(data = 1:20, nrow = 4, ncol=5) : will create matrix of 4 rows
and 5 columns

cbind(patients,my_matrix): combines two matrixs implicit coersion will take place


since one character matrix and other is integer result will be whole character
matrix

data.frame(patients , my_matrix ): combine two matrices by keeps type intacted

colnames(my_data) <- cnames : gives name to colums in matrix


Boolean Operators

ou can use the `&` operator to evaluate AND across a vector. The `&&` version of
AND only evaluates the first
| member of a vector.

isTRUE(): takes one argument. If that argument evaluates to TRUE, the function
will return TRUE

identical(): will return TRUE if the two R objects passed to it as arguments are
identical

xor(5 == 6, !FALSE): The xor() function stands for exclusive OR. If one argument
evaluates to TRUE and one argument evaluates to FALSE, then this function will
return TRUE

ints <- sample(10) : The vector `ints` is a random sampling of integers from 1 to
10 without replacement.

which(ints > 7): returns values which are greater then 7 in ints vector

The any()
| function will return TRUE if one or more of the elements in the logical vector is
TRUE. The all() function will
| return TRUE if every element in the logical vector is TRUE.

Functions

Sys.Date(): function returns a string representing today's date.

mean(c(2,4,5)): mean() function takes a vector of numbers as input, and returns


the average of all of the numbers in the
input vector. Inputs to functions are often called arguments. Providing arguments
to a function is also sometimes called passing arguments to that function.
Arguments you want to pass to a function go inside the function's parentheses

Error : options(editor = "internal")

source('~/first function .R'): to open a particular file

args(remainder): shows arguments of remainder

evaluate(function(x){x+1}, 6):tiny anonymous function that takes one argument `x`


and returns `x+1`

paste (..., sep = " ", collapse = NULL):`...` which is referred to as an ellipsis
or simply
| dot-dot-dot. The ellipsis allows an indefinite number of arguments to be passed
into a function.

# simon_says <- function(...){


# paste("Simon says:", ...)
# }
#
# The simon_says function works just like the paste function, except the
# begining of every string is prepended by the string "Simon says:"

Lapply and Sapply

Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a
function to each piece, then
| COMBINE the results. A more detailed discussion of this strategy is found in
Hadley Wickham's Journal of
| Statistical Software paper titled 'The Split-Apply-Combine Strategy for Data
Analysis'

viewinfo(): information can ve viewed

head(flags): to preview the first six lines

cls_list <- lapply(flags, class) : result class of each column

as.character(cls_list): list as factor

sum(flags$orange): flag is a list orange is a column so orange can be accessed in


this way : flags$orange and sum will do the sum

flags[, 11:17]: we want all rowns but columns only from 11 to 17

range() function returns the minimum and maximum of its first argument, which
should be a numeric vector.

sapply(flags,sum) : returns vector of length 1

When given a vector, the unique() function returns a vector with all duplicate
elements removed. In other
| words, unique() returns a vector of only the 'unique' elements

lapply(unique_vals, function(elem){elem[2]}) : we are defining and using our own


function
| right in the call to lapply(). Our function has no name and disappears as soon as
lapply() is done using it.

lapply() and sapply(). Both take a list as input, apply a function to each element
of the list, then combine
| and return the result. lapply() always returns a list, whereas sapply() attempts
to simplify the result

vapply and tapply

str() and summary() : give first line and summary resp.


hereas sapply() tries to 'guess' the correct format of the result, vapply() allows
you to specify it
| explicitly. If the result doesn't match the format you specify, vapply() will
throw an error, causing the
| operation to stop. This can prevent significant problems in your code that might
be caused by getting
| unexpected return values from sapply().

you'll often wish to split your data up into groups based on the value of some
variable,
| then apply a function to the members of each group. The next function we'll look
at, tapply(), does exactly
| that.

table(flags$landmass) to see how many flags/countries fall into each group

tapply(flags$animate, flags$landmass, mean): to apply the mean function to the


'animate' variable separately for each of the six landmass groups, thus giving us
the proportion of flags containing an animate image WITHIN each landmass group.

Looking at DATA

It's very common for data to be stored in a data frame. It is the default class
for data read into R using
| functions like read.csv() and read.table(), which you'll learn about in another
lesson.

nrow(plants) to see only the number of rows. ncol for colums

how much space the dataset is occupying in memory, you can use object.size(plants).

names(plants) will return a character vector of column (i.e. variable) names.

head(plants, 10) will show you the first 10 rows of the dataset.

tail(plants,15): will show last 15 rows

summary() provides different output for each variable, depending on its class. For
numeric data such as
| Precip_Min, summary() displays the minimum, 1st quartile, median, mean, 3rd
quartile, and maximum. These values
| help us understand how the data are distributed.

For categorical variables (called 'factor' variables in R), summary() displays the
number of times each value
| (or 'level') occurs in the data. For example, each value of Scientific_Name only
appears once, since it is
| unique to a specific plant. In contrast, the summary for Duration (also a factor
variable) tells us that our
| dataset contains 3031 Perennial plants, 682 Annual plants, etc.

The beauty of str() is that it combines many of the features of the other functions
you've already seen, all in
| a concise and readable format. At the very top, it tells us that the class of
plants is 'data.frame' and that
| it has 5166 observations and 10 variables. It then gives us the name and class of
each variable, as well as a
| preview of its contents.

Simulation

sample(1:6, 4, replace = TRUE).: sample takes a sample of the specified size from
the elements of x using either with or without replacement. sample(1:6, 4, replace
= TRUE) instructs R to randomly select four numbers between 1 and 6, WITH
replacement.
| Sampling with replacement simply means that each number is "replaced" after it is
selected, so that the same
| number can show up more than once.

LETTERS is a predefined variable in R containing a vector of all 26 letters of the


English alphabet.

When the 'size' argument to


| sample() is not specified, R takes a sample equal in size to the vector from
which you are sampling.

Let the value 0 represent tails and the value 1 represent heads. Use sample() to
draw a sample of size 100 from
| the vector c(0,1), with replacement. Since the coin is unfair, we must attach
specific probabilities to the
| values 0 (tails) and 1 (heads) with a fourth argument, prob = c(0.3, 0.7). Assign
the result to a new variable
| called flips.: flips <- sample(c(0,1),100,replace = TRUE, prob = c(0.3,0.7))

rbinom(): A binomial random variable represents the number of 'successes' (heads)


in a given number of independent
| 'trials' (coin flips). Therefore, we can generate a single random variable that
represents the number of heads
| in 100 flips of our unfair coin using rbinom(1, size = 100, prob = 0.7). Note
that you only specify the
| probability of 'success' (heads) and NOT the probability of 'failure' (tails).

Each probability distribution in R has an r*** function (for "random"), a d***


function (for "density"), a p*** (for
| "probability"), and q*** (for "quantile").

rbinom(1, size = 100, prob = 0.7). Note that you only specify the probability of
'success' (heads) and NOT the probability of
| 'failure' (tails).
rnorm(n,mean,sd): default value of mean = 0 and sd = 1,create random number from
normal distribution

rpois(n,lambda)

rpois(5,10)
Now use replicate(100, rpois(5, 10)) to perform this operation 100 times

colMeans(): calculate mean of all columns in matrix

hist(): plots a histogram

| All of the standard probability distributions are built into R, including


exponential (rexp()), chi-squared (rchisq()), gamma
| (rgamma())

Dates and Times

| Dates are represented by the 'Date' class and times are represented by the
'POSIXct' and 'POSIXlt' classes. Internally, dates are
| stored as the number of days since 1970-01-01 and times are stored as either the
number of seconds since 1970-01-01 (for 'POSIXct')
| or a list of seconds, minutes, hours, etc. (for 'POSIXlt').

Sys.Date() : give today's date

unclass(d1) : gives total no of days since 01.01.1970 if d1 is date object

What if we need to reference a date prior to 1970-01-01? Create a variable d2


containing as.Date("1969-01-01").

> d2 <- as.Date("1969-01-01")

unclass(d2)
[1] -365

You can access the current date and time using the Sys.time()

y default, Sys.time() returns an object of class POSIXct, but we can coerce the
result to POSIXlt with as.POSIXlt(Sys.time())

t1 <- Sys.time()

t2 <- as.POSIXlt(Sys.time())

output is same of t1 and t2 but when we use unclass on both result is different
The weekdays() function will return the day of week from any date or time object.
Try it out on d1, which is the Date object that
| contains today's date.

The months() function also works on any date or time object

The quarters() function returns the quarter of the year (Q1-Q4) from any date or
time object

Often, the dates and times in a dataset will be in a format that R does not
recognize. The strptime() function can be helpful in
| this situation.

t4 <- strptime(t3, "%B %d, %Y %H:%M") for t3 <- "October 17, 1986 08:24"

Finally, there are a number of operations that you can perform on dates and times,
including
| arithmetic operations (+ and -) and comparisons (<, ==, etc.)

Sys.time() - t1 give time difference

Use difftime(Sys.time(), t1, units = 'days') to find the amount of time in DAYS
that has
| passed since you created t1.

>
> difftime(Sys.time(),t1,units = 'days')
Time difference of 0.01397037 days

Basics Graphs

plot(cars)

| Perseverance, that's the answer.

|==================
| 22%

| As always, R tries very hard to give you something sensible given the information
that you
| have provided to it. First, R notes that the data frame you have given it has
just two
| columns, so it assumes that you want to plot one column versus the other.

Second, since we do not provide labels for either axis, R uses the names of the
columns.
| Third, it creates axis tick marks at nice round numbers and labels them
accordingly. Fourth,
| it uses the other defaults supplied in plot().

do not type plot(cars$speed,


| cars$dist), although that will work. Instead, use plot(x = cars$speed, y =
cars$dist).

plot(x = cars$speed, y = cars$dist)

| Excellent work!

|==============================
| 35%

| Note that this produces a slightly different answer than plot(cars). In this
case, R is not
| sure what you want to use as the labels on the axes, so it just uses the
arguments which you
| pass in, data frame name and dollar signs included.

Type plot(x = cars$speed, y = cars$dist, xlab = "Speed") to create the plot.

> plot(x = cars$speed, y = cars$dist,xlab = "Speed")

| You nailed it! Good job!

|=========================================
| 48%

| Recreate the plot with the label of the y-axis set to "Stopping Distance".

> plot(x = cars$speed, y = cars$dist,xlab = "Speed",ylab = "Stopping distance")

lot cars with a main title of "My Plot". Note that the argument for the main title
is
| "main" not "title".

> skip()

| Entering the following correct answer for you...

> plot(cars, main = "My Plot")

Plot cars with a sub title of "My Plot Subtitle".

> skip()

| Entering the following correct answer for you...

> plot(cars, sub = "My Plot Subtitle")

plot(cars, xlim = c(10,15))


plot(cars, pch = 2)

| Great job!

|=============================================================
| 72%

| Arguments like "col" and "pch" may not seem very intuitive. And that is because
they aren't!
| So, many/most people use more modern packages, like ggplot2, for creating their
graphics in
| R.

boxplot(), like many R functions, also takes a "formula" argument, generally an


expression
| with a tilde ("~") which indicates the relationship between the input variables.
This allows
| you to enter something like mpg ~ cyl to plot the relationship between cyl
(number of
| cylinders) on the x-axis and mpg (miles per gallon) on the y-axis.

boxplot(formula = mpg ~ cyl, data = mtcars)

You might also like