R Imp Funtions
R Imp Funtions
R Imp Funtions
file.info("mytest.R"):give info
file.rename("mytest.R")
Sequencing in R
dim(my_vector) <- c(4,5): created a matrix of four rows and five colums from vector
lenght of 20
my_matrix2 <- matrix(data = 1:20, nrow = 4, ncol=5) : will create matrix of 4 rows
and 5 columns
ou can use the `&` operator to evaluate AND across a vector. The `&&` version of
AND only evaluates the first
| member of a vector.
isTRUE(): takes one argument. If that argument evaluates to TRUE, the function
will return TRUE
identical(): will return TRUE if the two R objects passed to it as arguments are
identical
xor(5 == 6, !FALSE): The xor() function stands for exclusive OR. If one argument
evaluates to TRUE and one argument evaluates to FALSE, then this function will
return TRUE
ints <- sample(10) : The vector `ints` is a random sampling of integers from 1 to
10 without replacement.
which(ints > 7): returns values which are greater then 7 in ints vector
The any()
| function will return TRUE if one or more of the elements in the logical vector is
TRUE. The all() function will
| return TRUE if every element in the logical vector is TRUE.
Functions
paste (..., sep = " ", collapse = NULL):`...` which is referred to as an ellipsis
or simply
| dot-dot-dot. The ellipsis allows an indefinite number of arguments to be passed
into a function.
Each of the *apply functions will SPLIT up some data into smaller pieces, APPLY a
function to each piece, then
| COMBINE the results. A more detailed discussion of this strategy is found in
Hadley Wickham's Journal of
| Statistical Software paper titled 'The Split-Apply-Combine Strategy for Data
Analysis'
range() function returns the minimum and maximum of its first argument, which
should be a numeric vector.
When given a vector, the unique() function returns a vector with all duplicate
elements removed. In other
| words, unique() returns a vector of only the 'unique' elements
lapply() and sapply(). Both take a list as input, apply a function to each element
of the list, then combine
| and return the result. lapply() always returns a list, whereas sapply() attempts
to simplify the result
you'll often wish to split your data up into groups based on the value of some
variable,
| then apply a function to the members of each group. The next function we'll look
at, tapply(), does exactly
| that.
Looking at DATA
It's very common for data to be stored in a data frame. It is the default class
for data read into R using
| functions like read.csv() and read.table(), which you'll learn about in another
lesson.
how much space the dataset is occupying in memory, you can use object.size(plants).
head(plants, 10) will show you the first 10 rows of the dataset.
summary() provides different output for each variable, depending on its class. For
numeric data such as
| Precip_Min, summary() displays the minimum, 1st quartile, median, mean, 3rd
quartile, and maximum. These values
| help us understand how the data are distributed.
For categorical variables (called 'factor' variables in R), summary() displays the
number of times each value
| (or 'level') occurs in the data. For example, each value of Scientific_Name only
appears once, since it is
| unique to a specific plant. In contrast, the summary for Duration (also a factor
variable) tells us that our
| dataset contains 3031 Perennial plants, 682 Annual plants, etc.
The beauty of str() is that it combines many of the features of the other functions
you've already seen, all in
| a concise and readable format. At the very top, it tells us that the class of
plants is 'data.frame' and that
| it has 5166 observations and 10 variables. It then gives us the name and class of
each variable, as well as a
| preview of its contents.
Simulation
sample(1:6, 4, replace = TRUE).: sample takes a sample of the specified size from
the elements of x using either with or without replacement. sample(1:6, 4, replace
= TRUE) instructs R to randomly select four numbers between 1 and 6, WITH
replacement.
| Sampling with replacement simply means that each number is "replaced" after it is
selected, so that the same
| number can show up more than once.
Let the value 0 represent tails and the value 1 represent heads. Use sample() to
draw a sample of size 100 from
| the vector c(0,1), with replacement. Since the coin is unfair, we must attach
specific probabilities to the
| values 0 (tails) and 1 (heads) with a fourth argument, prob = c(0.3, 0.7). Assign
the result to a new variable
| called flips.: flips <- sample(c(0,1),100,replace = TRUE, prob = c(0.3,0.7))
rbinom(1, size = 100, prob = 0.7). Note that you only specify the probability of
'success' (heads) and NOT the probability of
| 'failure' (tails).
rnorm(n,mean,sd): default value of mean = 0 and sd = 1,create random number from
normal distribution
rpois(n,lambda)
rpois(5,10)
Now use replicate(100, rpois(5, 10)) to perform this operation 100 times
| Dates are represented by the 'Date' class and times are represented by the
'POSIXct' and 'POSIXlt' classes. Internally, dates are
| stored as the number of days since 1970-01-01 and times are stored as either the
number of seconds since 1970-01-01 (for 'POSIXct')
| or a list of seconds, minutes, hours, etc. (for 'POSIXlt').
unclass(d2)
[1] -365
You can access the current date and time using the Sys.time()
y default, Sys.time() returns an object of class POSIXct, but we can coerce the
result to POSIXlt with as.POSIXlt(Sys.time())
t1 <- Sys.time()
t2 <- as.POSIXlt(Sys.time())
output is same of t1 and t2 but when we use unclass on both result is different
The weekdays() function will return the day of week from any date or time object.
Try it out on d1, which is the Date object that
| contains today's date.
The quarters() function returns the quarter of the year (Q1-Q4) from any date or
time object
Often, the dates and times in a dataset will be in a format that R does not
recognize. The strptime() function can be helpful in
| this situation.
t4 <- strptime(t3, "%B %d, %Y %H:%M") for t3 <- "October 17, 1986 08:24"
Finally, there are a number of operations that you can perform on dates and times,
including
| arithmetic operations (+ and -) and comparisons (<, ==, etc.)
Use difftime(Sys.time(), t1, units = 'days') to find the amount of time in DAYS
that has
| passed since you created t1.
>
> difftime(Sys.time(),t1,units = 'days')
Time difference of 0.01397037 days
Basics Graphs
plot(cars)
|==================
| 22%
| As always, R tries very hard to give you something sensible given the information
that you
| have provided to it. First, R notes that the data frame you have given it has
just two
| columns, so it assumes that you want to plot one column versus the other.
Second, since we do not provide labels for either axis, R uses the names of the
columns.
| Third, it creates axis tick marks at nice round numbers and labels them
accordingly. Fourth,
| it uses the other defaults supplied in plot().
| Excellent work!
|==============================
| 35%
| Note that this produces a slightly different answer than plot(cars). In this
case, R is not
| sure what you want to use as the labels on the axes, so it just uses the
arguments which you
| pass in, data frame name and dollar signs included.
|=========================================
| 48%
| Recreate the plot with the label of the y-axis set to "Stopping Distance".
lot cars with a main title of "My Plot". Note that the argument for the main title
is
| "main" not "title".
> skip()
> skip()
| Great job!
|=============================================================
| 72%
| Arguments like "col" and "pch" may not seem very intuitive. And that is because
they aren't!
| So, many/most people use more modern packages, like ggplot2, for creating their
graphics in
| R.