R Complete Notes
R Complete Notes
R Complete Notes
UNIT - 1
Introduction To R Programming
1. What is R Programming?
2. List Features of R?
You can start a variable name with a letter or a period, but not with digits.
If a variable name starts with a dot, you can't follow it with digits.
R is case sensitive. This means that age and Age are treated as different
variables.
1. Boolean Variables - It stores single bit data which is either TRUE or FALSE
1. First way -
> x <- 6 # assignment operator: a less-than character (<) and a hyphen (-)
with no space
2. Second way -
3. Third way -
4. Fourth way -
> 5 -> fun #A rightward assignment operator (->) can be used anywhere
5. Fifth way -
6. Sixth way -
1. logical -
The logical data type in R is also known as boolean data type. It can only
have two values: TRUE and FALSE.
2. numeric -
In R, the numeric data type represents all real numbers with or without
decimal values.
3. integer -
The integer data type specifies real values without decimal points. We use
the suffix L to specify integer data.
4. complex -
6. raw -
9. What is coercion?
The process of altering the data type of an object to another type is referred to as
coercion or data type conversion.
Types of Coercion -
Syntax:-
as.data_type(object)
Example:-
print(as.numeric(TRUE))
Arithmetic Operators
Logical Operators
Relational Operators
Assignment Operators
Miscellaneous Operator
1. Vector -
2. List -
Lists are the R objects which contain elements of different types like −
numbers,strings, vectors and another list inside it.
3. Array -
An Array is a data structure which can store data of the same type in more
than two dimensions.
Syntax -
Here,
4. Matrices -
Example -
# create a 2 by 3 matrix
print(matrix1)
5. Data Frame -
Data frames are tabular data objects. Has both rows and columns analogous
to excel spreadsheet. Unlike a matrix in data frame each column can
contain different modes of data.
6. Factors-
Factors are data structures used to categorize and store data on multiple
levels.The main advantage is that it can store both Integer and Character
types of data.
Class System in R -
1. S3 Class -
With the help of the S3 class, we can take advantage of the ability to
implement the generic function OO.
2. S4 Class -
3. Reference Class -
You can either treat each vector as a row (by using the command rbind) or treat
each vector as a column (using the command cbind).
rbind() -
R > rbind(1:3,4:6)
Output:-
[1,] 1 2 3
[2,] 4 5 6
cbind() -
R > cbind(c(1,4),c(2,5),c(3,6))
[1,] 1 2 3
[2,] 4 5 6
2. main, xlab, ylab - Options to include plot title, the horizontal axi label,
and the vertical axis label, respectively
3. col - Color (or colors) to use for plotting points and lines.
4. pch - Stands for point character. This selects which character to use for
plotting individual points.
5. cex - Stands for character expansion. This controls the size of plotted
point characters.
6. lty - Stands for line type. This specifies the type of line to use to connect
the points (for example, solid, dotted, or dashed).
7. lwd - Stands for line width. This controls the thickness of plotted lines.
8. xlim, ylim - This provides limits for the horizontal range and vertical
range (respectively) of the plotting region.
UNIT - 2
R Programming Structures
1. For loop -
A for loop is used to iterate over a list, vector or any other object of
elements.
Syntax -
for (value in sequence) {
# block of code
2. While loop -
while loops are used when you don't know the exact number of times a
block of code is to be repeated.
Syntax -
while ( condition ) {
statement
3. Repeat loop -
We use the R repeat loop to execute a code block multiple times. However,
the repeat loop doesn't have any condition to terminate. You need to put
an exit condition implicitly with a break statement inside the loop.
Syntax -
repeat {
# statements
if(stop_condition) {
break
}
}
1. Break -
A break statement is used inside a loop (repeat, for, while) to stop the iterations
and flow the control outside of the loop.
2. Next -
A next statement is useful when we want to skip the current iteration of a loop
without terminating it. On encountering next, the R parser skips further evaluation
and starts next iteration of the loop
R does not directly support iteration over nonvector sets, but there are a couple of
indirect yet easy ways to accomplish it.
Apply() Family -
1. apply() -
This function is the most basic form of implicit looping—it takes a function
and applies it to each margin of an array.
Syntax -
2. lapply() -
The lapply() function is used to apply a function to each element of the list.
It collects the returned values into a list, and then returns that list.
Syntax-
lapply(x,FUN,…)
4. Explain operators in R?
1. Arithmetic operators -
Operators - + , - , * , / , %% , %/% , ^
2. Relational operators -
3. Logical operators -
4. Assignment operators -
<- or = or <<-
2. Right assignment operator -
-> or ->>
Syntax -
#statements
Arguments to functions are evaluated lazily, which means so they are evaluated
only when needed by the function body.
Functions are generally used for computing some value, so they need a
mechanism to supply that value back to the caller. This is called returning.
R functions are first-class objects (of the class "function"), meaning that they can
be used for the most part just like other objects.
g <- function(x) {
return(x+1)
For Example -
>>> x = [13,5,12]
>>> x.sort()
>>> x
> sort(x)
[1] 5 12 13
>x
[1] 13 5 12
>x
[1] 5 12 13
return(c(sv1,pivot,sv2))
UNIT - 3
Doing Math And Simulation In R
1. Cumulative Product -
> cumprod(x)
[1] 2 8 24
2. Cumulative Sum -
Example -
> x <- c(2,4,3)
> cumsum(x)
[1] 2 6 9
1. sort() - Ordinary numerical sorting of a vector is done with the sort() function.
2. order() - If you want the indices of the sorted values in the original vector, use
order() function.
3. rank() - This function specifies the rank of every single element present in a
vector.
1. t() -
Syntax - t(x)
2. det() -
3. diag() -
4. sweep() -
5. qr() - QR decomposition
1. union(x,y) -
The union of two sets is defined as the set of all the elements that are
members of set A, set B or both and is denoted by AUB read as A union B.
= {1,2,3,4,5,a,b,c,d,e}
2. intersect(x,y) -
The intersection of any two sets A and B is the set containing of all the
elements that belong to both A and B is denoted by A∩B read as A
intersection B.
A ∩ B = {1,2,3,4,5,a,b} ∩ {a,b,c,d,e}
= {a,b}
3. setdiff(x,y) -
The set difference of any two sets A and B is the set of elements that
belongs to A but not B. It is denoted by A-B and read as A difference B.
A = {1,2,3,4,5,6} B = {3,5,7,9}
A-B = {1,2,4,6}
B-A = {7,9}
4. setequal(x,y) -
Test for equality between x and y. If both x and y are equal it returns TRUE
otherwise returns FALSE.
Syntax: setequal(x, y)
Example -
x1 <- c(1, 2, 3, 4, 5, 6)
x2 <- c(1:6)
x3 <- c(2, 3, 4, 5, 6)
setequal(x1, x2)
setequal(x1, x3)
5. c %in% y -
6. choose(n,k) -
This function that can compute the nCr value without writing the whole
code for computing nCr value.
Syntax: choose(n, r)
> c32
[1,] 1 1 2
[2,] 2 3 3
UNIT - 4
Probability Distributions
1. Define probability?
The function that produces random variables always begins with ‘r’.
qbinom(p, size, prob) :- This function takes the probability value and gives a
number whose cumulative value matches the probability value.
1. x is a vector of numbers.
2. p is a vector of probabilities.
3. n is number of observations.
Normal Distribution is a probability function used in statistics that tells about how
the data values are distributed.
Functions -
A discrete variable, on the other hand, may take on only distinct numeric
values—and if the range is restricted, then the number of possible values is finite.
1. Univariate data -
2. Multivariate data -
When it’s necessary to consider data with respect to variables that exist in
more than one dimension, they are considered multivariate.
Syntax:
mean(x)
Syntax:
The mode is the value that has highest number of occurrences in a set of data.
Unike mean and median, mode can have both numeric and character data.
The covariance expresses how much two numeric variables “change together” and
the nature of that relationship, whether it is positive or negative.
The one-way ANOVA is used to test two or more means for equality. Those means
are split by a categorical group or factor variable.
UNIT - 5
Simpler Linear & Non-Linear Regression
1. What is regression?
Linear regression is used to predict the value of an outcome variable y on the basis
of one or more input predictor variables x.
y = ax + b
Where,
A linear line showing the relationship between the dependent and independent
variables is called a regression line.
If the dependent variable increases on the Y-axis and the independent variable
increases on the X-axis, then such a relationship is termed as a Positive linear
relationship.
This function creates the relationship model between the predictor and the
response variable.
Syntax -
lm(formula,data)
where,
Syntax -
predict(object, newdata)
where,
object is the formula which is already created using the lm() function.
The typical base R commands such as plot, hist, boxplot, and so on will
automatically open a device for plotting and draw the desired plot, if nothing is
currently open.
13. Write a R program for any visual representation of an object with creating
graphs using?
# graphic functions: Plot(),Hist(),Linechart(),Pie(),Boxplot(),Scatterplots().
x <- c(1, 2, 3, 4, 5)
y <- c(3, 5, 7, 2, 8)
dev.new()
# Create a plot
plot(x, y, type = "o", main = "Line Chart", xlab = "X-axis", ylab = "Y-axis", col = "blue")
dev.new()
# Create a histogram
dev.new()
dev.new()
dev.new()
# Create a boxplot
dev.new()
# Create a scatterplot
plot(x, y, main = "Scatterplot", xlab = "X-axis", ylab = "Yaxis", col = "green", pch = 19)