R-Programming Notes
R-Programming Notes
Data analytics
Data analytics is the collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision making.
Data analytics: Key concepts
• Descriptive analytics tell us what happened.
• Diagnostic analytics tell us why something happened.
• Predictive analytics tell us what will likely happen in the future.
• Prescriptive analytics tell us how to act.
Tools used for Data Analytics
• R Programming
• Tableau
• Python
• SAS
• Apache Spark
• Excel
Introduction
• R is very powerful programming language widely used in the Data Science world.
• R analytics is data analytics using R programming language, an open-source language used
for statistical computing or graphics
History of R Programming
1. Founders: Ross Ihaka and Robert Gentleman initiated the development of R at the University
of Auckland, New Zealand, in the early 1990s.
2. Development Begins (Early 1990s): Ross Ihaka and Robert Gentleman started working on R
with the aim of creating a language and environment for statistical computing and graphics.
3. Initial Release (1995): The first version of R was publicly released in 1995 under the GNU
General Public License.
4. Growing Popularity (Late 1990s-2000s): R gained attention and popularity among statisticians,
researchers, and academics due to its robust statistical analysis capabilities, data manipulation
functions, and graphical tools.
5. Community Contributions: The open-source nature of R encouraged collaboration and
contributions from a growing community of statisticians and developers. This led to the creation
of numerous packages, expanding R's functionalities.
6. Expansion into Various Industries (2000s-2010s): R found extensive use in data science,
bioinformatics, finance, and other industries due to its ability to handle large datasets, perform
complex analyses, and create compelling visualizations.
7. Continued Evolution: The R language continue to evolve with regular updates, new packages,
and improvements, overseen by the R Core Team, ensuring its relevance and usefulness in
statistical computing and data analysis.
Features of R:
• R is an interpreted language which means R allows coding in interactive manner.
• R is a free and open-source statistical software. Copyright for the primary source code of R
is held by R Foundation and publishes under the General Public License.
• Today R runs almost on almost any standard computing platform and operating system.
• R has state-of-the-art graphics capabilities. R is unbeatable for Data Visualization task.
• By 2017, CRAN (comprehensive R Archive Network) had more than 10,000 packages with
tone of thousands of functions.
• The community support is overwhelming. There are numerous forums to help you out.
Limitations of R Language
• As R is an interpreted language, R is slow as compared to C, C++ and other compiled
languages.
• Another biggest challenge Data scientists face while using R is Out of Memory issue.
• In R, no one tests the quality of new package before publishing it, That’s why the quality of
some packages is less than perfect.
Basic Syntax in R Programming
Variables in R:
In R programming, a variable is a named container that holds data or values, allowing you to
store, manipulate, and retrieve information within a program.
• In R, the assignment can be denoted in three ways:
= (Simple Assignment)
<- (Leftward Assignment)
-> (Rightward Assignment)
Example
a=10
b<-a+10
print(c(a,b))
Vector:
In R programming, a vector is a fundamental data structure that represents a sequence of
elements of the same data type. It can hold numeric values, character strings, logical values, etc.
Vectors allow for the efficient storage and manipulation of data in a one-dimensional array-like
structure.
Syntax for Creating Vectors:
Using `c()` function:
vector_name <- c(element1, element2, ..., elementN)
Explanation of Terms:
- `vector_name`: The name given to the vector being created. It serves as the identifier for
accessing and manipulating the vector's elements.
- `c()` function: The `c()` function stands for "combine" and is used to concatenate elements
together into a vector. It creates a vector by combining individual elements.
- `element1, element2, ..., elementN`: These are the individual elements that constitute the
vector. Elements can be of the same or different data types, separated by commas within the `c()`
function.
Examples:
Numeric Vector Example:
# Creating a numeric vector
numbers <- c(1, 2, 3, 4, 5)
- `numbers` is the name assigned to the numeric vector.
- `c(1, 2, 3, 4, 5)` combines these five numeric elements into the vector named `numbers`.
Character Vector Example:
# Creating a character vector
names <- c("Alice", "Bob", "Charlie", "David")
- `names` is the name assigned to the character vector.
- `c("Alice", "Bob", "Charlie", "David")` combines these four character elements into the vector
named `names`.
Creating vector using sequence function:
#Sequence Generating Functions:
Using `:` Operator:
The `:` operator generates a sequence of numbers from a starting value to an ending value,
incrementing by 1.
Syntax:
sequence_vector <- start_value:end_value
Example:
# Creating a sequence using the ':' operator
sequence1 <- 1:10 # Generates a sequence from 1 to 10
- The `:` operator is a way to create sequences of consecutive integers in R.
- It generates a sequence starting from the `start_value` to the `end_value`, inclusive,
incrementing by 1 each time.
- This method is convenient for generating simple integer sequences.
Using `seq()` Function:
The `seq()` function allows more control over sequence generation by specifying the starting
point, ending point, and the increment value.
Syntax:
sequence_vector <- seq(from = start_value, to = end_value, by = increment)
Example:
# Creating a sequence using the 'seq()' function
sequence2 <- seq(1, 20, by = 2) # Generates a sequence from 1 to 20 with a step of 2
- The `seq()` function is versatile and allows for more flexibility in generating sequences.
- It generates a sequence starting from `start_value` to `end_value` with a specified increment
(`by`) value.
- The `from`, `to`, and `by` arguments help create sequences that aren't limited to consecutive
integers and allow for sequences of different lengths and steps.
Vector indexing:
In R, you can extract elements from vectors using indexing and slicing methods. Indexing allows
you to access specific elements of a vector. Here are the methods for extracting elements from
vectors:
Indexing Elements:
Single Element:
You can extract a single element from a vector by specifying its index within square brackets `[ ]`.
Indexing in R starts from 1.
Syntax:
vector_name[index]
Example:
# Extracting the third element from a numeric vector
numbers <- c(10, 20, 30, 40, 50)
element <- numbers[3] # Extracts the third element (30)
Multiple Elements:
To extract multiple elements, you can provide a vector of indices within the square brackets.
Syntax:
vector_name[c(index1, index2, ..., indexN)]
Example:
# Extracting multiple elements from a character vector
names <- c("Alice", "Bob", "Charlie", "David")
selected_names <- names[c(2, 4)] # Extracts elements at indices 2 and 4 ("Bob" and "David")
Arithmetic operations on Vectors:
Certainly! Here are notes on arithmetic operations (addition, subtraction, multiplication, and
division) on vectors in R along with examples and outputs:
Arithmetic Operations on Vectors:
- Addition of Vectors (`+`): Adding two vectors of the same length performs element-wise
addition.
Example:
# Adding two numeric vectors
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result_addition <- vec1 + vec2 # Element-wise addition: (1+4, 2+5, 3+6) = (5, 7, 9)
print(result_addition)
Output:
[1] 5 7 9
- Subtraction of Vectors (`-`): Subtracting two vectors of the same length performs element-wise
subtraction.
Example:
# Subtracting two numeric vectors
vec3 <- c(10, 20, 30)
vec4 <- c(4, 5, 6)
result_subtraction <- vec3 - vec4 # Element-wise subtraction: (10-4, 20-5, 30-6) = (6, 15, 24)
print(result_subtraction)
Output:
[1] 6 15 24
- Multiplication of Vectors (`*`): Multiplying two vectors of the same length performs element-
wise multiplication.
Example:
# Multiplying two numeric vectors
vec5 <- c(2, 4, 6)
vec6 <- c(3, 2, 1)
result_multiplication <- vec5 * vec6 # Element-wise multiplication: (2*3, 4*2, 6*1) = (6, 8, 6)
print(result_multiplication)
Output:
[1] 6 8 6
- Division of Vectors (`/`): Dividing two vectors of the same length performs element-wise
division.
Example:
# Dividing two numeric vectors
vec7 <- c(10, 20, 30)
vec8 <- c(2, 5, 3)
result_division <- vec7 / vec8 # Element-wise division: (10/2, 20/5, 30/3) = (5, 4, 10)
print(result_division)
Output:
[1] 5 4 10
List: A list in R is a collection of elements that can be of different data types or structures. It's a
versatile data structure used to store heterogeneous data.
Using `list()` function:
# Creating a list with different types of elements
my_list <- list(element1, element2, ..., elementN)
- Elements in a list can be vectors, matrices, data frames, other lists, scalars, or even functions.
- Each element in the list can be accessed using its index within double square brackets `[[ ]]`.
Example:
# Creating a list with different elements
my_list <- list("John", c(1, 2, 3), matrix(1:9, nrow = 3), data.frame(Name = c("Alice", "Bob"), Age =
c(25, 30)))
- `my_list` is a list containing:
- Element 1: Character string "John".
- Element 2: Numeric vector `c(1, 2, 3)`.
- Element 3: 3x3 matrix created by `matrix(1:9, nrow = 3)`.
- Element 4: Data frame with columns `Name` and `Age`.
Accessing Elements in a List:
- Elements in a list are accessed using double square brackets `[[ ]]` or the dollar sign `$` notation.
Example:
# Accessing elements in the list
element2_list <- my_list[[2]] # Accessing the second element of the list
age_column <- my_list[[4]]$Age # Accessing the 'Age' column in the fourth element of the list
- `my_list[[2]]` retrieves the second element (a numeric vector).
- `my_list[[4]]$Age` retrieves the `Age` column from the fourth element (a data frame) using the
`$` notation.
Converting List to Vector:
Using `unlist()` function:
The `unlist()` function in R is used to convert a list to a vector by concatenating its elements
together into a single vector.
Syntax:
vector_from_list <- unlist(my_list)
- `my_list` is the list that you want to convert to a vector.
- `unlist()` function concatenates the elements of the list into a single vector.
Example:
# Creating a list with two types of elements
my_list <- list(c(1, 2, 3), c("apple", "orange", "banana"))
# Converting the list to a vector
vector_from_list <- unlist(my_list)
print(vector_from_list)
Output:
[1] "1" "2" "3" "apple" "orange" "banana"
Matrix:
Matrix: In R, a matrix is a two-dimensional array that contains elements of the same data type. It
has rows and columns, forming a rectangular structure.
Creating Matrices:
Using `matrix()` function:
The `matrix()` function is used to create matrices in R by arranging elements into rows and
columns.
Syntax:
matrix_name <- matrix(data, nrow = number_of_rows, ncol = number_of_columns, byrow =
FALSE)
- `data`: The vector or sequence of elements used to fill the matrix.
- `nrow`: The number of rows in the matrix.
- `ncol`: The number of columns in the matrix.
- `byrow`: Specifies whether the matrix should be filled by rows (`TRUE`) or by columns (`FALSE`).
Example:
# Creating a 3x3 matrix filled by columns
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Accessing Elements in a Matrix:
- Elements in a matrix are accessed using square brackets `[row_index, column_index]`.
Example:
# Accessing an element in the matrix
element <- mat[2, 3] # Accessing element in the second row and third column (8)
print(element)
Output:
[1] 8
Matrix Operations and Functions:
Creating Matrices Using `cbind()` and `rbind()`:
- `cbind()` function combines vectors as columns to create a matrix.
- `rbind()` function combines vectors as rows to create a matrix.
Example:
# Creating vectors
vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
# Creating a matrix by column binding (cbind)
mat_cbind <- cbind(vector1, vector2)
print(mat_cbind)
# Creating a matrix by row binding (rbind)
mat_rbind <- rbind(vector1, vector2)
print(mat_rbind)
Output:
vector1 vector2
[1,] 1 4
[2,] 2 5
[3,] 3 6
[1] 2 5
[1] 3.333333 4.000000 5.666667
2. `as.integer()`:
- Converts data to integer type.
- Truncates decimal points (doesn't round).
Example:
x <- 3.7
y <- "123"
print(as.integer(x)) # Output: 3
print(as.integer(y)) # Output: 123
3. `as.numeric()`:
- Converts data to numeric data type.
Example:
x <- "3.14"
y <- TRUE
print(as.numeric(x)) # Output: 3.14
print(as.numeric(y)) # Output: 1 (TRUE becomes 1 in numeric)
4. `as.character()`:
- Converts data to character type.
- Numbers are converted to their character representations.
Example:
x <- 123
y <- TRUE
print(as.character(x)) # Output: "123"
print(as.character(y)) # Output: "TRUE"
Functions:
In R programming, you can define functions using the `function()` keyword and then call those
functions to perform specific tasks or computations. Here's how you can define and call functions
in R:
Defining Functions:
To define a function in R, you use the `function()` keyword followed by a set of parentheses that
can contain input parameters, and then you specify the code block for the function's body. Here's
the basic structure of a function:
function_name <- function(parameter1, parameter2, ...) {
# Function body
# Code to perform a specific task
# Optionally, return a value using 'return()'
}
Here's a simple example of a function that adds two numbers:
add_numbers <- function(a, b) {
result <- a + b
return(result)
}
Calling Functions:
To call a function, you use its name and provide the required arguments (if any) within
parentheses. You can assign the result to a variable if needed.
Call the 'add_numbers' function
sum_result <- add_numbers(5, 3)
# Print the result
print(sum_result) # Output: [1] 8
In this example, we called the `add_numbers` function with arguments `5` and `3`, and it
returned the sum, which we assigned to the variable `sum_result` and then printed.
Example:
# Function to calculate factorial
factorial_func <- function(n) {
result <- 1
for (i in 1:n) {
result <- result * i
}
result
}
number <- 5
result <- factorial_func(number)
print(paste("Factorial of", number, "is", result))
Output:
[1] "Factorial of 5 is 120"
Conditions and looping:
In R programming, you can use conditions and loops to control the flow of your code and
perform repetitive tasks. Here's an overview of conditions and loops in R:
Conditions (Control Structures):
1. `if` Statements:
The `if` statement allows you to execute a block of code only if a specified condition is `TRUE`.
The basic structure is:
if (condition) {
# Code to execute if the condition is TRUE
}
Example:
x <- 10
if (x > 5) {
print("x is greater than 5")
}
2. `if-else` Statements:
The `if-else` statement allows you to execute one block of code if a condition is `TRUE` and
another block of code if the condition is `FALSE`.
if (condition) {
# Code to execute if the condition is TRUE
} else {
# Code to execute if the condition is FALSE
}
Example:
x <- 3
if (x > 5) {
print("x is greater than 5")
} else {
print("x is not greater than 5")
}
3. `if-else if-else` Statements:
You can use `if-else if-else` statements when you have multiple conditions to check. It allows you
to specify a series of conditions to test.
if (condition1) {
# Code to execute if condition1 is TRUE
} else if (condition2) {
# Code to execute if condition2 is TRUE
} else {
# Code to execute if no conditions are TRUE
}
Example:
x <- 7
if (x < 5) {
print("x is less than 5")
} else if (x == 5) {
print("x is equal to 5")
} else {
print("x is greater than 5")
}
Break and Next statements in R:
Break Statement:
The break keyword is a jump statement that is used to terminate the loop at a particular
iteration.
Syntax:
If (test_expression) {
break
}
Example 1:
# R program for break statement in For-loop
no <- 1:10
for (val in no)
{
if (val == 5)
{
print(paste("Coming out from for loop Where i = ", val))
break
}
print(paste("Values are: ", val))
}
Output:
[1] "Values are: 1"
[1] "Values are: 2"
[1] "Values are: 3"
[1] "Values are: 4"
[1] "Coming out from for loop Where i = 5"
Next Statement:
The next statement is used to skip the current iteration in the loop and move to the next
iteration without exiting from the loop itself
Syntax:
if (test_condition)
{
next
}
Example:
for (i in 1:5) {
if (i %% 2 == 0) {
next # Skip even numbers
}
print(i)
}
Loops (Control Structures):
1. `for` Loop:
A `for` loop is used to iterate over a sequence and perform a block of code for each element in
the sequence.
for (variable in sequence) {
# Code to execute for each element in the sequence
}
Example:
for (i in 1:5) {
print(i)
}
2. `while` Loop:
A `while` loop is used to repeatedly execute a block of code as long as a specified condition is
`TRUE`.
while (condition) {
# Code to execute as long as the condition is TRUE
}
Example:
x <- 1
while (x <= 5) {
print(x)
x <- x + 1
}
Exceptions:
In R programming, you can handle exceptions (errors) that occur during the execution of your
code using the `tryCatch()` function and related constructs. Exception handling is essential for
gracefully managing errors and ensuring that your code can handle unexpected situations
without crashing. Here's an overview of exception handling in R:
`tryCatch()` Function:
The primary mechanism for handling exceptions in R is the `tryCatch()` function. It allows you to
specify code to be executed within a "try" block, and you can also specify how to handle errors in
a "catch" block. The basic syntax is as follows:
result <- tryCatch({
# Code that may produce an error
}, error = function(err) {
# Code to handle the error
})
- The code within the `tryCatch()` block is the code that you want to execute, and it might
produce an error.
- The `error` argument specifies a function to be executed if an error occurs. This function takes
the error object as an argument, which you can inspect for information about the error.
Example
v<-list(1,2,4,'0',5)
for (i in v) {
tryCatch(
print(5/i),
error=function(e)
{
print("Non conformabale arguments")
})
}
Reading and writing files
1. Reading Text Files:
Syntax:
read.table(file, header = FALSE, sep = "", ...)
Explanation:
- `file`: The file path or connection where the data is located.
- `header`: Boolean, specifying if the first row contains column names.
- `sep`: The separator used in the file (e.g., "," for CSV, "\t" for tab-delimited).
Example:
data <- read.table("data.txt", header = TRUE, sep = "\t")
2. Reading CSV Files:
Syntax:
read.csv(file, header = TRUE, sep = ",", ...)
Explanation:
- `file`: The path or connection where the CSV data is located.
- `header`: Boolean, indicating if the first row contains column names.
- `sep`: The delimiter used in the CSV file (default is "," for comma-separated files).
Example:
data <- read.csv("data.csv")
Writing Files in R:
1. Writing Text Files:
Syntax:
write.table(x, file, sep = " ", ...)
Explanation:
- `x`: The data object (e.g., data frame) to be written.
- `file`: The file path or connection where the data will be written.
- `sep`: The separator to use between entries.
Example:
write.table(data, "output.txt", sep = "\t")
2. Writing CSV Files:
Syntax:
write.csv(x, file, row.names = FALSE, ...)
Explanation:
- `x`: The data object (e.g., data frame) to be written.
- `file`: The file path or connection where the CSV data will be written.
- `row.names`: Boolean indicating if row names should be included.
Example:
write.csv(data, "output.csv")
stacking statements:
In R programming, "stacking statements" typically refers to writing multiple statements on a
single line, separated by semicolons (;). Stacking statements is a way to condense multiple
operations into a single line of code. While this can be convenient for concise code, it can also
make the code less readable and harder to maintain, so it should be used judiciously.
Here is an example of stacking statements:
x <- 5; y <- 10; z <- x + y; print(z)
In this single line of code, multiple statements are stacked together:
While stacking statements can save space and make your code more compact, it's generally
recommended to use separate lines for each statement, especially in cases where the code might
become more complex or when readability is a priority. For example:
x <- 5
y <- 10
z <- x + y
print(z)
stand- alone statement with illustrations.
A standalone statement in R is a line of code that performs a specific task or operation
independently of other code.
1. Assignment Statement:
An assignment statement assigns a value to a variable. It's a common standalone statement in R.
x <- 10
In this example, the value `10` is assigned to the variable `x`.
2. Print Statement:
A print statement is used to display the value of a variable or expression in the console.
x <- 10
print(x)
This code assigns `10` to `x` and then prints the value of `x`.
3. Comment Statement:
A comment statement is used to add comments or notes to your code. It does not perform any
computation; it is purely for documentation.
# This
Visibility:
Visibility in R refers to the scope or accessibility of variables and objects within your code. R has a
specific set of rules for how variables can be accessed based on where they are defined.
1. Global Scope: Variables defined in the global scope are accessible from anywhere within the R
environment. They are not confined to a specific function or block of code.
2. Function Scope: Variables defined within a function are only accessible within that function's
scope. They are considered local variables and are not visible outside of the function.
Here is a simple example illustrating visibility:
x <- 10 # Global variable
my_function <- function() {
y <- 5 # Function-level variable
z <- x + y # Accesses the global variable 'x'
print(z)
}
my_function()
In this example, `x` is defined globally and can be accessed within the `my_function()` function,
while `y` is a local variable within the function.