Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
1K views

R-Programming Notes

This document provides an introduction to R programming for data analytics. It discusses key concepts in data analytics and tools used, including R. It then covers the history of R programming, features of R, and basic syntax used in R like variables, data types, vectors and arithmetic operations on vectors.

Uploaded by

nivn26393
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

R-Programming Notes

This document provides an introduction to R programming for data analytics. It discusses key concepts in data analytics and tools used, including R. It then covers the history of R programming, features of R, and basic syntax used in R like variables, data types, vectors and arithmetic operations on vectors.

Uploaded by

nivn26393
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to R-Programming

Data analytics
Data analytics is the collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision making.
Data analytics: Key concepts
• Descriptive analytics tell us what happened.
• Diagnostic analytics tell us why something happened.
• Predictive analytics tell us what will likely happen in the future.
• Prescriptive analytics tell us how to act.
Tools used for Data Analytics
• R Programming
• Tableau
• Python
• SAS
• Apache Spark
• Excel
Introduction
• R is very powerful programming language widely used in the Data Science world.
• R analytics is data analytics using R programming language, an open-source language used
for statistical computing or graphics
History of R Programming
1. Founders: Ross Ihaka and Robert Gentleman initiated the development of R at the University
of Auckland, New Zealand, in the early 1990s.
2. Development Begins (Early 1990s): Ross Ihaka and Robert Gentleman started working on R
with the aim of creating a language and environment for statistical computing and graphics.
3. Initial Release (1995): The first version of R was publicly released in 1995 under the GNU
General Public License.
4. Growing Popularity (Late 1990s-2000s): R gained attention and popularity among statisticians,
researchers, and academics due to its robust statistical analysis capabilities, data manipulation
functions, and graphical tools.
5. Community Contributions: The open-source nature of R encouraged collaboration and
contributions from a growing community of statisticians and developers. This led to the creation
of numerous packages, expanding R's functionalities.
6. Expansion into Various Industries (2000s-2010s): R found extensive use in data science,
bioinformatics, finance, and other industries due to its ability to handle large datasets, perform
complex analyses, and create compelling visualizations.
7. Continued Evolution: The R language continue to evolve with regular updates, new packages,
and improvements, overseen by the R Core Team, ensuring its relevance and usefulness in
statistical computing and data analysis.

Features of R:
• R is an interpreted language which means R allows coding in interactive manner.
• R is a free and open-source statistical software. Copyright for the primary source code of R
is held by R Foundation and publishes under the General Public License.
• Today R runs almost on almost any standard computing platform and operating system.
• R has state-of-the-art graphics capabilities. R is unbeatable for Data Visualization task.
• By 2017, CRAN (comprehensive R Archive Network) had more than 10,000 packages with
tone of thousands of functions.
• The community support is overwhelming. There are numerous forums to help you out.

Limitations of R Language
• As R is an interpreted language, R is slow as compared to C, C++ and other compiled
languages.
• Another biggest challenge Data scientists face while using R is Out of Memory issue.
• In R, no one tests the quality of new package before publishing it, That’s why the quality of
some packages is less than perfect.
Basic Syntax in R Programming
Variables in R:
In R programming, a variable is a named container that holds data or values, allowing you to
store, manipulate, and retrieve information within a program.
• In R, the assignment can be denoted in three ways:
= (Simple Assignment)
<- (Leftward Assignment)
-> (Rightward Assignment)
Example
a=10
b<-a+10
print(c(a,b))

Basic Data Types in R


1. Numeric: It represents numerical values.
Example:
# Numeric values
10.5
55
787
2. Integer: Whole numbers in R. The "L" suffix denotes that the number is explicitly an integer.
Example:
# Integer values
1L
55L
100L
3. Complex: Numbers with both real and imaginary parts. "i" represents the imaginary unit.
Example:
# Complex values
9 + 3i
4. Character (String): Textual data enclosed in quotes. It can include letters, numbers, symbols, or
spaces.
Example:
# Character values
"k"
"R is exciting"
"FALSE" # Even though it resembles a logical value, it's a string due to the quotes
"11.5" # A string as it's enclosed in quotes
5. Logical (Boolean): Represents TRUE or FALSE values, indicating true or false conditions.
Example:
# Logical values
TRUE
FALSE
Arithmetic Operators
• Addition (+): Adds two or more numbers together.
result <- 5 + 3
• Subtraction (-): Subtracts one number from another.
result <- 10 - 7
• Multiplication (*): Multiplies two or more numbers.
result <- 4 * 6
• Division (/): Divides one number by another.
result <- 8 / 2
• Exponentiation (^ or **): Raises a number to a power.
result <- 2^3 # 2 raised to the power of 3 is 8
• Modulus (%%): Computes the remainder when one number is divided by another.
result <- 10 %% 3 # The remainder when 10 is divided by 3 is 1
• Increment (+=) and Decrement (-=): These operators are used to increase or decrease the
value of a variable by a specific amount.
x <- 5
x += 2 # Increment x by 2, making it 7
x -= 3 # Decrement x by 3, making it 4

Vector:
In R programming, a vector is a fundamental data structure that represents a sequence of
elements of the same data type. It can hold numeric values, character strings, logical values, etc.
Vectors allow for the efficient storage and manipulation of data in a one-dimensional array-like
structure.
Syntax for Creating Vectors:
Using `c()` function:
vector_name <- c(element1, element2, ..., elementN)
Explanation of Terms:
- `vector_name`: The name given to the vector being created. It serves as the identifier for
accessing and manipulating the vector's elements.
- `c()` function: The `c()` function stands for "combine" and is used to concatenate elements
together into a vector. It creates a vector by combining individual elements.
- `element1, element2, ..., elementN`: These are the individual elements that constitute the
vector. Elements can be of the same or different data types, separated by commas within the `c()`
function.
Examples:
Numeric Vector Example:
# Creating a numeric vector
numbers <- c(1, 2, 3, 4, 5)
- `numbers` is the name assigned to the numeric vector.
- `c(1, 2, 3, 4, 5)` combines these five numeric elements into the vector named `numbers`.
Character Vector Example:
# Creating a character vector
names <- c("Alice", "Bob", "Charlie", "David")
- `names` is the name assigned to the character vector.
- `c("Alice", "Bob", "Charlie", "David")` combines these four character elements into the vector
named `names`.
Creating vector using sequence function:
#Sequence Generating Functions:
Using `:` Operator:
The `:` operator generates a sequence of numbers from a starting value to an ending value,
incrementing by 1.
Syntax:
sequence_vector <- start_value:end_value
Example:
# Creating a sequence using the ':' operator
sequence1 <- 1:10 # Generates a sequence from 1 to 10
- The `:` operator is a way to create sequences of consecutive integers in R.
- It generates a sequence starting from the `start_value` to the `end_value`, inclusive,
incrementing by 1 each time.
- This method is convenient for generating simple integer sequences.
Using `seq()` Function:
The `seq()` function allows more control over sequence generation by specifying the starting
point, ending point, and the increment value.
Syntax:
sequence_vector <- seq(from = start_value, to = end_value, by = increment)
Example:
# Creating a sequence using the 'seq()' function
sequence2 <- seq(1, 20, by = 2) # Generates a sequence from 1 to 20 with a step of 2
- The `seq()` function is versatile and allows for more flexibility in generating sequences.
- It generates a sequence starting from `start_value` to `end_value` with a specified increment
(`by`) value.
- The `from`, `to`, and `by` arguments help create sequences that aren't limited to consecutive
integers and allow for sequences of different lengths and steps.
Vector indexing:
In R, you can extract elements from vectors using indexing and slicing methods. Indexing allows
you to access specific elements of a vector. Here are the methods for extracting elements from
vectors:
Indexing Elements:
Single Element:
You can extract a single element from a vector by specifying its index within square brackets `[ ]`.
Indexing in R starts from 1.
Syntax:
vector_name[index]
Example:
# Extracting the third element from a numeric vector
numbers <- c(10, 20, 30, 40, 50)
element <- numbers[3] # Extracts the third element (30)
Multiple Elements:
To extract multiple elements, you can provide a vector of indices within the square brackets.
Syntax:
vector_name[c(index1, index2, ..., indexN)]
Example:
# Extracting multiple elements from a character vector
names <- c("Alice", "Bob", "Charlie", "David")
selected_names <- names[c(2, 4)] # Extracts elements at indices 2 and 4 ("Bob" and "David")
Arithmetic operations on Vectors:
Certainly! Here are notes on arithmetic operations (addition, subtraction, multiplication, and
division) on vectors in R along with examples and outputs:
Arithmetic Operations on Vectors:
- Addition of Vectors (`+`): Adding two vectors of the same length performs element-wise
addition.
Example:
# Adding two numeric vectors
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result_addition <- vec1 + vec2 # Element-wise addition: (1+4, 2+5, 3+6) = (5, 7, 9)
print(result_addition)
Output:
[1] 5 7 9
- Subtraction of Vectors (`-`): Subtracting two vectors of the same length performs element-wise
subtraction.
Example:
# Subtracting two numeric vectors
vec3 <- c(10, 20, 30)
vec4 <- c(4, 5, 6)
result_subtraction <- vec3 - vec4 # Element-wise subtraction: (10-4, 20-5, 30-6) = (6, 15, 24)
print(result_subtraction)
Output:
[1] 6 15 24
- Multiplication of Vectors (`*`): Multiplying two vectors of the same length performs element-
wise multiplication.
Example:
# Multiplying two numeric vectors
vec5 <- c(2, 4, 6)
vec6 <- c(3, 2, 1)
result_multiplication <- vec5 * vec6 # Element-wise multiplication: (2*3, 4*2, 6*1) = (6, 8, 6)
print(result_multiplication)
Output:
[1] 6 8 6
- Division of Vectors (`/`): Dividing two vectors of the same length performs element-wise
division.
Example:
# Dividing two numeric vectors
vec7 <- c(10, 20, 30)
vec8 <- c(2, 5, 3)
result_division <- vec7 / vec8 # Element-wise division: (10/2, 20/5, 30/3) = (5, 4, 10)
print(result_division)
Output:
[1] 5 4 10
List: A list in R is a collection of elements that can be of different data types or structures. It's a
versatile data structure used to store heterogeneous data.
Using `list()` function:
# Creating a list with different types of elements
my_list <- list(element1, element2, ..., elementN)
- Elements in a list can be vectors, matrices, data frames, other lists, scalars, or even functions.
- Each element in the list can be accessed using its index within double square brackets `[[ ]]`.
Example:
# Creating a list with different elements
my_list <- list("John", c(1, 2, 3), matrix(1:9, nrow = 3), data.frame(Name = c("Alice", "Bob"), Age =
c(25, 30)))
- `my_list` is a list containing:
- Element 1: Character string "John".
- Element 2: Numeric vector `c(1, 2, 3)`.
- Element 3: 3x3 matrix created by `matrix(1:9, nrow = 3)`.
- Element 4: Data frame with columns `Name` and `Age`.
Accessing Elements in a List:
- Elements in a list are accessed using double square brackets `[[ ]]` or the dollar sign `$` notation.
Example:
# Accessing elements in the list
element2_list <- my_list[[2]] # Accessing the second element of the list
age_column <- my_list[[4]]$Age # Accessing the 'Age' column in the fourth element of the list
- `my_list[[2]]` retrieves the second element (a numeric vector).
- `my_list[[4]]$Age` retrieves the `Age` column from the fourth element (a data frame) using the
`$` notation.
Converting List to Vector:
Using `unlist()` function:
The `unlist()` function in R is used to convert a list to a vector by concatenating its elements
together into a single vector.
Syntax:
vector_from_list <- unlist(my_list)
- `my_list` is the list that you want to convert to a vector.
- `unlist()` function concatenates the elements of the list into a single vector.
Example:
# Creating a list with two types of elements
my_list <- list(c(1, 2, 3), c("apple", "orange", "banana"))
# Converting the list to a vector
vector_from_list <- unlist(my_list)
print(vector_from_list)
Output:
[1] "1" "2" "3" "apple" "orange" "banana"
Matrix:
Matrix: In R, a matrix is a two-dimensional array that contains elements of the same data type. It
has rows and columns, forming a rectangular structure.
Creating Matrices:
Using `matrix()` function:
The `matrix()` function is used to create matrices in R by arranging elements into rows and
columns.
Syntax:
matrix_name <- matrix(data, nrow = number_of_rows, ncol = number_of_columns, byrow =
FALSE)
- `data`: The vector or sequence of elements used to fill the matrix.
- `nrow`: The number of rows in the matrix.
- `ncol`: The number of columns in the matrix.
- `byrow`: Specifies whether the matrix should be filled by rows (`TRUE`) or by columns (`FALSE`).
Example:
# Creating a 3x3 matrix filled by columns
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Accessing Elements in a Matrix:
- Elements in a matrix are accessed using square brackets `[row_index, column_index]`.
Example:
# Accessing an element in the matrix
element <- mat[2, 3] # Accessing element in the second row and third column (8)
print(element)
Output:
[1] 8
Matrix Operations and Functions:
Creating Matrices Using `cbind()` and `rbind()`:
- `cbind()` function combines vectors as columns to create a matrix.
- `rbind()` function combines vectors as rows to create a matrix.
Example:
# Creating vectors
vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
# Creating a matrix by column binding (cbind)
mat_cbind <- cbind(vector1, vector2)
print(mat_cbind)
# Creating a matrix by row binding (rbind)
mat_rbind <- rbind(vector1, vector2)
print(mat_rbind)
Output:
vector1 vector2
[1,] 1 4
[2,] 2 5
[3,] 3 6

[,1] [,2] [,3]


[1,] 1 2 3
[2,] 4 5 6
Functions:
- `dim()` function returns the dimensions (rows and columns) of a matrix.
- `rownames()` and `colnames()` functions return row and column names, respectively.
- `rowMeans()` and `colMeans()` functions calculate row-wise and column-wise means,
respectively.
Example:
# Dimensions of the matrix
dimensions <- dim(mat_cbind)
print(dimensions)
# Row means and column means of the matrix
row_means <- rowMeans(mat_rbind)
col_means <- colMeans(mat_rbind)
print(row_means)
print(col_means)
# Adding row and column names to the matrix
rownames(mat_cbind) <- c("A", "B", "C")
colnames(mat_cbind) <- c("Col1", "Col2")
print(rownames(mat_cbind))
print(colnames(mat_cbind))
Output:
[1] 3 2

[1] 2 5
[1] 3.333333 4.000000 5.666667

[1] "A" "B" "C"


[1] "Col1" "Col2"
Arrays:
An array in R is a multi-dimensional data structure that can store elements of the same data type.
It extends the matrices to more than two dimensions.
Creating Arrays:
Using `array()` function:
The `array()` function is used to create arrays in R by arranging elements in multiple dimensions.
Syntax:
array_name <- array(data, dim = c(dim1, dim2, ..., dimN))
- `data`: The vector or sequence of elements used to fill the array.
- `dim`: A vector specifying the dimensions of the array in each dimension.
Example:
# Creating a 3x3x2 array
arr <- array(1:18, dim = c(3, 3, 2))
print(arr)
Output:
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
Accessing Elements in an Array:
- Elements in an array are accessed using square brackets `[,,]` for multiple dimensions.
Example:
# Accessing an element in the array
element <- arr[2, 3, 2] # Accessing element at second row, third column, and second layer (17)
print(element)
Output:
[1] 17
Using Functions on an Array:
arr <- array(1:18, dim = c(3, 3, 2))
print(arr)
Output:
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
`dim()` Function:
The `dim()` function in R returns the dimensions of the array.
# Dimensions of the array
dimensions <- dim(arr)
print(dimensions)
Output:
[1] 3 3 2
`sum()` Function:
The `sum()` function in R calculates the sum of all elements in the array.
# Sum of all elements in the array
total_sum <- sum(arr)
print(total_sum)
Output:
[1] 171
`mean()` Function
The `mean()` function in R calculates the mean (average) value of all elements in the array.
# Mean of all elements in the array
average_value <- mean(arr)
print(average_value)
Output:
[1] 9.5
`min()` Function:
The `min()` function in R finds the minimum value among all elements in the array.
# Minimum value in the array
minimum_value <- min(arr)
print(minimum_value)
Output:
[1] 1
`max()` Function:
The `max()` function in R finds the maximum value among all elements in the array.
# Maximum value in the array
maximum_value <- max(arr)
print(maximum_value)
Output:
[1] 18
dataframes:
- DataFrame: A DataFrame is a two-dimensional, heterogeneous data structure similar to a table
or spreadsheet where columns can have different data types. It organizes data into rows and
columns.
Using `data.frame()` function:
The `data.frame()` function creates DataFrames in R by combining vectors or lists into a
structured tabular format.
Syntax:
df_name <- data.frame(column_name1 = vector1, column_name2 = vector2, ...)
- `column_name1, column_name2, ...`: Names for each column in the DataFrame.
- `vector1, vector2, ...`: Vectors containing data for each column.
Example:
# Creating a DataFrame
names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 28)
is_student <- c(TRUE, FALSE, TRUE)
# Combining vectors into a DataFrame
df <- data.frame(Name = names, Age = ages, IsStudent = is_student)
print(df)
Output:
Name Age IsStudent
1 Alice 25 TRUE
2 Bob 30 FALSE
3 Charlie 28 TRUE
Accessing Elements in a DataFrame:
- Elements in a DataFrame can be accessed using column names or numerical indexing
`[row_index, column_index]`.
Example:
# Accessing elements in the DataFrame
name_value <- df$Name # Accessing the 'Name' column
age_at_index_2 <- df[2, "Age"] # Accessing the age at row 2 and column 'Age'
Output:
# Output for name_value
[1] "Alice" "Bob" "Charlie"
# Output for age_at_index_2
[1] 30
Using Functions on the DataFrame:
`head()`: `head()` function returns the initial rows of the DataFrame (by default, the first 6 rows if
'n' is not specified).
# Displaying the first 3 rows using head()
head_rows <- head(df, n = 3)
print(head_rows)
Output (head):
Name Age IsStudent
1 Alice 25 TRUE
2 Bob 30 FALSE
3 Charlie 28 TRUE
`tail()`: `tail()` function returns the final rows of the DataFrame (by default, the last 6 rows if 'n' is
not specified).
# Displaying the last 2 rows using tail()
tail_rows <- tail(df, n = 2)
print(tail_rows)
Output (tail):
Name Age IsStudent
2 Bob 30 FALSE
3 Charlie 28 TRUE
`mean()`:Calculates the mean (average) value
`min()`:Returns the minimum value
`max()`:Returns the maximum value
Functions:
# Calculating mean, min, and max for numeric column 'Age'
average_age <- mean(df$Age)
min_age <- min(df$Age)
max_age <- max(df$Age)
print(average_age)
print(min_age)
print(max_age)
Output
[1] 27.66667
[1] 25
[1] 30
Non-numeric Values:
In R programming, non-numeric values typically refer to data that is not represented as numbers,
but rather as characters, logical values, or other non-numeric data types.
1. Character vectors: Character vectors are used to store text data. You can create character
vectors using single or double quotes. For example:
my_string <- "Hello, World!"
2. Logical values: R has two logical values, `TRUE` and `FALSE`, which are often used for
conditional operations and logical comparisons. You can also use `NA` to represent missing or
undefined values.
my_logical <- TRUE
3. Character matrices/data frames: In data frames, columns can contain character data, and the
data frame itself can be a combination of different data types, including characters, factors, and
numerics.
my_data_frame <- data.frame(Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22))
4. Lists: Lists are a versatile data structure in R that can hold elements of different types,
including non-numeric types. Lists can contain a mix of characters, factors, logical values, and
more.
my_list <- list(name = "John", age = 35, is_student = FALSE)
Special Values: In R programming, special values are often used to represent certain unique or
specific data conditions. These special values are typically used for handling missing data,
representing infinity, and denoting not-a-number (NaN) situations.
1. NA (Not Available):
- `NA` is used to represent missing or undefined data. It indicates that a value is not available or
is missing in the data.
- You can use `NA` in vectors, matrices, data frames, and other data structures to indicate
missing values.
- Example:
x <- c(1, 2, NA, 4, 5)
2. NaN (Not-a-Number):
- `NaN` represents a result that is undefined or not a valid number. It is often returned when
mathematical operations involve invalid values, such as dividing zero by zero.
- Example:
result <- 0/0
3. Inf (Infinity) and `-Inf (Negative Infinity):
- `Inf` represents positive infinity, while `-Inf` represents negative infinity. These values are used
when calculations result in values that are beyond the range of representable numbers.
- Example:
positive_inf <- Inf
negative_inf <- -Inf
4. NULL:
- `NULL` is used to represent an empty or undefined object. It is often used to remove objects
or reset variables.
- Example:
x <- 10
x <- NULL
5. TRUE and FALSE:
- `TRUE` and `FALSE` are used to represent logical values in R. They are used in conditional
statements, logical operations, and comparisons.
- Example:
is_true <- TRUE
is false <- FALSE
Basic Plotting: Basic plotting in R is typically accomplished using the built-in functions provided
by the base graphics system. You can create a wide variety of plots, including scatterplots, bar
plots, histograms, line plots, and more. Here are some basic examples of how to create common
types of plots in R
Basic Syntax for Plotting in R:
plot(x, y, type = "p", col = "blue", main = "Plot Title", xlab = "X-axis Label", ylab = "Y-axis Label")
- `x` and `y`: These are input vectors or datasets used for plotting. They represent the data points
to be visualized on the x-axis and y-axis, respectively.
- `type`: Defines the type of plot to be created. It specifies whether to create a scatter plot, line
plot, both lines and points, or other types of plots.
- `col`: Determines the color of the plotted elements. It can be specified as a color name
(`"blue"`, `"red"`, etc.)
- `main`: Represents the title of the plot. It provides a descriptive title for the entire plot.
- `xlab` and `ylab`: These parameters specify labels for the x-axis and y-axis, respectively. They
provide information about what the axes represent.
Scatter plot:
A scatter plot is a graphical representation of data that displays individual data points as dots on
a two-dimensional coordinate system. It is used to visually examine and show the relationship or
correlation between two variables, typically one on the x-axis and the other on the y-axis.
Scatter Plot in R:
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 1, 6, 3)
plot(x, y, type = "p", col = "blue", main = "Scatter Plot", xlab = "X-axis Label", ylab = "Y-axis Label")
Terms in the Syntax:
- `x` and `y`: These are numerical vectors or datasets representing the data points to be plotted
on the x-axis and y-axis, respectively.
- `type`: The `"p"` type is used for a scatter plot. It indicates that the plot will display points.
- `col`: Determines the color of the plotted points. It can be specified by color names or
hexadecimal color codes.
- `main`: Provides a title for the scatter plot.
- `xlab` and `ylab`: These parameters specify the labels for the x-axis and y-axis, respectively.
2. Bar Plot:
A bar plot is a graphical representation of data that uses rectangular bars to display the values of
different categories or groups. The height or length of each bar corresponds to the quantity or
value of the category it represents. Bar plots are often used to compare and visualize data across
different categories or groups.
# Create sample data
categories <- c("A", "B", "C", "D")
values <- c(10, 5, 12, 8)
# Create a bar plot
barplot(values, names.arg = categories, main = "Bar Plot Example", xlab = "Categories", ylab =
"Values")
- `categories` is a vector that represents the category labels for the bars.
- `values` is a vector that represents the values or heights of the bars corresponding to each
category.
You're using the `barplot` function to create the bar plot, and you're customizing it with various
arguments:
- `values` specifies the heights of the bars.
- `names.arg` is used to specify the labels for the categories on the X-axis.
- `main` sets the title of the plot.
- `xlab` and `ylab` are used to label the X-axis and Y-axis.
3. Histogram:
A histogram is a graphical representation of data that divides a range of values into intervals or
"bins" and shows how many data points fall into each interval. It provides a visual summary of
the distribution of data
# Manually create a dataset and create a histogram
data <- c(3, 4, 2, 5, 3, 4, 4, 6, 5, 3, 4, 5, 2, 3, 4, 5, 6, 4, 3, 5)
hist(data, main = "Simple Histogram Example", xlab = "Values", ylab = "Frequency", col = “blue”)
`data`: The dataset you want to visualize in the histogram.
`hist(data, ...)`: The function to create the histogram.
`main`: The title of the histogram.
`xlab`: The label for the X-axis.
`ylab`: The label for the Y-axis.
`col`: The color of the bars in the histogram.
4. line plot
Line plot is known as a line chart or line graph, is a type of data visualization used to display data
points over a continuous interval or time period. In a line plot, data points are connected by
straight lines, forming a line that represents the trend or pattern in the data. Line plots are
commonly used to show trends, changes, and relationships between variables.
Here's a simple example of creating a line plot in R using a hypothetical dataset:
# Sample data
time <- c(1, 2, 3, 4, 5, 6)
values <- c(10, 12, 8, 15, 11, 13)
# Create a line plot
plot(time, values, type = "l", main = "Line Plot Example", xlab = "Time", ylab = "Values")
In this example:
- `time` represents the time points.
- `values` represents the corresponding values at those time points.
-We use the `plot` function with the `type` argument set to "l" to create a line plot, connecting
the data points with lines. The `main` argument sets the title of the plot
- `xlab` and `ylab` specify labels for the X-axis and Y-axis, respectively.
5. box plot
is a graphical representation of a dataset that displays key statistical information, including the
median, quartiles. It helps visualize the distribution and spread of the data, making it easy to
compare multiple datasets or identify unusual data points.
Certainly! Here's a simple example of creating a box plot in R with a hypothetical dataset:
# Sample data for three groups
group1 <- c(25, 30, 32, 35, 38, 40, 42, 45, 48, 50)
group2 <- c(20, 22, 24, 26, 28, 30, 32, 34, 36, 38)
group3 <- c(15, 18, 20, 22, 24, 26, 28, 30, 32, 35)
# Create a box plot
boxplot(group1, group2, group3, names = c("Group 1", "Group 2", "Group 3"),
main = "Box Plot Example", ylab = "Values")
In this simple example:
- `group1`, `group2`, and `group3` represent three hypothetical groups or datasets with different
values.
- We use the `boxplot` function to create a box plot. In this function:
- `group1`, `group2`, and `group3` are the data vectors to be plotted.
- `names` is used to label the different groups on the X-axis.
- `main` sets the title of the plot as "Box Plot Example."
- `ylab` is used to label the Y-axis as "Values."
Coercion:
In R programming, coercion refers to the automatic or explicit conversion of data from one data
type to another. This conversion occurs when operations or functions expect data of a particular
type different from the one provided.
Types of Coercion in R:
1. Implicit Coercion (Automatic Coercion):
- Implicit coercion occurs automatically when R needs to perform an operation or calculation
involving different data types.
Example 1 - Implicit Coercion:
# Implicit coercion of 'x' from integer to numeric
x <- 10L
y <- 3.5
result <- x + y # Implicit coercion of 'x' from integer to numeric
print(result) # The result will be a double (numeric) value
In this case, `x` is an integer, and `y` is a numeric). When adding these two variables, `x` gets
implicitly coerced into a numeric to match the data type of `y`, resulting in a numeric value.
# Implicit Coercion in Comparison Operations:
x <- 5
y <- "10"
result <- x > y # Implicit coercion of 'y' from character to numeric
print(result) # The result will be a logical value: FALSE
2. Explicit Coercion:
- Explicit coercion involves manually converting data from one type to another using conversion
functions like `as.numeric`, `as.character`, `as.integer` and `as.logical`.
1. `as.logical()`:
- Converts data to logical (Boolean) type (`TRUE`/`FALSE`).
- Numeric zero becomes `FALSE`; non-zero values become `TRUE`.
Example:
x <- 0
y <- 5
print(as.logical(x)) # Output: FALSE
print(as.logical(y)) # Output: TRUE

2. `as.integer()`:
- Converts data to integer type.
- Truncates decimal points (doesn't round).
Example:
x <- 3.7
y <- "123"
print(as.integer(x)) # Output: 3
print(as.integer(y)) # Output: 123
3. `as.numeric()`:
- Converts data to numeric data type.
Example:
x <- "3.14"
y <- TRUE
print(as.numeric(x)) # Output: 3.14
print(as.numeric(y)) # Output: 1 (TRUE becomes 1 in numeric)
4. `as.character()`:
- Converts data to character type.
- Numbers are converted to their character representations.
Example:
x <- 123
y <- TRUE
print(as.character(x)) # Output: "123"
print(as.character(y)) # Output: "TRUE"
Functions:
In R programming, you can define functions using the `function()` keyword and then call those
functions to perform specific tasks or computations. Here's how you can define and call functions
in R:
Defining Functions:
To define a function in R, you use the `function()` keyword followed by a set of parentheses that
can contain input parameters, and then you specify the code block for the function's body. Here's
the basic structure of a function:
function_name <- function(parameter1, parameter2, ...) {
# Function body
# Code to perform a specific task
# Optionally, return a value using 'return()'
}
Here's a simple example of a function that adds two numbers:
add_numbers <- function(a, b) {
result <- a + b
return(result)
}
Calling Functions:
To call a function, you use its name and provide the required arguments (if any) within
parentheses. You can assign the result to a variable if needed.
Call the 'add_numbers' function
sum_result <- add_numbers(5, 3)
# Print the result
print(sum_result) # Output: [1] 8
In this example, we called the `add_numbers` function with arguments `5` and `3`, and it
returned the sum, which we assigned to the variable `sum_result` and then printed.
Example:
# Function to calculate factorial
factorial_func <- function(n) {
result <- 1
for (i in 1:n) {
result <- result * i
}
result
}
number <- 5
result <- factorial_func(number)
print(paste("Factorial of", number, "is", result))
Output:
[1] "Factorial of 5 is 120"
Conditions and looping:
In R programming, you can use conditions and loops to control the flow of your code and
perform repetitive tasks. Here's an overview of conditions and loops in R:
Conditions (Control Structures):
1. `if` Statements:
The `if` statement allows you to execute a block of code only if a specified condition is `TRUE`.
The basic structure is:
if (condition) {
# Code to execute if the condition is TRUE
}
Example:
x <- 10
if (x > 5) {
print("x is greater than 5")
}
2. `if-else` Statements:
The `if-else` statement allows you to execute one block of code if a condition is `TRUE` and
another block of code if the condition is `FALSE`.
if (condition) {
# Code to execute if the condition is TRUE
} else {
# Code to execute if the condition is FALSE
}
Example:
x <- 3
if (x > 5) {
print("x is greater than 5")
} else {
print("x is not greater than 5")
}
3. `if-else if-else` Statements:
You can use `if-else if-else` statements when you have multiple conditions to check. It allows you
to specify a series of conditions to test.
if (condition1) {
# Code to execute if condition1 is TRUE
} else if (condition2) {
# Code to execute if condition2 is TRUE
} else {
# Code to execute if no conditions are TRUE
}
Example:
x <- 7
if (x < 5) {
print("x is less than 5")
} else if (x == 5) {
print("x is equal to 5")
} else {
print("x is greater than 5")
}
Break and Next statements in R:
Break Statement:
The break keyword is a jump statement that is used to terminate the loop at a particular
iteration.
Syntax:
If (test_expression) {
break
}
Example 1:
# R program for break statement in For-loop
no <- 1:10
for (val in no)
{
if (val == 5)
{
print(paste("Coming out from for loop Where i = ", val))
break
}
print(paste("Values are: ", val))
}
Output:
[1] "Values are: 1"
[1] "Values are: 2"
[1] "Values are: 3"
[1] "Values are: 4"
[1] "Coming out from for loop Where i = 5"
Next Statement:
The next statement is used to skip the current iteration in the loop and move to the next
iteration without exiting from the loop itself
Syntax:
if (test_condition)
{
next
}
Example:
for (i in 1:5) {
if (i %% 2 == 0) {
next # Skip even numbers
}
print(i)
}
Loops (Control Structures):
1. `for` Loop:
A `for` loop is used to iterate over a sequence and perform a block of code for each element in
the sequence.
for (variable in sequence) {
# Code to execute for each element in the sequence
}
Example:
for (i in 1:5) {
print(i)
}
2. `while` Loop:
A `while` loop is used to repeatedly execute a block of code as long as a specified condition is
`TRUE`.
while (condition) {
# Code to execute as long as the condition is TRUE
}
Example:
x <- 1
while (x <= 5) {
print(x)
x <- x + 1
}
Exceptions:
In R programming, you can handle exceptions (errors) that occur during the execution of your
code using the `tryCatch()` function and related constructs. Exception handling is essential for
gracefully managing errors and ensuring that your code can handle unexpected situations
without crashing. Here's an overview of exception handling in R:
`tryCatch()` Function:
The primary mechanism for handling exceptions in R is the `tryCatch()` function. It allows you to
specify code to be executed within a "try" block, and you can also specify how to handle errors in
a "catch" block. The basic syntax is as follows:
result <- tryCatch({
# Code that may produce an error
}, error = function(err) {
# Code to handle the error
})
- The code within the `tryCatch()` block is the code that you want to execute, and it might
produce an error.
- The `error` argument specifies a function to be executed if an error occurs. This function takes
the error object as an argument, which you can inspect for information about the error.
Example
v<-list(1,2,4,'0',5)
for (i in v) {
tryCatch(
print(5/i),
error=function(e)
{
print("Non conformabale arguments")
})
}
Reading and writing files
1. Reading Text Files:
Syntax:
read.table(file, header = FALSE, sep = "", ...)
Explanation:
- `file`: The file path or connection where the data is located.
- `header`: Boolean, specifying if the first row contains column names.
- `sep`: The separator used in the file (e.g., "," for CSV, "\t" for tab-delimited).
Example:
data <- read.table("data.txt", header = TRUE, sep = "\t")
2. Reading CSV Files:
Syntax:
read.csv(file, header = TRUE, sep = ",", ...)
Explanation:
- `file`: The path or connection where the CSV data is located.
- `header`: Boolean, indicating if the first row contains column names.
- `sep`: The delimiter used in the CSV file (default is "," for comma-separated files).
Example:
data <- read.csv("data.csv")
Writing Files in R:
1. Writing Text Files:
Syntax:
write.table(x, file, sep = " ", ...)
Explanation:
- `x`: The data object (e.g., data frame) to be written.
- `file`: The file path or connection where the data will be written.
- `sep`: The separator to use between entries.
Example:
write.table(data, "output.txt", sep = "\t")
2. Writing CSV Files:
Syntax:
write.csv(x, file, row.names = FALSE, ...)
Explanation:
- `x`: The data object (e.g., data frame) to be written.
- `file`: The file path or connection where the CSV data will be written.
- `row.names`: Boolean indicating if row names should be included.
Example:
write.csv(data, "output.csv")
stacking statements:
In R programming, "stacking statements" typically refers to writing multiple statements on a
single line, separated by semicolons (;). Stacking statements is a way to condense multiple
operations into a single line of code. While this can be convenient for concise code, it can also
make the code less readable and harder to maintain, so it should be used judiciously.
Here is an example of stacking statements:
x <- 5; y <- 10; z <- x + y; print(z)
In this single line of code, multiple statements are stacked together:
While stacking statements can save space and make your code more compact, it's generally
recommended to use separate lines for each statement, especially in cases where the code might
become more complex or when readability is a priority. For example:
x <- 5
y <- 10
z <- x + y
print(z)
stand- alone statement with illustrations.
A standalone statement in R is a line of code that performs a specific task or operation
independently of other code.
1. Assignment Statement:
An assignment statement assigns a value to a variable. It's a common standalone statement in R.
x <- 10
In this example, the value `10` is assigned to the variable `x`.
2. Print Statement:
A print statement is used to display the value of a variable or expression in the console.
x <- 10
print(x)
This code assigns `10` to `x` and then prints the value of `x`.
3. Comment Statement:
A comment statement is used to add comments or notes to your code. It does not perform any
computation; it is purely for documentation.
# This
Visibility:
Visibility in R refers to the scope or accessibility of variables and objects within your code. R has a
specific set of rules for how variables can be accessed based on where they are defined.
1. Global Scope: Variables defined in the global scope are accessible from anywhere within the R
environment. They are not confined to a specific function or block of code.
2. Function Scope: Variables defined within a function are only accessible within that function's
scope. They are considered local variables and are not visible outside of the function.
Here is a simple example illustrating visibility:
x <- 10 # Global variable
my_function <- function() {
y <- 5 # Function-level variable
z <- x + y # Accesses the global variable 'x'
print(z)
}
my_function()
In this example, `x` is defined globally and can be accessed within the `my_function()` function,
while `y` is a local variable within the function.

You might also like