Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Mod1 R Programming

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

Module 1

STATISTICAL COMPUTING AND DATA TRANSFORMATION


MR Tool: Installing, loading and updating R packages, Creating objects,
Data types, Data structures, Sorting vectors and data frames, Directory
management commands, Direct data entry in R (for small data sets),
Importing data from other software, Decision structures (if, if-else, if-else
if-else), Repetitive structures (for and while loops), Other functions
(break, next, warn, stop)

Data Wrangling and Cleaning: Transform continuous variables to


categorical variables, Handling missing values, Sub-setting data frames,
Appending and merging data frames, Split data frames, Stack and unstack
data frames

Packages in R
Packages are collections of R functions, data, and compiled code in a well-
defined format. When you install a package it gives you access to a set of
commands that are not available in the base R set of functions. The directory
where packages are stored is called the library. R comes with a standard set of
packages. Others are available for download and installation. Once installed,
they have to be loaded into the session to be used.

The basic information about a package is provided in the DESCRIPTION file,


where you can find out what the package does, who the author is, what version
the documentation belongs to, the date, the type of license its use, and the
package dependencies.
For example, for the “stats” package, these ways will be:

packageDescription("stats")

help(package = "stats")

What are R Repositories?

A repository is a place where packages are located so you can install them from it.
Although you or your organization might have a local repository, typically, they are
online and accessible to everyone. Three of the most popular repositories for R
packages are:

CRAN: the official repository, it is a network of ftp and web servers maintained by the
R community around the world. The R foundation coordinates it, and for a package
to be published here, it needs to pass several tests that ensure the package is
following CRAN policies.

Bioconductor

Github

Installing R Packages From CRAN

To install a package you have to know where to get the package. Most established packages are
available from "CRAN" or the Comprehensive R Archive Network.

Packages download from specific CRAN "mirrors"" where the packages are saved (assuming that a
binary, or set of installation files, is available for your operating system). If you have not set a
preferred CRAN mirror in your options(), then a menu will pop up asking you to choose a location
from which you'd like to install your packages.

To install any package from CRAN, you use install.packages(). You only need to install packages the
first time you use R (or after updating to a new version).

For example, the oldest package published in CRAN and still online and
being updated is the vioplot package, from Daniel Adler.

To install it from CRAN, you will need to use:


install.packages("vioplot")

This code installs the "vioplot" package in R.

• The "install.packages()" function is used to install packages in R

. • In this case, the "vioplot" package is being installed.

• This package provides a violin plot function for visualizing data distributions.

After running this, you will receive some messages on the screen

some of the messages you might get are:

Installing package into ‘/home/username/R/x86_64-pc-linux-gnu-library/3.3’

(as ‘lib’ is unspecified)


R – Objects



Every programming language has its own data types to store values
or any information so that the user can assign these data types to
the variables and perform operations respectively. Operations are
performed accordingly to the data types.
These data types can be character, integer, float, long, etc. Based
on the data type, memory/storage is allocated to the variable. For
example, in C language character variables are assigned with 1 byte
of memory, integer variable with 2 or 4 bytes of memory and other
data types have different memory allocation for them.
Unlike other programming languages, variables are assigned to
objects rather than data types in R

Data Types
Data values in R come in several different types. We can begin by
considering three fundamental types of data

numeric values (5, 3.14)

character values (“abc”, “Wisconsin”)

logical values (TRUE, FALSE)

The distinction is fundamental because it is common for operators


(+, &) and functions to only work with specific types of data. When
you are creating or debugging an R script, getting the data type
right will be a common theme.

As a very simple example, we can add numbers, but not character


values.

5 + 3.14

[1] 8.14

"abc" + "Wisconsin"

Error in "abc" + "Wisconsin": non-numeric argument to binary


operator
Similarly, we can use the “and” operator (&) with logical values, but
not character values.

TRUE & FALSE

[1] FALSE

"abc" & "Wisconsin"


Error in "abc" & "Wisconsin": operations are possible only for
numeric, logical or complex types

Data structures
There are 6 basic types of data structures in the R language:
Vectors
Atomic vectors are one of the basic types of objects in R
programming. Atomic vectors can store homogeneous data types
such as character, doubles, integers, raw, logical, and complex. A
single element variable is also said to be vector.
Example:
Pause

Unmute

# Create vectors

x <- c(1, 2, 3, 4)

y <- c("a", "b", "c", "d")

z <- 5

# Print vector and class of vector

print(x)

print(class(x))

print(y)

print(class(y))
print(z)

print(class(z))

Output:
[1] 1 2 3 4
[1] "numeric"
[1] "a" "b" "c" "d"
[1] "character"
[1] 5
[1] "numeric"

Vector Functions in R

 sort(my_vector): Returns my_vector sorted


 rev(my_vector): Reverses the order of my_vector

 table(my_vector): Count the values in a vector


 unique(my_vector): Distinct elements in a vector
Lists
List is another type of object in R programming. List can contain
heterogeneous data types such as vectors or another lists.
Example:
# Create list

ls <- list(c(1, 2, 3, 4), list("a", "b", "c"))

# Print

print(ls)

print(class(ls))

Output:
[[1]]
[1] 1 2 3 4

[[2]]
[[2]][[1]]
[1] "a"

[[2]][[2]]
[1] "b"

[[2]][[3]]
[1] "c"

[1] "list"
Matrices
To store values as 2-Dimensional array, matrices are used in R.
Data, number of rows and columns are defined in
the matrix() function.
Syntax:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames
= NULL)
Example:
x <- c(1, 2, 3, 4, 5, 6)

# Create matrix

mat <- matrix(x, nrow = 2)

print(mat)

print(class(mat))

Output:
[, 1] [, 2] [, 3]
[1, ] 1 3 5
[2, ] 2 4 6

[1] "matrix"
Factors
Factor object encodes a vector of unique elements (levels) from the
given data vector.
Example:
# Create vector

s <- c("spring", "autumn", "winter", "summer",

"spring", "autumn")

print(factor(s))

print(nlevels(factor(s)))

Output:
[1] spring autumn winter summer spring autumn
Levels: autumn spring summer winter
[1] 4
Arrays
array() function is used to create n-dimensional array. This function
takes dim attribute as an argument and creates required length of
each dimension as specified in the attribute.
Syntax:
array(data, dim = length(data), dimnames = NULL)
Example:
# Create 3-dimensional array

# and filling values by column

arr <- array(c(1, 2, 3), dim = c(3, 3, 3))

print(arr)

Output:
,, 1

[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3,, 2
[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3,, 3

[, 1] [, 2] [, 3]
[1, ] 1 1 1
[2, ] 2 2 2
[3, ] 3 3 3
Data Frames

Data frames are 2-dimensional tabular data object in R


programming. Data frames consists of multiple columns and each
column represents a vector. Columns in data frame can have
different modes of data unlike matrices.
Example:
# Create vectors

x <- 1:5

y <- LETTERS[1:5]

z <- c("Albert", "Bob", "Charlie", "Denver", "Elie")

# Create data frame of vectors

df <- data.frame(x, y, z)

# Print data frame

print(df)

Output:
x y z
1 1 A Albert
2 2 B Bob
3 3 C Charlie
4 4 D Denver
5 5 E Elie

Directory management commands


The working directory is just a file path on your computer that sets
the default location of any files you read into R, or save out of R. In
other words, a working directory is like a little flag somewhere on
your computer which is tied to a specific analysis project. If you ask
R to import a dataset from a text file, or save a dataframe as a text
file, it will assume that the file is inside of your working directory.

You can only have one working directory active at any given time.
The active working directory is called your current working directory.

To see your current working directory, use getwd():

# Print my current working directory

getwd()

## [1] "/Users/nphillips/Dropbox/manuscripts/YaRrr/YaRrr_bd"

As you can see, when I run this code, it tells me that my working
directory is in a folder on my Desktop called yarrr. This means that
when I try to read new files into R, or write files out of R, it will
assume that I want to put them in this folder.

If you want to change your working directory, use the setwd()


function. For example, if I wanted to change my working directory to
an existing Dropbox folder called yarrr, I’d run the following code:

# Change my working directory to the following path

setwd(dir = "/Users/nphillips/Dropbox/yarrr")

Repetitive structures (for and while loops)


Here is an example of a simple for loop:

# Create a vector filled with random normal values


u1 <- rnorm(30)

print("This loop calculates the square of the first 10 elements of vector u1")

# Initialize `usq`

usq <- 0.

for (i in 1:10) {

# i-th element of `u1` squared into `i`-th position of `usq`

usq[i] <- u1[i] * u1[i]

print(usq[i])

print(i)

The format is while(cond) expr, where cond is the condition to test


and expr is an expression.

Example

Print i as long as i is less than 6:

i <- 1

while (i < 6) {

print(i)

i <- i + 1

}
Other functions:

The word ‘looping’ means cycling or iterating. Jump statements are


used in loops to terminate the loop at a particular iteration or to
skip a particular iteration in the loop. The two most commonly used
jump statements in loops are:
 Break Statement
 Next Statement
Note: In R language continue statement is referred to as the next
statement.
The basic function of the Break and Next statement is to alter the
running loop in the program and flow the control outside of the
loop. In R language, repeat, for, and a while loops are used to run
the statement or get the desired output N a number of times until
the given condition to the loop becomes false.
The break Statement in R is a jump statement that is used to
terminate the loop at a particular iteration.
Syntax:
if (test_expression) {
break
}
Break Statement in R using For-loop
 R
# R program for break statement in For-loop

no <- 1:10

for (val in no)

if (val == 5)

print(paste("Coming out from for loop Where i = ", val))

break

print(paste("Values are: ", val))

Output:
[1] "Values are: 1"
[1] "Values are: 2"
[1] "Values are: 3"
[1] "Values are: 4"
[1] "Coming out from for loop Where i = 5"
Break statement in R using While-loop
 R
# R Break Statement Example

a<-1

while (a < 10)

print(a)

if(a==5)

break

a = a + 1

Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Next Statement in R
The next statement in R is used to skip the current iteration in the
loop and move to the next iteration without exiting from the loop
itself.
Syntax:
if (test_condition)
{
next
}

Next statement in R using For-loop

# R Next Statement Example


no <- 1:10

for (val in no)

if (val == 6)

print(paste("Skipping for loop Where i = ", val))

next

print(paste("Values are: ", val))

Output:
[1] "Values are: 1"
[1] "Values are: 2"
[1] "Values are: 3"
[1] "Values are: 4"
[1] "Values are: 5"
[1] "Skipping for loop Where i = 6"
[1] "Values are: 7"
[1] "Values are: 8"
[1] "Values are: 9"
[1] "Values are: 10"
Next statement in R using While-loop

# R Next Statement Example

x <- 1

while(x < 5)

x <- x + 1;

if (x == 3)
next;

print(x);

Output:
[1] 2
[1] 4
[1] 5

message() vs. warning() vs. stop() Functions in R


Definitions: You can find the definitions of the message, warning, and stop functions below.

 The message R function generates a diagnostic message.


 The warning R function generates a warning message.
 The stop R function generates an error message and stops executing the current R code.

Basic R Syntaxes: You can find the basic R programming syntaxes of the message, warning,
and stop functions below.

message(any_string) # Basic R syntax of


message function
warning(any_string) # Basic R syntax of
warning function
stop(any_string) # Basic R syntax of stop
function

Example 1: Apply message() Function in R

Example 1 explains how to use the message function in the R programming language. Within
the message function, we need to specify a character string that should be returned as
diagnostic message to the RStudio console:

message("This is a message") # Using message function


# This is a message

Have a look at the previous output of the RStudio console: We returned a diagnostic message.

Example 2: Apply warning() Function in R


In this Example, I’ll show how to apply the warning function. Similar to the message
function, we need to give a character string as input for the warning command:

warning("This is a warning message") # Using warning function


# Warning message:
# This is a warning message

By comparing the previous RStudio console output with the output of Example 1, you can see
the major difference between the message and warning functions: The warning function
returns another line of output saying “Warning message”. This indicates that there might be a
problem with the R syntax.

Example 3: Apply stop() Function in R

The following R code illustrates how to generate error messages using the stop function.
Again, we need to assign a character string to the stop function:

stop("This is an error message") # Using stop function


# Error: This is an error message

As you can see, the stop function added the term “Error:” in front of our character string.

However, this is not the only difference of the stop function compared to the message and
warning functions. So keep on reading!

Example 4: Using stop() Function to Interrupt Process

The following R syntax shows another important feature of the stop function: The stop
function stops the execution of the currently running R code. It is illustrated with a for-
loop containing of ten iterations:

for(i in 1:10) {
# For-loop containing error condition

if(i != 5) {
print(paste("Finished loop iteration No.", i))
}

if(i == 5) {
stop("i was equal to 5!")
}
}
# [1] "Finished loop iteration No. 1"
# [1] "Finished loop iteration No. 2"
# [1] "Finished loop iteration No. 3"
# [1] "Finished loop iteration No. 4"
# Error: i was equal to 5!

Data wrangling and Cleaning


Data wrangling is the art of getting your data into R in a useful form for visualisation and
modelling.

We can divide data into two general categories: continuous and categorical. Continuous data
is numeric, has a natural order, and can potentially take on an infinite number of values.
Examples include age, income, and health care expenditures. In contrast, categorical data
takes on a limited number of values and may or may not have a natural order. Examples
without a natural order include race, state of residence, and political affiliation. Examples
with a natural order include Likert scale items (e.g., disagree, neutral, agree), socioeconomic
status, and educational attainment.

The distinction between continuous and categorical variables is fundamental to how we use
them the analysis. For example, in a regression model, continuous variables give us slopes
while categorical variables give us intercepts.

In R, categorical data is managed as factors. We specify which variables are factors when we
create and store them, and then they are treated as categorical variables in a model without
any additional specification.

Transform continuous variables to categorical variables


cut() function in R

cut() function in R Programming Language is used to divide a numeric vector into different
ranges. It is particularly useful when we want to convert a numeric variable into a categorical
one by dividing it into intervals or bins.

Syntax:
cut.default(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3)

Parameters:
x: Numeric Vector
break: break points of the vector
labels: labels for levels
include.lowest: Boolean value to include lowest break value

right: Boolean value to close interval on the right

dig.lab: Used when labels are not provided


Create a numeric vector and apply cut() Function

 R
# Create a numeric vector

ages <- c(18, 25, 35, 40, 50, 60, 70, 80, 90)

# Cut the vector into age groups

age_groups <- cut(ages, breaks = c(0, 25, 50, 75, 100),

labels = c("18-25", "26-50", "51-75", "76-100"))

# Print the result

print(table(age_groups))

Output:
age_groups
18-25 26-50 51-75 76-100
2 3 2 2
Cut Vector Using Specific Break Points and Labels
 R
# Create a numeric vector

ages <- c(18, 25, 35, 40, 50, 60, 70, 80, 90)

# Cut the vector into age groups

age_groups <- cut(ages, breaks = c(0, 25, 50, 75, 100),

labels = c("18-25", "26-50", "51-75", "76-100"))

# Create a data frame with the result

result_df <- data.frame(AgeGroup = levels(age_groups), Count =


table(age_groups))

# Print the result


print(result_df)

Output:
AgeGroup Count.age_groups Count.Freq
1 18-25 18-25 2
2 26-50 26-50 3
3 51-75 51-75 2
4 76-100 76-100 2
Create a data frame and apply cut() Function
 R
# R program to divide vector into ranges

# Creating vectors

age <- c(40, 49, 48, 40, 67, 52, 53)

salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220)

gender <- c("male", "male", "transgender",

"female", "male", "female", "transgender")

# Creating data frame named employee

employee<- data.frame(age, salary, gender)

# Creating a factor corresponding to age with labels

wfact = cut(employee$age, 3, labels = c('Young', 'Medium', 'Aged'))

table(wfact)

Output:
wfact
Young Medium Aged
4 2 1
Missing values
NA (Not Available) is a recognized element in R.

Finding missing values in a vector

# Create vector

x <- c(4, 2, 7, NA)

# Find missing values in vector:

is.na(x)

# Remove missing values

na.omit(x)
x[ !is.na(x) ]

Subsetting in R Programming



In R Programming Language, subsetting allows the user to access
elements from an object. It takes out a portion from the object
based on the condition provided. There are 4 ways of subsetting in R
programming. Each of the methods depends on the usability of the
user and the type of object. For example, if there is a dataframe
with many columns such as states, country, and population and
suppose the user wants to extract states from it, then subsetting is
used to do this operation. In this article, let us discuss the
implementation of different types of subsetting in R programming.
R – subsetting
Method 1: Subsetting in R Using [ ] Operator

Using the ‘[ ]’ operator, elements of vectors and observations from


data frames can be accessed. To neglect some indexes, ‘-‘ is used to
access all other indexes of vector or data frame.
Example 1:
In this example, let us create a vector and perform subsetting using
the [ ] operator.

 R
# Create vector

x <- 1:15

# Print vector

cat("Original vector: ", x, "\n")

# Subsetting vector

cat("First 5 values of vector: ", x[1:5], "\n")

cat("Without values present at index 1, 2 and 3: ",

x[-c(1, 2, 3)], "\n")

Output:
Original vector: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
First 5 values of vector: 1 2 3 4 5
Without values present at index 1, 2 and 3: 4 5 6 7 8 9 10 11
12 13 14 15
Example 2:
In this example, let us use mtcars data frame present in R base
package for subsetting.

 R
# Dataset

cat("Original dataset: \n")

print(mtcars)

# Subsetting data frame

cat("HP values of all cars:\n")

print(mtcars['hp'])
# First 10 cars

cat("Without mpg and cyl column:\n")

print(mtcars[1:10, -c(1, 2)])

Output:
Original dataset:
mpg cyl disp hp drat wt qsec vs am
gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
4 2

HP values of all cars:


hp
Mazda RX4 110
Mazda RX4 Wag 110
Datsun 710 93
Hornet 4 Drive 110
Hornet Sportabout 175
Valiant 105
Duster 360 245
Merc 240D 62
Merc 230 95
Merc 280 123
Merc 280C 123
Merc 450SE 180
Merc 450SL 180
Merc 450SLC 180
Cadillac Fleetwood 205
Lincoln Continental 215
Chrysler Imperial 230
Fiat 128 66
Honda Civic 52
Toyota Corolla 65
Toyota Corona 97
Dodge Challenger 150
AMC Javelin 150
Camaro Z28 245
Pontiac Firebird 175
Fiat X1-9 66
Porsche 914-2 91
Lotus Europa 113
Ford Pantera L 264
Ferrari Dino 175
Maserati Bora 335
Volvo 142E 109
Without mpg and cyl column:
disp hp drat wt qsec vs am gear carb
Mazda RX4 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 167.6 123 3.92 3.440 18.30 1 0 4 4

Method 2: Subsetting in R Using [[ ]] Operator

[[ ]] operator is used for subsetting of list-objects. This operator is


the same as [ ] operator but the only difference is that [[ ]] selects
only one element whereas [ ] operator can select more than 1
element in a single command.
Example 1: In this example, let us create a list and select the
elements using [[]] operator.
 R
# Create list

ls <- list(a = 1, b = 2, c = 10, d = 20)

# Print list

cat("Original List: \n")

print(ls)

# Select first element of list

cat("First element of list: ", ls[[1]], "\n")

Output:
Original List:
$a
[1] 1

$b
[1] 2

$c
[1] 10

$d
[1] 20

First element of list: 1

Method 2: Subsetting in R Using $ Operator

$ operator can be used for lists and data frames in R. Unlike [ ]


operator, it selects only a single observation at a time. It can be
used to access an element in named list or a column in data frame.
$ operator is only applicable for recursive objects or list-like objects.
Example 1: In this example, let us create a named list and access
the elements using $ operator
 R
# Create list

ls <- list(a = 1, b = 2, c = "Hello", d = "GFG")

# Print list

cat("Original list:\n")

print(ls)

# Print "GFG" using $ operator

cat("Using $ operator:\n")

print(ls$d)

Output:
Original list:
$a
[1] 1

$b
[1] 2

$c
[1] "Hello"

$d
[1] "GFG"

Using $ operator:
[1] "GFG"
Example 2: In this example, let us use the mtcars dataframe and
select a particular column using $ operator.
 R
# Dataset

cat("Original data frame:\n")

print(mtcars)

# Access hp column

cat("Using $ operator:\n")

print(mtcars$hp)

Output:
Original data frame:
mpg cyl disp hp drat wt qsec vs am
gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
4 2

Using $ operator:
[1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180
205 215 230 66 52
[20] 65 97 150 150 245 175 66 91 113 264 175 335 109

Method 4: Subsetting in R Using subset() Function

subset() function in R programming is used to create a subset of


vectors, matrices, or data frames based on the conditions provided
in the parameters.
Syntax: subset(x, subset, select)
Parameters:
 x: indicates the object
 subset: indicates the logical expression on the basis of
which subsetting has to be done
 select: indicates columns to select

Example 1: In this example, let us use airquality data frame
present in R base package and select Month where Temp < 65.
 R
# Subsetting

airq <- subset(airquality, Temp < 65,

select = c(Month))

# Print subset

print(airq)

Output:
Month
4 5
5 5
8 5
9 5
15 5
16 5
18 5
20 5
21 5
23 5
24 5
25 5
26 5
27 5
144 9
148 9
Example 2: In this example, let us use mtcars data frame present
in R base package and selects the car with 5 gears and hp > 200.
 R
# Subsetting
mtc <- subset(mtcars, gear == 5 & hp > 200,

select = c(gear, hp))

# Print subset

print(mtc)

Output:
gear hp
Ford Pantera L 5 264
Maserati Bora 5 335

How to merge dataframes in R ?




How to perform inner, outer, left, or right joins in a given dataframe


in R Programming Language.
Functions Used
merge() function is used to merge or join two tables. With
appropriate values provided to specific parameters, we can create
the desired join.
Syntax: merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)
Parameters:
df1: one dataframe
df2: another dataframe
by.df1, by.df2: The names of the columns that are common to
both df1 and df2.
all, all.df1, all.df2: Logical values that actually specify the type of
merging happens.

Inner join
An inner join also known as natural join, merges the two dataframes
in one that contains the common elements of both. For this merge()
function is simply given the values of two dataframes in
consideration and on the basis of a common column a dataframe is
generated.
t

Syntax:
merge(x = dataframe 1, y = data frame 2)
Example
 R
# create data frame 1 with id ,

# name and address

df1=data.frame(id=c(7058,7059,7072,7075),

name=c("bobby","pinkey","harsha","deepika"),

address=c("kakumanu","hyd","tenali","chebrolu"))

# create data frame 2 with id ,

df2=data.frame(id=c(7058,7059,7072,7075,7062,7063),

marks=c(90,78,98,67,89,90))

# display dataframe1

print(df1)

# display dataframe2

print(df2)

print("Inner join")

# inner join

print(merge(x = df1, y = df2))


Output:

Outer Join
Outer Join merges all the columns of both data frames into one for
all elements. For this, the dataframes in consideration along with all
parameter assigned value TRUE has to be passed to merge()
function.

Syntax:
merge(x = data frame 1, y = data frame 2, all = TRUE)
Example:
 R
# create data frame 1 with id , name and address

df1=data.frame(id=c(7058,7059,7072,7075),

name=c("bobby","pinkey","harsha","deepika"),
address=c("kakumanu","hyd","tenali","chebrolu"))

# create data frame 2 with id , marks

df2=data.frame(id=c(7058,7059,7072,7075,7062,7063),

marks=c(90,78,98,67,89,90))

# display dataframe1

print(df1)

# display dataframe2

print(df2)

print("Inner join")

# outer join

print(merge(x = df1, y = df2,all=TRUE))

Output:
Note: It returns NA of unmatched columns
Left Join
It gives the data which are matching all the rows in the first data
frame with the corresponding values on the second data frame. For
this along with the dataframes in consideration, all parameter has to
be passed TRUE after giving reference of the left table.

Syntax:
merge(x = data frame 1, y = data frame 2, all.x = TRUE)
Example:
 R
# create data frame 1 with id , name and address
df1=data.frame(id=c(7058,7059,7072,7075),

name=c("bobby","pinkey","harsha","deepika"),

address=c("kakumanu","hyd","tenali","chebrolu"))

# create data frame 2 with id , marks

df2=data.frame(id=c(7058,7059,7072,7075,7062,7063),

marks=c(90,78,98,67,89,90))

# display dataframe1

print(df1)

# display dataframe2

print(df2)

print("Left join")

# Left join

print(merge(x = df1, y = df2,all.x=TRUE))

Output:
Right Join
It gives the data which are matching all the rows in the second data
frame with the corresponding values on the first data frame. For this
merge() function should be provided with dataframes along with all
parameters assigned TRUE. all parameters should have a reference
to the right dataframe.

Syntax:
merge(x = data frame 1, y = data frame 2, all.y = TRUE)
Example:
 R
# create data frame 1 with id , name and address

df1=data.frame(id=c(7058,7059,7072,7075),

name=c("bobby","pinkey","harsha","deepika"),

address=c("kakumanu","hyd","tenali","chebrolu"))

# create data frame 2 with id , marks

df2=data.frame(id=c(7058,7059,7072,7075,7062,7063),
marks=c(90,78,98,67,89,90))

# display dataframe1

print(df1)

# display dataframe2

print(df2)

print("Right join")

# Right join

print(merge(x = df1, y = df2,all.y=TRUE))

Output:

How to split DataFrame in R





In this article, we will discuss how to split the dataframe in R
programming language.
A subset can be split both continuously as well as randomly based
on rows and columns. The rows and columns of the dataframe can
be referenced using the indexes as well as names. Multiple rows and
columns can be referred using the c() method in base R.

Splitting dataframe by row


Splitting dataframe by row indexes
The dataframe cells can be referenced using the row and column
names and indexes.
Syntax:
data-frame[start-row-num:end-row-num,]
The row numbers are retained in the final output dataframe.
Example: Splitting dataframe by row
R
# create first dataframe

data_frame1<-data.frame(col1=c(rep('Grp1',2),

rep('Grp2',2),

rep('Grp3',2)),

col2=rep(1:3,2),

col3=rep(1:2,3)

print("Original DataFrame")

print(data_frame1)

# extracting first four rows

data_frame_mod <- data_frame1[1:4,]

print("Modified DataFrame")

print(data_frame_mod)
Output:
[1] "Original DataFrame"
col1 col2 col3
1 Grp1 1 1
2 Grp1 2 2
3 Grp2 3 1
4 Grp2 1 2
5 Grp3 2 1
6 Grp3 3 2
[1] "Modified DataFrame"
col1 col2 col3
1 Grp1 1 1
2 Grp1 2 2
3 Grp2 3 1
4 Grp2 1 2
Example: Splitting dataframe by row
R
# create first dataframe

data_frame1<-data.frame(col1=c(rep('Grp1',2),

rep('Grp2',2),

rep('Grp3',2)),

col2=rep(1:3,2),

col3=rep(1:2,3)

print("Original DataFrame")

print(data_frame1)

data_frame_mod <- data_frame1[6,]

print("Modified DataFrame")
print(data_frame_mod)

Output:
[1] "Original DataFrame"
col1 col2 col3
1 Grp1 1 1
2 Grp1 2 2
3 Grp2 3 1
4 Grp2 1 2
5 Grp3 2 1
6 Grp3 3 2
[1] "Modified DataFrame"
col1 col2 col3
6 Grp3 3 2

Splitting dataframe by column


Splitting dataframe by column names
The dataframe can also be referenced using the column names.
Multiple column names can be specified using the c() method
containing column names as strings. The column names may be
contiguous or random in nature.

Syntax:
data-frame[,c(col1, col2,...)]
Example: splitting dataframe by column names
R
# create first dataframe

data_frame1<-data.frame(col1=c(rep('Grp1',2),

rep('Grp2',2),

rep('Grp3',2)),

col2=rep(1:3,2),

col3=rep(1:2,3),
col4 = letters[1:6]

print("Original DataFrame")

print(data_frame1)

# extracting sixth row

data_frame_mod <- data_frame1[,c("col2","col4")]

print("Modified DataFrame")

print(data_frame_mod)

Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 Grp1 1 1 a
2 Grp1 2 2 b
3 Grp2 3 1 c
4 Grp2 1 2 d
5 Grp3 2 1 e
6 Grp3 3 2 f
[1] "Modified DataFrame"
col2 col4
1 1 a
2 2 b
3 3 c
4 1 d
5 2 e
6 3 f
Splitting dataframe by column indices
The dataframe can also be referenced using the column indices.
Individual, as well as multiple columns, can be extracted from the
dataframe by specifying the column position.
Syntax:
data-frame[,start-col-num:end-col-num]
Example: Split dataframe by column indices
R
# create first dataframe

data_frame1<-data.frame(col1=c(rep('Grp1',2),

rep('Grp2',2),

rep('Grp3',2)),

col2=rep(1:3,2),

col3=rep(1:2,3),

col4 = letters[1:6]

print("Original DataFrame")

print(data_frame1)

# extracting last two columns

data_frame_mod <- data_frame1[,c(3:4)]

print("Modified DataFrame")

print(data_frame_mod)

Output:
[1] "Original DataFrame"
col1 col2 col3 col4
1 Grp1 1 1 a
2 Grp1 2 2 b
3 Grp2 3 1 c
4 Grp2 1 2 d
5 Grp3 2 1 e
6 Grp3 3 2 f
[1] "Modified DataFrame"
col3 col4
1 1 a
2 2 b
3 1 c
4 2 d
5 1 e
6 2 f

Stack and unstack in R


stack and unstack function in R are two important functions.
Stacking vectors concatenates multiple vectors into a single
vector along with a factor indicating where each observation
originated using stack() function. Unstacking reverses this
operation using unstack() function.

 Stack() function in R stacks a data set i.e. it converts a


data set from unstacked form to stacked form.
 Unstack() function in R unstacks a data set i.e. it
converts the data set from stacked form to unstacked
form.

Syntax for stack and unstack function in R:


stack(dataframe)
unstack(dataframe)
Stack and unstack function in R:
Lets use the “PlantGrowth” data set to demonstrate unstack
function in R. PlantGrowth data set is shown below.

stack and unstack function in R 12


1. Delete Rows in R? Explained...
Pause
Unmute

me -1:41
, numeric, logical, character
Syntax for stack and unstack function in R:
stack(dataframe)
unstack(dataframe)

Stack and unstack function in R:


Lets use the “PlantGrowth” data set to demonstrate unstack
function in R. PlantGrowth data set is shown below.
Example of unstack function in R:
unstack() function takes up the dataframe as argument and
unstacks the dataframe as shown below.

1# unstack function in R

3df = PlantGrowth

4unstacked_df = unstack(df)

5unstacked_df

In the above example unstack() function in R converts the data


from stacked form to unstacked form. So the output will be

ctrl trt1 trt2

1 4.17 4.81 6.31

2 5.58 4.17 5.12

3 5.18 4.41 5.54

4 6.11 3.59 5.50

5 4.50 5.87 5.37

6 4.61 3.83 5.29

7 5.17 6.03 4.92

8 4.53 4.89 6.15

9 5.33 4.32 5.80

10 5.14 4.69 5.26

unstack function in R by subsetting or


selecting specific columns
Lets use the “PlantGrowth” data frame to demonstrate unstack()
function in R. unstack() function takes up “PlantGrowth” and
selects first twenty rows and unstacks them as shown below.
1# unstack function in R

2
df<-PlantGrowth

unstacked_list = (unstack(df[c(1:20),]))

unstacked_list
3
so the above code unstacks the dataframe and converts them into a list as
4shown below.

Example of stack function in R:


Lets use the above data frame to demonstrate stack() function in
R.

1# stack function in R

3stacked_df = stack(unstacked_df)

4stacked_df

the above code stacks the data frame back to original data
frame, so the output will be
Stack function in R by subsetting or
selecting specific columns
Lets use the “unstacked_df” data frame to demonstrate stack()
function with select argument in R. stack() function takes up
“unstacked_df” and selects all the columns except “ctrl” column.

1# stack function in R

3stacked_df1 = stack(unstacked_df, select = -ctrl)

4stacked_df1

the above code stacks the data frame back to original data frame
except “ctrl” column, so the output will be
Hands on:

1. Write a R program to get the first 10 Fibonacci numbers.

2. Write a R program to find the maximum and the minimum


value of a given vector.

3. Write a R program to get all prime numbers up to a given


number

4. Write a R program to get the unique elements of a given


string and unique numbers of vector

You might also like