Data Reshaping in R Programming
Last Updated :
01 Aug, 2023
Generally, in R Programming Language, data processing is done by taking data as input from a data frame where the data is organized into rows and columns. Data frames are mostly used since extracting data is much simpler and hence easier. But sometimes we need to reshape the format of the data frame from the one we receive. Hence, in R, we can split, merge and reshape the data frame using various functions.
The various forms of reshaping data in a data frame are:
- Transpose of a Matrix
- Joining Rows and Columns
- Merging of Data Frames
- Melting and Casting
Why R – Data Reshaping is Important?
While doing an analysis or using an analytic function, the resultant data obtained because of the experiment or study is generally different. The obtained data usually has one or more columns that correspond or identify a row followed by a number of columns that represent the measured values. We can say that these columns that identify a row can be the composite key of a column in a database.
Transpose of a Matrix
We can easily calculate the transpose of a matrix in R language with the help of the t() function. The t() function takes a matrix or data frame as an input and gives the transpose of that matrix or data frame as its output.
Syntax:
t(Matrix/ Data frame)
Example:
R
first <- matrix ( c (1:12), nrow=4, byrow= TRUE )
print ( "Original Matrix" )
first
first <- t (first)
print ( "Transpose of the Matrix" )
first
|
Output:
[1] "Original Matrix"
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
[1] "Transpose of the Matrix"
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Joining Rows and Columns in Data Frame
In R, we can join two vectors or merge two data frames using functions. There are basically two functions that perform these tasks:
cbind():
We can combine vectors, matrix or data frames by columns using cbind() function.
Syntax: cbind(x1, x2, x3)
where x1, x2 and x3 can be vectors or matrices or data frames.
rbind():
We can combine vectors, matrix or data frames by rows using rbind() function.
Syntax: rbind(x1, x2, x3)
where x1, x2 and x3 can be vectors or matrices or data frames.
Example:
R
name <- c ( "Shaoni" , "esha" , "soumitra" , "soumi" )
age <- c (24, 53, 62, 29)
address <- c ( "puducherry" , "kolkata" , "delhi" , "bangalore" )
info <- cbind (name, age, address)
print ( "Combining vectors into data frame using cbind " )
print (info)
newd <- data.frame (name= c ( "sounak" , "bhabani" ),
age= c ( "28" , "87" ),
address= c ( "bangalore" , "kolkata" ))
new.info <- rbind (info, newd)
print ( "Combining data frames using rbind " )
print (new.info)
|
Output:
[1] "Combining vectors into data frame using cbind "
name age address
[1,] "Shaoni" "24" "puducherry"
[2,] "esha" "53" "kolkata"
[3,] "soumitra" "62" "delhi"
[4,] "soumi" "29" "bangalore"
[1] "Combining data frames using rbind "
name age address
1 Shaoni 24 puducherry
2 esha 53 kolkata
3 soumitra 62 delhi
4 soumi 29 bangalore
5 sounak 28 bangalore
6 bhabani 87 kolkata
Merging two Data Frames
In R, we can merge two data frames using the merge() function provided both the data frames should have the same column names. We may merge the two data frames based on a key value.
Syntax: merge(dfA, dfB, …)
Example:
R
d1 <- data.frame (name= c ( "shaoni" , "soumi" , "arjun" ),
ID= c ( "111" , "112" , "113" ))
d2 <- data.frame (name= c ( "sounak" , "esha" ),
ID= c ( "114" , "115" ))
total <- merge (d1, d2, all= TRUE )
print (total)
|
Output:
name ID
1 arjun 113
2 shaoni 111
3 soumi 112
4 esha 115
5 sounak 114
Melting and Casting
Data reshaping involves many steps in order to obtain desired or required format. One of the popular methods is melting the data which converts each row into a unique id-variable combination and then casting it. The two functions used for this process:
melt():
It is used to convert a data frame into a molten data frame.
Syntax: melt(data, …, na.rm=FALSE, value.name=”value”)
where,
data: data to be melted
… : arguments
na.rm: converts explicit missings into implicit missings
value.name: storing values
dcast():
It is used to aggregate the molten data frame into a new form.
Syntax: melt(data, formula, fun.aggregate)
where,
data: data to be melted
formula: formula that defines how to cast
fun.aggregate: used if there is a data aggregation
Example:
R
library (reshape2)
a <- data.frame (id = c ( "1" , "1" , "2" , "2" ),
points = c ( "1" , "2" , "1" , "2" ),
x1 = c ( "5" , "3" , "6" , "2" ),
x2 = c ( "6" , "5" , "1" , "4" ))
a$x1 <- as.numeric ( as.character (a$x1))
a$x2 <- as.numeric ( as.character (a$x2))
print ( "Melting" )
m <- melt (a, id = c ( "id" , "points" ))
print (m)
print ( "Casting" )
idmn <- dcast (m, id ~ variable, mean)
print (idmn)
|
Output:
[1] "Melting"
id points variable value
1 1 1 x1 5
2 1 2 x1 3
3 2 1 x1 6
4 2 2 x1 2
5 1 1 x2 6
6 1 2 x2 5
7 2 1 x2 1
8 2 2 x2 4
[1] "Casting"
id x1 x2
1 1 4 5.5
2 2 4 2.5
Similar Reads
Data Reshaping in R Programming
Generally, in R Programming Language, data processing is done by taking data as input from a data frame where the data is organized into rows and columns. Data frames are mostly used since extracting data is much simpler and hence easier. But sometimes we need to reshape the format of the data frame
5 min read
Data Structures in R Programming
A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. Râs base data structures are often organized by
6 min read
R Programming for Data Science
R is an open-source programming language used statistical software and data analysis tools. It is an important tool for Data Science. It is highly popular and is the first choice of many statisticians and data scientists. R includes powerful tools for creating aesthetic and insightful visualizations
13 min read
Sorting of Arrays in R Programming
Prerequisite: R â Array A vector is a uni-dimensional array, which is specified by a single dimension, length. A Vector can be created using the âc()â function. A list of values is passed to the c() function to create a vector. Sorting can be done either in ascending order or descending. There are f
5 min read
Reading Tabular Data from files in R Programming
Often, the data which is to be read and worked upon is already stored in a file but is present outside the R environment. Hence, importing data into R is a mandatory task in such circumstances. The formats which are supported by R are CSV, JSON, Excel, Text, XML, etc. The majority of times, the data
4 min read
Data Wrangling in R Programming - Working with Tibbles
R is a robust language used by Analysts, Data Scientists, and Business users to perform various tasks such as statistical analysis, visualizations, and developing statistical software in multiple fields.In R Programming Language Data Wrangling is a process of reimaging the raw data to a more structu
6 min read
Data Munging in R Programming
Data Munging is the general technique of transforming data from unusable or erroneous form to useful form. Without a few degrees of data munging (irrespective of whether a specialized user or automated system performs it), the data can't be ready for downstream consumption. Basically the procedure o
11 min read
Basic Syntax in R Programming
R is the most popular language used for Statistical Computing and Data Analysis with the support of over 10, 000+ free packages in CRAN repository. Like any other programming language, R has a specific syntax which is important to understand if you want to make use of its powerful features. This art
3 min read
Data Wrangling in R Programming - Data Transformation
A dataset can be presented in many different ways to the world. Let us look at one of the most essential and fundamental distinctions, whether a dataset is wide or long. The difference between wide and long datasets condenses to whether we prefer to have more rows in our dataset or more columns. A d
3 min read
Array vs Matrix in R Programming
The data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. The two most important data structures in R ar
3 min read