Machine Learning - Unit IV Notes
Machine Learning - Unit IV Notes
Unit IV
1. Manipulating Objects
2. Viewing Objects within Objects
3. Forms of Data Objects
4. Convert a Matrix to a Data Frame
5. Convert a Data Frame into a Matrix
6. Convert a Data Frame into a List
7. Convert a Matrix into a List
1. Manipulating Objects
filter() method
The filter() function is used to produce the subset of the data that satisfies the
condition specified in the filter() method. In the condition, we can use
conditional operators, logical operators, NA values, range operators etc. to
filter out data. Syntax of filter() function is given below-
filter(dataframeName, condition)
distinct() method
The distinct() method removes duplicate rows from data frame or based on
the specified columns. The syntax of distinct() method is given below-
arrange() method
In R, the arrange() method is used to order the rows based on a specified
column. The syntax of arrange() method is specified below-
arrange(dataframeName, columnName)
select() method
The select() method is used to extract the required columns as a table by
specifying the required column names in select() method. The syntax of
select() method is mentioned below-
select(dataframeName, col1,col2,…)
rename() method
The rename() function is used to change the column names. This can be done
by the below syntax-
rename(dataframeName, newName=oldName)
summarize() method
Using the summarize method we can summarize the data in the data frame by
using aggregate functions like sum(), mean(), etc. The syntax of summarize()
method is specified below-
summarize(dataframeName, aggregate_function(columnName))
mutate(dataframeName, newVariable=formula)
transmute(dataframeName, newVariable=formula)
Example
library(dplyr)
distinct(stats)
output
arrange(start,desc(run)) # Descending
Output
select(stats, player,wickets)
Rename the heading and return the new data frame. The original will remain
the same.
Output
● Vectors
● List
● Array
● Matrix
● Factors
● DataFrame
2.1 Vectors
Atomic vectors are one of the basic types of objects in R programming. Atomic
vectors can store homogeneous data types such as character, doubles,
integers, raw, logical, and complex. A single element variable is also said to
be vector.
x <- c(1, 2, 3, 4)
y <- c("a", "b", "c", "d")
z <- 5
print(y)
print(class(y))
print(z)
print(class(z))
2.2 Lists
List is another type of object in R programming. List can contain heterogeneous
data types such as vectors or another lists.
Example:
# Create list
ls <- list(c(1, 2, 3, 4), list("a", "b", "c"))
# Print
print(ls)
print(class(ls))
2.3 Matrices
To store values as 2-Dimensional array, matrices are used in R. Data, number
of rows and columns are defined in the matrix() function.
Syntax:
x <- c(1, 2, 3, 4, 5, 6)
# Create matrix
mat <- matrix(x, nrow = 2)
print(mat)
print(class(mat))
2.4 Factors
Factor object encodes a vector of unique elements (levels) from the given data
vector.
Example:
# Create vector
s <- c("spring", "autumn", "winter", "summer",
"spring", "autumn")
print(factor(s))
print(nlevels(factor(s)))
2.5 Arrays
array() function is used to create n-dimensional array. This function takes dim
attribute as an argument and creates required length of each dimension as
specified in the attribute.
Syntax:
Example:
# Create vectors
x <- 1:5
y <- LETTERS[1:5]
z <- c("Albert", "Bob", "Charlie", "Denver", "Elie")
Syntax:
as.data.frame(matrix_data)
Ex 1
matrix_data=matrix(c(1,2,3,4,5,6,7,8),nrow=4)
# display the data
print(matrix_data)
# convert the matrix into dataframe
dataframe_data=as.data.frame(matrix_data)
# print dataframe data
print(dataframe_data)
Ex 2
# create the matrix with 8 rows
# with different elements
matrix_data=matrix(c(
"bobby","pinkey","rohith","gnanesh",5.3,6.6,7,8,11:18),nrow=8)
# display the data
print(matrix_data)
Output - Matrix
Syntax: data.matrix(df)
# Creating a dataframe
df1 = data.frame(
"Name" = c("Amar", "Akbar", "Ronald"),
"Language" = c("R", "Python", "C#"),
"Age" = c(26, 38, 22)
)
Output
Name Language Age
[1,] 2 3 26
[2,] 1 2 38
[3,] 3 1 22
df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))
print("Sample Dataframe")
print (df)
Output
print("Sample Dataframe")
print (df)
list=as.list(df)
list
5.1 Dataframe rows as a list of vectors
split() function in R Language is used to divide a data vector into
groups as defined by the factor provided.
Syntax: split(x, f)
Parameters:
x: represents data vector or data frame
f: represents factor to divide the data
df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))
print("Sample Dataframe")
print (df)
print("Result after conversion")
split(df, 1:nrow(df))
unlist(as.list(matrix))
print("Sample matrix:")
print(mat)
Rind the transpose of the matrix using t() function. And convert into list using
as.list().
mat = matrix(1:12,nrow=3, ncol=4)
print("Sample matrix:")
print(mat)