Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Machine Learning - Unit IV Notes

Notes for ml

Uploaded by

Vaibhav Behera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine Learning - Unit IV Notes

Notes for ml

Uploaded by

Vaibhav Behera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Machine Learning - R Programming

Unit IV

1. Manipulating Objects
2. Viewing Objects within Objects
3. Forms of Data Objects
4. Convert a Matrix to a Data Frame
5. Convert a Data Frame into a Matrix
6. Convert a Data Frame into a List
7. Convert a Matrix into a List

1. Manipulating Objects

In order to manipulate the data, R provides a library called dplyr which


consists of many built-in methods to manipulate the data. So to use the data
manipulation function, first need to import the dplyr package using
library(dplyr) line of code.

Function Name Description

filter() Produces a subset of a Data Frame.

distinct() Removes duplicate rows in a Data Frame

arrange() Reorder the rows of a Data Frame

select() Produces data in required columns of a Data Frame


rename() Renames the variable names

mutate() Creates new variables without dropping old ones.

transmute() Creates new variables by dropping the old.

summarize() Gives summarized data like Average, Sum, etc.

filter() method
The filter() function is used to produce the subset of the data that satisfies the
condition specified in the filter() method. In the condition, we can use
conditional operators, logical operators, NA values, range operators etc. to
filter out data. Syntax of filter() function is given below-

filter(dataframeName, condition)

distinct() method
The distinct() method removes duplicate rows from data frame or based on
the specified columns. The syntax of distinct() method is given below-

distinct(dataframeName, col1, col2,.., .keep_all=TRUE)

arrange() method
In R, the arrange() method is used to order the rows based on a specified
column. The syntax of arrange() method is specified below-
arrange(dataframeName, columnName)

select() method
The select() method is used to extract the required columns as a table by
specifying the required column names in select() method. The syntax of
select() method is mentioned below-

select(dataframeName, col1,col2,…)

rename() method
The rename() function is used to change the column names. This can be done
by the below syntax-

rename(dataframeName, newName=oldName)

summarize() method
Using the summarize method we can summarize the data in the data frame by
using aggregate functions like sum(), mean(), etc. The syntax of summarize()
method is specified below-

summarize(dataframeName, aggregate_function(columnName))

mutate() & transmute() methods


These methods are used to create new variables. The mutate() function
creates new variables without dropping the old ones but transmute() function
drops the old variables and creates new variables. The syntax of both
methods is mentioned below-

mutate(dataframeName, newVariable=formula)

transmute(dataframeName, newVariable=formula)
Example
library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))

# fetch players who scored more


# than 100 runs
filter(stats, runs>100)
Output

distinct(stats)

output

#remove duplicates based on a column


distinct(stats, player, .keep_all = TRUE)
arrange(stats, runs) # Ascending order
Output

arrange(start,desc(run)) # Descending

Output

select(stats, player,wickets)

Rename the heading and return the new data frame. The original will remain
the same.

# change the heading of runs to runs_scored


NewData = rename(stats, runs_scored=runs)
NewData
Output
summarize(stats, sum(runs), mean(runs))

Output

# add new column avg


# The original data frame will remain the same.
# The function return new data frame which include old data with new column
NewColumnAdded = mutate(stats, avg=runs/4)
NewColumnAdded
Output

# drop all the existing column and create a new column.


# the data frame passed in function will remain the same
# the function create new data frame only with the column based on the
expression given
DF_RemoveColumn = transmute(stats, avg=runs/4)
DF_RemoveColumn
2. Viewing Objects within Objects
There are 5 basic types of objects in the R language:

● Vectors
● List
● Array
● Matrix
● Factors
● DataFrame

2.1 Vectors
Atomic vectors are one of the basic types of objects in R programming. Atomic
vectors can store homogeneous data types such as character, doubles,
integers, raw, logical, and complex. A single element variable is also said to
be vector.

x <- c(1, 2, 3, 4)
y <- c("a", "b", "c", "d")
z <- 5

# Print vector and class of vector


print(x)
print(class(x))

print(y)
print(class(y))

print(z)
print(class(z))
2.2 Lists
List is another type of object in R programming. List can contain heterogeneous
data types such as vectors or another lists.

Example:

# Create list
ls <- list(c(1, 2, 3, 4), list("a", "b", "c"))

# Print
print(ls)
print(class(ls))

2.3 Matrices
To store values as 2-Dimensional array, matrices are used in R. Data, number
of rows and columns are defined in the matrix() function.

Syntax:

matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)

x <- c(1, 2, 3, 4, 5, 6)

# Create matrix
mat <- matrix(x, nrow = 2)

print(mat)
print(class(mat))

2.4 Factors
Factor object encodes a vector of unique elements (levels) from the given data
vector.

Example:

# Create vector
s <- c("spring", "autumn", "winter", "summer",
"spring", "autumn")

print(factor(s))
print(nlevels(factor(s)))

2.5 Arrays
array() function is used to create n-dimensional array. This function takes dim
attribute as an argument and creates required length of each dimension as
specified in the attribute.
Syntax:

array(data, dim = length(data), dimnames = NULL)


2.6 Data Frames
Data frames are 2-dimensional tabular data object in R programming. Data
frames consists of multiple columns and each column represents a vector.
Columns in data frame can have different modes of data unlike matrices.

Example:

# Create vectors
x <- 1:5
y <- LETTERS[1:5]
z <- c("Albert", "Bob", "Charlie", "Denver", "Elie")

# Create data frame of vectors


df <- data.frame(x, y, z)

# Print data frame


print(df)

3. Convert a Matrix to a Data Frame


A matrix can be converted to a dataframe by using a function called
as.data.frame(). It will take each column from the matrix and convert it to each
column in the dataframe.

Syntax:
as.data.frame(matrix_data)

Ex 1
matrix_data=matrix(c(1,2,3,4,5,6,7,8),nrow=4)
# display the data
print(matrix_data)
# convert the matrix into dataframe
dataframe_data=as.data.frame(matrix_data)
# print dataframe data
print(dataframe_data)

Ex 2
# create the matrix with 8 rows
# with different elements
matrix_data=matrix(c(
"bobby","pinkey","rohith","gnanesh",5.3,6.6,7,8,11:18),nrow=8)
# display the data
print(matrix_data)

Output - Matrix

# convert the matrix into dataframe


dataframe_data=as.data.frame(matrix_data)

# print dataframe data


print(dataframe_data)
output
4. Convert a Data Frame into a Matrix
data.matrix() function in R Language is used to create a matrix by converting
all the values of a Data Frame into numeric mode and then binding them as a
matrix.

Syntax: data.matrix(df)

# Creating a dataframe
df1 = data.frame(
"Name" = c("Amar", "Akbar", "Ronald"),
"Language" = c("R", "Python", "C#"),
"Age" = c(26, 38, 22)
)

# Printing data frame


print(df1)

# Converting into numeric matrix


df2 <- data.matrix(df1)
df2

Output
Name Language Age
[1,] 2 3 26

[2,] 1 2 38

[3,] 3 1 22

All the string values will be converted to categorical values.

5. Convert a Data Frame into a List


as.list() function in R Language is used to convert an object to a list.
These objects can be Vectors, Matrices, Factors, and data frames.

Syntax: as.list( object )

df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))

print("Sample Dataframe")
print (df)
Output

list=as.list(df) #each column in one list

print("After Conversion of Dataframe into list of Vectors")


print(list)
Data Frame with String Value
df <- data.frame(name = c("Test", "for", "Mark"),
roll_no = c(10, 20, 30),
age=c(20,21,22)
)

print("Sample Dataframe")
print (df)

print("Our list after being converted from a dataframe: ")

list=as.list(df)
list
5.1 Dataframe rows as a list of vectors
split() function in R Language is used to divide a data vector into
groups as defined by the factor provided.

Syntax: split(x, f)
Parameters:
x: represents data vector or data frame
f: represents factor to divide the data

df<-data.frame(c1=c(1:5),
c2=c(6:10),
c3=c(11:15),
c4=c(16:20))

print("Sample Dataframe")
print (df)
print("Result after conversion")

split(df, 1:nrow(df))

6. Convert a Matrix into a List


The as.list() is an inbuilt function that takes an R language object as an
argument and converts the object into a list. The same function is used to
convert matrix to a list. These objects can be Vectors, Matrices, Factors, and
data frames. By default, as.list() converts the matrix to a list of lists in column-
major order.

unlist(as.list(matrix))

By default it is column majour. Inorder to do it in row majour, the matrix is


transposed and converted to list.

Example: Column Major

mat = matrix(1:12,nrow=3, ncol=4)

print("Sample matrix:")
print(mat)

print("Matrix into a single list")


unlist(as.list(mat))

Example: Row Major

Rind the transpose of the matrix using t() function. And convert into list using
as.list().
mat = matrix(1:12,nrow=3, ncol=4)

print("Sample matrix:")
print(mat)

print("Result after conversion")


unlist(as.list ( t (mat)))

7. Convert List to matrix


x<-list(1:25,26:50,51:75,76:100,101:125,126:150,151:175,176:200)

x <- matrix(unlist(x), ncol = 10, byrow = TRUE)


x

You might also like