Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Manual

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 48

LIST OF EXPERIMENTS

Week 1:
a)Installing R and R Studio
b)Basic functionality of R, variable, data types in R
Week 2:
a) Implement R script to show the usage of various operators available in R language.
b) Implement R script to read person‘s age from keyboard and display whether he is eligible
for voting or not.
c) Implement R script to find biggest number between two numbers.
d) Implement R script to check the given year is leap year or not.
Week 3:
a) Implement R Script to create a list.
b) Implement R Script to access elements in the list.
c) Implement R Script to merge two or more lists.
Implement R Script to perform matrix operation
Week 4:
Implement R script to perform following operations:
a) various operations on vectors
b) Finding the sum and average of given numbers using arrays.
c) To display elements of list in reverse order.
d) Finding the minimum and maximum elements in the array.
Week 5:
a) Implement R Script to perform various operations on matrices
b) Implement R Script to extract the data from dataframes.
c) Write R script to display file contents.
d) Write R script to copy file contents from one file to another.
Week 6:
a) Write an R script to find basic descriptive statistics using summary, str, quartile function
on mtcars& cars datasets.
b) Write an R script to find subset of dataset by using subset (), aggregate () functions on iris
dataset.
Week 7:
a)Reading different types of data sets (.txt, .csv) from Web or disk and writing in file in
specific disk location.
b) Reading Excel data sheet in R.
c)Reading XML dataset in R
Week 8:
a) Implement R Script to create a Pie chart, Bar Chart, scatter plot and Histogram
(Introduction to ggplot2 graphics)
b) Implement R Script to perform mean, median, mode, range, summary, variance, standard
deviation operations.
Week 9:
a)Implement R Script to perform Normal, Binomial distributions.
b) Implement R Script to perform correlation, Linear and multiple regression.
Week 10:
Introduction to Non-Tabular Data Types: Time series, spatial data, Network data. Data
Transformations: Converting Numeric Variables into Factors, Date Operations, String
Parsing, Geocoding
Week 11:
Introduction Dirty data problems: Missing values, data manipulation, duplicates, forms of
data dates, outliers, spelling

Week 12:
Data sources: SQLite examples for relational databases, Loading SPSS and SAS files,
Reading from Google Spreadsheets, API and web scraping examples.

Week 1:
a) Installing R and R Studio
R is an open-source programming language that is widely used as a statistical software and
data analysis tool. R programming language is the latest cutting -edge tool. It wa designed by
Ross Ihaka and Robert Gentleman at the University of Auckland, Ne Zealand. R s
w
programming language is an implementation of the S programming language.
R and Python both play a major role in data science. It becomes confusing for any newbie to
choose the better or the most suitable one among the two, R and Python.
Why R Programming Language ?

 R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
 It’s a platform-independent language. This means it can be applied to all operating
system.
 It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
 R programming language is not only a statistic package but also allows us to integrate
with other languages (C, C++). Thus, you ca n easily interact with many data sources d
an statistical packages.
 R is currently one of the most requested programming languages in Data Science.
Installing R -Console & R -Studio
R programming language is a language and free software environment, available under GNU
e,
licens supported by R Foundation for Statistical Computing. The language is most widely
known for it powerful statistical and data interpretation capabilities. s

Installing R – Console :
 Open an internet browser and go to www.r-project.org.
 Click the "download R" link in the middle of the page under "Getting Started."
 Select a CRAN location (a mirror site) and click the corresponding link.
 Click on the "Download R for Windows" link at the top of the page.
 Click on the "install R for the first time" link at the top of the page.
 Click "Download R for Windows" and save the executable file somewhere on your
com puter. Run the
.exe file and follow the installation instructions.
 Now that R is installed, you need to download and install RStudio.
Installing R – Studio :
Step 1: First, you need to set up R environment in your local machine. You can download the
same from r-project.org.
Step 2: After downloading R for Windows platform, install it by double -clicking it.

Step 3: Download R Studio from R Studio (Desktop) downloads page.


Note: It is free of cost (under AGPL licensing).
Step 4: After downloading, you will get a file named “RStudio-1.x.xxxx.exe” in your
Downloads folder.
Step 5: Double-click the installer, and install the software.
Step 6: Open R-Studio from windows : Following are the panels appearing in the
screen :
 The interactive R console (entire left)
 Environment/History (tabbed in upper right)
 Files/Plots/Packages/Help/Viewer (tabbed in lower right)

Step 7: Test the R Studio installation

 Search for RStudio in Window search bar on Taskbar.


 Start the application.
Insert the following code in console.
Input : print('Hello world!') Output : [1] "Hello world!"
Step 8: Your installation is successful.

b) Basic functionality of R, variable, data types in R


Generally, while doing programming in any programming language, you need to use
various variables to store various information. Variables are nothing but reserved
memory locations to store values. This means that, when you create a variable you
reserve some space in memory.
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based on the
data type of a variable, the operating system allocates memory and decides what
can be stored in the reserved memory.
In contrast to other programming languages like C and java in R, the variables are
not declared as some data type. The variables are assigned with R-Objects and
the data type of the R-object becomes the data type of the variable. There are many
types of R-objects. The frequently used ones are −
 Vectors
 Lists
 Matrices
 Arrays
 Factors
 Data Frames
The simplest of these objects is the vector object and there are six data types of
these atomic vectors, also termed as six classes of vectors. The other R-Objects
are built upon the atomic vectors.

Data Example Verify


Type

Logical TRUE, FALSE v <- TRUE


print(class(v))

it produces the following result −


[1] "logical"

Numeric 12.3, 5, 999 v <-23.5


print(class(v))

it produces the following result −


[1] "numeric"

Integer 2L, 34L, 0L

Complex 3 + 2i v <-2+5i
print(class(v))

it produces the following result −


[1] "complex"

Characte 'a' , '"good", "TRUE", '23.4' v <-"TRUE"


r print(class(v))

it produces the following result −


[1] "character"

Raw "Hello" is stored as 48 65 6c v <- charToRaw("Hello")


6c 6f print(class(v))

it produces the following result −


[1] "raw"
In R programming, the very basic data types are the R-objects called vectors which
hold elements of different classes as shown above. Please note in R the number of
classes is not confined to only the above six types. For example, we can use many
atomic vectors and create an array whose class will become array.

Vectors
When you want to create vector with more than one element, you should
use c() function which means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)

# Get the class of the vector.


print(class(apple))

When we execute the above code, it produces the following result −


[1] "red" "green" "yellow"
[1] "character"

Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3)

# Print the list.


print(list1)

When we execute the above code, it produces the following result −


[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

sin in R. The sin() is a built-in mathematical R function that computes the sine value of
the input numeric value. The sin() method accepts a numeric value as an argument and
returns the sine value. To calculate the sine of a value in R programming, use the sin()
function.

Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector
input to the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow =2, ncol =3, byrow =
TRUE)
print(M)

When we execute the above code, it produces the following result −


[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"

Arrays
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension. In the below example we create an array with two elements
which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

When we execute the above code, it produces the following result −


, , 1

[,1] [,2] [,3]


[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

, , 2

[,1] [,2] [,3]


[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"

Factors
Factors are the r-objects which are created using a vector. It stores the vector along
with the distinct values of the elements in the vector as labels. The labels are
always character irrespective of whether it is numeric or character or Boolean etc. in
the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the
count of levels.
# Create a vector.
apple_colors <-
c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))

When we execute the above code, it produces the following result −


[1] green green yellow red red red green
Levels: green red yellow
[1] 3

Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column
can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of
vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male","Male","Female"),
height = c(152,171.5,165),
weight = c(81,93,78),
Age= c(42,38,26)
)

Week 2:
a) Implement R script to show the usage of various operators available in R language.
An operator is a symbol that tells the compiler to perform specific mathematical or
logical manipulations. R language is rich in built-in operators and provides following
types of operators.

Types of Operators
We have the following types of operators in R programming −

 Arithmetic Operators
 Relational Operators
 Logical Operators
 Assignment Operators
 Miscellaneous Operators

Arithmetic Operators
Following table shows the arithmetic operators supported by R language. The
operators act on each element of the vector.
Operato Description Example
r

+ Adds two vectors v <- c(2,5.5,6)


t <- c(8,3,4)
print(v+t)

it produces the following result −


[1] 10.0 8.5 10.0

− Subtracts second vector from the v <- c(2,5.5,6)


first t <- c(8,3,4)
print(v-t)

it produces the following result −


[1] -6.0 2.5 2.0

* Multiplies both vectors v <- c(2,5.5,6)


t <- c(8,3,4)
print(v*t)

it produces the following result −


[1] 16.0 16.5 24.0

/ Divide the first vector with the v <- c(2,5.5,6)


second t <- c(8,3,4)
print(v/t)

When we execute the above code, it


produces the following result −
[1] 0.250000 1.833333 1.500000

%% Give the remainder of the first v <- c(2,5.5,6)


vector with the second t <- c(8,3,4)
print(v%%t)

it produces the following result −


[1] 2.0 2.5 2.0

%/% The result of division of first vector v <- c(2,5.5,6)


with second (quotient) t <- c(8,3,4)
print(v%/%t)

it produces the following result −


[1] 0 1 1

^ The first vector raised to the v <- c(2,5.5,6)


exponent of second vector t <- c(8,3,4)
print(v^t)

it produces the following result −


[1] 256.000 166.375 1296.000
print(BMI)

When we execute the above code, it produces the following result −


gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26

Relational Operators
Following table shows the relational operators supported by R language. Each
element of the first vector is compared with the corresponding element of the
second vector. The result of comparison is a Boolean value.

Operato Description Example


r

> v <- c(2,5.5,6,9)


Checks if each element of the first vector is t <- c(8,2.5,14,9)
greater than the corresponding element of print(v>t)
the second vector.
it produces the following result −
[1] FALSE TRUE FALSE FALSE

< v <- c(2,5.5,6,9)


t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v < t)
less than the corresponding element of the
second vector. it produces the following result −
[1] TRUE FALSE TRUE FALSE

== v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v == t)
equal to the corresponding element of the
second vector. it produces the following result −
[1] FALSE FALSE FALSE TRUE

<= v <- c(2,5.5,6,9)


t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v<=t)
less than or equal to the corresponding
element of the second vector. it produces the following result −
[1] TRUE FALSE TRUE TRUE

>= Checks if each element of the first vector is v <- c(2,5.5,6,9)


t <- c(8,2.5,14,9)
print(v>=t)
greater than or equal to the corresponding
element of the second vector. it produces the following result −
[1] FALSE TRUE FALSE TRUE

!= v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v!=t)
unequal to the corresponding element of
the second vector. it produces the following result −
[1] TRUE TRUE TRUE FALSE

Logical Operators
Following table shows the logical operators supported by R language. It is
applicable only to vectors of type logical, numeric or complex. All numbers greater
than 1 are considered as logical value TRUE.
Each element of the first vector is compared with the corresponding element of the
second vector. The result of comparison is a Boolean value.

Operato Description Example


r

& It is called Element-wise Logical AND v <- c(3,1,TRUE,2+3i)


operator. It combines each element of the first t <- c(4,1,FALSE,2+3i)
vector with the corresponding element of the print(v&t)
second vector and gives a output TRUE if
both the elements are TRUE. it produces the following result −
[1] TRUE TRUE FALSE TRUE

| v <- c(3,0,TRUE,2+2i)
It is called Element-wise Logical OR operator.
t <- c(4,0,FALSE,2+3i)
It combines each element of the first vector print(v|t)
with the corresponding element of the second
vector and gives a output TRUE if one the it produces the following result −
elements is TRUE.
[1] TRUE FALSE TRUE TRUE

! v <- c(3,0,TRUE,2+2i)
It is called Logical NOT operator. Takes each print(!v)
element of the vector and gives the opposite
logical value. it produces the following result −
[1] FALSE TRUE FALSE FALSE

The logical operator && and || considers only the first element of the vectors and
give a vector of single element as output.
Operato Description Example
r

&& v <- c(3,0,TRUE,2+2i)


Called Logical AND operator. Takes first t <- c(1,3,TRUE,2+3i)
element of both the vectors and gives the print(v&&t)
TRUE only if both are TRUE.
it produces the following result −
[1] TRUE

|| v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
Called Logical OR operator. Takes first print(v||t)
element of both the vectors and gives the
TRUE if one of them is TRUE. it produces the following result −
[1] FALSE

Miscellaneous Operators:

Operator Description Example

: Colon v <-2:8
operator. It print(v)
creates the
series of it produces the following result −
numbers in [1] 2 3 4 5 6 7 8
sequence
for a vector.

%in% v1 <-8
This v2 <-12
operator is t <-1:10
used to print(v1 %in% t)
identify if an print(v2 %in% t)
element
belongs to a it produces the following result −
vector. [1] TRUE
[1] FALSE

%*% This M = matrix( c(2,6,5,1,10,4), nrow =2,ncol


operator is =3,byrow = TRUE)
used to t = M %*% t(M)
multiply a print(t)
matrix with
it produces the following result −
[,1] [,2]
its [1,] 65 82
transpose. [2,] 82 117

b) Implement R script to read person‘s age from keyboard and display whether he is
eligible for voting or not.
In this program, You will learn how to check the age of a user is eligible for
voting or not in R.

if( age >= 18) {


//statement
}

else{
//statement
}

Textile

Example: How to Check Age of a user is eligible for


voting or not in R

{
age <- as.integer(readline(prompt ="Enter your age :"))

if(age >=18){
print(paste("You are valid for voting :", age))
}else{
print(paste("You are not valid for voting :", age))
}

Output:

Enter your age :48


[1] "You are valid for voting : 48"

c) Implement R script to find biggest number between two numbers.


In this program, You will learn how to find the greatest number among the
three numbers in R.

Find Greatest Among Three Numbers

Numbers Is :10 20 40 => 40

Numbers Is :30 20 10 => 30

Textile

Example: How to find the greatest number among


three numbers in R

{
x <- as.integer(readline(prompt ="Enter first number :"))
y <- as.integer(readline(prompt ="Enter second number :"))
z <- as.integer(readline(prompt ="Enter third number :"))

if(x > y && x > z){


print(paste("Greatest is :", x))
}elseif(y > z){
print(paste("Greatest is :", y))
}else{
print(paste("Greatest is :", z))
}

Output:

Enter first number :2


Enter second number :22
Enter third number :4
[1] "Greatest is : 22"

d) Implement R script to check the given year is leap year ornot.

Leap year check can be implemented very simply using if condition in R programming.

First, we will ask the user to enter a year for leap year checking. R

provides readline() function for taking the user's input by prompting an appropriate message
to the user for data using ' prompt '. Here the user is asked to enter a year, data will be stored

to variable year. Then, check the given year can be divided by 4, If the remainder is zero it is

a leap year otherwise not a leap year. Also, check the given year is a century (eg., 2000)

dividing the year by 100 without any remainder; then divide the year by 400 and check

whether the remainder is 0, if that condition is also satisfied then it is a leap year, and if not

means it's a normal year.

Let's understand with some examples,

Consider the year 2004, it is completely divided by 4 and thus it is a leap year. If we take

2005 it can't be fully divided by 4 and thus it's not a leap year.

Now check examples of the century years, we should satisfy an extra condition for century

ie., for a century being leap we should also divide it by 400 and check any remainder left.

Consider the year 2000, it can be divided by 4 and let's confirm it is a century as dividing

2000 by 100 and then to check for leap divide by 400. Here 2000 is a century which is a leap,

but if we take 1900 it will satisfy the first 2 conditions but can't divide by 400 and thus it's not

a leap year.

ALGORITHM

STEP 1: Read a year prompting appropriate messages to the user using readline() into

variable year

STEP 2: First look for a century, use nested if condition to check year is exactly divisible

by 4,100,400 and gives a remainder of 0;

 If yes print The year is a leap year


 else print The year is not a leap year

STEP 3: If a year is divisible by 4 but not by 100 means year is not a

century then print The year is a leap year

STEP 4: If a year is not divisible by 4 then print The year is not a leap year

R Source Code
year = as.integer(readline(prompt="Enter a year: "))
if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {
print(paste(year,"is not a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}

OUTPUT

Enter a year: 2011

[1] "2011 is not a leap year"


Enter a year: 2004

[1] "2004 is a leap year"

Week 3:
a) Implement R Script to create a list.

How to create a list in R programming?


List can be created using the list() function.

> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)

Here, we create a list x, of three components with data


types double, logical and integer vector respectively.
Its structure can be examined with the str() function.

> str(x)

List of 3

$ a: num 2.5

$ b: logi TRUE

$ c: int [1:3] 1 2 3

In this example, a, b and c are called tags which makes it easier to reference the
components of the list.
However, tags are optional. We can create the same list without the tags as follows. In
such scenario, numeric indices are used by default.

> x <- list(2.5,TRUE,1:3)

> x

[[1]]
[1] 2.5

[[2]]

[1] TRUE

[[3]]

[1] 1 2 3

b) Implement R Script to access elements in the list.

Access Elements of List


To access elements of an R List, we may use index, or names of elements.

In this tutorial, we will learn how to access elements of a list in R, in different


ways, with the help of example programs.

Examples
Access Elements using Index

In the following program, we will create a list with three elements, and read its
elements using index.

Example.R
x <- list(TRUE, 25, "Apple")
print(x[1])
print(x[2])
print(x[3])

Output
[[1]]
[1] TRUE

[[1]]
[1] 25
[[1]]
[1] "Apple"

We can also assign new values for elements in the list.

Example.R
x <- list(TRUE, 25, "Apple")
x[2] = 38
print(x)

Output
[[1]]
[1] TRUE

[[2]]
[1] 38

[[3]]
[1] "Apple"
c) Implement R Script to merge two or more lists.

Here, you can see that the second list has 2 elements, which shows that
there are two lists combined as one.
Example 2:
 R

# R Program to combine two lists

# Creating Lists using the list() function


List1 <- list(1, 2, 3)
List2 <- list('a', 'b', 'c')

# Combining lists using c() function


List3 = c(List1, List2)
print(List3)

Output:
Method 2: Using append() function
append() function in R language accepts two or more lists as parameters and
returns another list with the elements of both the lists.
Syntax:
append(list1, list2)
Example 1:
 R

# R Program to combine two lists

# Creating Lists using the list() function


List1 <- list(1:5)
List2 <- list(6:10)

print(List1)
print(List2)

# Combining lists using append() function


List3 = append(List1, List2)
print(List3)

Output:

Example 2:
 R

# R Program to combine two lists

# Creating Lists using the list() function


List1 <- list(1, 2, 3)
List2 <- list('a', 'b', 'c')

# Combining lists using append() function


List3 = append(List1, List2)
print(List3)

Output:

Week 4:
Implement R script to perform following operations:
a) various operations on vectors

Operations on Vectors in R
Vectors are the most basic data types in R. Even a single object created is also stored
in the form of a vector. Vectors are nothing but arrays as defined in other languages.
Vectors contain a sequence of homogeneous types of data. If mixed values are given
then it auto converts the data according to the precedence. There are various
operations that can be performed on vectors in R.

Creating a vector
Vectors can be created in many ways as shown in the following example. The most
usual is the use of ‘c’ function to combine different elements together.

# Use of 'c' function


# to combine the values as a vector.

# by default the type will be double

X <-c(1, 4, 5, 2, 6, 7)

print('using c function')

print(X)

# using the seq() function to generate

# a sequence of continuous values

# with different step-size and length.

# length.out defines the length of vector.

Y <-seq(1, 10, length.out =5)

print('using seq() function')

print(Y)

# using ':' operator to create

# a vector of continuous values.

Z <-5:10

print('using colon')

print(Y)

Output:

using c function 1 4 5 2 6 7
using seq function 1.00 3.25 5.50 7.75 10.00
using colon 5 6 7 8 9 10

Accessing vector elements:

Vector elements can be accessed in many ways. The most basic is using the ‘[]’,
subscript operator. Following are the ways of accessing Vector elements:

Note: vectors in R are 1 based indexed, unlike the normal C, python, etc format where
indexing starts from 0.
# Accessing elements using the position number.

X <-c(2, 5, 8, 1, 2)

print('using Subscript operator')

print(X[2])

# Accessing specific values by passing

# a vector inside another vector.

Y <-c(4, 5, 2, 1, 7)

print('using c function')

print(Y[c(4, 1)])

# Logical indexing

Z <-c(5, 2, 1, 4, 4, 3)

print('Logical indexing')

print(Z[Z>3])

Output:

using Subscript operator 5


using c function 1 4
Logical indexing 5 4 4

Modifying a vector
Vectors can be modified using different indexing variations which are mentioned in
the below code:

# Creating a vector

X <-c(2, 5, 1, 7, 8, 2)

# modify a specific element

X[3] <-11

print('Using subscript operator')


print(X)

# Modify using different logics.

X[X>9] <-0

print('Logical indexing')

print(X)

# Modify by specifying the position or elements.

X <-X[c(5, 2, 1)]

print('using c function')

print(X)

Output:

Using subscript operator 2 5 11 7 8 2


Logical indexing 2 5 0 7 8 2
using c function 8 5 2

Deleting a vector:

Vectors can be deleted by reassigning them as NULL. To delete a vector we use the
NULL operator.

# Creating a vector

X <-c(5, 2, 1, 6)

# Deleting a vector

X <-NULL

print('Deleted vector')

print(X)

Deleted vector NULL


Arithmetic operations

We can perform arithmetic operations between 2 vectors. These operations are


performed element-wise and hence the length of both the vectors should be the same.

# Creating Vectors

X <-c(5, 2, 5, 1, 51, 2)

Y <-c(7, 9, 1, 5, 2, 1)

# Addition

Z <-X +Y

print('Addition')

print(Z)

# Subtraction

S <-X -Y

print('Subtraction')

print(S)

# Multiplication

M <-X *Y

print('Multiplication')

print(M)

# Division

D <-X /Y

print('Division')

print(D)

Output:

Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000
2.0000000

Sorting of Vectors

For sorting we use the sort() function which sorts the vector in ascending order by
default.

# Creating a Vector

X <-c(5, 2, 5, 1, 51, 2)

# Sort in ascending order

A <-sort(X)

print('sorting done in ascending order')

print(A)

# sort in descending order.

B <-sort(X, decreasing =TRUE)

print('sorting done in descending order')

print(B)

Output:

sorting done in ascending order 1 2 2 5 5 51


sorting done in descending order 51 5 5 2 2 1

b) Finding the sum and average of given numbers using arrays.


Given below are examples to help you understand better.
Example 1:
vec = c(1, 2, 3 , 4)
print("Sum of the vector:")

# inbuilt sum method


print(sum(vec))

# using inbuilt mean method


print("Mean of the vector:")
print(mean(vec))

# using inbuilt product method


print("Product of the vector:")
print(prod(vec))

Output
[1] “Sum of the vector:”
[1] 10
[1] “Mean of the vector:”
[1] 2.5
[1] “Product of the vector:”
[1] 24

c)To display elements of list in reverse order.

R – Reverse a List:
To reverse a list in R programming, call rev() function and pass given list as
argument to it. rev() function returns returns a new list with the contents of
given list in reversed order.

The syntax to reverse a list x is

rev(x)

Return Value

The rev() function returns a list.

Examples
In the following program, we take a list in x, and reverse this list using rev().

example.R
x <- list("a", "b", "c")
result = rev(x)
print(result)

Now, let us take a list x with numeric values and reverse it.
example.R
x <- list(5, 25, 125)
result = rev(x)
print(result)

d) Finding the minimum and maximum elements in the array.


nums = c(10,20,30,40,50,60)

print('Original vector:')

print(nums)

print(paste("Maximum value of the said vector:",max(nums)))

print(paste("Minimum value of the said vector:",min(nums)))

Copy
Sample Output:
[1] "Original vector:"
[1] 10 20 30 40 50 60
[1] "Maximum value of the said vector: 60"
[1] "Minimum value of the said vector: 10"

Week 5:
a) Implement R Script to perform various operations on matrices

Operations on Matrices in R
 Last Updated : 21 Apr, 2020
Matrices in R are a bunch of values, either real or complex numbers, arranged in a
group of fixed number of rows and columns. Matrices are used to depict the data in a
structured and well-organized format.
It is necessary to enclose the elements of a matrix in parentheses or brackets.
A matrix with 9 elements is shown below.
This Matrix [M] has 3 rows and 3 columns. Each element of matrix [M] can be
referred to by its row and column number. For example, a23 = 6
Order of a Matrix :
The order of a matrix is defined in terms of its number of rows and columns.
Order of a matrix = No. of rows × No. of columns
Therefore Matrix [M] is a matrix of order 3 × 3.

Operations on Matrices

There are four basic operations i.e. DMAS (Division, Multiplication, Addition,
Subtraction) that can be done with matrices. Both the matrices involved in the
operation should have the same number of rows and columns.
Matrices Addition
The addition of two same ordered matrices and yields a
matrix where every element is the sum of corresponding elements of the input
matrices.

# R program to add two matrices

# Creating 1st Matrix

B =matrix(c(1, 2, 3, 4, 5, 6), nrow =2, ncol =3)

# Creating 2nd Matrix

C =matrix(c(7, 8, 9, 10, 11, 12), nrow =2, ncol =3)

# Getting number of rows and columns

num_of_rows =nrow(B)

num_of_cols =ncol(B)

# Creating matrix to store results

sum=matrix(, nrow =num_of_rows, ncol =num_of_cols)

# Printing Original matrices

print(B)
print(C)

Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 8 12 16
[2,] 10 14 18
In the above code, nrow(B) gives the number of rows in B and ncol(B) gives the
number of columns. Here, sum is an empty matrix of the same size as B and C. The
elements of sum are the addition of the corresponding elements of B and C through
nested for loops.
Using ‘+’ operator for matrix addition:
Similarly, the following R script uses the in-built operator +:
# R program for matrix addition

# using '+' operator

# Creating 1st Matrix

B =matrix(c(1, 2+3i, 5.4, 3, 4, 5), nrow =2, ncol =3)

# Creating 2nd Matrix

C =matrix(c(2, 0i, 0.1, 3, 4, 5), nrow =2, ncol =3)

# Printing the resultant matrix

print(B +C)

Output:
[,1] [,2] [,3]
[1,] 3+0i 5.5+0i 8+0i
[2,] 2+3i 6.0+0i 10+0i
R provides the basic inbuilt operator to add the matrices. In the above code, all the
elements in the resultant matrix are returned as complex numbers, even if a single
element of a matrix is a complex number.
Properties of Matrix Addition:
 Commutative: B + C = C + B
 Associative: For n number of matrices A + (B + C) = (A + B) + C
 Order of the matrices involved must be same.
Matrices Subtraction
The subtraction of two same ordered matrices and yields a
matrix where every element is the difference of corresponding elements of the
second input matrix from the first.

# R program to add two matrices

# Creating 1st Matrix

B =matrix(c(1, 2, 3, 4, 5, 6), nrow =2, ncol =3)

# Creating 2nd Matrix

C =matrix(c(7, 8, 9, 10, 11, 12), nrow =2, ncol =3)

# Getting number of rows and columns

num_of_rows =nrow(B)

num_of_cols =ncol(B)

# Creating matrix to store results

diff =matrix(, nrow =num_of_rows, ncol =num_of_cols)

# Printing Original matrices

print(B)

print(C)

# Calculating diff of matrices

for(row in1:num_of_rows)
{

for(col in1:num_of_cols)

diff[row, col] <-B[row, col] -C[row, col]

# Printing resultant matrix

print(diff)

Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] -6 -6 -6
[2,] -6 -6 -6

Here in the above code, the elements of diff matrix are the subtraction of the
corresponding elements of B and C through nested for loops.
Using ‘-‘ operator for matrix subtraction:
Similarly, the following R script uses the in-built operator ‘-‘:
# R program for matrix addition

# using '-' operator

# Creating 1st Matrix

B =matrix(c(1, 2+3i, 5.4, 3, 4, 5), nrow =2, ncol =3)


# Creating 2nd Matrix

C =matrix(c(2, 0i, 0.1, 3, 4, 5), nrow =2, ncol =3)

# Printing the resultant matrix

print(B -C)

Output:
[,1] [,2] [,3]
[1,] -1+0i 5.3+0i 0+0i
[2,] 2+3i 0.0+0i 0+0i
Properties of Matrix Subtraction:
 Non-Commutative: B – C != C – B
 Non-Associative: For n number of matrices A – (B – C) != (A – B) – C
 Order of the matrices involved must be same.
Matrices Multiplication
The multiplication of two same ordered matrices and yields a
matrix where every element is the product of corresponding elements of the
input matrices.

# R program to multiply two matrices

# Creating 1st Matrix

B =matrix(c(1, 2, 3, 4, 5, 6), nrow =2, ncol =3)

# Creating 2nd Matrix

C =matrix(c(7, 8, 9, 10, 11, 12), nrow =2, ncol =3)

# Getting number of rows and columns

num_of_rows =nrow(B)

num_of_cols =ncol(B)

# Creating matrix to store results

prod =matrix(, nrow =num_of_rows, ncol =num_of_cols)


# Printing Original matrices

print(B)

print(C)

# Calculating product of matrices

for(row in1:num_of_rows)

for(col in1:num_of_cols)

prod[row, col] <-B[row, col] *C[row, col]

# Printing resultant matrix

print(prod)

Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 7 27 55
[2,] 16 40 72
The elements of sum are the multiplication of the corresponding elements of B and C
through nested for loops.
Using ‘*’ operator for matrix multiplication:
Similarly, the following R script uses the in-built operator *:
# R program for matrix multiplication
# using '*' operator

# Creating 1st Matrix

B =matrix(c(1, 2+3i, 5.4), nrow =1, ncol =3)

# Creating 2nd Matrix

C =matrix(c(2, 1i, 0.1), nrow =1, ncol =3)

# Printing the resultant matrix

print(B *C)

Output:
[,1] [,2] [,3]
[1,] 2+0i -3+2i 0.54+0i
Properties of Matrix Multiplication:
 Commutative: B * C = C * B
 Associative: For n number of matrices A * (B * C) = (A * B) * C
 Order of the matrices involved must be same.
Matrices Division
The division of two same ordered matrices and yields a
matrix where every element is the quotient of corresponding elements of the
the first matrix element divided by the second.

# R program to divide two matrices

# Creating 1st Matrix

B =matrix(c(1, 2, 3, 4, 5, 6), nrow =2, ncol =3)

# Creating 2nd Matrix

C =matrix(c(7, 8, 9, 10, 11, 12), nrow =2, ncol =3)

# Getting number of rows and columns


num_of_rows =nrow(B)

num_of_cols =ncol(B)

# Creating matrix to store results

div =matrix(, nrow =num_of_rows, ncol =num_of_cols)

# Printing Original matrices

print(B)

print(C)

# Calculating product of matrices

for(row in1:num_of_rows)

for(col in1:num_of_cols)

div[row, col] <-B[row, col] /C[row, col]

# Printing resultant matrix

print(div)

Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 0.1428571 0.3333333 0.4545455
[2,] 0.2500000 0.4000000 0.5000000
The elements of div matrix are the division of the corresponding elements of B and C
through nested for loops.
Using ‘/’ operator for matrix division:
Similarly, the following R script uses the in-built operator /:
# R program for matrix division

# using '/' operator

# Creating 1st Matrix

B =matrix(c(4, 6i, -1), nrow =1, ncol =3)

# Creating 2nd Matrix

C =matrix(c(2, 2i, 0), nrow =1, ncol =3)

# Printing the resultant matrix

print(B /C)

Output:
[,1] [,2] [,3]
[1,] 2+0i 3+0i -Inf+NaNi
Properties of Matrix Division:
 Non-Commutative: B / C != C / B
 Non-Associative: For n number of matrices A / (B / C) != (A / B) / C
 Order of the matrices involved must be same.
Note: Time Complexity of all the matrix operations = O(r*c) where r*c is the order of
the matrix.

What is Descriptive Statistics?


Descriptive statistics is the branch of statistics that focuses
on describing and gaining more insight into the data in its present state. It
deals with what the data in its current state means. It makes the data easier
to understand and also gives us knowledge about the data which is
necessary to perform further analysis. Average measures like mean,
median, mode, etc. are a good example of descriptive statistics.

Descriptive Statistics in R

R programming language provides us with lots of simple yet effective


functions to perform descriptive statistics and gain more knowledge about
our data. Summarizing the data, calculating average measures, finding out
cumulative measures, summarizing rows/columns of data structures, etc.
everything is possible with trivial commands. Let’s start simple with the
summarizing functions str() and summary().
Summarizing your Data
R provides two very simple functions that can instantly summarize our data
for us. These are the str() and the summary() functions.
Let us begin with the str function. The str() function takes a single object as
an argument and compactly shows us the structure of the input object. It
shows us details like length, data type, names and other specifics about the
components of the object. Here is an example of the str function.
Code:
str(mtcars)

Output:
The summary() function also takes a single object as an argument. It then
returns the averages measures like mean, median, minimum, maximum, 1st
quantile, 3rd quantile, etc. for each component or variable in the object.
Here is an example of the summary function in action.
Code:
summary(mtcars)

Output:
Getting the Average Measures
R provides a number of functions that give us different average measures
for given data. These average measures include:

Mean: The mean of a given set of numeric or logical values(it may be a


vector or a row or column of any other data structure) can be easily found
using the mean() function.
Median: Finding the median of a set of numeric or logical values is also very
easy by using the median() function.
Standard deviation: The standard deviation of a set of numerical values can
be found using the sd() function.
Variance: the var() function gives us the variance of a set of numeric or
logical values.
Median Absolute Variance: The median absolute variance of a set of
numeric or logical values can be found by using the mad() function.
Maximum: In a given set of numeric or logical values, we can use
the max() function to find the maximum or the largest value in the set.
Note: NA is considered to be the largest by the max() function unless its
na.rm argument is set to TRUE.
Minimum: The min() function is a very handy way to find out the smallest
value in a set of numeric values.
Note: Like the max() function, the min() function considers NA to be the
smallest unless na.rm is set to TRUE.
Sum: The sum of a set of numerical values can be found by simply using
the sum() function.
Length: The length or the number of values in a set is given by
the length() function.
Code:
mean(mtcars$mpg)

median(mtcars$mpg)

sd(mtcars$mpg)

var(mtcars$mpg)

mad(mtcars$mpg)

max(mtcars$mpg, na.rm = TRUE)

min(mtcars$mpg, na.rm = TRUE)

sum(mtcars$mpg)

length(mtcars$mpg)

Output:
Cumulative measures in R
Cumulative measures are statistical measures that are
calculated sequentially. These measures evolve with the data. They provide
insight into the progression and growth of the data. R provides a few
functions that calculate cumulative measures with ease. These functions are
Cumulative sum: The cumsum() function calculates the cumulative sum of a
given vector.
Cumulative max: To find the cumulative maximum value of an input vector,
you can use the cummax() function.
Cumulative min: You can find the cumulative minimum values in a vector
by using the cummin() function.
Cumulative product: Using the comprod() function, you can find the
cumulative product of a vector.
Code:
a <- c(1:9,4,2,4,5:2)

cumsum(a)

cummax(a)

cummin(a)

cumprod(a)

Output:
Row and Column Summary Functions
in R
There are certain functions in R that give summary statistics for
only selected rows or columns of data frames or matrices or any other two or
more dimensional data structure.
These functions are:

rowMeans: The rowMeans() function, as the name suggests, returns the mean
of a selected row of a data structure.
rowSums: The rowSums() function finds the sum of a selected row of a data
structure.
colMeans: The colMeans() function returns the mean of a selected column of
a data structure.
colSums: The colSums() function calculate the sum of a selected column of a
data structure.
Code:
rowMeans(mtcars[2,])

rowSums(mtcars[2,])

colMeans(mtcars)

colSums(mtcars)

Output:
Subsetting Datasets in R

Tom Jeon • October 8, 2018

Subsetting datasets is a crucial skill for any data


professional. Learn and practice subsetting data in this
quick interactive tutorial!
Whether you're comparing how different demographics respond to
marketing campaigns, zooming in on a specific time frame, or pulling
information about a select few products from the inventory, subsetting
datasets enables you to extract useful observations in your dataset. R is a
great tool that makes subsetting data easy and intuitive. By the end of this
tutorial, you'll have the know-how to extract the information you want from
your dataset.
Subsetting your data does not change the content of your data, but simply
selects the portion most relevant to the goal you have in mind. In general,
there are three ways to subset the rows and columns of your dataset—by
index, by name, and by value.
Subsetting rows and columns by index

One way to subset your rows and columns is by your dataset's indices.
This is the same as describing your rows and columns as "the first row", "all
rows in second and fifth columns", or "the first row in second to fifth
columns". Let's specify such phrases using a dataset called iris in R. From
its documentation, "[t]his famous (Fisher's or Anderson's) iris dataset
gives the measurements in centimeters of the variables sepal length and
width and petal length and width, respectively, for 50 flowers from each of 3
species of iris. The species are Iris setosa, versicolor, and virginica."
 script.R

# "The first row":

iris[1, ]

# "All rows in second and fifth

columns":

iris[, c(2, 5)]

# "The first row in second to fifth

columns":

iris[1, 2:5]

 R Console

>

Run

To subset your data, square brackets are used after your dataset object.
The rows of your dataset are specified as the first element inside the
square brackets, and the columns of your dataset are specified as the
second, separated by a comma:
data[rows, columns]
Subsetting rows and columns by name

In R, the rows and columns of your dataset have name attributes. Row
names are rarely used and by default provide indices—integers numbering
from 1 to the number of rows of your dataset—just like what you saw in the
previous section. In fact, if you called rownames() on the iris dataset, you
will see that these are just indexed from 1 to 150:

>rownames(iris)

[1]"1""2""3""4""5""6""7""8""9""10""11""12""13""14"

[15]"15""16""17""18""19""20""21""22""23""24""25""26""27""28"

[29]"29""30""31""32""33""34""35""36""37""38""39""40""41""42"

[43]"43""44""45""46""47""48""49""50""51""52""53""54""55""56"

[57]"57""58""59""60""61""62""63""64""65""66""67""68""69""70"

[71]"71""72""73""74""75""76""77""78""79""80""81""82""83""84"

[85]"85""86""87""88""89""90""91""92""93""94""95""96""97""98"

[99]"99""100""101""102""103""104""105""106""107""108""109""1
10""111""112"

[113]"113""114""115""116""117""118""119""120""121""122""123"
"124""125""126"

[127]"127""128""129""130""131""132""133""134""135""136""137"
"138""139""140"

[141]"141""142""143""144""145""146""147""148""149""150"

>nrow(iris)

[1]150

Row names are more common in smaller datasets and are used to make
observations in your dataset easily identifiable. For example, for a small
dataset containing health information of a doctor's patients, the row names
of this dataset could be the full names of the patients.
Column names on the other hand, are ubiquitous to almost any dataset.
You can access these with the colnames() function or the names() function:

colnames(iris)

[1]"Sepal.Length""Sepal.Width""Petal.Length""Petal.Width""Sp
ecies"

names(iris)

[1]"Sepal.Length""Sepal.Width""Petal.Length""Petal.Width""Sp
ecies"

To subset your dataset by the names of your rows and columns, simply use
the square brackets again, prefixed by your dataset object:
 script.R

# Sepal width of the fifth observation

iris["5", "Sepal.Width"]

# Sepal width and petal width

iris[, c("Sepal.Width", "Petal.Width"

)]

 R Console

>

Run
It's important to note that both the row and column names are characters,
so using single or double quotes is absolutely necessary!
Subsetting rows and columns by value

Subsetting your rows and columns by value often allows the most flexibility.
For example, you can extract the data on Iris setosa using a conditional
statement like this:

> iris[iris$Species =="setosa",]

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

15.13.51.40.2 setosa

24.93.01.40.2 setosa

34.73.21.30.2 setosa

44.63.11.50.2 setosa

...

475.13.81.60.2 setosa

484.63.21.40.2 setosa

495.33.71.50.2 setosa

505.03.31.40.2 setosa

Conditional statements like iris$Species == "setosa" belong in the row


element in the square brackets (i.e., the first element before the comma). In
addition to the conditional statement in the first element, you can specify
columns by index or name in the second element.

You might also like