R Manual
R Manual
R Manual
Week 1:
a)Installing R and R Studio
b)Basic functionality of R, variable, data types in R
Week 2:
a) Implement R script to show the usage of various operators available in R language.
b) Implement R script to read person‘s age from keyboard and display whether he is eligible
for voting or not.
c) Implement R script to find biggest number between two numbers.
d) Implement R script to check the given year is leap year or not.
Week 3:
a) Implement R Script to create a list.
b) Implement R Script to access elements in the list.
c) Implement R Script to merge two or more lists.
Implement R Script to perform matrix operation
Week 4:
Implement R script to perform following operations:
a) various operations on vectors
b) Finding the sum and average of given numbers using arrays.
c) To display elements of list in reverse order.
d) Finding the minimum and maximum elements in the array.
Week 5:
a) Implement R Script to perform various operations on matrices
b) Implement R Script to extract the data from dataframes.
c) Write R script to display file contents.
d) Write R script to copy file contents from one file to another.
Week 6:
a) Write an R script to find basic descriptive statistics using summary, str, quartile function
on mtcars& cars datasets.
b) Write an R script to find subset of dataset by using subset (), aggregate () functions on iris
dataset.
Week 7:
a)Reading different types of data sets (.txt, .csv) from Web or disk and writing in file in
specific disk location.
b) Reading Excel data sheet in R.
c)Reading XML dataset in R
Week 8:
a) Implement R Script to create a Pie chart, Bar Chart, scatter plot and Histogram
(Introduction to ggplot2 graphics)
b) Implement R Script to perform mean, median, mode, range, summary, variance, standard
deviation operations.
Week 9:
a)Implement R Script to perform Normal, Binomial distributions.
b) Implement R Script to perform correlation, Linear and multiple regression.
Week 10:
Introduction to Non-Tabular Data Types: Time series, spatial data, Network data. Data
Transformations: Converting Numeric Variables into Factors, Date Operations, String
Parsing, Geocoding
Week 11:
Introduction Dirty data problems: Missing values, data manipulation, duplicates, forms of
data dates, outliers, spelling
Week 12:
Data sources: SQLite examples for relational databases, Loading SPSS and SAS files,
Reading from Google Spreadsheets, API and web scraping examples.
Week 1:
a) Installing R and R Studio
R is an open-source programming language that is widely used as a statistical software and
data analysis tool. R programming language is the latest cutting -edge tool. It wa designed by
Ross Ihaka and Robert Gentleman at the University of Auckland, Ne Zealand. R s
w
programming language is an implementation of the S programming language.
R and Python both play a major role in data science. It becomes confusing for any newbie to
choose the better or the most suitable one among the two, R and Python.
Why R Programming Language ?
R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
It’s a platform-independent language. This means it can be applied to all operating
system.
It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
R programming language is not only a statistic package but also allows us to integrate
with other languages (C, C++). Thus, you ca n easily interact with many data sources d
an statistical packages.
R is currently one of the most requested programming languages in Data Science.
Installing R -Console & R -Studio
R programming language is a language and free software environment, available under GNU
e,
licens supported by R Foundation for Statistical Computing. The language is most widely
known for it powerful statistical and data interpretation capabilities. s
Installing R – Console :
Open an internet browser and go to www.r-project.org.
Click the "download R" link in the middle of the page under "Getting Started."
Select a CRAN location (a mirror site) and click the corresponding link.
Click on the "Download R for Windows" link at the top of the page.
Click on the "install R for the first time" link at the top of the page.
Click "Download R for Windows" and save the executable file somewhere on your
com puter. Run the
.exe file and follow the installation instructions.
Now that R is installed, you need to download and install RStudio.
Installing R – Studio :
Step 1: First, you need to set up R environment in your local machine. You can download the
same from r-project.org.
Step 2: After downloading R for Windows platform, install it by double -clicking it.
Complex 3 + 2i v <-2+5i
print(class(v))
Vectors
When you want to create vector with more than one element, you should
use c() function which means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3)
[[2]]
[1] 21.3
sin in R. The sin() is a built-in mathematical R function that computes the sine value of
the input numeric value. The sin() method accepts a numeric value as an argument and
returns the sine value. To calculate the sine of a value in R programming, use the sin()
function.
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector
input to the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow =2, ncol =3, byrow =
TRUE)
print(M)
Arrays
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension. In the below example we create an array with two elements
which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
, , 2
Factors
Factors are the r-objects which are created using a vector. It stores the vector along
with the distinct values of the elements in the vector as labels. The labels are
always character irrespective of whether it is numeric or character or Boolean etc. in
the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the
count of levels.
# Create a vector.
apple_colors <-
c('green','green','yellow','red','red','red','green')
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column
can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of
vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male","Male","Female"),
height = c(152,171.5,165),
weight = c(81,93,78),
Age= c(42,38,26)
)
Week 2:
a) Implement R script to show the usage of various operators available in R language.
An operator is a symbol that tells the compiler to perform specific mathematical or
logical manipulations. R language is rich in built-in operators and provides following
types of operators.
Types of Operators
We have the following types of operators in R programming −
Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
Miscellaneous Operators
Arithmetic Operators
Following table shows the arithmetic operators supported by R language. The
operators act on each element of the vector.
Operato Description Example
r
Relational Operators
Following table shows the relational operators supported by R language. Each
element of the first vector is compared with the corresponding element of the
second vector. The result of comparison is a Boolean value.
== v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v == t)
equal to the corresponding element of the
second vector. it produces the following result −
[1] FALSE FALSE FALSE TRUE
!= v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first vector is print(v!=t)
unequal to the corresponding element of
the second vector. it produces the following result −
[1] TRUE TRUE TRUE FALSE
Logical Operators
Following table shows the logical operators supported by R language. It is
applicable only to vectors of type logical, numeric or complex. All numbers greater
than 1 are considered as logical value TRUE.
Each element of the first vector is compared with the corresponding element of the
second vector. The result of comparison is a Boolean value.
| v <- c(3,0,TRUE,2+2i)
It is called Element-wise Logical OR operator.
t <- c(4,0,FALSE,2+3i)
It combines each element of the first vector print(v|t)
with the corresponding element of the second
vector and gives a output TRUE if one the it produces the following result −
elements is TRUE.
[1] TRUE FALSE TRUE TRUE
! v <- c(3,0,TRUE,2+2i)
It is called Logical NOT operator. Takes each print(!v)
element of the vector and gives the opposite
logical value. it produces the following result −
[1] FALSE TRUE FALSE FALSE
The logical operator && and || considers only the first element of the vectors and
give a vector of single element as output.
Operato Description Example
r
|| v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
Called Logical OR operator. Takes first print(v||t)
element of both the vectors and gives the
TRUE if one of them is TRUE. it produces the following result −
[1] FALSE
Miscellaneous Operators:
: Colon v <-2:8
operator. It print(v)
creates the
series of it produces the following result −
numbers in [1] 2 3 4 5 6 7 8
sequence
for a vector.
%in% v1 <-8
This v2 <-12
operator is t <-1:10
used to print(v1 %in% t)
identify if an print(v2 %in% t)
element
belongs to a it produces the following result −
vector. [1] TRUE
[1] FALSE
b) Implement R script to read person‘s age from keyboard and display whether he is
eligible for voting or not.
In this program, You will learn how to check the age of a user is eligible for
voting or not in R.
else{
//statement
}
Textile
{
age <- as.integer(readline(prompt ="Enter your age :"))
if(age >=18){
print(paste("You are valid for voting :", age))
}else{
print(paste("You are not valid for voting :", age))
}
Output:
Textile
{
x <- as.integer(readline(prompt ="Enter first number :"))
y <- as.integer(readline(prompt ="Enter second number :"))
z <- as.integer(readline(prompt ="Enter third number :"))
Output:
Leap year check can be implemented very simply using if condition in R programming.
First, we will ask the user to enter a year for leap year checking. R
provides readline() function for taking the user's input by prompting an appropriate message
to the user for data using ' prompt '. Here the user is asked to enter a year, data will be stored
to variable year. Then, check the given year can be divided by 4, If the remainder is zero it is
a leap year otherwise not a leap year. Also, check the given year is a century (eg., 2000)
dividing the year by 100 without any remainder; then divide the year by 400 and check
whether the remainder is 0, if that condition is also satisfied then it is a leap year, and if not
Consider the year 2004, it is completely divided by 4 and thus it is a leap year. If we take
2005 it can't be fully divided by 4 and thus it's not a leap year.
Now check examples of the century years, we should satisfy an extra condition for century
ie., for a century being leap we should also divide it by 400 and check any remainder left.
Consider the year 2000, it can be divided by 4 and let's confirm it is a century as dividing
2000 by 100 and then to check for leap divide by 400. Here 2000 is a century which is a leap,
but if we take 1900 it will satisfy the first 2 conditions but can't divide by 400 and thus it's not
a leap year.
ALGORITHM
STEP 1: Read a year prompting appropriate messages to the user using readline() into
variable year
STEP 2: First look for a century, use nested if condition to check year is exactly divisible
STEP 4: If a year is not divisible by 4 then print The year is not a leap year
R Source Code
year = as.integer(readline(prompt="Enter a year: "))
if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {
print(paste(year,"is not a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}
OUTPUT
Week 3:
a) Implement R Script to create a list.
> str(x)
List of 3
$ a: num 2.5
$ b: logi TRUE
$ c: int [1:3] 1 2 3
In this example, a, b and c are called tags which makes it easier to reference the
components of the list.
However, tags are optional. We can create the same list without the tags as follows. In
such scenario, numeric indices are used by default.
> x
[[1]]
[1] 2.5
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3
Examples
Access Elements using Index
In the following program, we will create a list with three elements, and read its
elements using index.
Example.R
x <- list(TRUE, 25, "Apple")
print(x[1])
print(x[2])
print(x[3])
Output
[[1]]
[1] TRUE
[[1]]
[1] 25
[[1]]
[1] "Apple"
Example.R
x <- list(TRUE, 25, "Apple")
x[2] = 38
print(x)
Output
[[1]]
[1] TRUE
[[2]]
[1] 38
[[3]]
[1] "Apple"
c) Implement R Script to merge two or more lists.
Here, you can see that the second list has 2 elements, which shows that
there are two lists combined as one.
Example 2:
R
Output:
Method 2: Using append() function
append() function in R language accepts two or more lists as parameters and
returns another list with the elements of both the lists.
Syntax:
append(list1, list2)
Example 1:
R
print(List1)
print(List2)
Output:
Example 2:
R
Output:
Week 4:
Implement R script to perform following operations:
a) various operations on vectors
Operations on Vectors in R
Vectors are the most basic data types in R. Even a single object created is also stored
in the form of a vector. Vectors are nothing but arrays as defined in other languages.
Vectors contain a sequence of homogeneous types of data. If mixed values are given
then it auto converts the data according to the precedence. There are various
operations that can be performed on vectors in R.
Creating a vector
Vectors can be created in many ways as shown in the following example. The most
usual is the use of ‘c’ function to combine different elements together.
X <-c(1, 4, 5, 2, 6, 7)
print('using c function')
print(X)
print(Y)
Z <-5:10
print('using colon')
print(Y)
Output:
using c function 1 4 5 2 6 7
using seq function 1.00 3.25 5.50 7.75 10.00
using colon 5 6 7 8 9 10
Vector elements can be accessed in many ways. The most basic is using the ‘[]’,
subscript operator. Following are the ways of accessing Vector elements:
Note: vectors in R are 1 based indexed, unlike the normal C, python, etc format where
indexing starts from 0.
# Accessing elements using the position number.
X <-c(2, 5, 8, 1, 2)
print(X[2])
Y <-c(4, 5, 2, 1, 7)
print('using c function')
print(Y[c(4, 1)])
# Logical indexing
Z <-c(5, 2, 1, 4, 4, 3)
print('Logical indexing')
print(Z[Z>3])
Output:
Modifying a vector
Vectors can be modified using different indexing variations which are mentioned in
the below code:
# Creating a vector
X <-c(2, 5, 1, 7, 8, 2)
X[3] <-11
X[X>9] <-0
print('Logical indexing')
print(X)
X <-X[c(5, 2, 1)]
print('using c function')
print(X)
Output:
Deleting a vector:
Vectors can be deleted by reassigning them as NULL. To delete a vector we use the
NULL operator.
# Creating a vector
X <-c(5, 2, 1, 6)
# Deleting a vector
X <-NULL
print('Deleted vector')
print(X)
# Creating Vectors
X <-c(5, 2, 5, 1, 51, 2)
Y <-c(7, 9, 1, 5, 2, 1)
# Addition
Z <-X +Y
print('Addition')
print(Z)
# Subtraction
S <-X -Y
print('Subtraction')
print(S)
# Multiplication
M <-X *Y
print('Multiplication')
print(M)
# Division
D <-X /Y
print('Division')
print(D)
Output:
Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000
2.0000000
Sorting of Vectors
For sorting we use the sort() function which sorts the vector in ascending order by
default.
# Creating a Vector
X <-c(5, 2, 5, 1, 51, 2)
A <-sort(X)
print(A)
print(B)
Output:
Output
[1] “Sum of the vector:”
[1] 10
[1] “Mean of the vector:”
[1] 2.5
[1] “Product of the vector:”
[1] 24
R – Reverse a List:
To reverse a list in R programming, call rev() function and pass given list as
argument to it. rev() function returns returns a new list with the contents of
given list in reversed order.
rev(x)
Return Value
Examples
In the following program, we take a list in x, and reverse this list using rev().
example.R
x <- list("a", "b", "c")
result = rev(x)
print(result)
Now, let us take a list x with numeric values and reverse it.
example.R
x <- list(5, 25, 125)
result = rev(x)
print(result)
print('Original vector:')
print(nums)
Copy
Sample Output:
[1] "Original vector:"
[1] 10 20 30 40 50 60
[1] "Maximum value of the said vector: 60"
[1] "Minimum value of the said vector: 10"
Week 5:
a) Implement R Script to perform various operations on matrices
Operations on Matrices in R
Last Updated : 21 Apr, 2020
Matrices in R are a bunch of values, either real or complex numbers, arranged in a
group of fixed number of rows and columns. Matrices are used to depict the data in a
structured and well-organized format.
It is necessary to enclose the elements of a matrix in parentheses or brackets.
A matrix with 9 elements is shown below.
This Matrix [M] has 3 rows and 3 columns. Each element of matrix [M] can be
referred to by its row and column number. For example, a23 = 6
Order of a Matrix :
The order of a matrix is defined in terms of its number of rows and columns.
Order of a matrix = No. of rows × No. of columns
Therefore Matrix [M] is a matrix of order 3 × 3.
Operations on Matrices
There are four basic operations i.e. DMAS (Division, Multiplication, Addition,
Subtraction) that can be done with matrices. Both the matrices involved in the
operation should have the same number of rows and columns.
Matrices Addition
The addition of two same ordered matrices and yields a
matrix where every element is the sum of corresponding elements of the input
matrices.
num_of_rows =nrow(B)
num_of_cols =ncol(B)
print(B)
print(C)
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 8 12 16
[2,] 10 14 18
In the above code, nrow(B) gives the number of rows in B and ncol(B) gives the
number of columns. Here, sum is an empty matrix of the same size as B and C. The
elements of sum are the addition of the corresponding elements of B and C through
nested for loops.
Using ‘+’ operator for matrix addition:
Similarly, the following R script uses the in-built operator +:
# R program for matrix addition
print(B +C)
Output:
[,1] [,2] [,3]
[1,] 3+0i 5.5+0i 8+0i
[2,] 2+3i 6.0+0i 10+0i
R provides the basic inbuilt operator to add the matrices. In the above code, all the
elements in the resultant matrix are returned as complex numbers, even if a single
element of a matrix is a complex number.
Properties of Matrix Addition:
Commutative: B + C = C + B
Associative: For n number of matrices A + (B + C) = (A + B) + C
Order of the matrices involved must be same.
Matrices Subtraction
The subtraction of two same ordered matrices and yields a
matrix where every element is the difference of corresponding elements of the
second input matrix from the first.
num_of_rows =nrow(B)
num_of_cols =ncol(B)
print(B)
print(C)
for(row in1:num_of_rows)
{
for(col in1:num_of_cols)
print(diff)
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] -6 -6 -6
[2,] -6 -6 -6
Here in the above code, the elements of diff matrix are the subtraction of the
corresponding elements of B and C through nested for loops.
Using ‘-‘ operator for matrix subtraction:
Similarly, the following R script uses the in-built operator ‘-‘:
# R program for matrix addition
print(B -C)
Output:
[,1] [,2] [,3]
[1,] -1+0i 5.3+0i 0+0i
[2,] 2+3i 0.0+0i 0+0i
Properties of Matrix Subtraction:
Non-Commutative: B – C != C – B
Non-Associative: For n number of matrices A – (B – C) != (A – B) – C
Order of the matrices involved must be same.
Matrices Multiplication
The multiplication of two same ordered matrices and yields a
matrix where every element is the product of corresponding elements of the
input matrices.
num_of_rows =nrow(B)
num_of_cols =ncol(B)
print(B)
print(C)
for(row in1:num_of_rows)
for(col in1:num_of_cols)
print(prod)
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 7 27 55
[2,] 16 40 72
The elements of sum are the multiplication of the corresponding elements of B and C
through nested for loops.
Using ‘*’ operator for matrix multiplication:
Similarly, the following R script uses the in-built operator *:
# R program for matrix multiplication
# using '*' operator
print(B *C)
Output:
[,1] [,2] [,3]
[1,] 2+0i -3+2i 0.54+0i
Properties of Matrix Multiplication:
Commutative: B * C = C * B
Associative: For n number of matrices A * (B * C) = (A * B) * C
Order of the matrices involved must be same.
Matrices Division
The division of two same ordered matrices and yields a
matrix where every element is the quotient of corresponding elements of the
the first matrix element divided by the second.
num_of_cols =ncol(B)
print(B)
print(C)
for(row in1:num_of_rows)
for(col in1:num_of_cols)
print(div)
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
[,1] [,2] [,3]
[1,] 0.1428571 0.3333333 0.4545455
[2,] 0.2500000 0.4000000 0.5000000
The elements of div matrix are the division of the corresponding elements of B and C
through nested for loops.
Using ‘/’ operator for matrix division:
Similarly, the following R script uses the in-built operator /:
# R program for matrix division
print(B /C)
Output:
[,1] [,2] [,3]
[1,] 2+0i 3+0i -Inf+NaNi
Properties of Matrix Division:
Non-Commutative: B / C != C / B
Non-Associative: For n number of matrices A / (B / C) != (A / B) / C
Order of the matrices involved must be same.
Note: Time Complexity of all the matrix operations = O(r*c) where r*c is the order of
the matrix.
Descriptive Statistics in R
Output:
The summary() function also takes a single object as an argument. It then
returns the averages measures like mean, median, minimum, maximum, 1st
quantile, 3rd quantile, etc. for each component or variable in the object.
Here is an example of the summary function in action.
Code:
summary(mtcars)
Output:
Getting the Average Measures
R provides a number of functions that give us different average measures
for given data. These average measures include:
median(mtcars$mpg)
sd(mtcars$mpg)
var(mtcars$mpg)
mad(mtcars$mpg)
sum(mtcars$mpg)
length(mtcars$mpg)
Output:
Cumulative measures in R
Cumulative measures are statistical measures that are
calculated sequentially. These measures evolve with the data. They provide
insight into the progression and growth of the data. R provides a few
functions that calculate cumulative measures with ease. These functions are
Cumulative sum: The cumsum() function calculates the cumulative sum of a
given vector.
Cumulative max: To find the cumulative maximum value of an input vector,
you can use the cummax() function.
Cumulative min: You can find the cumulative minimum values in a vector
by using the cummin() function.
Cumulative product: Using the comprod() function, you can find the
cumulative product of a vector.
Code:
a <- c(1:9,4,2,4,5:2)
cumsum(a)
cummax(a)
cummin(a)
cumprod(a)
Output:
Row and Column Summary Functions
in R
There are certain functions in R that give summary statistics for
only selected rows or columns of data frames or matrices or any other two or
more dimensional data structure.
These functions are:
rowMeans: The rowMeans() function, as the name suggests, returns the mean
of a selected row of a data structure.
rowSums: The rowSums() function finds the sum of a selected row of a data
structure.
colMeans: The colMeans() function returns the mean of a selected column of
a data structure.
colSums: The colSums() function calculate the sum of a selected column of a
data structure.
Code:
rowMeans(mtcars[2,])
rowSums(mtcars[2,])
colMeans(mtcars)
colSums(mtcars)
Output:
Subsetting Datasets in R
One way to subset your rows and columns is by your dataset's indices.
This is the same as describing your rows and columns as "the first row", "all
rows in second and fifth columns", or "the first row in second to fifth
columns". Let's specify such phrases using a dataset called iris in R. From
its documentation, "[t]his famous (Fisher's or Anderson's) iris dataset
gives the measurements in centimeters of the variables sepal length and
width and petal length and width, respectively, for 50 flowers from each of 3
species of iris. The species are Iris setosa, versicolor, and virginica."
script.R
iris[1, ]
columns":
columns":
iris[1, 2:5]
R Console
>
Run
To subset your data, square brackets are used after your dataset object.
The rows of your dataset are specified as the first element inside the
square brackets, and the columns of your dataset are specified as the
second, separated by a comma:
data[rows, columns]
Subsetting rows and columns by name
In R, the rows and columns of your dataset have name attributes. Row
names are rarely used and by default provide indices—integers numbering
from 1 to the number of rows of your dataset—just like what you saw in the
previous section. In fact, if you called rownames() on the iris dataset, you
will see that these are just indexed from 1 to 150:
>rownames(iris)
[1]"1""2""3""4""5""6""7""8""9""10""11""12""13""14"
[15]"15""16""17""18""19""20""21""22""23""24""25""26""27""28"
[29]"29""30""31""32""33""34""35""36""37""38""39""40""41""42"
[43]"43""44""45""46""47""48""49""50""51""52""53""54""55""56"
[57]"57""58""59""60""61""62""63""64""65""66""67""68""69""70"
[71]"71""72""73""74""75""76""77""78""79""80""81""82""83""84"
[85]"85""86""87""88""89""90""91""92""93""94""95""96""97""98"
[99]"99""100""101""102""103""104""105""106""107""108""109""1
10""111""112"
[113]"113""114""115""116""117""118""119""120""121""122""123"
"124""125""126"
[127]"127""128""129""130""131""132""133""134""135""136""137"
"138""139""140"
[141]"141""142""143""144""145""146""147""148""149""150"
>nrow(iris)
[1]150
Row names are more common in smaller datasets and are used to make
observations in your dataset easily identifiable. For example, for a small
dataset containing health information of a doctor's patients, the row names
of this dataset could be the full names of the patients.
Column names on the other hand, are ubiquitous to almost any dataset.
You can access these with the colnames() function or the names() function:
colnames(iris)
[1]"Sepal.Length""Sepal.Width""Petal.Length""Petal.Width""Sp
ecies"
names(iris)
[1]"Sepal.Length""Sepal.Width""Petal.Length""Petal.Width""Sp
ecies"
To subset your dataset by the names of your rows and columns, simply use
the square brackets again, prefixed by your dataset object:
script.R
iris["5", "Sepal.Width"]
)]
R Console
>
Run
It's important to note that both the row and column names are characters,
so using single or double quotes is absolutely necessary!
Subsetting rows and columns by value
Subsetting your rows and columns by value often allows the most flexibility.
For example, you can extract the data on Iris setosa using a conditional
statement like this:
15.13.51.40.2 setosa
24.93.01.40.2 setosa
34.73.21.30.2 setosa
44.63.11.50.2 setosa
...
475.13.81.60.2 setosa
484.63.21.40.2 setosa
495.33.71.50.2 setosa
505.03.31.40.2 setosa