Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Tutorial

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

Running R Program

To run the program, enter into the directory structure "R\R3.2.2\bin\i386\Rgui.exe" under the
Windows Program Files.

Double clicking following icon brings up the R-GUI which is the R console to do R Programming.

Creating the first "Hello World!" program


Depending on the needs, you can program either at R command prompt or you can use an R
script file to write your program. Let's check both one by one.

Using Command prompt


Type following lines at the command prompt

> myString <- "Hello, World!"

> print ( myString)

[1] "Hello, World!"

Here first statement defines a string variable myString, where we assign a string "Hello, World!"
and then next statement print() is being used to print the value stored in variable myString.

Using R-Script File


Usually, you will do your programming by writing your programs in script files and then you
execute those scripts at your command prompt with the help of R interpreter called Rscript. So
let's start with writing following code in a text file called test.R as under

# My first program in R Programming


myString <- "Hello, World!"

print ( myString)

Save the above code in a file test.R and execute it at Linux command prompt as given below.
Even if you are using Windows or other system, syntax will remain same.

$ Rscript test.R

When we run the above program, it produces the following result.

[1] "Hello, World!"


R Data Types
In contrast to other programming languages like C and java in R, the variables are not declared
as some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are

Vectors

Lists

Matrices

Arrays

Factors

Data Frames

The simplest of these objects is the vector object and there are six data types of these atomic
vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic
vectors.

Variables

Introduction
A variable provides us with named storage that our programs can manipulate. A variable in R
can store an atomic vector, group of atomic vectors or a combination of many Robjects. A valid
variable name consists of letters, numbers and the dot or underline characters. The variable
name starts with a letter or the dot not followed by a number.

Variable Validity Reason


Name

var_name2. valid Has letters, numbers, dot and underscore

var_name% Invalid Has the character '%'. Only dot(.) and underscore
allowed.

2var_name invalid Starts with a number


.var_name , valid Can start with a dot(.) but the dot(.)should not be
var.name followed by a number.

.2var_name invalid The starting dot is followed by a number making it


invalid.

_var_name invalid Starts with _ which is not valid

Creating Variables (Assigning Value to Variables)


The variables can be assigned values using leftward, rightward and equal to operator.

Multiple Assignment
> name <- "Carmen"; n1 <- 10; n2 <- 100; m <- 0.5
Displaying Contnets of a Variable
The values of the variables can be printed simply by typing the name of variable at the command
prompt.

~ From E. Paradis
> n <- 15
>n
[1] 15
> 5 -> n
>n
[1] 5
> x <- 1
> X <- 10
>x
[1] 1
>X
[1] 10

> n <- 10 + 2
>n
[1] 12
> n <- 3 + rnorm(1)
>n
[1] 2.208807

> name <- "Carmen"

>name (or, print(name))


Data display can also be carried by using print(var_name) or cat(var_name)function.
The cat(var_name) function combines multiple items into a continuous print output.

Displaying Data Type of Variables


In R, a variable itself is not declared of any data type, rather it gets the data type of the R -
object assigned to it. So R is called a dynamically typed language, which means that we can
change a variables data type of the same variable again and again when using it in a program.

The data type of a variable can be extracted using class(var_name) function, as

var_x <- "Hello"

cat("The class of var_x is ",class(var_x),"\n")

var_x <- 34.5

cat(" Now the class of var_x is " ,class(var_x),"\n")

var_x <- 27L

cat(" Next the class of var_x becomes " ,class(var_x),"\n")

When we execute the above code, it produces the following result

The class of var_x is character


Now the class of var_x is numeric
Next the class of var_x becomes integer

Displaying names of all variables stored in memory


(Not revised)
> ls()
[1] "m" "n1" "n2" "name"

Note the use of the semi-colon to separate distinct commands on


the same
line.
If we want to list only the objects which contain a given character
in
their name, the option pattern (which can be abbreviated with
pat) can be
used:
> ls(pat = "m")
[1] "m" "name"

To restrict the list of objects whose names start with this


character:
> ls(pat = "^m")
[1] "m"

The function ls.str displays some details on the objects in


memory:
> ls.str()
m : num 0.5
n1 : num 10
n2 : num 100
name : chr "Carmen"

The option pattern can be used in the same way as with ls.
Another
useful option of ls.str is max.level which speci_es the level of
detail for the
display of composite objects. By default, ls.str displays the details
of all
objects in memory, included the columns of data frames, matrices
and lists,
which can result in a very long display. We can avoid to display all
these
details with the option max.level = -1:
> M <- data.frame(n1, n2, m)
> ls.str(pat = "M")
M : `data.frame': 1 obs. of 3 variables:
$ n1: num 10
$ n2: num 100
$ m : num 0.5
> ls.str(pat="M", max.level=-1)
M : `data.frame': 1 obs. of 3 variables:

From Web Site (Not revised yet)

To know all the variables currently available in the workspace we use the ls()function. Also the
ls() function can use patterns to match the variable names.

print(ls())

When we execute the above code, it produces the following result

[1] "my var" "my_new_var" "my_var" "var.1"


[5] "var.2" "var.3" "var.name" "var_name2."
[9] "var_x" "varname"

Note It is a sample output depending on what variables are declared in your environment.

The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".

print(ls(pattern = "var"))

When we execute the above code, it produces the following result

[1] "my var" "my_new_var" "my_var" "var.1"


[5] "var.2" "var.3" "var.name" "var_name2."
[9] "var_x" "varname"

The variables starting with dot(.) are hidden, they can be listed using "all.names = TRUE"
argument to ls() function.

print(ls(all.name = TRUE))

When we execute the above code, it produces the following result

[1] ".cars" ".Random.seed" ".var_name" ".varname" ".varname2"


[6] "my var" "my_new_var" "my_var" "var.1" "var.2"
[11]"var.3" "var.name" "var_name2." "var_x"
To delete variables in memory
later

Online Help
The on-line help of R gives very useful information on how to use
the functions.
Help is available directly for a given function, for instance:
> ?lm
will display, within R, the help page for the function lm() (linear
model). The
commands help(lm) and help("lm") have the same e_ect.

Some remained

R Operators

We have the following types of operators in R programming

Arithmetic Operators

Relational Operators

Logical Operators

Assignment Operators

Miscellaneous Operators
Later

Data Objects of R

Introudction
R works with objects which are, of course, characterized by their names and their
content, but also by attributes which specify the kind of data represented by an
object.
All objects have two intrinsic attributes: mode and length.

The mode is the basic type of the elements of the object.

There are four main modes: numeric, character, complex 7, and logical (FALSE or
TRUE).

Other modes exist but they do not represent data, for instance function or
expression.

The length is the number of elements of the object.

To display the mode and the length of an object


To display the mode and the length of an object, one can use the functions mode
and length, respectively:

> x <- 1
> mode(x)
[1] "numeric"
> length(x)
[1] 1
> A <- "Gomphotherium"; compar <- TRUE; z <- 1i
> mode(A); mode(compar); mode(z)
[1] "character"
[1] "logical"
[1] "complex"

In contrast to other programming languages like C and java in R, the


variables are not declared as some data type. The variables are assigned
with R-Objects and the data type of the R-object becomes the data type of
the variable. There are many types of R-objects. The frequently used ones
are

Vectors

Lists

Matrices

Arrays

Factors

Data Frames
The simplest of these objects is the vector object and there are six data
types of these atomic vectors, also termed as six classes of vectors. The
other R-Objects are built upon the atomic vectors.

Missing Data
Whatever the mode, missing data are represented by NA (not available).

Exponential Notation
A very large numeric value can be speci_ed with an exponential notation:
> N <- 2.1e23
>N
[1] 2.1e+23

Non-finite Numbers
R correctly represents non-_nite numeric values, such as _1 with Inf and -Inf, or
values which are not numbers with NaN (not a number ).

> x <- 5/0


>x
[1] Inf
> exp(x)
[1] Inf
> exp(-x)
[1] 0
>x-x
[1] NaN

Character Mode Data


A value of mode character is input with double quotes ". It is possible to include this
latter character in the value if it follows a backslash \. The two charaters
altogether \" will be treated in a specific way by some functions such as 'cat' for
display on screen, or write.table to write on the disk (p. 14, the option qmethod of
this function).

> x <- "Double quotes \" delimitate R's strings."


>x
[1] "Double quotes \" delimitate R's strings."
> cat(x)
Double quotes " delimitate R's strings.
Alternatively, variables of mode character can be delimited with single quotes ('); in
this case it is not necessary to escape double quotes with backslashes (but single
quotes must be!):

> x <- 'Double quotes " delimitate R\'s strings.'


>x
[1] "Double quotes \" delimitate R's strings."

The following table gives an overview of the type of objects representing data.

Vector
The simplest of these objects is the vector object and there are six data types of these atomic
vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic
vectors.
When you want to create vector with more than one element, you should use c() function which
means to combine the elements into a vector.

Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.

Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.

Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The
array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.

Factors

Factors are the r-objects which are created using a vector. It stores the
vector along with the distinct values of the elements in the vector as labels.
The labels are always character irrespective of whether it is numeric or
character or Boolean etc. in the input vector. They are useful in statistical
modeling.
Factors are created using the factor() function.The nlevels functions gives
the count of levels.

Data Frames

Data frames are tabular data objects. Unlike a matrix in data frame each
column can contain different modes of data. The first column can be
numeric while the second column can be character and third column can be
logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

Reading Data in File


Later
Saving Data
Later

Generating Data

Regular Sequence

To Generate sequence of natural numbers

Using ':' operator

> x <- 1:30


The resulting vector x has 30 elements.
> 1:(25-1)

It creates a vector of elements containing 1 through 24.

The operator `:' has priority on the arithmetic operators within an expression, so if
you give command
> 1:10-1
It generates numbers from 0 to 9, as

[1] 0 1 2 3 4 5 6 7 8 9

Using 'seq()' function

> seq(1, 5, 0.5)


[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

where the first number indicates the beginning of the sequence, the second one the
end, and the third one the increment to be used to generate the sequence.

One can use also:


> seq(length=9, from=1, to=5)

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Using 'c' function


One can also type directly the values using the function c:
> c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Generating Data by from keyboard input

It is also possible, if one wants to enter some data on the keyboard, to use the
function scan with simply the default options:
> z <- scan()
1: 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
10:
Read 9 items
>z
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Using 'rep' function

The function rep creates a vector with all its elements identical:
> rep(1, 30)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Using 'sequence' function

The function sequence creates a series of sequences of integers each ending by the
numbers given as arguments:
> sequence(4:5)
[1] 1 2 3 4 1 2 3 4 5

>sequence(4:7)

> sequence(c(10,5))
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5

Using' gl()' function

It is used to generate regular series of factors.


The syntax is:

gl(number of levels, number of replications in each level)

Exm.
> gl(3, 5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
Functions

Creating a function

Syntax:

function_name <- function(arg_1, arg_2, ...) {

Function body

Example

# Create a function to print squares of numbers in sequence.

new.function <- function(a) {

for(i in 1:a) {

b <- i^2
print(b)

Calling a Function
Example

To call function of name new.function with '6' as its parameter-

new.function(6)

Calling a function with multiple arguments by position and by name


The arguments to a function call can be supplied in the same sequence as defined in the function or
they can be supplied in a different sequence but assigned to the names of the arguments.

# Create a function with arguments.

new.function <- function(a,b,c) {

result <- a * b + c
print(result)

}
# Call the function by position of arguments.

new.function(5,3,11)

# Call the function by names of the arguments.

new.function(a = 11, b = 5, c = 3)

When we execute the above code, it produces the following result

[1] 26
[1] 58

Calling a Function with Default Arguments


We can define the value of the arguments in the function definition and call the function without
supplying any argument to get the default result. But we can also call such functions by supplying new
values of the argument and get non default result.

# Create a function with arguments.

new.function <- function(a = 3, b = 6) {

result <- a * b
print(result)

# Call the function without giving any argument.

new.function()

# Call the function with giving new values of the argument.

new.function(9,5)

When we execute the above code, it produces the following result

[1] 18
[1] 45

R Strings

Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R
stores every string within double quotes, even when you create them with single quote.
Notes:

The quotes at the beginning and end of a string should be both double quotes or both single
quote. They can not be mixed.

Double quotes can be inserted into a string starting and ending with single quote.

Single quote can be inserted into a string starting and ending with double quotes.

Double quotes can not be inserted into a string starting and ending with double quotes.

Single quote can not be inserted into a string starting and ending with single quote.

String Manipulation

Concatening Strings by Using 'paste()' function

Many strings in R are combined using the paste() function. It can take any number of arguments to be
combined together.

The basic syntax for paste function is

paste(..., sep = " ", collapse = NULL)

... represents any number of arguments to be combined.

sep represents any separator between the arguments. It is optional.

collapse is used to eliminate the space in between two strings. But not the space within two
words of one string.

Example

a <- "Hello"

b <- 'How'

c <- "are you? "

print(paste(a,b,c))

print(paste(a,b,c, sep = "-"))

print(paste(a,b,c, sep = "", collapse = ""))

When we execute the above code, it produces the following result


[1] "Hello How are you? "
[1] "Hello-How-are you? "
[1] "HelloHoware you? "

Formatting Strings as well as Numbers- Using 'format()' Function

Numbers and strings can be formatted to a specific style using format()function.

The basic syntax for format function is

format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none"))

Following is the description of the parameters used

x is the vector input.

digits is the total number of digits displayed.

nsmall is the minimum number of digits to the right of the decimal point.

scientific is set to TRUE to display scientific notation.

width indicates the minimum width to be displayed by padding blanks in the beginning.

justify is the display of the string to left, right or center.

Example

# Total number of digits displayed. Last two digits rounded off.

result <- format(23.123456789, digits = 9)


print(result)

Output

[1] "23.1234568"

# The minimum number of digits to the right of the decimal point.

result <- format(23.47, nsmall = 5)


print(result)

Output

[1] "23.47000"

# Display numbers in scientific notation.

result <- format(c(6, 13.14521), scientific = TRUE)


print(result)

Output

[1] "6.000000e+00" "1.314521e+01"

# Format treats everything as a string.

result <- format(6)

print(result)

Output

[1] "6"

# Numbers are padded with blank in the beginning for width.

result <- format(13.7, width = 6)


print(result)

Output

[1] " 13.7"

# Left justify strings.

result <- format("Hello", width = 8, justify = "l")

print(result)

# Justfy string with center.

result <- format("Hello", width = 8, justify = "c")

print(result)

Output

[1] "Hello "


[1] " Hello "

Counting Number of Characters in String

The basic syntax for nchar() function is

nchar(x)

x is the vector input.

Changing the Case of Strings


Following syntices are used

toupper(x)
tolower(x)

x is the vector input.

Extracting Part of a String


The function 'substring()' is used. Its basic syntax is

substring(x,first,last)

x is the character vector input.

first is the position of the first character to be extracted.

last is the position of the last character to be extracted.

Example

# Extract characters from 5th to 7th position.

result <- substring("Extract", 5, 7)

print(result)

When we execute the above code, it produces the following result

[1] "act"

R Vectors
Vectors are the most basic R data objects and there are six types of atomic vectors. They are
logical, integer, double, complex, character and raw.

Creating Vectors
Creating Single Element Vectors
Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of
the above vector types.

# Atomic vector of type character.

print("abc");

Output

[1] "abc"

# Atomic vector of type double.

print(12.5)

Output
[1] 12.5

# Atomic vector of type integer.

print(63L)

Output

[1] 63

# Atomic vector of type logical.

print(TRUE)

Output

[1] TRUE
# Atomic vector of type complex.

print(2+3i)

Output

[1] 2+3i

# Atomic vector of type raw.

print(charToRaw('hello'))

Output

[1] 68 65 6c 6c 6f

Creating Multiple Element Vector

Using colon (:) Operator


> x <- 1:30
The resulting vector x has 30 elements.
> 1:(25-1)

It creates a vector of elements containing 1 through 24.

The operator `:' has priority on the arithmetic operators within an expression, so if
you give command
> 1:10-1
It generates numbers from 0 to 9, as

[1] 0 1 2 3 4 5 6 7 8 9

# Creating a sequence from 5 to 13.

v <- 5:13

print(v)
Output

[1] 5 6 7 8 9 10 11 12 13

# Creating a sequence from 6.6 to 12.6.

v <- 6.6:12.6

print(v)

Output

[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6

# If the final element specified does not belong to the sequence then it is discarded.

v <- 3.8:11.4

print(v)

Output

[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8

Using 'seq()' function

> seq(1, 5, 0.5)


[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

where the first number indicates the beginning of the sequence, the second one the
end, and the third one the increment to be used to generate the sequence.

One can use also:


> seq(length=9, from=1, to=5)

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

# Create vector with elements from 5 to 9 incrementing by 0.4.

print(seq(5, 9, by = 0.4))

Output

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

Using 'c' function


One can also type directly the values using the function c:
> c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

The non-character values are coerced to character type if one of the elements is a character.

# The logical and numeric values are converted to characters.


s <- c('apple','red',5,TRUE)
print(s)
Output

[1] "apple" "red" "5" "TRUE"

Accessing Vector Elements


Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing.
Indexing starts with position 1. Giving a negative value in the index drops that element from
result.TRUE, FALSE or 0 and 1 can also be used for indexing.

# Accessing vector elements using position.


t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")

u<-t[2]

print(u)

u <- t[c(2,3,6)]

print(u)

Output

[1] "Mon"
[1] "Mon" "Tue" "Fri"

# Accessing vector elements using logical indexing.


v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]

print(v)

Output

[1] "Sun" "Fri"

# Accessing vector elements using negative indexing.


x <- t[c(-2,-5)]

print(x)

Output

[1] "Sun" "Tue" "Wed" "Fri" "Sat"

# Accessing vector elements using 0/1 indexing.


y <- t[c(0,0,0,0,0,0,1)]

print(y)

Output

[1] "Sun"
Vector Manipulation

Vector Arithmetic

You might also like