R Module 1
R Module 1
R Module 1
R
Semester 3
Course Code BDS306C
• R is a Programming Language
• R also refers to the software that is used to run the R programs .
• Ross Ihaka and Robert Gentleman from University of Auckland created R language in 1990s.
• R language is based on the S language.
• S Language was developed at the Bell Laboratories in 1970s.
• S Language was developed by John Chambers.
• R Software is a GNU project-free (GNU is an operating system which is
100% free software, GNU stands for Gnu's Not Unix, and it is pronounced as “g-noo”) and
open source software.
• R (Language and Software) is developed by the R Core Team.
• R has evolved over the past 3 to 4 decades as its history originated from 1970s.
One can write a new package in R if the existing package is not sufficient for the
individual’s use.
R is a high-level scripting language which need not be compiled, but it is an interpreted
language.
R is an imperative language(composed of step-by-step instructions) and still it supports
object-oriented programming.
R is a free open source language that has cross platform compatibility.
R is a most advanced statistical programming language and it can produce outstanding
graphical outputs.
R is extremely flexible and comprehensive even for the beginners.
R easily relates to other programming languages such as C, C++, Java, Python, Hadoop,
etc.
R can handle huge data in flat files even in semi structured or in unstructured form.
The R language allows the user to program loops to successively analyze several data sets.
It is also possible to combine in single program different statistical functions to perform
more complex analyses.
R displays the results of the analysis immediately and these results are stored in “objects”
so that further analysis can be done on them.
The user can also extract a part of the result which is of interest to him.
R is an interpreted language and not a compiled one. This means that all commands typed
on the keyboard are directly executed without need to build the complete program like in
C, C++ or Java.
R’s syntax is very simple and intuitive.
In R, a function is always written with parentheses, eg. ls().
If only the name of the function is typed, R displays the content of the function.
When R is running, variables, data, functions, results, etc. are stored in the active memory
of the computer in the form of objects which have a name.
The user can do actions on these objects with operators and functions.
Installing R
R is available in several forms, for Unix and Linux machines, or some pre-compiled
binaries for Windows, Linux and Macintosh.
The files needed to install R, either from the source or from the pre-compiled binaries are
distributed from the internet site of the Comprehensive R Archive Network (CRAN) where
the instructions for installation are also available.
R can be installed from the link http://www.r-project.org using internet connection.
Use the “Download R” link in web page to download the R Executable.
Choose the version of R that is suitable for your operating system.
R-Scripts can run without the installation of the IDE, the R-Studio using the R-Console.
Once R installation is completed we install R-Studio.
For installation of R-Studio in Windows operating system, we download the latest
precompiled binary distribution from the CRAN website http://www.rstudio.org.
Once completed, launch RStudio IDE from Start à All Programs à Rstudio à RStudio.exe or
from your custom installation directory.
The default installation directory for RStudio IDE is “C:\Program Files\RStudio\bin\
rstudio.exe.
R Studio is an Integrated Development Environment (IDE) that consists of a GUI with four
parts – 1) A text editor 2) command-line interpreter 3) place to display files, plots, packages
and help information 4) place to display the data being used and the variables used in the
program (Environment/ History).
Initiating R
First Program
Open R Gui, find the command prompt and type the command below and hit
enter to run the command.
> sum(1:5)
[1] 15
o The result above shows that the command gives the result 15.
o That is the command has taken the input of integers from 1 to 5 and has performed the sum
operation on them.
o In the above command sum() is a function that takes the argument 1:5 which means a vector
that consists of a sequence of integers from 1 to 5.
o Like any other command prompt, R also allows to use the up arrow key to revoke the previous
commands.
Help in R :
If a function name or a dataset name is known then we can type? followed by the name.
If name is not known then we then we need to type?? followed by a term that is related
to the search function.
Keywords, special characters and two separate terms of search need to be enclosed in
double or single quotes.
The symbol # is used to comment a line in R Program like any other programming
language.
Assigning Variables
The results of the operations in R can be stored for reuse.
The values can be assigned to the variables using the symbol “<-” or “=” of which the
symbol “<-” is preferred.
There is no concept of variables declaration in R.
The variable type is assumed based on the value assigned.
The variable names consist of letters, numbers, dots and underscores,
but a variable name should only start with an alphabet.
The variable names should not be reserve words.
To create global variables (variables available everywhere) we use the symbol “<<-”.
X <<- exp(exp(1))
Assignment operation can also be done using the assign() function.
For global assignment the same function assign() can be used, but, by including an extra
attribute globalenv().
To see the value of the variable, simply type the variable in the command prompt.
The same thing can be done using a print() function.
If assignment and printing of a value has to be done in one line we can do the same in two
ways.
First method, by separating the two statements by a semicolon and the second method is by
wrapping the assignment in parenthesis () as below.
Basic Mathematical Operations;
The “+” plus operator is used to perform the addition operation.
It can be used to add two numbers or add two vectors.
Vector represents an ordered set of values.
Vectors are mainly used to analyse statistical data.
The “:” colon operator creates a sequence.
Sequence is a series of numbers within the given limits.
The “c()” function concatenates the values given within the brackets “(“ and “)”.
Variable names in R are case sensitive.
Open R Gui, find the command prompt and type the command below and hit
enter to run the command.
The vectors and the c() function in R helps us to avoid loops.
The statistical functions in R can take the vectors as input and produce results.
The sum() function takes vector arguments and produces results.
But, the median() function when taking the vector arguments shows errors.
Similar to the “+” plus
operator all other operators
in R take vectors as inputs
and can produce results.
The subtraction and the
multiplication operations
work as below.
The division operator is of three types.
The ordinary division is represented using the
“/” symbol,
the integer division operator is represented
using the “%/%” symbol
and the modulo division operator is represented
using the “%%” symbol.
The below example commands show the results
of the division operators.
The other mathematical functions are the trigonometry functions like, sin(),
cos(), tan(), asin(), acos(), atan() and the logarithmic and exponential functions
like log(), exp(), log1p(), expm1().
All these mathematical functions can operate on vectors as well as individual
elements.
Relational operators.