UNIT 1 - 2023-24 Part 1
UNIT 1 - 2023-24 Part 1
UNIT 1 - 2023-24 Part 1
Introduction
Features of R :
1. An effective data handling and storage facility.
2. R is an interpreted programming language which means it allows coding in interactive
manner.
3. R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined, recursive functions and input and output facilities.
4. A set of large number of operators for calculations on an array in particular matrices.
5. A set of large collection of intermediate tools for data analysis
6. Excellent graphical facilities for analysis of data and to display the results directly on
computer or as a hard copy.
7. R has excellent in-built help system.
8. R is compatible with S-plus.
Thus R is a useful software for interactive data analysis.
•Data Science
•Statistical computing(9100+ packages)
•Machine Learning (ML tasks like linear and non-linear regression, decision trees, linear and non-
linear classification and many more)
1.2 R
R - Local Environment Setup
Windows Installation
You can download the Windows installer version of R from R-3.2.2 for Windows (32/64 bit)
and save it in a local directory.
As it is a Windows installer (.exe) with a name "R-version-win.exe". You can just double
click and run the installer accepting the default settings. If your Windows is 32-bit version, it
installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and
64-bit versions.
After installation you can locate the icon to run the Program in a directory structure "R\R-
3.2.2\bin\i386\Rgui.exe" under the Windows Program Files. Clicking this icon brings up the
R-GUI which is the R console to do R Programming.
KLS GCC BCA V- Semester Statistical Computing & R Programming
R Studio: R Studio is an Integrated Development Environment (IDE) for R Language with advanced
and more user-friendly GUI. It includes a console, syntax- highlighting editor that supports direct
code execution, as well as tools for plotting, history, debugging and workspace management. R
Studio allows the user to run R in a more user-friendly environment. It is
open-source (i.e.free) and available at http://www.rstudio.com/.
The fig shows the GUI of R Studio. The R Studio screen has four windows:
1. Console.
2. Workspace and history.
3. Files, plots, packages and help.
4. The R script(s) and data view.
The R script is where you keep a record of your work.
KLS GCC BCA V- Semester Statistical Computing & R Programming
Create a new R script file:
1) File -> New -> R Script,
2) Click on the icon with the “+” sign and select “R Script”
3) Use shortcut as: Ctrl+Shift+N.
Comments
Comments are like helping text in your R program and they are ignored by the interpreter while
executing your actual program. Single comment is written using # in the beginning of the statement
as follows:
# My first program in R Programming
R Data Types:
Variables are nothing but reserved memory locations to store values. This means that, when
you create a variable you reserve some space in memory.
You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.
In contrast to other programming languages like C and java in R, the variables are not
declared as some data type. The variables are assigned with R-Objects and the data type of
the R-object becomes the data type of the variable.
Examples:
> x <- 6 # assignment operator: a less-than character (<) and a hyphen (-)
with no space
> x
[1] 6
> y = 3 # assignment operator = is used.
> y
[1] 3
> z <<- 9 # assignment to a global variable rather than a local variable.
> z
[1] 9
> 5 -> a #A rightward assignment operator (->) can be used anywhere
> a
[1] 5
> a <- b <- 7 # Multiple values can be assigned simultaneously.
> a
[1] 7
> b
[1] 7
Variable (Object) Names: Certain variable names are reserved for particular purposes. Some
reserved symbols are: c q t C D F I T
### meaning of c q t C D F I T
1) numeric : The most commonly used numeric data is numeric. This is similar to float or
double in other languages. It handles and decimals, both positive and negative, and also
zero.
Example: 14.3, 23.5, 60
x <- 23.5
print(class(x))
it produces the following result:
[1] "numeric"
2) Integers:
> i <- 5L # To set an integer to a variable, append the value with an „L‟.
>i
[1] 5
> is.integer(i) # Testing whether a variable is integer or not
[1] TRUE
3) Complex : The complex data type is used to specify purely imaginary values in R. We use the
suffix i to specify the imaginary part.
Example
# 2i represents imaginary part
x <- 3 + 2i
# print class of x
print(class(x))
Output
[1] "complex"
4) Character
The character data type is used to specify character or string values in a variable.
In programming, a string is a set of characters. For example, 'A' is a single character and "Apple" is a
string.You can use single quotes ' ' or double quotes " " to represent strings.
In general, we use:
' ' for character variables
" " for string variables
Example: ’ a ', "BCA", "TRUE", '23.4'
y <- FALSE
print(y)
print(class(y))
Output
[1] FALSE
[1] "logical"
Arithmetic in R:
In R, standard mathematical rules apply throughout and follow the usual left-to-right order of
operations: parentheses, exponents, multiplication, division, addition, subtraction (PEMDAS).
Here‟s are some examples in the console:
You can find the square root of any non-negative number with the sqrt
function. You simply provide the desired number to x as shown here:
R> sqrt(x<-9)
[1] 3
R> sqrt(x<-5.311)
[1] 2.304561
R has a wide variety of objects for holding data, including scalars, vectors, matrices ,arrays, data
frames, and lists. They differ in terms of the type of data they can hold, how they are created, their
structural complexity, and the notation used to identify and access individual elements.
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.
Vectors must be homogeneous i.e, the type of data in a given vector must all be the same. One
important key factor is that in R the indexing of vector starts from 1 and not from 0.
Creating a Vector:
The function for creating a vector is the single letter c, with the desired entries in parentheses
separated by commas.
Example:
> myvec <- c(1,3,11,42)
> myvec
Output :[1] 1 3 11 42
Vector entries can be calculations or previously stored items (including vectors themselves).
Example:
> foo <- 32.1
> myvec2 <- c(3,-3,2,3.4,45+5,foo)
> myvec2
[1] 3.0 -3.0 2.0 3.4 50.0 32.1
This code created a new vector assigned to the object myvec2. Some of the entries are defined as
arithmetic expressions, and it‟s the result of the expression that‟s stored in the vector. The last
element, foo, is an existing numeric object defined as 32.1.
Let‟s create an equally spaced sequence of increasing or decreasing numeric values. This is
something you‟ll need often, for example when programming loops or when plotting data points. The
easiest way to create such a sequence, with numeric values separated by intervals of 1, is to use the
colon operator.
Example:
> 3:10
[1] 3 4 5 6 7 8 9 10
The example 3:10 should be read as “from 3 to 10 (by 1).” The result is a numeric vector just as if
you had listed each number manually in parentheses with c.
Example:
> seq(from=3,to=20,by=3)
[1] 3 6 9 12 15 18
Examples:
1. 2.
> myvec_2<-rep(x<-5,times<-3) > vec_r<-rep(5,times<-4)
> cat(myvec_2) > cat(vec_r)
5 5 5 5 5 5 5
3. 4.
> y<-1:5 > x<-1:4
> vec_reach<-rep(y,each<-2) > vec_rlen<-rep(x,length<-3)
> cat(vec_reach) > cat(vec_rlen)
KLS GCC BCA V- Semester Statistical Computing & R Programming
1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 1 2 3 4
5.
> rep(x=c(3,62,8.3),each=2)
[1] 3.0 3.0 62.0 62.0 8.3 8.3
The rep function is given a single value or a vector of values as its argument x, as well as a
value for the arguments times and each.
The value for times provides the number of times to repeat x, and each provides the number of
times to repeat each element of x.
In the first example directly above, it simply repeats a single value three times. The other
examples first use rep and times on a vector to repeat the entire vector.
Use each to repeat each member of the vector, and finally use both times and each to do both
at once.
Examples:
1.
> sort(x<-c(1, 4, 5, 2), decreasing = FALSE)
[1] 1 2 4 5
2.
> sort(x<-c(1, 4, 5, 2), decreasing = TRUE)
[1] 5 4 2 1
3.
> v1<-c(1,4,5,2,3)
> v2<-c(6, 9, 8, 7)
> sort(x<-c(v1,v2),decreasing = TRUE)
[1] 9 8 7 6 5 4 3 2 1
The sort function is pretty straightforward. You supply a vector to the function as the argument x, and
a second argument, decreasing, indicates the order in which you want to sort.
Examples:
1.
> length(x<-c(3,2,8,1))
[1] 4
2.
> length(x<-5:13)
[1] 9
Note that if you include entries that depend on the evaluation of other functions (in 3 rd example, calls
to rep and seq), length tells you the number of entries after those inner functions have been executed.
indexes allow you to retrieve specific elements from a vector, which is known as subsetting.
Example:
> myvec1<-c(11:17)
> length(myvec1)
[1] 7
> myvec1[length(x<-myvec1)]
[1] 17
> cat(myvec1)
11 12 13 14 15 16 17
Because length(x<-myvec1) results in the final index of the vector (in this case, 7), entering this
phrase in the square brackets extracts the final element, 17.
Similarly, you could extract the second-to-last element by subtracting 1 from the length; let‟s try that,
and also assign the result to a new object:
You can also delete individual elements by using negative versions of the indexes supplied in the
square brackets
Example:
> myvec2<-c(15:20)
> print(myvec2)
[1] 15 16 17 18 19 20
> myvec2[-4]
[1] 15 16 17 19 20
The index in the square brackets can be the result of an appropriate calculation:
> myvec2[-(length(x<-myvec2))]
[1] 15 16 17 18 19
> myvec2
[1] 15 16 17 18 19 20
#excludes element 20 which is present at index 6, which is
#equivalent to the length of myvec2
As with most operations in R, you are not restricted to doing things one by one. You can also subset
objects using vectors of indexes, rather than individual indexes.
> myvec<-c(10,20,30,40,50)
> myvec[c(1,3,5)]
[1] 10 30 50
This returns the first, third, and fifth elements of myvec in one go.
i<-1
while(i<=n)
{
cat('elements in myvec are',myvec[i],'\n')
i<-i+1
}
Vector-Oriented Behavior:
vectororiented, vectorized, or element-wise behavior is a key feature of the language.
R matches up the elements according to their respective positions and performs the operation
on each corresponding pair of elements.
> v1<-c(1,2,3,4,5)
> v2<-c(6,7,8,9,10)
> v3<-c(v1+v2)
> v3
[1] 7 9 11 13 15
KLS GCC BCA V- Semester Statistical Computing & R Programming
The situation is made more complicated when using vectors of different lengths, which can
happen in two distinct ways:
1) The first is when the length of the longer vector can be evenly divided by the length of the
shorter vector.
Example:
2) The second is when the length of the longer vector cannot be divided by the length of the
shorter vector—this is usually unintentional on the user‟s part.
R essentially attempts to replicate, or recycle, the shorter vector by as many times as needed
to match the length of the longer vector, before completing the specified operation.
Example:
> a <- c(4, 5, 6, 1)
> b <- c(2, 4, 7)
> res<-c(a+b)
Warning message:
In a + b : longer object length is not a multiple of shorter object
length
> cat(res)
6 9 13 3
Here you see that R has matched the first three elements of a with the elements of b, but it‟s not able
to fully repeat the vector again. (we can observe the warning message).
Another benefit of vector-oriented behavior is that you can use vectorized functions to
complete potentially laborious tasks.
For example, if you want to sum or multiply all the entries in a numeric vector, you can just
use a built- in function.
> v1<-c(1,2,3)
> v2<-c(4,5,6)
> #You can find the sum of v1 elements with
> sum(v1)
[1] 6
>
> #You can find the product of v2 elements with
> prod(v2)
[1] 120