R - Chapter 1
R - Chapter 1
R - Chapter 1
UNIT I- Chapter 1
Introduction
R is a programming language and software environment for statistical analysis, graphics
representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed by the R Development Core
Team.
The core of R is an interpreted computer language which allows branching and looping as well
as modular programming using functions. R allows integration with the procedures written in
the C, C++, .Net, Python or FORTRAN languages for efficiency.
R is freely available under the GNU General Public License, and pre-compiled binary versions
are provided for various operating systems like Linux, Windows and Mac.
R is free software distributed under a GNU-style copy left, and an official part of the GNU
project called GNU S.
Features of R
As stated earlier, R is a programming language and software environment for statistical
analysis, graphics representation and reporting. The following are the important features of R:
• R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
• R has an effective data handling and storage facility,
• R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
• R provides a large, coherent and integrated collection of tools for data analysis.
• R provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
Comments
In R, you can annotate your code with comments. Just preface the line with a hash mark (#),
and anything that comes thereafter will be ignored by the interpreter. For example, executing
the following in the console does nothing but return you to the prompt:
R> # This is a comment in R...
R script
R script is simply a text file containing (almost) the same commands that you would enter
on the command line of R.
All common arithmetic operations and mathematical functionality are ready to use at the
console prompt. You can perform addition, subtraction, multiplication, and division with the
symbols +, -, *, and /, respectively. You can create exponents (also referred to as powers or
indices) using ^, and you control the order of the calculations in a single command using
parentheses, ().
Assigning Objects
In R, you can assign objects (variables) using the assignment operator <- or the equal sign =.
Both methods are commonly used for variable assignment. Here's how you can assign objects
in R:
R> x <- -5
R> x
Vectors
A vector in R is a fundamental data structure that is used to store a collection of elements of
the same data type. It is one of the most basic and versatile data structures in R, and it forms
the building block for many other data structures and operations in the language. Understanding
vectors is crucial for working effectively with R because many R operations are inherently
vectorized, allowing you to perform operations on entire vectors at once.
Here are the key characteristics and details of vectors in R:
Homogeneous Elements: All elements within a vector must be of the same data type. R has
several atomic data types, including:
• Numeric: This type includes both real and integer numbers.
• Character: Used for text data.
• Logical: Represents Boolean values (TRUE or FALSE).
• Integer: Specifically for integer values.
• Complex: For complex numbers.
The function for creating a vector is the single letter c, with the desired entries in parentheses
separated by commas.
• R> myvec <- c(1,3,1,42)
• R> myvec
• [1] 1 3 1 42
In R, you can work with sequences, repetitions, sorting, and calculate lengths of objects to
perform various data manipulation and analysis tasks. Here's how you can perform these
operations:
Sequences: In R, you can create sequences of numbers using the seq() function or the colon :
operator. Sequences can be arithmetic, geometric, or custom.
Arithmetic Sequences:
You can create an arithmetic sequence using the seq() function. It takes arguments for the
starting value, ending value, and the increment.
The rep function is given a single value or a vector of values as its argument x, as well as a
value for the arguments times and each. The value for times provides the number of times to
repeat x, and each provides the number of times to repeat each element of x. In the first line
directly above,you simply repeat a single value four times. The other examples first use rep
and times on a vector to repeat the entire vector, then use each to repeat each member of the
vector, and finally use both times and each to do both at once.
If neither times nor each is specified, R’s default is to treat the values of times and each as 1 so
that a call of rep(x=c(3,62,8.3)) will just return the originally supplied x with no changes. As
with seq, you can include the result of rep in a vector of the same data type, as shown in the
following example:
R> foo <- 4
R> c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
[1] 3.00 8.30 32.00 32.00 32.00 32.00 -2.00 -1.25 -0.50 0.25 1.00
You can access individual elements of a vector using square brackets []. R uses 1-based
indexing, meaning the first element is accessed with [1], the second with [2], and so on.
suppose you want to piece myvec back together from qux and bar. You can call something like
this:
As you can see, this line uses c to reconstruct the vector in three parts: qux[-length(x=qux)],
the object bar defined earlier, and qux[length(x=qux)]. For clarity, let’s examine each part in
turn.
Vectors are so useful because they allow R to carry out operations on multiple elements
simultaneously with speed and efficiency. This vectororiented, vectorized, or element-wise
behavior is a key feature of the language, one that you will briefly examine here through some
examples of rescaling measurements.
This code creates a sequence of six values between 5.5 and 0.5, in increments of 1. From this
vector, you subtract another vector containing 2, 4,6, 8, 10, and 12. What does this do? Well,
quite simply, R matches up the elements according to their respective positions and performs
the operation on each corresponding pair of elements. The resulting vector is obtained by
subtracting the first element of c(2,4,6,8,10,12) from the first element of foo (5.5 - 2 = 3:5),
then by subtracting the second element of c(2,4,6,8,10,12) from the second element of foo (4.5
- 4 = 0:5), and so on.
Here bar has been applied repeatedly throughout the length of foo until completion. Now let’s
see what happens when the vector lengths are not evenly divisible.
Lastly, as mentioned earlier, this vector-oriented behaviour applies in the same way to
overwriting multiple elements. Again using foo, examine the following:
__________