Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
93 views

R Programming

R is a statistical programming language used for statistical analysis and graphics. It provides many statistical and graphical techniques and is highly extensible. R was initially developed at the University of Auckland. It is free and open source software that runs on various platforms. Some key benefits of R include its extensive built-in help system, excellent graphing capabilities, and easy to extend syntax. However, it has limited graphical user interface and requires learning programming concepts.

Uploaded by

Kunal Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

R Programming

R is a statistical programming language used for statistical analysis and graphics. It provides many statistical and graphical techniques and is highly extensible. R was initially developed at the University of Auckland. It is free and open source software that runs on various platforms. Some key benefits of R include its extensive built-in help system, excellent graphing capabilities, and easy to extend syntax. However, it has limited graphical user interface and requires learning programming concepts.

Uploaded by

Kunal Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Programming

Debjit Konai
Department of Statistics
The University of Burdwan
Introduction To
What is 𝐑 ?
 𝐑 is the statistical programming language is a free open source package based on the S
language developed by Bell Labs.
 𝐑 plays a key role in a wide variety of research and data analysis projects because it
makes many modern statistical methods, both simple and advanced, readily available and
easy to use. It’s true, however, that a beginner to R is often new to programming in
general. As a beginner, you must not only learn to use R for your specific data analysis
goals but also learn to think like a programmer.
 𝐑 is a language and environment for statistical computing and graphics.
 𝐑 provides a wide variety of statistical (linear and nonlinear modelling, classical statistical
tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is
highly extensible. The S language is often the vehicle of choice for research in statistical
methodology, and R provides an Open Source route to participation in that activity.
 𝐑 can be used any one for any purpose.
 𝐑 initially written by 𝐑𝐨𝐬𝐬 𝐈𝐡𝐚𝐤𝐚 and 𝐑𝐨𝐛𝐞𝐫𝐭 𝐆𝐞𝐧𝐭𝐥𝐞𝐦𝐚𝐧 at Dep. of Statistics of
University of Auckland, New Zealand during 1990s.
Contributors of R
The current R is the result of a collaborative effort with
contributions from all over the world. R was initially
written by Robert Gentleman and Ross 𝑰𝒉𝒂𝒌𝒂—also
known as "R & R" of the Statistics Department of the
University of 𝑨𝒖𝒄𝒌𝒍𝒂𝒏𝒅. 𝒓

1. Douglas Bates 𝟗. 𝐌𝐚𝐫𝐭𝐢𝐧 𝐌𝐚𝐞𝐜𝐡𝐥𝐞𝐫 𝑹𝒐𝒔𝒔 𝑰𝒉𝒂𝒌𝒂


2. John Chambers 𝟏𝟎. 𝐃𝐮𝐧𝐜𝐚𝐧 𝐌𝐮𝐫𝐝𝐨𝐜𝐡
3. Robert Gentleman 𝟏𝟏. 𝐏𝐚𝐮𝐥 𝐌𝐮𝐫𝐫𝐞𝐥𝐥
4. Kurt 𝐇𝐨𝐫𝐧𝐢𝐤 𝟏𝟐. 𝐌𝐚𝐫𝐭𝐲𝐧 𝐏𝐥𝐮𝐦𝐦𝐞𝐫
5. 𝐒𝐭𝐞𝐟𝐚𝐧𝐨 𝐈𝐚𝐜𝐮𝐬 𝟏𝟑. 𝐁𝐫𝐢𝐚𝐧 𝐑𝐢𝐩𝐥𝐞𝐲
6. 𝐑𝐨𝐬𝐬 𝐈𝐡𝐚𝐤𝐚 𝟏𝟒. 𝐃𝐮𝐧𝐜𝐚𝐧 𝐓𝐞𝐦𝐩𝐥𝐞 𝐋𝐚𝐧𝐠
7. Friedrich 𝐋𝐞𝐢𝐬𝐜𝐡 𝟏𝟓. 𝐋𝐮𝐤𝐞 𝐓𝐢𝐞𝐫𝐧𝐞𝐲
8. Thomas Lumley 𝟏𝟔. 𝐒𝐢𝐦𝐨𝐧 𝐔𝐫𝐛𝐚𝐧𝐞𝐤

𝑹𝒐𝒃𝒆𝒓𝒕 𝑮𝒆𝒏𝒕𝒍𝒆𝒎𝒂𝒏
The benefits of 𝑹
 𝐑 is free and open source and runs on UNIX, Windows, Macintosh.

 𝐑 has an excellent built-in help system.

 𝐑 has excellent graphing capabilities.

 𝐑′𝐬 language has a powerful, easy to learn syntax with many built-in
statistical functions.

 The language is easy to extend with user-written functions.

 𝐑 is a computer programming language.


What is 𝐑 lacking compared to other software solutions ?

 It has a limited graphical interface.

 There is no commercial support.

 The command language is a programming language so


users must learn to appreciate syntax issues etc…

 No spreadsheet view of data, but connects to Excel/𝐌𝐬


Office.

 The main disadvantage of 𝐑 is, it does not have support


for dynamic or 𝟑𝐃 graphics.
Where to get 𝑹 ?
Downloading Link: https://cran.r-project.org/bin/windows/base/
The R 𝐆𝐮𝐢 ?
Where to get 𝐑- Studio ?
Downloading Link: https://www.rstudio.com/products/rstudio/download/
 Opening a Script. How to start 𝑹 ?
 This gives you a Script window.
Typical 𝐑 session
 Start up R via GUI or favorite text editor
 Two windows:
1. 1+ new or existing scripts (text files) –these will be saved.
2. Terminal-output and temporary input

This terminal always saved


This terminal usually unsaved
𝐑 sessions are interactive.
Typical 𝐑 session

Output appears here.


Did you get what you wanted ?
Typical 𝑹 session
𝐑 sessions are interactive

Output appears here. Write down the correct code.


Did you get what you wanted ?
(If errors are found)
Typical 𝑹 sessions
𝐑 sessions are interactive

Output appears here. Adjust your syntax here


Did you get what you wanted ? depending on this answer.
(If errors are found)
𝐑 sessions are interactive Typical 𝑹 sessions
𝐑 sessions are interactive Typical 𝐑 sessions
Getting Started
 Basic assignment and operations.
 Arithmetic operations.
+, -,*,/,^ are the standard arithmetic operations.
 Matrix multiplications.
* is the element wise multiplications.
%*% is the matrix multiplications.
 Assignment.
To assign a value to variable use " < −" or "=“.
Basic program control

 Two important “control flow” constructs.


Conditional execution: (use if ,if-else, nested if-else statements

1. If (Condition){
Expression
}
2. If (Condition) {
Expression 1
} else {
Expression 2
}
3. If (Condition) { Repetitive execution : (use for loop, repeat loop,
while, do while loop)
Expression 1
} 𝐞𝐥𝐢𝐟 { 1. for ( 𝐢 in vector)
Expression 2 {
} 𝐞𝐥𝐢𝐟 { Expression
Expression 3 }
… 2. Repeat {
} else{ commands
Expression n if (Condition) {
} break
}
}
3. While (Condition) {
Expression
}
Some applications:
1. >a=1
> if(a==1) {print(“Hello Students”)}
[1] “Hello Students”
2. >x = -2
>if(x > 0){
print(“The given number is Non-negative ")
} else {
print(“The given number is Negative ")
}
[1] “ The given number is Negative”
3. >k = 100
> if(k>100){
print(“Greater than 100”)
} else if (k<100){
print(“Less than 100”)
} else {
print(“Equal to 100”)
}
[1] “ Equal to 100”
4. > 𝐟𝐫𝐮𝐢𝐭 = 𝐜(′𝐀𝐩𝐩𝐥𝐞′, ′𝐎𝐫𝐚𝐧𝐠𝐞′, ′𝐏𝐚𝐬𝐬𝐢𝐨𝐧 𝐟𝐫𝐮𝐢𝐭′, ′𝐁𝐚𝐧𝐚𝐧𝐚′)
> 𝐟𝐨𝐫( 𝐢 𝐢𝐧 𝐟𝐫𝐮𝐢𝐭){
𝐩𝐫𝐢𝐧𝐭(𝐢)
}
Output:
[1] Apple
[1] Orange
[1] Passion
[1] Banana

5. > 𝐑𝐞𝐬𝐮𝐥𝐭 = 𝐜("Hello Students" )


>𝐢=𝟏
> 𝐫𝐞𝐩𝐞𝐚𝐭 {
𝐩𝐫𝐢𝐧𝐭 𝐑𝐞𝐬𝐮𝐥𝐭 𝐎𝐮𝐭𝐩𝐮𝐭:
𝐢=𝐢+𝟏 𝟏 "Hello Students"
𝐢𝐟 𝐢 > 𝟓 { 𝟏 "Hello Students"
𝐛𝐫𝐞𝐚𝐤 𝟏 "Hello Students“
} 𝟏 "Hello Students"
} 𝟏 "Hello Students"
Getting Started
 How to use help in R?
R has a extensive help system built in.
If you know which function you want help with simply
use ?_______ with the function in the blank.
Ex: ?hist.
If you don’t know which function to use, then use
𝒉𝒆𝒍𝒑. 𝒔𝒆𝒂𝒓𝒄𝒉("_______").
𝐑 as a calculator
> 𝐥𝐨𝐠𝟐(𝟏𝟐)
𝟏 𝟑. 𝟓𝟖𝟒𝟗𝟔𝟑
> 𝐬𝐪𝐫𝐭 𝟑
𝟏 𝟏. 𝟕𝟑𝟐𝟎𝟓𝟏
> 𝐞𝐱𝐩 𝟑
𝟏 𝟐𝟎. 𝟎𝟖𝟓𝟓𝟒
 > 𝐬𝐞𝐪 𝟎, 𝟓, 𝐥𝐞𝐧𝐠𝐭𝐡 = 𝟔
𝟏 𝟎𝟏𝟐𝟑𝟒𝟓

 > 𝐩𝐥𝐨𝐭(𝐬𝐢𝐧 𝐬𝐞𝐪 𝟎, 𝟐 ∗ 𝐩𝐢, 𝐥𝐞𝐧𝐠𝐭𝐡 = 𝟐𝟎𝟎 )

The value of typos is prefaced with a funny looking [𝟏]. This indicates that
the value is a vector. More on that later.
Entering data with c function
 The most useful 𝐑 command for quickly entering in small data sets is the
𝐜 function. This function combines, or concatenates terms together. As an
example, suppose we have the following count of the number of typos per page
of these notes:
43131051
 To enter this into an R session we do so with
> 𝐭𝐲𝐩𝐨𝐬 = 𝐜(𝟒, 𝟑, 𝟏, 𝟑, 𝟏, 𝟎, 𝟓, 𝟏)
> 𝐭𝐲𝐩𝐨𝐬
[𝟏] 𝟒 𝟑 𝟏 𝟑 𝟏 𝟎 𝟓 𝟏

 Notice a few things


 We assigned the values to a variable called typos
The value of the typos doesn't automatically print out. It does when we type just
the name though as the last input line indicates
Appling a function
R comes with many built in functions that one can apply to data
such as typos. One of them is the mean function for finding the
mean or average of the data. To use it is easy
> 𝐦𝐞𝐚𝐧 𝐭𝐲𝐩𝐨𝐬
𝟏 𝟐. 𝟐𝟓
 > 𝐬𝐝 𝐭𝐲𝐩𝐨𝐬
𝟏 𝟏. 𝟕𝟓𝟐𝟓𝟒𝟗
 > 𝐦𝐞𝐝𝐢𝐚𝐧 𝐭𝐲𝐩𝐨𝐬
𝟏 𝟐
How to find Mean, Median, Mode?
At first create a vector using C function and finally you can use the basic syntax for
calculating Mean in R.
R-code for finding the Mean:
Basic syntax for calculating Mean………
 𝐦𝐞𝐚𝐧(𝐗, 𝐭𝐫𝐢𝐦 = 𝟎, 𝐧𝐚. 𝐫𝐦 = 𝐅𝐀𝐋𝐒𝐄, … )
a. X is the input vectors.
b. trim is used to some observations from both end of the sorted vector.
c. na.rm is used to remove the missing values from the input vector.

𝐗 < −𝐜 𝟏𝟐, 𝟕, 𝟑, 𝟒. 𝟐, 𝟖, 𝟗, 𝟐. 𝟖 (For read data)


print(x) (For print data)
𝐑𝐞𝐬𝐮𝐥𝐭_𝐌𝐞𝐚𝐧𝟏 < −𝐦𝐞𝐚𝐧(𝐗) (For result calculation)
𝐩𝐫𝐢𝐧𝐭(𝐑𝐞𝐬𝐮𝐥𝐭_𝐌𝐞𝐚𝐧𝟏) (For result print)
𝐑𝐞𝐬𝐮𝐥𝐭_𝐌𝐞𝐚𝐧𝟐 < −𝐦𝐞𝐚𝐧(𝐗, 𝐭𝐫𝐢𝐦 = 𝟎. 𝟑)
𝐩𝐫𝐢𝐧𝐭(𝐑𝐞𝐬𝐮𝐥𝐭_𝐌𝐞𝐚𝐧𝟐)
R-code for finding the Median:
Syntax…..
𝐦𝐞𝐝𝐢𝐚𝐧(𝐗, 𝐧𝐚. 𝐫𝐦 = 𝐅𝐀𝐋𝐒𝐄)
Where, X is the input vector and na.rm is used to removed the missing values from
the input vector.
Code:
𝐦𝐞𝐝𝐢𝐚𝐧_𝐫𝐞𝐬𝐮𝐥𝐭𝟏 < −𝐦𝐞𝐝𝐢𝐚𝐧 𝐗
𝐩𝐫𝐢𝐧𝐭(𝐦𝐞𝐝𝐢𝐚𝐧_𝐫𝐞𝐬𝐮𝐥𝐭𝟏)

𝐦𝐞𝐝𝐢𝐚𝐧_𝐫𝐞𝐬𝐮𝐥𝐭𝟐 < −𝐦𝐞𝐝𝐢𝐚𝐧 𝐗, 𝐧𝐚. 𝐫𝐦 = 𝐅𝐀𝐋𝐒𝐄


𝐩𝐫𝐢𝐧𝐭(𝐦𝐞𝐝𝐢𝐚𝐧_𝐫𝐞𝐬𝐮𝐥𝐭𝟐)

R-code for finding the Mode:


The mode is the value that has highest number of occurrences in a set of data.
Unlike mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user
function to calculate mode of a data set in R. This function takes the vector as
input and gives the mode value as output.
At first create the function:
𝐆𝐞𝐭_𝐦𝐨𝐝𝐞 < −𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐗 {
𝐮𝐧𝐢𝐪𝐯 < −𝐮𝐧𝐢𝐪𝐮𝐞 𝐗
𝐮𝐧𝐢𝐪𝐯 𝐰𝐡𝐢𝐜𝐡. 𝐦𝐚𝐱 𝐭𝐚𝐛𝐮𝐥𝐚𝐭𝐞 𝐦𝐚𝐭𝐜𝐡 𝐗, 𝐮𝐧𝐢𝐪𝐯
}
# Create the vectors of the numbers.
𝐗 < −𝐜 𝟐, 𝟓, 𝟕, 𝟒, 𝟔, 𝟕, 𝟗, 𝟏, 𝟓, 𝟑
#Calculate the mode using the user function.
𝐑𝐞𝐬𝐮𝐥𝐭 < −𝐆𝐞𝐭_𝐦𝐨𝐝𝐞(𝐗)
𝐩𝐫𝐢𝐧𝐭(𝐑𝐞𝐬𝐮𝐥𝐭)
Types of Data in Statistics
Whenever we are working with statistics. It’s very important to recognize the
different types of data:
1. Numerical (discrete and continuous)
2. Categorical
3. Ordinal
Data is nothing but information that is gathered as a result of a survey. Data can
either be numerical or categorical in nature.
1. Numerical Data:
 It contains data that can be measured. A person’s height, weight, IQ, or blood
pressure are examples of Numerical Data.
Numerical Data is again of two types –
a. Discrete
b. Continuous
a. Discrete data: It represents items that can be counted. Basically, they take on
possible values that can be listed out. The list of possible values may be fixed or it may
go to infinity.
b. Continuous data: It represents measurements. Also, their possible values cannot
be counted. Although, it can only be described using intervals on the real number line.
2. Categorical Data:
 Categorical Data is used to represent characteristics that are present in the data
such as a person’s gender, marital status, hometown.
 For example, in a given group of males and females, males can be represented as 0
and females can be represented as 1. Therefore, we have two classes of distinct
characteristics.
3. Ordinal data:
 In this form of data, the variables have an ordered category which is natural and the
distance between these variables is not known. Ordinal Data is similar to
categorical data with the only difference that the data is ordered.
For example, Rating a restaurant on a scale of 0 to 4 gives us ordinal data.
They are often treated as categorical. We have to order the groups whenever it is
required to create graphs and charts.
Low level plotting Commands in R
Low level plotting functions:
Function Outcome
points(𝐱, 𝐲) Adds points
𝐚𝐛𝐥𝐢𝐧𝐞(), segments() Adds lines or segments
arrows() Adds arrows
curve() Adds a curve representing a function
𝐫𝐞𝐜𝐭(), polygon() Adds a rectangle or arbitrary shape
text(), 𝐦𝐭𝐞𝐱𝐭() Adds text within the plot, or to plot margins
legend() Adds legend
axis() Adds an axis
R-Charts and Graphs
Pie Charts:
Syntax………
𝐩𝐢𝐞 𝐗, 𝐥𝐚𝐛𝐞𝐥𝐬, 𝐫𝐚𝐝𝐢𝐮𝐬, 𝐦𝐚𝐢𝐧, 𝐜𝐨𝐥, 𝐜𝐥𝐨𝐜𝐤𝐰𝐢𝐬𝐞
Where, 1. x is the vectors containing the numeric values used in the pie chart.
2. label is used to give description to the slices.
3. radius indicates the radius of the circle of the pie chart( Value between -1
to +1).
4. main indicate the title of the pie chart.
5. col indicates the color palette.
6. clockwise is a logical value indicating if the slices are drawn clockwise or
anti clockwise.
Example:
 Pie chart R-Code:
𝒙 < −𝒄 𝟐𝟏, 𝟒𝟓, 𝟓𝟔, 𝟐𝟑
𝒍𝒂𝒃𝒆𝒍𝒔 < −𝒄("London","𝑺𝒊𝒏𝒈𝒂𝒑𝒐𝒓𝒆","India", "Mumbai")
p𝐢𝐞 𝐱, 𝐥𝐚𝐛𝐞𝐥𝐬 (Use for the plot of chart)
Output:
#Plot the chart with title and rainbow color pallet.
pie(x, labels, main = "City pie chart", col = rainbow(length(x)))
Output:
 𝐩𝐢𝐞 𝐱, 𝐥𝐚𝐛𝐞𝐥𝐬 = 𝐩𝐢𝐞𝐩𝐞𝐫𝐜𝐞𝐧𝐭, 𝐦𝐚𝐢𝐧 = "City pie chart", 𝐜𝐨𝐥 = 𝐫𝐚𝐢𝐧𝐛𝐨𝐰 𝐥𝐞𝐧𝐠𝐭𝐡 𝐱
𝐥𝐞𝐠𝐞𝐧𝐝 "topright, c("London", "Singapur", "India", "Mumbai" , 𝐜𝐞𝐱 = 𝟎. 𝟖, 𝐟𝐢𝐥𝐥 = 𝐫𝐚𝐢𝐧𝐛𝐨𝐰(𝐥𝐞𝐧𝐠𝐭𝐡 𝐱 ))
Output:
Bar-Chart Using R
A bar chart represents data in rectangular bars with length of the bar proportional
to the value of the variable. R uses the function 𝐛𝐚𝐫𝐩𝐥𝐨𝐭() to create bar charts. R
can draw both vertical and Horizontal bars in the bar chart. In bar chart each of
the bars can be given different colors.
Syntax:
The basic syntax to create a bar-chart in R is…………….
𝐛𝐚𝐫𝐩𝐥𝐨𝐭(𝐗, 𝐱𝐥𝐚𝐛, 𝐲𝐥𝐚𝐛, 𝐦𝐚𝐢𝐧, 𝐧𝐚𝐦𝐞𝐬. 𝐚𝐫𝐠, 𝐜𝐨𝐥)
where, 𝐗 is a vector or matrix containing numeric values used in bar
chart.
𝐱𝐥𝐚𝐛 is the label for x axis.
𝐲𝐥𝐚𝐛 is the label for y axis.
𝐦𝐚𝐢𝐧 is the title of the bar chart.
𝐧𝐚𝐦𝐞𝐬. 𝐚𝐫𝐠 is a vector of names appearing under each bar.
𝐜𝐨𝐥 is used to give colors to the bars in the graph.
Example1:
𝐗 < −𝐜 𝟕, 𝟗, 𝟏𝟐, 𝟖, 𝟐𝟑
𝐛𝐚𝐫𝐩𝐥𝐨𝐭(𝐗)
Output:
Example2:
𝐕 < −𝐜 𝟕, 𝟗, 𝟏𝟐, 𝟖, 𝟐𝟑
𝑯 < −𝒄(𝑴𝒂𝒓𝒄𝒉, 𝑨𝒑𝒓𝒊𝒍, 𝑴𝒂𝒚, 𝑱𝒖𝒏𝒆, 𝑱𝒖𝒍𝒚)
𝐛𝐚𝐫𝐩𝐥𝐨𝐭(𝐕, 𝐧𝐚𝐦𝐞𝐬. 𝐚𝐫𝐠 = 𝐇, 𝐱𝐥𝐚𝐛 = "Month", 𝐲𝐥𝐚𝐛 = "𝐑𝐞𝐯𝐞𝐧𝐮𝐞", 𝐜𝐨𝐥
= "𝐛𝐥𝐮𝐞", 𝐦𝐚𝐢𝐧 = "𝐑𝐞𝐯𝐞𝐧𝐮𝐞 𝐂𝐡𝐚𝐫𝐭", 𝐛𝐨𝐫𝐝𝐞𝐫 = "𝐫𝐞𝐝")
Output:
Example3:
𝐂𝐨𝐥𝐨𝐫𝐬 < −𝐜("𝐫𝐞𝐝", "𝐛𝐥𝐮𝐞", "𝐨𝐫𝐚𝐧𝐠𝐞")
𝐌𝐨𝐧𝐭𝐡𝐬 < −𝐜("March", "𝐀𝐩𝐫𝐢𝐥", "𝐌𝐚𝐲", "June" , "𝐉𝐮𝐥𝐲")
𝐑𝐞𝐠𝐢𝐨𝐧 < −𝐜("East","West","North")
Values < − matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11), nrow = 3, ncol = 5, byrow = TRUE)
𝐛𝐚𝐫𝐩𝐥𝐨𝐭(Values, main = “Total revenue”, 𝐧𝐚𝐦𝐞𝐬. 𝐚𝐫𝐠 = Months, 𝐱𝐥𝐚𝐛 =
"month", 𝐲𝐥𝐚𝐛 = "revenue", col = Colors)
legend("𝐭𝐨𝐩𝐥𝐞𝐟𝐭", regions, 𝐜𝐞𝐱 = 1.3, fill = colors)
Output:
Histogram Using R
A histogram represents the frequencies of values of a variable bucketed into ranges.
Histogram is similar to bar chat but the difference is it groups the values into
continuous ranges. Each bar in histogram represents the height of the number of
values present in that range.
R creates histogram using 𝐡𝐢𝐬𝐭() function. This function takes a vector as an input
and uses some more parameters to plot histograms.
The basic syntax for creating a histogram using R is…………….
𝐡𝐢𝐬𝐭(𝐱, 𝐦𝐚𝐢𝐧, 𝐱𝐥𝐚𝐛, 𝐱𝐥𝐢𝐦, 𝐲𝐥𝐢𝐦, 𝐛𝐫𝐞𝐚𝐤𝐬, 𝐜𝐨𝐥, 𝐛𝐨𝐫𝐝𝐞𝐫)
Where, 𝐱 is a vector containing numeric values used in histogram.
𝐦𝐚𝐢𝐧 indicates title of the chart.
𝐜𝐨𝐥 is used to set color of the bars.
𝐛𝐨𝐫𝐝𝐞𝐫 is used to set border color of each bar.
𝐱𝐥𝐚𝐛 is used to give description of x-axis.
𝐱𝐥𝐢𝐦 is used to specify the range of values on the x-axis.
𝐲𝐥𝐢𝐦 is used to specify the range of values on the y-axis.
𝐛𝐫𝐞𝐚𝐤𝐬 is used to mention the width of each bar.
Example1:
R-Code…..
𝒙 < −𝒄 𝟏𝟎, 𝟏𝟐, 𝟐𝟏, 𝟏𝟑, 𝟑𝟒, 𝟗, 𝟏𝟏, 𝟏𝟒, 𝟐𝟑
𝒉𝒊𝒔𝒕(𝒙, 𝒙𝒍𝒂𝒃 = "Weight", 𝒄𝒐𝒍 = "𝒃𝒍𝒖𝒆", 𝒃𝒐𝒓𝒅𝒆𝒓 = "𝒓𝒆𝒅")
Output:
Example2:
R-code….
𝒙 < −𝒄 𝟏𝟎, 𝟏𝟐, 𝟐𝟏, 𝟏𝟑, 𝟑𝟒, 𝟗, 𝟏𝟏, 𝟏𝟒, 𝟐𝟑
𝒉𝒊𝒔𝒕(𝒙, 𝒙𝒍𝒂𝒃 = "Weight", 𝒄𝒐𝒍 = "𝒃𝒍𝒖𝒆", 𝒙𝒍𝒊𝒎 = 𝒄(𝟎, 𝟓𝟎), 𝒚𝒍𝒊𝒎 = 𝒄(𝟎, 𝟓), 𝒃𝒐𝒓𝒅𝒆𝒓
= "𝒓𝒆𝒅", 𝒃𝒓𝒆𝒂𝒌𝒔 = 𝟓)
Output:
Line graph Using R
A line chart is a graph that connects a series of points by drawing line segments
between them. These points are ordered in one of their coordinate (usually the x-
coordinate) value. Line charts are usually used in identifying the trends in data.
The plot() function in R is used to create the line graph.
The basic syntax to create a line chart in R is……….
𝐩𝐥𝐨𝐭(𝐱, 𝐭𝐲𝐩𝐞, 𝐜𝐨𝐥, 𝐱𝐥𝐚𝐛, 𝐲𝐥𝐚𝐛)
Where, 𝐱 is a vector containing the numeric values.
𝐭𝐲𝐩𝐞 takes the value "p" to draw only the points, "l" to draw only the lines
and "o" to draw both points and lines.
𝐱𝐥𝐚𝐛 is the label for x axis.
𝐲𝐥𝐚𝐛 is the label for y axis.
𝐦𝐚𝐢𝐧 is the Title of the chart.
𝐜𝐨𝐥 is used to give colors to both the points and lines.
Example1:
R-Code…..
𝐱 < −𝐜 𝟖, 𝟗, 𝟔, 𝟏𝟎, 𝟏𝟑, 𝟏𝟓
𝐩𝐥𝐨𝐭(𝐱, 𝐭𝐲𝐩𝐞 = "o")
Output:
Example2:
R-Code………..
𝐱 < −𝐜 𝟖, 𝟗, 𝟔, 𝟏𝟎, 𝟏𝟑, 𝟏𝟓
𝐩𝐥𝐨𝐭(𝐱, 𝐭𝐲𝐩𝐞 = "o, col=𝐫𝐞𝐝", 𝐱𝐥𝐚𝐛 = "𝐌𝐨𝐧𝐭𝐡", 𝐲𝐥𝐚𝐛 = "𝐑𝐚𝐢𝐧 𝐟𝐚𝐢𝐥", 𝐦𝐚𝐢𝐧
= "𝐑𝐚𝐢𝐧 𝐟𝐚𝐢𝐥 𝐂𝐡𝐚𝐫𝐭")
Output:
Example3:
R-Code……..
𝐱 < −𝐜 𝟖, 𝟗, 𝟔, 𝟏𝟑, 𝟏𝟓, 𝟏𝟏
𝐲 < −𝐜 𝟒, 𝟏𝟐, 𝟖, 𝟏𝟏, 𝟗
𝐩𝐥𝐨𝐭(𝐱, 𝐭𝐲𝐩𝐞 = "o, col=𝐫𝐞𝐝", 𝐱𝐥𝐚𝐛 = "𝐌𝐨𝐧𝐭𝐡", 𝐲𝐥𝐚𝐛 = "𝐑𝐚𝐢𝐧 𝐟𝐚𝐢𝐥", 𝐦𝐚𝐢𝐧 = "𝐑𝐚𝐢𝐧 𝐟𝐚𝐢𝐥 𝐂𝐡𝐚𝐫𝐭")
𝐥𝐢𝐧𝐞𝐬(𝐲, 𝐭𝐲𝐩𝐞 = "o", 𝐜𝐨𝐥 = "𝐛𝐥𝐮𝐞")
Output:
Scatterplots Using R
Scatterplots show many points plotted in the Cartesian plane. Each
point represents the values of two variables. One variable is chosen in
the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
The basic syntax to create a Scatterplots in R is…….
𝐩𝐥𝐨𝐭(𝐱, 𝐲, 𝐦𝐚𝐢𝐧, 𝐱𝐥𝐚𝐛, 𝐲𝐥𝐚𝐛, 𝐱𝐥𝐢𝐦, 𝐲𝐥𝐢𝐦, 𝐚𝐱𝐞𝐬)
Where, 𝐱 is the data set whose values are the horizontal coordinates.
𝐲 is the data set whose values are the vertical coordinates.
𝐦𝐚𝐢𝐧 is the tile of the graph.
𝐱𝐥𝐚𝐛 is the label in the horizontal axis.
𝐲𝐥𝐚𝐛 is the label in the vertical axis.
𝐱𝐥𝐢𝐦 is the limits of the values of x used for plotting.
𝐲𝐥𝐢𝐦 is the limits of the values of y used for plotting.
𝐚𝐱𝐞𝐬 indicates whether both axes should be drawn on the plot.
Example1:
R-Code…..
𝒙 < −𝒄 𝟏, 𝟑, 𝟐, 𝟔, 𝟕, 𝟒, 𝟖, 𝟓
𝒚 < −𝒄 𝟐, 𝟒, 𝟏, 𝟕, 𝟔, 𝟏, 𝟗, 𝟏𝟎
𝐩𝐥𝐨𝐭 𝐱, 𝐲, 𝐦𝐚𝐢𝐧 = "X vs Y", 𝐱𝐥𝐚𝐛 = "X", 𝐲𝐥𝐚𝐛 = "Y", 𝐱𝐥𝐢𝐦 = 𝐜 𝟎, 𝟗 , 𝐲𝐥𝐢𝐦 = 𝐜 𝟎, 𝟏𝟏 , 𝐚𝐱𝐞𝐬
Output:
Boxplot
The box plot is a standardized way to display the distribution of data based on
following five number summary.
1. Minimum, 2. 1st Quartile, 3. 2nd Quartile or Median, 4. 3rd Quartile,
5. Maximum
 For a uniformly distributed data set in box plot diagram, the central rectangle
spans the first quartile to the third quartile (or the interquartile range, IQR). A
line inside the rectangle shows the median and "whiskers" above and below the
box show the locations of the minimum and maximum values. Such box plot
displays the full range of variation from min to max, the likely range of variation,
the IQR, and the median.
Example:
 The given data sets 120, 123, 56, 67, 100, 111, 120, 89, 90, 99, 130, 110.Find the
boxplot(Using R)
Answer: The given data sets 120, 123, 56, 67, 100, 111, 120, 89, 90, 99, 130, 110.
R-Code: 𝐝𝐚𝐭𝐚 < −𝐜(120, 123, 56, 67, 100, 111, 120, 89, 90, 99, 130, 110)
𝐛𝐨𝐱𝐩𝐥𝐨𝐭 𝐝𝐚𝐭𝐚
𝐚𝐛𝐥𝐢𝐧𝐞(𝐡 = 𝐦𝐢𝐧 𝐝𝐚𝐭𝐚 , 𝐜𝐨𝐥 = "Blue")
𝐚𝐛𝐥𝐢𝐧𝐞(𝐡 = 𝐦𝐚𝐱(𝐝𝐚𝐭𝐚), 𝐜𝐨𝐥 = "𝐑𝐞𝐝")
𝐚𝐛𝐥𝐢𝐧𝐞(𝐡 = 𝐦𝐞𝐝𝐢𝐚𝐧 𝒅𝒂𝒕𝒂 , 𝐜𝐨𝐥 = "𝐆𝐫𝐞𝐞𝐧")
𝐚𝐛𝐥𝐢𝐧𝐞(𝐡 = 𝐪𝐮𝐚𝐧𝐭𝐢𝐥𝐞 𝐝𝐚𝐭𝐚, 𝒄(𝟎. 𝟐𝟓, 𝟎. 𝟕𝟓 , 𝐜𝐨𝐥 = "𝐘𝐞𝐥𝐥𝐨𝐰")
Output:
Discuss Boxplot with graphics:
Matrices using R
Matrices are the R objects in which the elements are arranged in a two-dimensional
rectangular layout. They contain elements of the same atomic types. Though we
can create a matrix containing only characters or only logical values, they are not of
much use. We use matrices containing numeric elements to be used in
mathematical calculations.
A Matrix is created using the matrix() function.
Syntax:
The basic syntax for creating a matrix in R is………………
𝐦𝐚𝐭𝐫𝐢𝐱 𝐝𝐚𝐭𝐚, 𝐧𝐫𝐨𝐰, 𝐧𝐜𝐨𝐥, 𝐛𝐲𝐫𝐨𝐰, 𝐝𝐢𝐦𝐧𝐚𝐦𝐞𝐬
Where, 𝐝𝐚𝐭𝐚 is the input vector which becomes the data elements of the matrix.
𝐧𝐫𝐨𝐰 is the number of rows to be created.
𝐧𝐜𝐨𝐥 is the number of columns to be created.
𝐛𝐲𝐫𝐨𝐰 is a logical clue. If TRUE then the input vector elements are arranged
by row.
𝐝𝐢𝐦𝐧𝐚𝐦𝐞𝐬 is the names assigned to the rows and columns.
 Create a matrix taking a vector of numbers as input:
R-Code …………
#Elements are arranged in sequentially by row.
𝐦 = 𝐦𝐚𝐭𝐫𝐢𝐱 𝐜 𝟐: 𝟐𝟑 , 𝐧𝐫𝐨𝐰 = 𝟒, 𝐛𝐲𝐫𝐨𝐰 = 𝐓𝐑𝐔𝐄
𝐩𝐫𝐢𝐧𝐭(𝐦)
Output:

# Elements are arranged in sequentially by column.


𝐦 = 𝐦𝐚𝐭𝐫𝐢𝐱 𝐜 𝟐: 𝟐𝟑 , 𝐧𝐫𝐨𝐰 = 𝟒, 𝐛𝐲𝐫𝐨𝐰 = 𝐅𝐀𝐋𝐒𝐄
𝐩𝐫𝐢𝐧𝐭(𝐦)
Output:
R-Code:
At first you can defined row names and column names ……….
𝐫𝐨𝐰𝐧𝐚𝐦𝐞𝐬 = 𝐜("𝐫𝐨𝐰𝟏", "𝐫𝐨𝐰𝟐", "𝐫𝐨𝐰𝟑", "𝐫𝐨𝐰𝟒")
𝐜𝐨𝐥𝐧𝐚𝐦𝐞𝐬 = 𝐜("𝐜𝐨𝐥𝟏", "𝐜𝐨𝐥𝟐", "𝐜𝐨𝐥𝟑", "𝐜𝐨𝐥𝟒", "𝐜𝐨𝐥𝟓", "𝐜𝐨𝐥𝟔")
𝐦 = 𝐦𝐚𝐭𝐫𝐢𝐱 𝐜 𝟐: 𝟐𝟑 , 𝐧𝐫𝐨𝐰 = 𝟒, 𝐛𝐲𝐫𝐨𝐰 = 𝐅𝐀𝐋𝐒𝐄, 𝐝𝐢𝐦𝐧𝐚𝐦𝐞𝐬 = 𝐥𝐢𝐬𝐭(𝐫𝐨𝐰𝐧𝐚𝐦𝐞𝐬, 𝐜𝐨𝐥𝐧𝐚𝐦𝐞𝐬
𝐩𝐫𝐢𝐧𝐭(𝐦)
Output:
#Access only 2nd row and 3rd column:
R-Code:
print(m[2,])
print(m[,3])
Output:
Matrix Computations in R
 Various mathematical operations are performed on the matrices using the R operators. The result of
the operation is also a matrix.
 The dimensions (number of rows and columns) should be same for the matrices involved in the
operation.
 Matrix Addition, Subtraction & Multiplication:
R-Code for addition & subtraction………
Matrix1=matrix(c(-2,1,4,5,1,6),𝐧𝐫𝐨𝐰=2)
print(Matrix1)
Matrix2=matrix(c(-4,2,-4,-3,1,-2),𝐧𝐫𝐨𝐰=2)
print(Matrix2)
Result1=Matrix1+Matrix2
print(Result1)
Result2=Matrix1-Matrix2
print(result2)
Result3=Matrix1*Matrix2
print(Result3)
Result4=Matrix1/Matrix2
print(Result4)
Output:
Reading a CSV File in R
 The function 𝐫𝐞𝐚𝐝. 𝐜𝐬𝐯() is used to read a 𝐜𝐬𝐯 file available in your current working directory.
 R-Code:
𝐝𝐚𝐭𝐚 < −𝐫𝐞𝐚𝐝. 𝐜𝐬𝐯(file.choose(),sep−=",", header=TRUE)
𝐩𝐫𝐢𝐧𝐭(𝐝𝐚𝐭𝐚)
Summary of the data set after reading CSV file in R
R-Code:
𝐝𝐚𝐭𝐚 < −𝐫𝐞𝐚𝐝. 𝐜𝐬𝐯(𝐟𝐢𝐥𝐞. 𝐜𝐡𝐨𝐨𝐬𝐞(), 𝐬𝐞𝐩 = ",", header=TRUE)
𝐩𝐫𝐢𝐧𝐭(𝐝𝐚𝐭𝐚)
𝐬𝐮𝐦𝐦𝐚𝐫𝐲(𝐝𝐚𝐭𝐚)
Output:
Reading Bigger Data Files
we used the scan() command to read data from simple files. In R, we can enter a
large amount of data that contain complicated data. There are various means and
measures to read such large data that is stored in a variety of text formats.
We can read from 𝒄𝒔𝒗 file as: > read.csv() or read.csv2()
From tables with: > 𝐫𝐞𝐚𝐝. 𝐭𝐚𝐛𝐥𝐞()
In order to read from files that contain values separated by tabs: > 𝐝𝐞𝐥𝐢𝐦().
read.csv() and read.csv2() both are used to read csv but the former makes use
of ‘,’ while the latter utilises the separator ‘;’
Random number generation in R
R has functions to generate a random number from many standard distribution like
uniform distribution, binomial distribution, normal distribution etc.
The full list of standard distributions available can be seen using ? 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧
Functions that generate random deviates start with the letter r.
 For example,𝐫𝐮𝐧𝐢𝐟() generates random numbers from a uniform distribution and
𝐫𝐧𝐨𝐫𝐦() generates from a normal distribution.
Random number generation from Uniform Distribution:
 Random numbers from a uniform distribution can be generated using 𝐫𝐮𝐧𝐢𝐟()
function. We need to specify how many numbers we want to generate.
 Additionally, we can specify the range of the Uniform distribution using max and
min argument. If not provided, the default range is between 0 and 1.
 R-Code(Generate 1, 2, 3 random number):
𝐫_𝟏 = 𝐫𝐮𝐧𝐢𝐟(𝟏) print(r_2)
print(r_1) 𝐫_𝟑 = 𝐫𝐮𝐧𝐢𝐟(𝟑)
𝐫_𝟐 = 𝐫𝐮𝐧𝐢𝐟(𝟐) print(r_3)
3 random numbers between 7 to 12 &10 random numbers between 1 to 15 from
uniform distribution.
𝐫_𝟒 = 𝐫𝐮𝐧𝐢𝐟(𝟑, 𝐦𝐢𝐧 = 𝟕, 𝐦𝐚𝐱 = 𝟏𝟐)
print(r_4)
𝐫_𝟓 = 𝐫𝐮𝐧𝐢𝐟(𝟏𝟎, 𝐦𝐢𝐧 = 𝟏, 𝐦𝐚𝐱 = 𝟏𝟓)
print(r_5)
Output:
Random number generation from Normal Distribution:
 Random numbers from a Normal distribution can be generated using 𝐫𝐧𝐨𝐫𝐦() function.
 We need to specify the number of samples to be generated.
 We can also specify the mean and standard deviation of the distribution.
 If not provided, the distribution defaults to mean 0 and 1 standard deviation.
 R-Code(Generate 1, 2, 3 random number):
𝐫_𝟔 = 𝐫𝐧𝐨𝐫𝐦(𝟏)
print(r_6)
𝐫_𝟕 = 𝐫𝐧𝐨𝐫𝐦(𝟐)
print(r_7)
𝐫_𝟖 = 𝐫𝐧𝐨𝐫𝐦(𝟑)
print(r_8)
 3 random numbers between 7 to 12 &10 random numbers between 1 to 15 from normal distribution.
𝐫_𝟗 = 𝐫𝐧𝐨𝐫𝐦(𝟑, 𝐦𝐢𝐧 = 𝟕, 𝐦𝐚𝐱 = 𝟏𝟐)
print(r_9)
𝐫_𝟏𝟎 = 𝐫𝐧𝐨𝐫𝐦(𝟏𝟎, 𝐦𝐢𝐧 = 𝟏, 𝐦𝐚𝐱 = 𝟏𝟓)
print(r_10)
Output:
Linear Regression in R
Regression analysis is a very widely used statistical tool to establish a relationship
model between two variables. One of these variable is called predictor variable
whose value is gathered through experiments. The other variable is called response
variable whose value is derived from the predictor variable.
In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1. Mathematically a linear relationship
represents a straight line when plotted as a graph. A non-linear relationship where
the exponent of any variable is not equal to 1 creates a curve.
The general mathematical equation for a linear regression is ………..
𝐲 = 𝐚 + 𝐛𝐱
Where, 𝐲 is the response variable.
𝐱 is the predictor variable.
𝐚 and 𝐛 are constants which are called the coefficients.
 In this case, create relationship model using the lm() function in R.
Syntax:
The basic syntax for lm() function in linear regression is…………
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 < −𝐥𝐦(𝐟𝐨𝐫𝐦𝐮𝐥𝐚, 𝐝𝐚𝐭𝐚)
𝐩𝐫𝐢𝐧𝐭(𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧)
Example1: The sample data are given by …….
Height(cm) 151 174 138 186 128 136 179 152 131

Weight(kg) 63 81 56 91 47 57 76 62 48

Find the linear relationship between Height and Weight using R and using this relation find Height
when Weight is 102 Kg. Also find the summary of the Relationship.
Solution:
Let 𝐲 = 𝐚 + 𝐛𝐱 be the linear relationship between height and weight.
where, a and b are constants, x is explanatory(weight) and y be the response(height)
variable. In this problem, a and b is to determine using R through the sample values.
R-Code:
𝐇𝐞𝐢𝐠𝐡𝐭 < −𝐜(𝟏𝟓𝟏, 𝟏𝟕𝟒, 𝟏𝟑𝟖, 𝟏𝟖𝟔, 𝟏𝟐𝟖, 𝟏𝟑𝟔, 𝟏𝟕𝟗, 𝟏𝟓𝟐, 𝟏𝟑𝟏)
𝐖𝐞𝐢𝐠𝐡𝐭 < −𝐜(𝟔𝟑, 𝟖𝟏, 𝟓𝟔, 𝟗𝟏, 𝟒𝟕, 𝟓𝟕, 𝟕𝟔, 𝟔𝟐, 𝟒𝟖)
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 < −𝐥𝐦(𝐇𝐞𝐢𝐠𝐡𝐭~𝐖𝐞𝐢𝐠𝐡𝐭)
𝐩𝐫𝐢𝐧𝐭(𝐬𝐮𝐦𝐦𝐚𝐫𝐲 𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 )
Output(Summary of the Linear Regression Model):
Visualize this Relationship graphically:
R-Code………..
𝐇𝐞𝐢𝐠𝐡𝐭 < −𝐜(𝟏𝟓𝟏, 𝟏𝟕𝟒, 𝟏𝟑𝟖, 𝟏𝟖𝟔, 𝟏𝟐𝟖, 𝟏𝟑𝟔, 𝟏𝟕𝟗, 𝟏𝟓𝟐, 𝟏𝟑𝟏)
𝐖𝐞𝐢𝐠𝐡𝐭 < −𝐜(𝟔𝟑, 𝟖𝟏, 𝟓𝟔, 𝟗𝟏, 𝟒𝟕, 𝟓𝟕, 𝟕𝟔, 𝟔𝟐, 𝟒𝟖)
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 < −𝐥𝐦(𝐇𝐞𝐢𝐠𝐡𝐭~𝐖𝐞𝐢𝐠𝐡𝐭)
𝐩𝐥𝐨𝐭(𝐇𝐞𝐢𝐠𝐡𝐭, 𝐖𝐞𝐢𝐠𝐡𝐭, 𝐜𝐨𝐥 = "𝐛𝐥𝐮𝐞", 𝐦𝐚𝐢𝐧 =
"𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧𝐬𝐡𝐢𝐩 𝐛𝐞𝐭𝐰𝐞𝐞𝐧 𝐇𝐞𝐢𝐠𝐡𝐭 & 𝐖𝐞𝐢𝐠𝐡𝐭", 𝐚𝐛𝐥𝐢𝐧𝐞 𝐖𝐞𝐢𝐠𝐡𝐭~𝐇𝐞𝐢𝐠𝐡𝐭 , 𝐜𝐞𝐱 = 𝟏. 𝟑, 𝐩𝐜𝐡 = 𝟏𝟔, 𝐱𝐥𝐚𝐛 =
"𝐖𝐞𝐢𝐠𝐡𝐭 𝐢𝐧 𝐊𝐠", 𝐲𝐥𝐚𝐛 = "𝐇𝐞𝐢𝐠𝐡𝐭 𝐢𝐧 𝐜𝐦")
Output:
Packages in R
R packages are collections of functions and data sets developed by the community.
They increase the power of R by improving existing base R functionalities, or by
adding new ones. For example, if you are usually working with data frames,
probably you will have heard about 𝐝𝐩𝐥𝐲𝐫 or 𝐝𝐚𝐭𝐚. 𝐭𝐚𝐛𝐥𝐞 two of the most popular R
packages.
But imagine that you'd like to do some natural language processing of Korean
texts, extract weather data from the web, or even estimate actual evapotranspiration
using land surface energy balance models, R packages got you covered! Recently,
the official repository (CRAN) reached 10,000 packages published, and many more
are publicly available through the internet.
If you are starting with R, today’s post will cover the basics of R packages and how
to use them. You’ll cover the following topics, and 11 frequently asked user
questions…..
1. The basics of R packages: what are packages and why should you incorporate
their use into your R experience?
2. Where can you find packages?
3. The installation and usage: how can you install packages from CRAN, CRAN
mirrors, 𝐁𝐢𝐨𝐜𝐨𝐧𝐝𝐮𝐜𝐭𝐨𝐫 or 𝐆𝐢𝐭𝐡𝐮𝐛?
4. What are some functions that are related to 𝐢𝐧𝐬𝐭𝐚𝐥𝐥. 𝐩𝐚𝐜𝐤𝐠𝐞𝐬() and that you can
use to update, remove, … packages?
5. How can you use the user interface to install packages?
6. How do you load packages?
7. What is the difference between a package and a library in R?
8. How do I load multiple packages at the same time?
9. How do I unload an R package?
10. The documentation: what are, besides the DESCRIPTION file, other sources of
documentation and how can use them?
11. Choosing between R packages: how do you find the right package for your
analysis?
What is a Package?
 Let’s start with some definitions. A package is a suitable way to organize your own
work and, if you want to, share it with others. Typically, a package will include
code (not only R code!), documentation for the package and the functions inside,
some tests to check everything works as it should, and data sets.
 The basic information about a package is provided in the DESCRIPTION file,
where you can find out what the package does, who the author is, what version the
documentation belongs to, the date, the type of license its use, and the package
dependencies.
 Besides finding the DESCRIPTION files such as cran.r-project.org or stat.ethz.ch,
you can also access the description file inside R with the command
𝐩𝐚𝐜𝐤𝐚𝐠𝐞𝐃𝐞𝐬𝐜𝐫𝐢𝐩𝐭𝐢𝐨𝐧("package") via the documentation of the package
𝐡𝐞𝐥𝐩(𝐩𝐚𝐜𝐤𝐚𝐠𝐞 = "𝐩𝐚𝐜𝐤𝐚𝐠𝐞") or online in the repository of the package.
For example, for the “stats” package, these ways will be:
R-Code:
packageDescription("stats")
package(package = "stats")
Output:
What are Repositories?
 A repository is a place where packages are located so you can install
them from it. Although you or your organization might have a local
repository, typically they are online and accessible to everyone. Three
of the most popular repositories for R packages are:
 CRAN: the official repository, it is a network of ftp and web servers
maintained by the R community around the world. The R foundation
coordinates it, and for a package to be published here, it needs to pass
several tests that ensure the package is following CRAN policies.
 Bioconductor: this is a topic specific repository, intended for open
source software for bioinformatics. As CRAN, it has its
own submission and review processes, and its community is very
active having several conferences and meetings per year.
Github: although this is not R specific, Github is probably the most popular
repository for open source projects. Its popularity comes from the unlimited space
for open source, the integration with 𝐠𝐢𝐭, a version control software, and its ease to
share and collaborate with others. But be aware that there is no review process
associated with it.
How to install an R packages ?
 Installing Packages From CRAN
 How you can install a package will depend on where it is located. So, for publicly
available packages, this means to what repository it belongs. The most common
way is to use the CRAN repository, then you just need the name of the package and
use the command
R-code:
install.packages("package")
 Example: Installing “VIF(Variance Inflation Factor)” package from CRAN.
R-Code:
install.packages("VIF")
Output:
Packages installing from CRAN Mirrors:
 Remember that CRAN is a network of servers (each of them called a “mirror”), so
you can specify which one you would like to use. If you are using R through the
𝐑𝐆𝐮𝐢 interface, you can do it by selecting it from the list which appears just after
you use the install.packages() command. On 𝐑𝐒𝐭𝐮𝐝𝐢𝐨, the mirror is already
selected by default.
 You can also select your mirror by using the chooseCRANmirror() , or directly
inside the install.packages() function by using the repo parameter. You can see
the list of available mirrors with getCRANmirror().
 Package installing via 𝐝𝐞𝐯𝐭𝐨𝐨𝐥𝐬:
 A more efficient way is probably to use the devtools package to simplify this
process, because it contains specific functions for each repository, including
CRAN.
 You can install 𝐝𝐞𝐯𝐭𝐨𝐨𝐥𝐬 package (install.packages("devtools")) but you might also
need to install Rtools on Windows, 𝐗𝐜𝐨𝐝𝐞 command line tools on Mac, or 𝒓 −
𝒃𝒂𝒔𝒆 − 𝒅𝒆𝒗 and 𝐫 − 𝐝𝐞𝐯𝐞𝐥 on Linux.
After 𝐝𝐞𝐯𝐭𝐨𝐨𝐥𝐬 is installed, you will be able to use the utility functions to install
another packages. The options are:
1. install_bioc() from Bioconductor.
2. install_bi𝐭𝐛𝐮𝐜𝐤𝐞𝐭() from Bitbucket.
3. install_𝐜𝐫𝐚𝐧() from CRAN.
4. install_𝐠𝐢𝐭() from a git repository
5. install_github() from GitHub.
6. install_𝐥𝐨𝐜𝐚𝐥() from a local file.
7. install_𝐬𝐯𝐧() from a SVN repository.
8. install_𝐮𝐫𝐥() from a URL.
9. install_𝐯𝐞𝐫𝐬𝐢𝐨𝐧() from a specific version of a CRAN package.
How to update, remove, and check installed packages:
R-Code for checking installed packages……
installed.packages()
R-Code for uninstalled packages……
remove.packages()
Package Lists
Some important R-Packages
The list of major packages in R programming language is as follows…….
1. 𝐠𝐠𝐩𝐥𝐨𝐭𝟐:
With ggplot2, you can create graphics declaratively. ggplot2 is famous for its
elegant and quality graphs that sets it apart from other visualization packages.
The ggplot2 package, created by Hadley Wickham, offers a powerful graphics
language for creating elegant and complex plots. Originally, based on Leland
Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that
represent both 𝐮𝐧𝐢𝐯𝐚𝐫𝐢𝐚𝐭𝐞 and multivariate numerical and categorical data in a
straightforward manner. Grouping can be represented by color, symbol, size, and
transparency.
𝟐. 𝐠𝐠𝐫𝐚𝐩𝐡:
 𝐠𝐠𝐫𝐚𝐩𝐡 is an extension of ggplot2. It takes away the limitation of ggplot2, that is,
its dependency on tabular data.
𝟑. 𝐝𝐩𝐥𝐲𝐫:
 We use this library for performing data wrangling and data analysis. The 𝐝𝐩𝐥𝐲𝐫
library facilitates several functions for the data frames in R.
4. 𝐠𝐠𝐦𝐚𝐩:
This is a mapping package that is used for delineating spatial visualizations. It also
consists of various tools for 𝒈𝒆𝒐𝒍𝒐𝒄𝒂𝒕𝒊𝒏𝒈 and routing.
5. MASS:
MASS provides a large number of statistical functions. It provides datasets that are
in conjunction with the book “Modern Applied Statistics with S”.
6. VIF:
This function is a simple port of VIF from car package. The VIF of a predictor is a
measure for how easily it is predicted from a linear regression using the other
predictors. Taking the square root of the VIF tells you how much larger the
standard error of the estimated coefficient is respect to the case when that predictor
is independent of the other predictors.
A general guideline is that a VIF larger than 5 or 10 is large, indicating that the
model has problems estimating the coefficient. However, this in general does not
degrade the quality of predictions. If the VIF is larger than 1/(1-𝑹𝟐 ), where 𝑹𝟐 is
the Multiple R-squared of the regression, then that predictor is more related to the
other predictors than it is to the response.
7. utf8:
utf8 is an R package for manipulating and printing UTF-8 text that fixes multiple
bugs in R's UTF-8 handling.
8. 𝐧𝐥𝐦𝐞(Non linear Mixed Effects):
 In R, 𝐧𝐥𝐦𝐞 is a library or package containing many useful functions and datasets
Some problems discuss using R
Q1. Find the random number generation from standard normal distribution and uniform distribution.
Answer:
R-Code:
𝐧𝟏 < −𝐫𝐧𝐨𝐫𝐦(𝟏𝟐, 𝐦𝐞𝐚𝐧 = 𝟎, 𝐬𝐝 = 𝟏)
print(n1)
𝐧𝟐 < −𝐫𝐮𝐧𝐢𝐟(𝟐𝟎, −𝟏, 𝟏)
print(n2)
Output:
Q2. The given data sets are 12, 23, 21, 34, 11, 45, 32, 43, 67, 22, 55, 44, 33, 54. Find the
mean, median, minimum and maximum value of the given data sets.
Answer:
 The given data sets are 12, 23, 21, 34, 11, 45, 32, 43, 67, 22, 55, 44, 33, 54.
R-Code:
𝐝𝐚𝐭𝐚 < −𝐜 𝟏𝟐, 𝟐𝟑, 𝟐𝟏, 𝟑𝟒, 𝟏𝟏, 𝟒𝟓, 𝟑𝟐, 𝟒𝟑, 𝟔𝟕, 𝟐𝟐, 𝟓𝟓, 𝟒𝟒, 𝟑𝟑, 𝟓𝟒
𝐩𝐫𝐢𝐧𝐭(𝐝𝐚𝐭𝐚)
𝐌𝐞𝐚𝐧 < −𝐦𝐞𝐚𝐧 𝐝𝐚𝐭𝐚 Output:
𝐩𝐫𝐢𝐧𝐭(𝐌𝐞𝐚𝐧)
𝐌𝐞𝐝𝐢𝐚𝐧 < −𝐦𝐞𝐝𝐢𝐚𝐧 𝐝𝐚𝐭𝐚
𝐩𝐫𝐢𝐧𝐭(𝐌𝐞𝐝𝐢𝐚𝐧)
𝐌𝐢𝐧𝐢𝐦𝐮𝐦 < −𝐦𝐢𝐧 𝐝𝐚𝐭𝐚
𝐩𝐫𝐢𝐧𝐭(𝐌𝐢𝐧𝐢𝐦𝐮𝐦)
𝐌𝐚𝐱𝐢𝐦𝐮𝐦 < −𝐦𝐚𝐱 𝐝𝐚𝐭𝐚
𝐩𝐫𝐢𝐧𝐭(𝐌𝐚𝐱𝐢𝐦𝐮𝐦)
𝟐 𝟒 𝟏 −𝟐 𝟓 𝟎
Q3. If A = 𝟑 𝟓 𝟕 and B= 𝟑 𝟔 𝟑 , then A+B, A-B and AB using R.
𝟒 𝟎 𝟑 −𝟒 𝟑 −𝟒
Answer:
𝟐 𝟒 𝟏 −𝟐 𝟓 𝟎
The given two matrices are A = 𝟑 𝟓 𝟕 and B= 𝟑 𝟔 𝟑
𝟒 𝟎 𝟑 −𝟒 𝟑 −𝟒
R-Code:
𝐝𝐚𝐭𝐚𝟏 < −𝐜(𝟐, 𝟒, 𝟏, 𝟑, 𝟓, 𝟕, 𝟒, 𝟎, 𝟑)
𝐀 < −𝐦𝐚𝐭𝐫𝐢𝐱 𝐝𝐚𝐭𝐚, 𝐧𝐫𝐨𝐰 = 𝟑, 𝐧𝐜𝐨𝐥 = 𝟑, 𝐛𝐲𝐫𝐨𝐰 = 𝐓𝐑𝐔𝐄
print(A)
𝐝𝐚𝐭𝐚𝟐 < −𝐜(−𝟐, 𝟓, 𝟎, 𝟑, 𝟔, 𝟑, −𝟒, 𝟑, −𝟒)
𝐁 < −𝐦𝐚𝐭𝐫𝐢𝐱 𝐝𝐚𝐭𝐚, 𝐧𝐫𝐨𝐰 = 𝟑, 𝐧𝐜𝐨𝐥 = 𝟑, 𝐛𝐲𝐫𝐨𝐰 = 𝐓𝐑𝐔𝐄
print(B)
𝐀𝐝𝐝 < − 𝐀 + 𝐁
print(Add)
𝐒𝐮𝐛𝐬 < − 𝐀 − 𝐁
print(Subs)
𝐌𝐮𝐥𝐭𝐢𝐩 < − 𝐀 ∗ 𝐁
Print(𝐌𝐮𝐥𝐭𝐢𝐩)
Output:
Q4. Suppose we have the following data set 88, 95, 92, 97, 96, 97, 94, 86, 91, 95, 97, 88,
85, 76, 68. Draw the histogram and also find 𝐬𝐤𝐞𝐰𝐧𝐞𝐬𝐬 and kurtosis using R.

Answer:
The given data sets are 88, 95, 92, 97, 96, 97, 94, 86, 91, 95, 97, 88, 85, 76, 68.

R-Code:
𝐝𝐚𝐭𝐚 < −𝐜(𝟖𝟖, 𝟗𝟓, 𝟗𝟐, 𝟗𝟕, 𝟗𝟔, 𝟗𝟕, 𝟗𝟒, 𝟖𝟔, 𝟗𝟏, 𝟗𝟓, 𝟗𝟕, 𝟖𝟖, 𝟖𝟓, 𝟕𝟔, 𝟔𝟖)
print(data)
𝐡𝐢𝐬𝐭(𝐝𝐚𝐭𝐚)
𝐥𝐢𝐛𝐫𝐚𝐫𝐲 𝐦𝐨𝐦𝐞𝐧𝐭𝐬
𝐤𝐮𝐫𝐭𝐨𝐬𝐢𝐬 𝐝𝐚𝐭𝐚
𝐬𝐤𝐞𝐰𝐧𝐞𝐬𝐬(𝐝𝐚𝐭𝐚)
Output:
Q5.Find the quartiles, interquartile of the given data sets 120, 123, 56, 67, 100, 111,
120, 89, 90, 99, 130, 110. (Using R)
Answer: The given data sets are 120, 123, 56, 67, 100, 111, 120, 89, 90, 99, 130, 110.
R-code:
𝐝𝐚𝐭𝐚 < −𝐜(120, 123, 56, 67, 100, 111, 120, 89, 90, 99, 130, 110)
print(data)
summary(data)
IQR(data)
Output:
Q6. Every month one measure the amount of weight one's dog has picked up and
get these outcomes:
0.5 0.5 0.3 -0.2 1.6 0 0.1 0.6 0.4

Draw the histogram demonstrating how much is that dog developing.


Answer:
 Monthly development vary from -0.2 (the fox lost weight that month) to 1.6.
Putting them in order from lowest to highest weight gain.
-0.2 0 0.1 0.3 0.4 0.5 0.5 0.6 1.6

We decide to put the results into groups of 0.5:


R-Code: 𝐝𝐚𝐭𝐚 < −𝐜 𝟎. 𝟓, 𝟎. 𝟓, 𝟎. 𝟑, −𝟎. 𝟐, 𝟏. 𝟔, 𝟎, 𝟎. 𝟏, 𝟎. 𝟔, 𝟎. 𝟒
𝐡𝐢𝐬𝐭(𝐱, 𝐱𝐥𝐚𝐛 = "Weight,col="𝐁𝐥𝐮𝐞", 𝐱𝐥𝐢𝐦 = 𝐜(−𝟏, 𝟑), 𝐲𝐥𝐢𝐦 =
𝐜(𝟎, 𝟔), 𝐛𝐨𝐫𝐝𝐞𝐫 = "𝐑𝐞𝐝")
Output:

There are no values from 1 to just below 1.5, but we still show the space.
R-Quiz
Q7. Which of the following is a base package for R language?
a. 𝐮𝐭𝐢𝐥 b. 𝐥𝐚𝐧𝐠 c. tools d. All of the above

Q8. R comes with a ________ to help you optimize your code and improve its
performance.
a. Debugger b. Monitor c. Profiler d. None of the above.

Q9. debug() flags a function for ______ mode in R mode.


a. debug b. run c. compile d. None of the above.

Q10. ______ suspends the execution of a function wherever it is called and puts the
function in debug mode.
a. recover() b. browser() c. Both of the above.

Q11. A matrix is ___dimensionsinal rectangular data set?


a. 5 b. 4 c. 3 d. 2
Q12. The _____ function takes a vector or other objects and splits it into groups
determined by a factor or list of factors.
a. apply() b. split c. 𝐢𝐬𝐩𝐥𝐢𝐭() d. 𝐦𝐚𝐩𝐩𝐥𝐲()

Q13. 𝐥𝐚𝐩𝐩𝐥𝐲 function takes___ arguments in R language.


a. 1 b. 3 c. 4 d. 5

Q14. ____is used to apply a function over subsets of a vector.


a. apply b. 𝐥𝐚𝐩𝐩𝐥𝐲 c. 𝐦𝐚𝐩𝐩𝐥𝐲 d. 𝐭𝐚𝐩𝐩𝐥𝐲

Q15. _______applies a function over the margins of an array.


a. apply b. 𝐥𝐚𝐩𝐩𝐥𝐲 c. 𝐭𝐚𝐩𝐩𝐥𝐲 d. 𝐦𝐚𝐩𝐩𝐥𝐲

Q16. ____function is same as 𝐥𝐚𝐩𝐩𝐥𝐲() in R.


a. apply b. 𝐥𝐚𝐩𝐩𝐥𝐲 c. 𝐬𝐚𝐩𝐩𝐥𝐲 d. 𝐭𝐚𝐩𝐩𝐥𝐲
Q17. _______ loop over a list and evaluate a function on each element.
a. apply b. 𝐥𝐚𝐩𝐩𝐥𝐲 c. 𝐬𝐚𝐩𝐩𝐥𝐲 d. 𝐭𝐚𝐩𝐩𝐥𝐲

Q18. __________ is proprietary tool for predictive analytics.


a. R b. SAS c. SSAS d. SPSS

Q19. Data frames can be converted to a matrix by calling data._______


a. 𝐦𝐚𝐭𝐫() b. mat() c. matrix() d. None of the above

Q20. Which of the following method make a vector of repeated values?


a. rep() b. data() c. view() d. None of the above

Q21. R objects can have attributes, which are like ________ for the object.
a. metadata b. features c. expressions

Q22. Attributes of an object (if any) can be accessed using the ______ function.
a. objects() b. 𝐚𝐭𝐭𝐫𝐢𝐛() c. attributes()
Q23. _________ involves predicting a response with meaningful magnitude, such as
quantity sold, stock price, or return on investment.
a. Regression b. Clustering c. Summarization

Q24. ________ provides needed string operators in R.


a. 𝐬𝐭𝐫 b. 𝐟𝐨𝐫𝐜𝐚𝐬𝐭 c. 𝐬𝐭𝐫𝐢𝐧𝐠𝐫

Q25. ______ splits a data frame and results in an array (hence the da). Hopefully,
you’re getting the idea here.
a. apply b. 𝐝𝐚𝐩𝐥𝐲 c. stats

Q26. 𝐒𝐲𝐬𝐭𝐞𝐦. 𝐭𝐢𝐦𝐞 function returns an object of class _______ which contains two
useful bits of information.
a. 𝐝𝐞𝐛𝐮𝐠_𝐭𝐢𝐦𝐞 b. 𝐩𝐫𝐨𝐜𝐞𝐝𝐮𝐫𝐞_𝐭𝐢𝐦𝐞 c. 𝐩𝐫𝐨𝐜_𝐭𝐢𝐦𝐞

Q26. Which of the following will start the R program?


a. $R b. &R c. *R
Q27. Which of the following is used for Statistical analysis in R language?
a. Studio b. 𝐑𝐬𝐭𝐮𝐝𝐢𝐨 c. Heck

Q28. R functionality is divided into a number of ________.


a. Packages b. Functions c. Domains

Q29. Which of the following is an example of 𝐯𝐞𝐜𝐭𝐨𝐫𝐢𝐳𝐞𝐝 operation as far as


subtraction is concerned?
> x < − 1:4
> y < − 6:9
𝐱
a. 𝐱 + 𝐲 b. 𝐱 − 𝐲 c. d. 𝐱 ∗ 𝐲
𝐲
Q30. What would be the output of the following code?
> x < − 1:4
> y < − 6:9
>z<−x+y
>z
a. 7 9 11 13 b. 7 9 11 13 14 c. 9 11 13 d. Null
Q31. What would be the output of the following code?
> x <- 1:4
>x>2
a. 𝐅𝐀𝐋𝐒𝐄 𝐅𝐀𝐋𝐒𝐄 𝐓𝐑𝐔𝐄 𝐓𝐑𝐔𝐄 b. 1 2 3 4 c. 1 2 3 4 5
Q32. What would be the value of the following expression?
log(-1)
a. Warning in log(-1): 𝐍𝐚𝐍𝐬 produced b. 1 c. Null
Q33. What will be the output of the following code?
> g < − function(x) {
+a<−3
+𝐱+𝐚+𝐲
+ ## ‘y’ is a free variable
+}
> g(2)
a. 8 b. 9 c. 42 d. Error
Q34. What will be the output of the following code?
function(p) {
𝐩𝐚𝐫𝐚𝐦𝐬[!fixed] < − p
mu < − 𝐩𝐚𝐫𝐚𝐦𝐬[1]
sigma < − 𝐩𝐚𝐫𝐚𝐦𝐬[2]
## Calculate the Normal density
a < − -0.5*length(data)*log(2*pi*sigma^2)
b < − -0.5*sum((data-mu)^2) / (sigma^2)
-(a + b)
}
> 𝐥𝐬(environment(𝐧𝐋𝐋))
a. “data” “fixed” “𝐩𝐚𝐫𝐚𝐦” b. “data” “variable” “𝐩𝐚𝐫𝐚𝐦𝐬” c. “data” “fixed”
“𝐩𝐚𝐫𝐚𝐦𝐬” d. None of the above
Q25. Which of the following is a principle of analytic graphics?
a. Don’t plot more than two variables at time b. Make judicious use of color
in your scatterplots c. Show box plots (𝐮𝐧𝐢𝐯𝐚𝐫𝐢𝐚𝐭𝐞 summaries) d. Show
causality, mechanism, explanation
Q26. R is an __________ programming language?
a. Close source b. GPL c. Open source d. Definite source
Q37. Who developed R?
a. Dennis Ritchie b. John Chambers c. 𝐁𝐣𝐚𝐫𝐧𝐞 𝐒𝐭𝐫𝐨𝐮𝐬𝐭𝐫𝐮𝐩
Q38. R was named partly after the first names of ____ R authors?
a. One b. Two c. Three d. Four
Q39. Packages are useful in collecting sets into a _____ unit ?
a. Single b. Multiple
Q40. R is an interpreted language so it can access through_____________?
a. Disk operating system b. User Interface operating system c. Operating
system d. Command line interpreter
Q41. Many quantitative analysts use R as their ____ tool?
a. Leading tool b. Programming tool c. Both of the above
Q42. Predictive analysis is the branch of __________analysis?
a. Advanced b. Core c. Both of the above
Q43. ___________ is used to make predictions about unknown future events?
a. Descriptive analysis b. 𝐏𝐫𝐞𝐝𝐢𝐜𝐢𝐭𝐢𝐯𝐞 analysis c. Both the above
Q44. How many steps does the predictive analysis process contained?
a. 5 b. 6 c. 7 d. 8
Q45. Descriptive analysis tell about________?
a. Past b. Present c. Future
Q46. How many types of R objects are present in R data type?
a. 4 b. 5 c. 6 d. 7
Q47. How many types of data types are present in R?
a. 4 b. 5 c. 6 d. 7
Q48. Which of the following is a primary tool for debugging?
a. debug() b. trace c. browser d. None of the above
Q49. Which function is used to create the vector with more than one element?
a. Library() b. plot() c. c() d. par()
Q50. In R every operation has a ______call?
a. System b. Function c. None of the above
Q51. The ____________ in R is a vector.
a. Basic data structure b. Basic 𝐝𝐚𝐭𝐚𝐭𝐲𝐩𝐞𝐬 c. both
Q52. Vectors come in two parts_____ and _____.
a. Atomic vectors and matrix b. Atomic vectors and array c. Atomic vectors
and list
Q53. How many types of atomic vectors are present?
a. 3 b. 4 c. 5 d. 6
Q54. How many types of vertices functions are present?
a. 1 b. 2 c. 3 d.4
Q55. _________and_________ are types of matrices functions?
a. apply and 𝐬𝐚𝐩𝐩𝐥𝐲 b. apply and l𝐚𝐩𝐩𝐥𝐲 c. Both
Q56. How many control statements are present in R?
a. 6 b. 7 c. 8 d.9
Q57. Which of the following finds the maximum value in the vector x, exclude
missing values
a. 𝐫𝐦() b. all(x) c. max(x, na.rm=TRUE) d. 𝐱%𝐢𝐧%𝐲
Q58. Which of the following sort 𝐝𝐚𝐭𝐚𝐟𝐫𝐚𝐦𝐞 by the order of the elements in B
a. 𝐚. 𝐱[rev(order(𝐱$𝐁)),] b. 𝐛. 𝐱[𝐨𝐫𝐝𝐞𝐫𝐬𝐨𝐫𝐭(𝐱$𝐁),] c. 𝐜. 𝐱[order(𝐱$𝐁),]

Q59. _________ initiates an infinite loop right from the start.


a. Never b. Repeat c. Break d. Set
Q60. _______ is used to skip an iteration of a loop.
a. Next b. Skip c. Group

Q61. _____ programming language is a dialect of S


a. B 𝐛. B c. C d. S

Q62. In 1991, R was created by Ross 𝐈𝐡𝐚𝐤𝐚 and Robert Gentleman in the Department
of Statistics at the University of _________.
a. Auckland b. Harvard c. California d. John Hopkins

Q63. Finally, in _________ R version 1.0.0 was released to the public.


a. 2000 b. 2005 c. 2010 d. 2012
Q64. R is technically much closer to the Scheme language than it is to the original
_____ language.
a. B 𝐛. S c. C d. C++
Q65. R functionality is divided into a number of ________
a. Packages b. Functions c. Domains
Q66. What are the data structures in R that is used to perform statistical analyses and
create graphs?
Answer: R has data structures like ………
1. Vectors
2. Matrices
3. Arrays
4. Data frames
Q67. Explain general format of Matrices in R?
Answer: General format is ………
𝐌𝐲𝐦𝐚𝐭𝐫𝐢𝐱 < − matrix (vector, 𝐧𝐫𝐨𝐰 = 𝐫 , 𝐧𝐜𝐨𝐥 = 𝐜 , 𝐛𝐲𝐫𝐨𝐰 = 𝐅𝐀𝐋𝐒𝐄,
𝐝𝐢𝐦𝐧𝐚𝐦𝐞𝐬 = list ( 𝐜𝐡𝐚𝐫_𝐯𝐞𝐜𝐭𝐨𝐫_ 𝐫𝐨𝐰𝐧𝐚𝐦𝐞, 𝐜𝐡𝐚𝐫_𝐯𝐞𝐜𝐭𝐨𝐫_𝐜𝐨𝐥𝐧𝐚𝐦𝐞𝐬))
Q68. In R how missing values are represented ?
𝐀𝐧𝐬: In R missing values are represented by NA (Not Available), why impossible
values are represented by the symbol 𝐍𝐚𝐍 (not a number).

Q69. Explain what is transpose?


𝐀𝐧𝐬: For re-shaping data before, analysis R provides various method and transpose are
the simplest method of reshaping a dataset. To transpose matrix or a data frame
t () function is used.

Q70. Explain how data is aggregated in R?


𝐀𝐧𝐬: By collapsing data in R by using one or more BY variables, it becomes easy. When
using the aggregate() function the BY variable should be in the list.

Q71. What is the function used for adding datasets in R?


𝐀𝐧𝐬: 𝐫𝐛𝐢𝐧𝐝 function can be used to join two data frames (datasets). The two data
frames must have the same variables, but they do not have to be in the same order.
Q72. What is the use of subset() function and sample() function in R ?
𝐀𝐧𝐬: In R, subset() functions help you to select variables and observations while
through sample() function you can choose a random sample of size n from a dataset.

Q73. Explain how you can create a table in R without external file?
𝐀𝐧𝐬: Use the code
𝐦𝐲𝐓𝐚𝐛𝐥𝐞 = 𝐝𝐚𝐭𝐚. 𝐟𝐫𝐚𝐦𝐞()
edit(𝐦𝐲𝐓𝐚𝐛𝐥𝐞)
This code will open an excel like spreadsheet where you can easily enter your data.

Q74. Explain what is R?


𝐀𝐧𝐬: R is data analysis software which is used by analysts, quants, statisticians, data
scientists and others.

Q75. List out some of the function that R provides?


𝐀𝐧𝐬: The function that R provides are Mean, Median, Distribution, Covariance,
Regression, Non-linear, Mixed Effects, GLM, GAM, etc…
Q76. Mention what does not ‘R’ language do?
𝐀𝐧𝐬: a. Though R programming can easily connects to DBMS is not a database
b. R does not consist of any graphical user interface
c. Though it connects to Excel/Microsoft Office easily, R language does not
provide any spreadsheet view of data.

Q77. Explain how R commands are written?


𝐀𝐧𝐬: In R, anywhere in the program you have to preface the line of code with a #sign,
for example
1. # subtraction
2. # division
3. # note order of operations exists

Q78. How can you save your data in R?


𝐀𝐧𝐬: To save data in R, there are many ways, but the easiest way of doing this is
Go to Data > Active Data Set > Export Active Data Set and a dialogue box will
appear, when you click ok the dialogue box let you save your data in the usual
way.
Q79. Mention how you can produce co-relations and 𝐜𝐨𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞𝐬?
𝐀𝐧𝐬: You can produce co-relations by the 𝐜𝐨𝐫 () function to produce co-relations and
𝐜𝐨𝐯 () function to produce 𝐜𝐨𝐯𝐚𝐫𝐢𝐚𝐧𝐜𝐞𝐬.

Q80. Explain what is t-tests in R?


𝐀𝐧𝐬: In R, the 𝐭. 𝐭𝐞𝐬𝐭 () function produces a variety of t-tests. T-test is the most
common test in statistics and used to determine whether the means of two
groups are equal to each other.

Q81. Who created R- programming language and when?


𝐀𝐧𝐬: R-programming language created in the early 1990s by Ross 𝐈𝐡𝐚𝐤𝐚 and Robert
Gentleman of the statistics department of the University Auckland and it is
currently maintained by the R core development team

Q82. What are some popular GUIs for R?


𝐀𝐧𝐬: There are some free GUIs are available for R. Among them the most popular
are 1. R-Studio 2. 𝐒𝐭𝐚𝐭𝐄𝐓 3. 𝐃𝐞𝐝𝐮𝐜𝐞𝐫
Q83. How can we remove object from R?
𝐀𝐧𝐬: The function remove() and 𝐫𝐦() are used to remove objects from the working
directory
Q84. How to write comments in R?
𝐀𝐧𝐬: R supports single line comment in same style as in shell using # in the
beginning of the statement.
Q85. What is factor?
𝐀𝐧𝐬: Factor is a special variable type for storing categorical variables. A factor can
contain both integers and string and it is generally used in statistical modeling.
The factor() function is used to create a factor.
Q86. What is the difference between vector and list?
𝐀𝐧𝐬: A vector represents a set of elements of the same mode, which can be integer,
floating number, character, complex number and so on while list may contain
different objects.
A vector have all elements of the same type, but a list may contain elements of
different types.
Q87. What is matrix?
𝐀𝐧𝐬: A matrix is a rectangular array of numbers, symbols, or expressions, arrange in
rows and columns. It has two dimensional structure. All columns in a matrix are
the same type and of the same length. In R a matrix is created by using matrix()
function.
Q88. What is the colon operator?
𝐀𝐧𝐬: The colon operator generates a regular sequence, which is equivalent to
interaction (𝐚, 𝐛) but the levels are ordered labelled differently.
Example: > 𝐯𝟏 < −𝟏: 𝟏𝟎
Q89. What is the repeat loop in R?
𝐀𝐧𝐬: Repeat loop is the easiest loop in R. It is similar to the do while loop of other
language. In this, the conditional check is written at the end of the loop
iteration, So the statement is executed first before the condition is tested.
Q90. What is the use of 𝐚𝐛𝐥𝐢𝐧𝐞() function?
𝐀𝐧𝐬: The 𝐚𝐛𝐥𝐢𝐧𝐞() function is a low-level plotting function, which adds one or more
straight lines(vertical, horizontal or regression lines) to the current plot.
This command adds a green horizontal line at y=5 to the plot …
> 𝐚𝐛𝐥𝐢𝐧𝐞(𝐡 = 𝟓, 𝐜𝐨𝐥 = "Green")

You might also like