Basics of R Programming
Basics of R Programming
Unit 1
What Is R?
● R refers to two things. There is R, the programming language, and R, the
piece of software that you use to run programs written in R.
● R (the language) was created in the early 1990s by Ross Ihaka and Robert
Gentleman, then both working at the University of Auckland.
● R (the software) is a GNU project, reflecting its status as open source
software
● R is an interpreted language (sometimes called a scripting language),
which means that your code doesn’t need to be compiled before you run
it.
● R is often used for statistical computing and graphical presentation to
analyze and visualize data.
Why Use R?
● It is a great resource for data analysis, data visualization, data science
and machine learning
● It provides many statistical techniques (such as statistical tests,
classification, clustering and data reduction)
● It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter
plot, etc
● It works on different platforms (Windows, Mac, Linux)
● It is open-source and free
● It has a large community support
● It has many packages (libraries of functions) that can be used to solve
different problems
Installing R and RStudio-
● R (the software) is a GNU project, reflecting its status as important free and
open source software
● RStudio is an R-specific IDE. That means that you lose the ability to code
(easily) in multiple languages, but you do get some features especially for
R.
● Go to the website https://www.rstudio.com/categories/rstudio-ide/
RStudio
● The RStudio IDE Layout is divided into 4 parts
RStudio
1. The R console
● This is the “interpreter” and runs your code in real time (as opposed to
needing to compile your code and then run it).
● It interprets whatever you write into the console to
➔ Perform basic calculations such as 2 + 2
➔ Assign values to a variable
➔ Apply a function
● This is where code is executed
● Code is not saved on your disk when entered into the console
● If you want to save your code, then use an R script.
RStudio
2. The R Scripts
● An R analysis script allows you to store your code in a static document that
you can save.
● A script allows you to
➔ Save your code and to share with others (reproducibility)
➔ Try things out interactively and then add/modify to your code in the script
● You can add R code and comments to script files
● You can run the code from your script by highlighting the code and
pressing CMD+Enter (Mac) or Ctrl+Enter (Windows).
● In .R files (R scripts), code is saved on your disk.
RStudio
3. Environments
● In this section, you can find
➔ Workspace/enviroment tab which tells you what objects are in R and what
exists in memory/what is loaded/what you have read in.
➔ History tab which shows previous commands you have run. This is useful
for debugging your code, but don’t rely on it as a script.
RStudio
4. In the bottom right hand corner, there are several tabs which include
➔ Files - shows the files on your computer in the directory you are working in
➔ Viewer - can vew data or R objects
➔ Help - shows help documentations for R commands
➔ Plots - shows plots generated in your R sessions. Can see current and
previous plots, save, and export them to png/pdf formats.
➔ Packages - list of R packages you have installed
Basics of R programming(Using console)
Displaying output :
Output
R Data Structures
● A data structure is a specialized format for organizing, processing,
retrieving and storing data.
● Some data structures in R are as follows:
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Data frames
R Data Structures-Vectors=>Creating a vector
1. Vectors
● A vector is simply a list of items that are of the same type.
● To combine the list of items to a vector, use the c() function and separate the items by a
comma.
● Vectors can be created using c(), vector() and scan() functions.
● Vectors cannot
● Eg1. A vector of characters
● To make bigger or smaller steps in a sequence, use the seq() function.The seq()
function has three parameters: from is where the sequence starts, to is where the
sequence stops, and by is the interval of the sequence.
Vectors=> length of vector and sorting a vector
● To find out how many items a vector has, use the length() function.
● Lists are often called “recursive vectors” as you can store a list inside
another list.
R Data Structures - LISTS => Creating lists
Code
Output
LISTS => Access & Change Lists items
● Access Lists
Access the list items by referring to its index number, inside brackets. The first item has index
1, the second item has index 2, and so on
CODE OUTPUT
Output
LISTS => Access List Item
● To access multiple item to the end of the list, use the append() function
Code
Output
LISTS => Change Item Value & Get length
● To change the value of a specific item, refer to the index number:
Code Output
● To find out how many items a list has, use the length() function
Code Output
LISTS => Add Item
● To add an item to the end of the list, use the append() function
Code Output
Output
Matrices=> Creating
● You can also create a matrix with strings
Code
Output
Matrices=> Access Items
● You can access the items by using [ ] brackets. The first number "1" in the
bracket specifies the row-position, while the second number "2" specifies the
column-position
Code
Output
Matrices=> Access Items
● The whole row can be accessed if you specify a comma after the number in the
bracket:
Code
Output
● The whole column can be accessed if you specify a comma before the number
in the bracket
CODE:
Matrices=> Access More Than One Row & Column
● More than one row and column can be accessed if you use the c() function
Matrices=> Add Rows and Columns
● Use the cbind() function to add additional columns in a Matrix.The cells in
the new column must be of the same length as the existing matrix.
Matrices=> Add Rows and Columns
● Use the rbind() function to add additional rows in a Matrix. The cells in the
new row must be of the same length as the existing matrix.
Matrices=> Remove Rows and Columns
● Use the c() function to remove rows and columns in a Matrix
Matrices=> Check if an Item Exists, length of matrix
● To find out if a specified item is present in a matrix, use the %in% operator
Length of matrix
Use the length() function to find the dimension of a Matrix
Matrices=> Number of Rows and Columns, Loop through Matrix
● Use the dim() function to find the number of rows and columns in a Matrix:
Output
Matrices=> Combine two Matrices
● Use the rbind() or cbind() function to combine two or more matrices
together
Arrays=> Creating Arrays
● A collection of values of similar data type
● Compared to matrices, arrays can have more than two dimensions.
● Use the array() function to create an array, and the dim parameter to
specify the dimensions
● Syntax:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
1. Data- The data is the first argument in the array() function. It is an input
vector which is given to the array.
2. Dim - create matrices of n row and m columns.Takes 4 inputs.
- row_size:defines the number of row elements which an array can store.
- column_size: defines the number of columns elements which an array can
store
- matrices: In R, the array consists of multi-dimensional matrices
Arrays=> Creating Arrays
- dim_names: Used to change the default names of rows and columns.
● Eg1. An array with one dimension with values ranging from 1 to 10
● Use the dim() function to find the amount of rows and columns in an array
● Can also use the ncol() function to find the number of columns and nrow()
to find the number of rows
Data Frames=> Data Frame Length
● Use the length() function to find the number of columns in a Data Frame
(similar to ncol())
Data Frames=> Combining Data Frames
● Use the rbind() function to combine two or more data frames in R vertically
Data Frames=> Combining Data Frames
● Use the cbind() function to combine two or more data frames in R
horizontally