R Programming For NGS Data Analysis
R Programming For NGS Data Analysis
1. R programming basics
getwd() Function:
Working directory is the directory where R finds all R file for reading and writing. getwd() function returns an
absolute filepath representing the current working directory of the R process.
getwd()
Output:
[1] "C:/Users/bioc/Documents"
setwd() Function:
setwd("D:/bioc/R/")
dir() Function:
ls() Function:
ls() is a function in R that lists all the object in the working environment.
rm() Funtion:
Remove objects from environment. It can be used in scenario where you want to clean the environment before
running code. Below command will remove all the object from R environment.
rm(list = ls())
Help:
help() or ?
help(rlm, package="MASS")
demo(ggplot2)
Packages:
Information about the available packages on CRAN with the available.packages() function.
a <- available.packages()
install.packages("ggplot2") or
install.packages("ggplot2", lib="/data/Rpackages/")
You can install multiple R packages at once with a single call to install.packages(). Place the names of the R
packages in a character vector.
Load the package to make it available. The library() function is used to load packages into R. The following
code is used to load the ggplot2 package into R. Do not put the package name in quotes.
library(ggplot2)
R objects can have attributes, which are like metadata for the object. These metadata can be very useful in that
they help to describe the object.
names, dimnames
dimensions (e.g. matrices, arrays)
class (e.g. integer, numeric)
length
other user-defined attributes/metadata
Output: $names
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
Vector -A vector is a sequence of data elements of the same basic type. Members in a vector are officially called
components. Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical,
integer, double, complex, character and raw.
# Vector addition.
add.result <- v1+v2
print(add.result)
# Vector substraction.
sub.result <- v2-v1
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
# Vector division.
divi.result <- v2/v1
print(divi.result)
Factors:
Matrices:
A Matrix can be created using the matrix() function. R can also be used for matrix calculations.
Matrices have rows and columns containing a single data type.
x<-1:3
y<-10:12
z<-30:32
cbind(x,y,z)
rbind(x,y,z
Arrays: Arrays are the data types can store data in more than two dimensions of only one type of data. An
array can be created using the array() function. It takes vectors as input and uses the values in the dim
parameter to create an array.
# Create two vectors of different lengths.
v1 <- c(1,2,3)
v2 <- 100:110
# Take these vectors as input to create an array.
arr1 <- array(c(v1,v2))
arr1
arr2 <- array(c(v1,v2), dim=c(2,7))
arr2
Dataframes: Data frames are used to store tabular data in R. They are an important type of object in R
and are used in a variety of statistical modeling applications. Data frames are represented as a special type
of list where every element of the list has to have the same length. Each element of the list can be thought
of as a column and the length of each element of the list is the number of rows. Unlike matrices, data
frames can store different classes of objects in each column. It can also be created by reading files.
employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)
Missing values:
x <- c(100, 200, NA, 300,NA, 400)
b <- is.na(x)
x[!b]
comple.cases(x)
R operators:
R language has so many built-in operators to perform different arithmetic and logical
operations. There are mainly 4 types of operators in R.
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Mixed Operators :,%in%,%*%
Files in R:
There are a few very useful functions for reading data into R.
1. read.table() and read.csv() are two popular functions used for reading tabular data into R.
2. readLines() is used for reading lines from a text file.
3. source() is a very useful function for reading in R code files from a another R program.
4. dget() function is also used for reading in R code files.
5. load() function is used for reading in saved workspaces
6. unserialize() function is used for reading single R objects in binary format.
There are similar functions for writing data to files
1. write.table() is used for writing tabular data to text files (i.e. CSV).
2. writeLines() function is useful for writing character data line-by-line to a file or connection.
3. dump() is a function for dumping a textual representation of multiple R objects.
4. dput() function is used for outputting a textual representation of an R object.
5. save() is useful for saving an arbitrary number of R objects in binary format to a file.
6. serialize() is used for converting an R object into a binary format for outputting to a connection (or
file).
The read.table() function is one of the most commonly used functions for reading data in R. TO get the
help file for read.table() just type ?read.table in R console.