Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1.1.4 Introduction To R FW

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Data Mining and

Statistical Learning
Introduction

Professor Yajun Mei


School of Industrial and Systems Engineering

Lesson 4: Introduction to R
Learning Objectives

• Execute Installation of R
• Manipulate Data With R
• Review Programming
What is R
• R is a free software for statistical computation and
graphics

• Operation Systems: Windows, Mac, or Linux

• Homepage: http://www.r-project.org
Installing R Under Windows
Suppose you have a laptop with Windows OS(32/64 bits), how to install
R in your laptop?
• Go to any CRAN site (see http://cran.r-project.org for a list), and click
“Download R for Windows.”
• Follow the instruction to click ‘install R for the first time”
• Download R installation file, double-click on the icon and follow the
instructions to install.
As of January 2020, the file is “R-3.6.2-win.exe” (Size: 83Mb).
Data With R
• Objects: vector, matrix, data.frame, ts, list, factor, array
• Mode (numerical, character, complex, and logical);Length
• Read data stored in text (ASCII) files
read.table(), scan(), and read.fwf()
• Saving data
write(x, file=“data.txt”), write.table() write in a file a data.frame
Generating Data in R
# Generating data
x1 <- 1:10 # x1 = c(1,2,3,4,5,6,7,8,9,10)
x2 <- seq(1,4,0.5) # x2 = c(1.0,1.5, 2.0, 2.5, 3.0, 3.5, 4.0)
x3 <- rep(1,5) # x3 = c(1,1,1,1,1)
x4 <- gl(2,3,10) (generate factor levels)

#Check x4 values in R
> x4
[1] 1 1 1 2 2 2 1 1 1 2
Levels: 1 2
Matrix in R
# matrix in R
y <- matrix(c(1,2,3,4,5,6),2,3)
# This is 2x3 Matrix filled in column by column
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

w <- matrix(c(1,2),2,1) # This is 2x1 Matrix/vector


t(w) %*% y #matrix product w^T y= c(5, 11, 17)
Matrix Operations in R (cont.)
# matrix in R
y <- matrix(c(1,2,3,4,5,6),2,3)

y[1,3] # is the element on row 1 and column 3


y[1,] # is the first row, c(1,3,5)
y[,2] # is the second column, c(3,4)
apply(y,2,mean) #gives us the column means, c(1.5, 3.5, 5.5)
apply(y,1,sd) #gives us the standard deviation of each row, c(2, 2)
Data Frame
#Generate 100 normal variables
x <- rnorm(100)
# w will be used as a `weight’ vector
w <- 1 + x/2
# Model to generate Y from x and w
y <- x + w * rnorm(x)
# Make a data frame of three columns named x,y, w, and look at it.
dum <- data.frame(x,y,w)
dum
Linear Regression in R
# Fit a simple ordinary linear regression of y and x
fm1 <- lm(y ~ x, data= dum);
summary(fm1)

# we do a weighted least squares regression


fm2 <- lm( y ~ x, data = dum, weight = 1/ w^2)
summary(fm2)

# Plots
plot(x,y); abline(fm1); abline(fm2, col=“red”)
Programming Language
• You can write your own function in R, and then call that function later.
• R is rather slow in loops, and you should avoid loops as much as
possible by using matrix operations.

Example: suppose we have a vector x=(-2,3,5,-8), and for each element of x


with the value 3, we want to give the value of 0 to another variable y, else 1.
• R code: x <- c(-2,3,5,-8); y <- (x != 3);

• R output: y [1] TRUE FALSE TRUE TRUE


More R
• R studio: An integrated development environment for R.
https://www.rstudio.com

• Books related to R:
https://www.r-project.org/doc/bib/R-books.html

• Google more R materials and packages.


Summary

• R Installation
• Data with R
• Programming in R

You might also like