Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1 views13 pages

1.1.4 Introduction To R FW

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

Data Mining and

Statistical Learning
Introduction

Professor Yajun Mei


School of Industrial and Systems Engineering

Lesson 4: Introduction to R
Learning Objectives

• Execute Installation of R
• Manipulate Data With R
• Review Programming
What is R
• R is a free software for statistical computation and
graphics

• Operation Systems: Windows, Mac, or Linux

• Homepage: http://www.r-project.org
Installing R Under Windows
Suppose you have a laptop with Windows OS(32/64 bits), how to install
R in your laptop?
• Go to any CRAN site (see http://cran.r-project.org for a list), and click
“Download R for Windows.”
• Follow the instruction to click ‘install R for the first time”
• Download R installation file, double-click on the icon and follow the
instructions to install.
As of January 2020, the file is “R-3.6.2-win.exe” (Size: 83Mb).
Data With R
• Objects: vector, matrix, data.frame, ts, list, factor, array
• Mode (numerical, character, complex, and logical);Length
• Read data stored in text (ASCII) files
read.table(), scan(), and read.fwf()
• Saving data
write(x, file=“data.txt”), write.table() write in a file a data.frame
Generating Data in R
# Generating data
x1 <- 1:10 # x1 = c(1,2,3,4,5,6,7,8,9,10)
x2 <- seq(1,4,0.5) # x2 = c(1.0,1.5, 2.0, 2.5, 3.0, 3.5, 4.0)
x3 <- rep(1,5) # x3 = c(1,1,1,1,1)
x4 <- gl(2,3,10) (generate factor levels)

#Check x4 values in R
> x4
[1] 1 1 1 2 2 2 1 1 1 2
Levels: 1 2
Matrix in R
# matrix in R
y <- matrix(c(1,2,3,4,5,6),2,3)
# This is 2x3 Matrix filled in column by column
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

w <- matrix(c(1,2),2,1) # This is 2x1 Matrix/vector


t(w) %*% y #matrix product w^T y= c(5, 11, 17)
Matrix Operations in R (cont.)
# matrix in R
y <- matrix(c(1,2,3,4,5,6),2,3)

y[1,3] # is the element on row 1 and column 3


y[1,] # is the first row, c(1,3,5)
y[,2] # is the second column, c(3,4)
apply(y,2,mean) #gives us the column means, c(1.5, 3.5, 5.5)
apply(y,1,sd) #gives us the standard deviation of each row, c(2, 2)
Data Frame
#Generate 100 normal variables
x <- rnorm(100)
# w will be used as a `weight’ vector
w <- 1 + x/2
# Model to generate Y from x and w
y <- x + w * rnorm(x)
# Make a data frame of three columns named x,y, w, and look at it.
dum <- data.frame(x,y,w)
dum
Linear Regression in R
# Fit a simple ordinary linear regression of y and x
fm1 <- lm(y ~ x, data= dum);
summary(fm1)

# we do a weighted least squares regression


fm2 <- lm( y ~ x, data = dum, weight = 1/ w^2)
summary(fm2)

# Plots
plot(x,y); abline(fm1); abline(fm2, col=“red”)
Programming Language
• You can write your own function in R, and then call that function later.
• R is rather slow in loops, and you should avoid loops as much as
possible by using matrix operations.

Example: suppose we have a vector x=(-2,3,5,-8), and for each element of x


with the value 3, we want to give the value of 0 to another variable y, else 1.
• R code: x <- c(-2,3,5,-8); y <- (x != 3);

• R output: y [1] TRUE FALSE TRUE TRUE


More R
• R studio: An integrated development environment for R.
https://www.rstudio.com

• Books related to R:
https://www.r-project.org/doc/bib/R-books.html

• Google more R materials and packages.


Summary

• R Installation
• Data with R
• Programming in R

You might also like