Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Short Tutorial

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

R Tutorial

Starting and Quitting R

Stat 698 Fall 2005

R is available on the following undergraduate machines: bacon, agnesi, fenchel, fitch, maddison, magnus, merrill, mgc2000 and in graduate machine beta. First login to this machine and to start R type the following at the UNIX prompt. bacon[101]% R To quit from R, type q()

Help Commands
You can type either of > help(log) > ?log To display the help file for the log (or any other) command. Type "help.start()" to start a help window. This is a way to list all the R commands and is very useful.

Simple Tasks in R
R is an interactive computing environment which you will use for data analysis. It can also be used as a calculator to perform simple tasks: > 4*6 [1] 24 > log(1000) [1] 6.907755

Creating Variables and Vectors


There are two assignment operators in R: <- and = They can be used interchangeably. > x <- 12 >x [1] 12 or x=12 (storing a scalar value)

> a <- c(1,2,3,4,5) >a [1] 1 2 3 4 5

(creating a five element vector)

> y <- 1:20 (creating a vector with the sequence of >y digits 1 through 20) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > z <- seq(1,2,0.1) (using the seq command to modify the step >z value in a sequence) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 > z[5] [1] 1.4 (returns the 5th observation)

> z[-5] (return all observations except the 5th) [1] 1.0 1.1 1.2 1.3 1.5 1.6 1.7 1.8 1.9 2.0 > z[2:4] (return the 2nd, 3rd, and 4th observations)

Stat 698 Fall 2005 Operations on Vectors


Create two vectors x and y, having the same length and see what happens when you do each of the following operations >x+y >x-y > x / y #division element by element > x * y #multiplication element by element > sqrt(y) > x^3 > x^y > log(x) > cos(x)

Subsetting by Value
Sometimes you want to extract a subset of a vector based on its values. For instance, if you want to extract just those elements of that are less than 0, type > x=c(1,4,3,-2,-3,6,-5) > i=x<0 > x[i] This returns the vector [1] -2 -3 -5 A short cut is > x[x<0] see the output for x[x<6 & x>3]

Making a Matrix and Having Access to it


> b <- c(11,18,13,14,15) > A=cbind(a,b) >A A [1,] 1 11 [2,] 2 18 [3,] 3 13 [4,] 4 14 [5,] 5 15 > Y=matrix(1,2,3) >Y [,1] [,2] [,3] [1,] 1 1 1 [2,] 1 1 1 > A[3,2] [1] 13 > A[2,] [1] 2 12 > A[,2] [1] 11 12 13 14 15 (column binding) (There is also a command called rbind that will bind vectors together as the rows of a matrix)

(The first element is the data (can be a vector), the second is the number of rows and the third is the number of columns. Use help(matrix) for more information) (returns element at row 3, column 2) (returns the 2nd row) (returns the 2nd column)

Matrix Multiplication
> A%*%Y > S=matrix(1:4,2,2) > solve(S) (matrix multiplication) (inverse of the matrix a)

Stat 698 Fall 2005 Read a file : read.table("filepath")


The function read.table is very useful for reading data from an external ascii file into R and storing it in a data frame. For example, the following commands > data=read.table("abc.dat",header=T)

Probability functions
R has a family of probability functions for each of several distributions. For instance, for the normal distribution, there is rnorm, pnorm, qnorm, dnorm rnorm(10, 1, 2) generates 10 independent values from a normal distribution with mean 1 and standard deviation 2 > dnorm(1.5, 1, 2) (returns the value of the normal(1,2) pdf at x=1.5) > pnorm(1.5, 1, 2) (returns the value of the normal(1,2) cdf at x=1.5) > qnorm(0.2, 1, 2) (returns x such that P(X<=x) = 0.2 where X is a normal(1,2) random variable) Other distributions are chisq, f, t, binom, poisson, etc. The corresponding functions for the t distribution are rt, pt, qt, dt

How to Write your own functions


Apart from built in functions, you can write your own functions Example: standardize the observations > std=function(x){ + m=mean(x) + s=sqrt(var(x)) + result=(x-m)/s + return(result) +} Here, the function std takes an argument, vector x. The original vector is transformed by subtracting the mean and divide by the standard deviation. Create a new vector and try calling your function.

Simple Linear Regression


> > > > weight=c(45,60,45,65,80,77,55,70) height=c(5.5,6.0,4.5,5.5,6.5,6.5,5.5,6.0) reg=lm(height~weight) summary(reg)

Built-in functions
Some built-in R function can be viewed in the same way. Try looking at the functions hist, mean, quantile, sum, prod You can also use many predefined function in the following topic : regression, glm, ANOVA models, survival analysis, data analysis, bootstrap, experimental designs, calculus...Use the help menu (help.start()) for the exact syntax of the functions.

Iteration
Avoid iteration if you can; take advantage of R vectorized math and functions such as apply. It successively applies the function of your choice to each row (or column) of a matrix. Let's create a simple matrix and use apply to find the mean of each row/column. > x = matrix(1:12,3,4) > apply(x,2,mean) #returns the mean of each column. > apply(x,1,mean) #returns the mean of each row However sometimes iteration cannot be avoided. Then you can use the R commands for or while . Here is an example of using for inside a function

Stat 698 Fall 2005


jsum_function(x){ jsum <- 0 for(i in 1:length(x)) { jsum <- jsum + x[i]} return(jsum)} Note that R has it's own function that performs this task, called sum. It will work MUCH faster than this one, especially on large vectors.

Commands for Plotting


Before making any graphs in Unix Environment, you must open a graphics device. > motif() > dev.off() (open a graphic device) (close a graphic device)

The following commands will create a scatterplot of the height and weight data and add to is the regression (best-fit) line. > plot(weight,height,xlab="weight",ylab="height") > title("Plot of weight and height") > abline(reg) Other graphical functions include lines(), boxplot(), hist(). To control the layout and number of graphs put on one page (useful for reducing the number of pages you need to print for an assignment) use the mfrow argument in the function par(). Here well have two down and one across. > par(mfrow=c(2,1)) To save/print a plot in Windows, move the cursor to the figure, right click the mouse and choose the save as postscript /print option. To save a plot in Unix, you must use the following command before creating the plot: > postscript(filename)

Listing and Removing Objects from your Workspace


R saves all the objects (matrices, vectors etc.) you create. To save memory you will want to occasionally clean up your workspace. > ls() [1] "a" "x" "y" "z" > rm(x) (list all the objects in your workspace) (remove the object x)

Other Reminders: R is case sensitive be careful! Installing R on a Windows PC


R is a software environment for statistical analysis which is similar to S plus but free. If you want to download R and install it on a PC running Windows 95 or later: 1. Go to http://cran.us.r-project.org/ 2. Click on "Windows (95 and later)" under "Precompiled Binary Distributions". 3. In the subdirectory base/ you will find rw2011.exe. Download this self-installing binary to your machine, double-click it, and follow the instructions.

Stat 698: R Software Assignment

Stat 698 Fall 2005

NOTE: All students who are not exempted for R software must complete this assignment, which is due by Friday October 14, 2004. You may submit the completed assignment in class or put it in my mail box (Asokan Mulayath Variyath). You should submit a printout that shows the answers / plots to all the questions together with the R codes. (sofy copy submission via email not acceptable). If you need any help, please email. 1. Find a short way to calculate (a) 70! (b) product of integers from 21 to 50 (i.e multiply numbers 21, 22,,50) (hint: see the documentation for the function prod). 2. a) Generate a random sample of 10 observations from exponential distribution with =1 and store it in a vector named "expobs". Type the command set.seed(698) and enter, before genetating the random numbers. b) Use the rep function to create a new vector "newexpobs" that consists of the first value of "expobs" repeated 1 time, the second value of "expobs" repeated 2 times, the third value of "expobs" repeated 3 times, ..., the 10th value of "expobs" repeated 10 times. (Hint: Look at the documentation on rep). c) Calculate the mean, the standard deviation, the 25% quantile and the 75% quantile of your new data "newexpobs". 3. a) For this question, use data in the file nba.txt . The data are online at http://www.student.math.uwaterloo.ca/~stat698/stat698_09_05/nba.txt The first column of this dataset represents the id number of some basketball players. The second column represents their height (in inches) and the third one represents their weight (in pounds). Show how you can use the read.table function to read this data set into R. b) Identify all the players (by their number) whose height is >83 and weight is <=240. c) Using the above data, fit a linear regression model, height = 0 + 1 weight + d) Construct 90% confidence interval for the predicted value of mean height as well as predicted individual value of height the mean height when weight = 201, 220 and 245 pounds. (Hint: see the documentation for the command predict or predict.lm). e) Suppose that the first half of the players are women (number 1 to 205) and the second half are men. Plot height vs weight for men. ON THE SAME PLOT, plot height vs weight for women, but use a different plotting symbol (be sure they both have the same scale ). Add a legend on the plot and a title. Be sure to include a printout of your functions and the final plots. 4. Write a function that takes a single vector as argument, and returns the Hoteling's T2 statisitc,

! 2 1.5 0.5" # 3 0.8$ . Use this function to get T2 value T =(x ) (x-), where =(1.5, 3, 2) and = 1.5 # $ #0.5 0.8 4 $ % & for vectors (1,2,3), (1.2, 2.8, 2.3) and (2.3, 1.9, 4).
2 -1

You might also like