R Short Tutorial
R Short Tutorial
R Short Tutorial
R is available on the following undergraduate machines: bacon, agnesi, fenchel, fitch, maddison, magnus, merrill, mgc2000 and in graduate machine beta. First login to this machine and to start R type the following at the UNIX prompt. bacon[101]% R To quit from R, type q()
Help Commands
You can type either of > help(log) > ?log To display the help file for the log (or any other) command. Type "help.start()" to start a help window. This is a way to list all the R commands and is very useful.
Simple Tasks in R
R is an interactive computing environment which you will use for data analysis. It can also be used as a calculator to perform simple tasks: > 4*6 [1] 24 > log(1000) [1] 6.907755
> y <- 1:20 (creating a vector with the sequence of >y digits 1 through 20) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > z <- seq(1,2,0.1) (using the seq command to modify the step >z value in a sequence) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 > z[5] [1] 1.4 (returns the 5th observation)
> z[-5] (return all observations except the 5th) [1] 1.0 1.1 1.2 1.3 1.5 1.6 1.7 1.8 1.9 2.0 > z[2:4] (return the 2nd, 3rd, and 4th observations)
Subsetting by Value
Sometimes you want to extract a subset of a vector based on its values. For instance, if you want to extract just those elements of that are less than 0, type > x=c(1,4,3,-2,-3,6,-5) > i=x<0 > x[i] This returns the vector [1] -2 -3 -5 A short cut is > x[x<0] see the output for x[x<6 & x>3]
(The first element is the data (can be a vector), the second is the number of rows and the third is the number of columns. Use help(matrix) for more information) (returns element at row 3, column 2) (returns the 2nd row) (returns the 2nd column)
Matrix Multiplication
> A%*%Y > S=matrix(1:4,2,2) > solve(S) (matrix multiplication) (inverse of the matrix a)
Probability functions
R has a family of probability functions for each of several distributions. For instance, for the normal distribution, there is rnorm, pnorm, qnorm, dnorm rnorm(10, 1, 2) generates 10 independent values from a normal distribution with mean 1 and standard deviation 2 > dnorm(1.5, 1, 2) (returns the value of the normal(1,2) pdf at x=1.5) > pnorm(1.5, 1, 2) (returns the value of the normal(1,2) cdf at x=1.5) > qnorm(0.2, 1, 2) (returns x such that P(X<=x) = 0.2 where X is a normal(1,2) random variable) Other distributions are chisq, f, t, binom, poisson, etc. The corresponding functions for the t distribution are rt, pt, qt, dt
Built-in functions
Some built-in R function can be viewed in the same way. Try looking at the functions hist, mean, quantile, sum, prod You can also use many predefined function in the following topic : regression, glm, ANOVA models, survival analysis, data analysis, bootstrap, experimental designs, calculus...Use the help menu (help.start()) for the exact syntax of the functions.
Iteration
Avoid iteration if you can; take advantage of R vectorized math and functions such as apply. It successively applies the function of your choice to each row (or column) of a matrix. Let's create a simple matrix and use apply to find the mean of each row/column. > x = matrix(1:12,3,4) > apply(x,2,mean) #returns the mean of each column. > apply(x,1,mean) #returns the mean of each row However sometimes iteration cannot be avoided. Then you can use the R commands for or while . Here is an example of using for inside a function
The following commands will create a scatterplot of the height and weight data and add to is the regression (best-fit) line. > plot(weight,height,xlab="weight",ylab="height") > title("Plot of weight and height") > abline(reg) Other graphical functions include lines(), boxplot(), hist(). To control the layout and number of graphs put on one page (useful for reducing the number of pages you need to print for an assignment) use the mfrow argument in the function par(). Here well have two down and one across. > par(mfrow=c(2,1)) To save/print a plot in Windows, move the cursor to the figure, right click the mouse and choose the save as postscript /print option. To save a plot in Unix, you must use the following command before creating the plot: > postscript(filename)
NOTE: All students who are not exempted for R software must complete this assignment, which is due by Friday October 14, 2004. You may submit the completed assignment in class or put it in my mail box (Asokan Mulayath Variyath). You should submit a printout that shows the answers / plots to all the questions together with the R codes. (sofy copy submission via email not acceptable). If you need any help, please email. 1. Find a short way to calculate (a) 70! (b) product of integers from 21 to 50 (i.e multiply numbers 21, 22,,50) (hint: see the documentation for the function prod). 2. a) Generate a random sample of 10 observations from exponential distribution with =1 and store it in a vector named "expobs". Type the command set.seed(698) and enter, before genetating the random numbers. b) Use the rep function to create a new vector "newexpobs" that consists of the first value of "expobs" repeated 1 time, the second value of "expobs" repeated 2 times, the third value of "expobs" repeated 3 times, ..., the 10th value of "expobs" repeated 10 times. (Hint: Look at the documentation on rep). c) Calculate the mean, the standard deviation, the 25% quantile and the 75% quantile of your new data "newexpobs". 3. a) For this question, use data in the file nba.txt . The data are online at http://www.student.math.uwaterloo.ca/~stat698/stat698_09_05/nba.txt The first column of this dataset represents the id number of some basketball players. The second column represents their height (in inches) and the third one represents their weight (in pounds). Show how you can use the read.table function to read this data set into R. b) Identify all the players (by their number) whose height is >83 and weight is <=240. c) Using the above data, fit a linear regression model, height = 0 + 1 weight + d) Construct 90% confidence interval for the predicted value of mean height as well as predicted individual value of height the mean height when weight = 201, 220 and 245 pounds. (Hint: see the documentation for the command predict or predict.lm). e) Suppose that the first half of the players are women (number 1 to 205) and the second half are men. Plot height vs weight for men. ON THE SAME PLOT, plot height vs weight for women, but use a different plotting symbol (be sure they both have the same scale ). Add a legend on the plot and a title. Be sure to include a printout of your functions and the final plots. 4. Write a function that takes a single vector as argument, and returns the Hoteling's T2 statisitc,
! 2 1.5 0.5" # 3 0.8$ . Use this function to get T2 value T =(x ) (x-), where =(1.5, 3, 2) and = 1.5 # $ #0.5 0.8 4 $ % & for vectors (1,2,3), (1.2, 2.8, 2.3) and (2.3, 1.9, 4).
2 -1