Introduction To R Exercise 1
Introduction To R Exercise 1
Exercise 1
1. Start R . Select R from the Start -> All Programs -> R menu.
2. On the command line, type demo(graphics). Follow prompts. You might prefer to re-size the
windows.
3. On the command line, type demo(image). This demonstration is concerned with representations
of 3D data.
It will be useful to create a location for files for this course. Your course account includes a network
drive, drive H:. Create a new directory on this drive, called rcourse say. Follow these instructions to
create a shortcut on the desktop for R to open using this location.
4. Click Next.
2. Change the Start in entry to the network directory you created, and click Ok.
Now, start R using this new shortcut. Type shell("dir"). Which directory is referred to? Finally,
quit this R session.
1
Introduction to R
Exercise 2
http://stats.ma.ic.ac.uk/das01/public_html/RCourse/hills.txt
and save it in the directory created earlier. This file contains the Scottish hill races data set. Now,
enter the following commands
hills
pairs(hills)
How many variables do you think such plots are suitable for?
attach(hills)
5. Construct a scatter plot. The function call in this way means the first argument is the horizontal
axis
plot(dist,time)
identify(dist,time,row.names(hills))
lm(time~dist)
summary(lm(time ~ dist))
2
9. Add the least squares regression line - note anonymous function call
abline(lm(time~dist))
10. Obtain some diagnostics plots - note the different arguments to the plot function. Be aware of
the prompt in the Console.
plot(lm(time~dist))
11. there are many pre-defined system objects. Display the value of pi - note that this is a reserved
word
pi
ls()
ls
ls()
rm(hill.cp)
3
Introduction to R
Exercise 3
1. Create a vector of coefficients for a quadratic equation, using the sample function. Here, we draw
a sample of size 3 from −20, −19, . . . , 19, 20 with replacement
class(coeffs)
length(coeffs)
names(coeffs)
5. Assign some names. Note the function call occurring on the right hand of the assignment operator.
6. Prepare to plot the equation, by constructing a regularly spaced vector for the horizontal axis
x <- seq(-3,3,length=200)
y <- coeffs[1]*x^2+coeffs[2]*x+coeffs[3]
plot(x,y)
plot(y=y,x=x)
4
9. Does the equation have real roots? Compute the discriminant
coeffs[2]^2-4*coeffs[1]*coeffs[3]
10. Oops, we didn’t retain the value! R stores the last unassigned object in the system object
.Last.value
11. Create a vector of type character, and display the second element
Problems
1. Compute the real roots of the quadratic equation
x2 + x + 1 = 0
x <- c(1,2,3)
x[1]/x[2]^3-1+2*x[3]-x[2-1]
3. Generate a regular grid between -50 and 50. Construct separate plots of log(x), exp(x), sin(x),
sin(2x), sin(x)/cos(x). Examine the cumulative sum of the final function. Experiment with the
argument type of the plot function.
5
Introduction to R
Exercise 4
!x
1:3 + c(T,F,T)
intersect(1:10,5:15)
6. Create a factor
unclass(drinks)
Problems
1. Compute the truth table for logical OR. The function R computes the logical EXCLUSIVE-OR.
What is the difference between the two?
2. Consider the vector 1:K, where K is a positive integer. Write an R command that determines how
many elements in the vector are exactly divisible by 3.
3. Write an R command to evaluate the proportion of beer in the drinks factor object.
6
Introduction to R
Exercise 5
This sheet is concerned with data frames and matrices.
row.names(hills)
names(hills)
mean(x2.df$X1)
4. Create a matrix
x.mat <-matrix(1:12,nrow=3,ncol=4)
dimnames(x.mat)
7. Combine matrices
xx <- cbind(x.mat,x.mat)
xxx <- rbind(x.mat,x.mat)
rbind(xx,xxx)
8. Explore indexing
x <- 1:10
names(x) <- letters[x]
x[1:3]
x[c(1,10)]
x[c(-1,-2)]
x[ x > 5]
x[c("a","d")]
x[]
jj1 <- matrix(1:100,ncol=10)
jj1[1:5,]
jj1[1:4,x[x <3]]
7
9. Compute row and column sums of a data frame
x <- matrix(1:10,ncol=2)
lapply(x,sum)
sapply(log(x),sum)
apply(x,1,sum)
apply(x,2,sum)
Problems
1. Construct a 2×2 data frame, X say. Experiment with X^(1:K), where K takes values 1:4. How
does the recycling rule behave? What happens if you remove the brackets from the command?
2. The function system.time returns timings for R operations. Examine the help system about this
function. For a 107 × 2 matrix, X, and vector y of length 107 /2 compute (a number of times) X t y
using matrix multiplication and the function crossproduct. Which is quicker?
8
Introduction to R
Exercise 6
ls()
objects()
search()
attach(hills)
search()
write.table("test.dat",hills)
Where is this file situated on the file system. Examine the default output format by viewing the
file with your favourite text editor.
?mean
help.start()
sin(matrix(0,nrow=5000,ncol=5000))
Use hESCi to interrupt. Note that this might take some time to return control to the console.
Problems
1. Generate a matrix of size n × p. Use the function as.data.frame to coerce the matrix to a data
frame. Which object requires more storage space?
9
Introduction to R
Exercise 7
1. Generate a sample of random normal deviates, and a sample of random exponential deviates.
x<- rnorm(50)
y <- rnorm(50,0,1)
mean(x)
sqrt(var(x))
cor(x,y)
cor(cbind(x,y))
summary(x)
summary(cbind(x,y))
4. Let X ∼ N (0, 1) and Y ∼ Exp(2). Compute P (X > 1.644) and find q such that P (Y < q) = 0.75.
1-pnorm(1.644)
qexp(0.75,2)
5. Use the sample function to obtain a random sample of 10 realisations in a biased coin experiment
sample(c("Head","Tail"), 10,probs=c(0.3,0.7),replace=T)
help(package="SuppDists")
set.seed(1)
runif(10)
set.seed(1)
runif(10)
runif(10)
plot(qexp(ppoints(x),1),sort(x))
abline(0,1)
10
9. Compare the two samples with a QQ plot
qqplot(x,y)
abline(0,1)
boxplot(x,y)
plot(c(x,y),rep(0:1,c(length(x),length(y))),xlab="",ylab="")
par(mfrow=c(2,1))
hist(x)
boxplot(y)
13. Consider the Pima Indians data: a collection of variables observed on a particular group of native
American Indians who are either healthy or diabetic. This data includes measurements of tricep,
skinfold and blood glucose level.
First, load the mlbench package.
attach(PimaIndiansDiabetes)
plot(triceps,glucose,type="n")
plot(triceps[diabetes=="neg"],glucose[diabetes=="neg"],xlab="Tricep",ylab="glucose")
points(triceps[diabetes=="pos"],glucose[diabetes=="pos"],xlab="Tricep",ylab="glucose",col=2,pch=2)
legend(40,50,c("Diabetes","Healthy"),pch=1:2)
Problems
1. Examine the built in ChickWeight data (the help gives background about the data). The function
split will prove useful to do the following (as will a script)
(a) Construct a plot of weight against time for chick number 34.
(b) For chicks in diet group 4, display box plots for each time point.
(c) Compute the mean weight for chicks in group 4, for each time point. Plot this mean value
against time.
(d) Repeat the previous computation for group 2. Add the mean for group 2 to the existing plot.
(e) Add a legend and a title.
(f) Copy and paste the graph into Word.
Note that some of these steps will be easier once we have some more programming expertise.
11
Introduction to R
Exercise 8
1. Write an R expression to determine if two sets, A and B, represented as integer vectors are disjoint.
If they are disjoint, display elements of set A otherwise display elements of set B. (Examine the
help for functions print and cat).
2. Write R codes that takes the coefficients of a quadratic equation, and outputs an appropriate
message for the cases of (i). two distinct roots (b2 − 4ac > 0) (ii) coincident roots (b2 = 4ac) or
(iii). complex roots (b2 < 4ac).
3. Let vector y be the logarithm of a random sample from a standard normal distribution, N (0, 1).
Use the ifelse function to replace missing values with the value 9999.
4. Let n be a large integer. Compute a vector x containing n random uniform deviates. Embed the
following code in the system.time function
y <- sin(x)
Which is faster?
A = P × (1 + R/100)n
where P is the original money lent, A is what it amounts to in n years at R percent per year
interest.
Write R code to calculate the amount of money owed after n years, where n changes from 1 to
15 in yearly increments, if the money lent originally is 5000 pounds and the interest rate remains
constant throughout the period at 11.5%.
6. Write a loop structure to scan through an integer vector to determine the index of the maximum
value. The loop should terminate as soon as the index is obtained. (Don’t worry about ties).
Of course, this is not a good way to do this! Examine the help for the rank, sort and order
functions.
12