Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R Commands

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Math 1320: Useful R Commands

Storing Data:
x <- c(100,200,300,400) c stands for concatenate
x = c(100,200,300,400) same as above
x = 1:100 makes the list x = c(1,2,3,...,100)

Miscellany:
sort(x) sorts list of data x
length(x) gives the number of points in dataset (i.e., the sample size) of x
sum(x) sums the values in the list of data x
x*y Given two lists of data, x and y, each the same length, it produces a
new list of data consisting of the products of the two lists.
table(x) Makes a frequency table for the dataset x
stem(x) Creates a stem-and-leaf plot for x
format(x,scientific = F) Prints the number x in decimal format

Generating Random Numbers:


runif(5,0,3) Prints 5 random real numbers, uniformly chosen from [0,3)
runif(5) Prints 5 random real numbers, uniformly chosen from [0,1)
sample(1:1000,10) Prints 10 random integers from 1 to 1000 (without replacement)
#Add “replace=TRUE” to allow replacement

Using R as a calculator:
+-*/ 4 basic arithmetic operations
2**3 or 2^3 gives 2 to the third power (exponentiation)
sqrt(9) gives square root of 9
abs(-3) gives absolute value of -3
factorial(4) gives 4! = (4)(3)(2)(1) = 24
choose(10,3) gives C(10,3) = 10!/(7!3!) = 120
To produce P(10,3), use that P(10,3) = C(10,3)*factorial(3)

Measures of Central tendency:


mean(x) gives mean of x
median(x) gives median of x

Measures of Spread:
sd(x) sample standard deviation of x
var(x) sample variance of x
range(x) Outputs max and min of data set
fivenum(x) produces 5-number summary
Boxplots have too many quirks to be useful for this course**
Discrete Random Variables:
sum(x*p) Computes expected value of a discrete random variable whose
outputs are x and corresponding probabilities p
sqrt(sum((x -m)^2*p)) Computes the standard deviation of a random variable whose
outputs are x and corresponding probabilities p

Binomial Distribution B(t,p)


dbinom(x,t,p) Gives probability of x successes in t trials, where a success has
probability p. You can use x = a:b if you want to compute a range of
probabilities. Note: dbinom(x,t,p) = choose(t,x)*p^x*(1-p)^(t-x) = P(X = x)
pbinom(x,t,p) Gives the cumulative probability of x or fewer success in t trials. In
other words, this command gives the cumulative probability P(X ≤ x). To
find P(X ≥ x), use 1 - pbinom(x-1,t,p).
qbinom(r,t,p) Gives the smallest integer k such that P(X ≤ k) ≥ r. In other words,
qbinom(r,t,p) = k if pbinom(k,t,p) < r, and pbinom(k,t,p) ≥ r.
rbinom(n,t,p) Gives n random numbers from a binomial distribution of t trials,
where a success has probability p.

Normal Distribution N(µ,σ2)


**The mean is assumed to be 0 and the standard deviation assumed to be 1 if no values are specified.
**The functions pnorm and qnorm assume you want cumulative probabilities. If instead you want the
probabilities that lie on or to the right of a given value, include “lower.tail = FALSE” in the function.
dnorm(x,mean,sd) Gives the y-value (height) of the normal curve at the value x.
pnorm(x,mean,sd) Gives the cumulative probability P(X ≤ x) (i.e., area under the normal
curve to the left of x). pnorm(x) gives the area to the left of x
under the normal curve, so pnorm(x) gives the area in the z-table for x.
qnorm(α,mean,sd) Gives the number z such that P(X ≤ z) = α. Thus,
qnorm(α) = z, if and only if pnorm(z) = α.
rnorm(n,mean,sd) Gives n random numbers from a normal distribution with specified mean
and standard deviation.
qqnorm(x) Gives a normal probability plot of the dataset x.

The Mean and Standard Deviation of the Sample Mean


combn(x,2) Lists the pairs of elements from x (without replacement, where no item gets
chosen twice) **The value 2 can be changed to any value 1 < k < length(x).
combn(x,2,mean) Lists the mean of all pairs of elements from x

Student’s t-Distribution
dt(x,df) Gives y-value (height) of the t-distribution at x with df degrees of freedom.
pt(x,df) Gives the cumulative probability at x in a t-distribution with df degs of freedom.
qt(α,df) Gives the x-value such that pt(x,df) = α.
qt(α,df,lower.tail=FALSE) Returns the value 𝑡
α
rt(n, df)Gives n random value from a t-distribution with df degrees of freedom.
Ch. 9: One Sample Student’s t-test (determining one population mean)
t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu=mu_0, conf.level = 1-α)
Performs the one-sample t-test. The variable x is the data, mu = mu_0 (H_0), alternative
gives the type of alternative hypothesis, and conf.level gives the confidence level (1-α).
This test will automatically compute p-values, but it is only valid for the t-test (and not the
z-test). The default value of alternative is "two.sided", the default of mu is 0, and the default
of conf.level is 0.95 (i.e. α = 0.05).

Ch. 9: P-Values
1. If the population standard deviation is known and z is your test statistic:
a. Two-sided: p = 2*pnorm(-abs(z))
b. Left-Tailed: p = pnorm(-abs(z))
c. Right-Tailed: p = pnorm(-abs(z))
2. If the population st. deviation is unknown and t is your test statistics (with df deg of freedom):
a. Two-side: p = 2*pt(-abs(t),df)
b. Left-Tailed: p = pt(-abs(t),df)
c. Right-Tailed: p = pt(-abs(t),df)

Ch. 10: Two Sample Student’s t-test (a.k.a Welch’s t-test)


t.test(x, y, alternative = c("two.sided", "less", "greater"), paired=FALSE, conf.level = 1-α, var.equal =
TRUE)
Performs the two-sample t-test. The sets x and y are the data,
alternative gives the type of alternative hypothesis, and conf.level gives the confidence
level (1-α). This test will automatically compute p-values, but it is only valid for the t-test
(and not the z-test). The default value of alternative is "two.sided", the default of mu is 0
(and since we always assume mu =0, we can always omit it), and the default of conf.level is
0.95 (i.e. α = 0.05). The command paired = FALSE is set by default, and if paired = TRUE is
used, it must be used for datasets x and y of the same length.
***NOTE: If you set var.equal = FALSE, R assumes that the two data sets are unpooled, so
that df = Δ. If you set var.equal = TRUE, then R runs this as a pooled t-test, so that df =
n_1 + n_2 -2. The default value is var.equal=FALSE
Ch. 12: Proportions
prop.test(x, n, p, alternative = c("two.sided", "less", "greater"), conf.level = 1-α, correct=FALSE)
Performs the one and two proportion test. The default value for alternative is “two.sided”
and the default value for conf.level is 0.95, while the default for p is 0.95. The term ‘correct =
FALSE’ has to do with Yates’ continuity correction, and it is meant to handle the errors that
arise in the difference between a discrete variable and a continuous (smooth) variable when
n is small. We did not discuss this, so we will always set correct=FALSE. Note that if you
want to conduct the two proportions test, you will need to enter your data as lists such as x =
c(data1, data2) and n = c(n1, n2).
***DISCLAIMER: The prop.test should not be used for computing confidence
intervals on small samples (say less than 500), as it does not use the confidence
interval we discuss in class, but rather uses the Wilson method for CI.
Making Tables/Matrices in R:
matrix(x,nrow=m,ncol=n,byrow=TRUE)
This creates an m-by-n matrix/table (i.e. with m rows and n columns). The byrow
command dictates whether the inputs from the dataset x are inserted into the matrix by
rows or by columns.

Ch. 13: Chi-Square Tests


chisq.test(x,p,correct=FALSE)
If x is a dataset or list (that is, if x is a vector), and p is a list of probabilities (adding to 1),
this function runs Pearons’ goodness of fit test.
chisq.test(M,correct=FALSE)
If M is a matrix with at least two rows and columns, the function runs a test for association
or a test for homogeneity. See R for details.
2
qchisq(α,df) Gives value x such that the area under the χ distribution to the right of x equals α
chisq.test(M)$expected gives the expected values for a matrix/table M.
rowSums(M) outputs the sums of the rows of a matrix
colSums(M) outputs the sums of the columns of a matrix
M/colSums(M) builds a conditional distribution from a contingency table

Ch. 14: Linear Regression and Correlation Coefficient


data.frame(x,y)
Makes a dataframe out of lists x and y. A dataframe is an organized way to group all data.
Note that x and y must have the same length for a dataframe to be defined. You can make a
dataframe with any number of components. For example, data.frame(x,y,z) produces triples
from the datasets x, y, and z.
plot(x,y,cex=1.3,pch=16,col= “blue4” ))
Gives a scatterplot for data (x,y). The code “cex” dictates the point size, “pch” is the point
Character (#16 is filled in dots, #1 is default and is empty dots), and “col” changes color.
lm(y~x) (Note: this also works if you type lm(df), where df = data.frame(x,y).
Finds the coefficients b0 and b1 such that ŷ = b0 + b1x.
summary(lm(y~x))
Gives a summary of the lm function including the r^2 value (called Multiple R-squared)
abline(b0,b1)
Plots the regression line y = b0 + b1x on top of the points used to run the regression. You
must plot the points (x,y) first for this to show up correctly in R.
print(x,digits=n)
Prints the value x with the precision of n digits. For example, typing “pi” in R returns
3.141593, but “print(pi,digits=10”) returns 3.141592654. This “print” code is useful for
getting more decimal places from a linear regression model.
cor(x,y)
Returns the linear correlation coefficient r
Ch. 16: F-distribution and ANOVA
qf(alpha,df1,df2)
Produces the critical value x in the F-distribution so that the area under the curve to the
right of x is equal to alpha. Here df1 is the numerator degrees of freedom and df2 is the
denominator degrees of freedom.
pf(x,df1,df2)
Gives the cumulative probability (i.e. area under the curve from 0 to x) of the F distribution
with df = (df1, df2)
rep(x, times=n)
Repeats the value x (x can be a string, vector, matrix, …) n times. If x = c(x1, x2, x3) and if n
= c(t1,t2,t3), then rep(x,times=n) is the string consisting of x1 repeated n1 times, x2 repeated
n2 times, and x3 repeated n3 times, and so on.
anova(aov(X~Group))
Runs the ANOVA test on the Analysis of Variance Model

You might also like