0% found this document useful (0 votes)

36 views

R Programming

The document provides tips for R programming, including saving scripts, using comments to document code, avoiding hardcoded values by using variables, and how to use if/else statements, loops, vectors, and define custom functions. It also discusses exploring and summarizing data frames to understand the key variables and their distributions.

Uploaded by

Sebastian Bejarano

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

R Programming

Uploaded by

Sebastian Bejarano

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

R programming

University of Trento - FBK

19 February, 2015

1 / 50
Hints on programming

1 Save all your commands in a SCRIPT FILE, they will be useful in future...no one knows...
2 Save your script file any time you can! You swet a lot writing those instructions; You don’t
want to loose them!
3 Try to give smart name to variables and functions (try to avoid “pippo”, “pluto” “a”, “b” etc...)
4 Use comments to define sections in your script and describe what the section does
If you read the code after 2 month you won’t be able to remember what it does, unless you try to read all the instructions...it’s
not worth spending time reading codes, use COMMENT instead
5 If using values in more than one instruction, try to avoid code repetitions and static values.
BAD:
sum(a[a>0])

GOOD:
thr <- 0
sum(a[a>thr])

2 / 50
Programming with R

The if then else statement

Check whether a condition is TRUE or FALSE
Syntax:
if (expr is TRUE){ do something
} else { do something else}
expr can be one logical expression as seen before

A simple if statement: A more complex if statement:

If the instruction is on one line and there is
no else -> no need for curly brackets
x <- 5 x <- 5
y <- 2 y <- 3
## if (y!=0) xy <- x/y if (x > 5){
## xy xy <- x - y ## expr = TRUE
} else {
xy <- x + y ## expr = FALSE
}

## [1] 8

3 / 50
Testing condition using combination of epression (& |)
a<-2
b<-3
d<-4
# Using & to test two conditions, both true
if(a<b & b<d)
x<-a+b+d
x
## [1] 9

# Using & to test two conditions, one is false

if(a>b & b<c)
y<-a-b-d
## Error in b < c: comparison (3) is possible only for atomic and list types
y
## Error in eval(expr, envir, enclos): object ’y’ not found
# Using | to test two conditions, both false
if(a==b | a>d)
z<-a*b*d
z
## [1] 24
# Using or to test two conditions, one true
if(a<b | a>d)
z<-a*b*d
z
## [1] 24

4 / 50
Looping

The while() statement

Syntax:
while( expr ){
do something
}
An example
x <- 0 ## set the counter to 0
while( x<5 ){ ## do the same operation until x is < 5
x <- x + 1 ## update x
}
x
## [1] 5

Pay attention to the condition

x <- 0
y <- 0
## while (x < 5){
## y <- y + 1
## }

5 / 50
Looping II

The for() statement

Syntax:
for (i in start:stop ){
do something
}
An example
y <- vector(mode="numeric") ## Allocating an empty vector of mode "numeric"
for (i in 1:5){
y[i] <- i + 2
}

Nested Loops
mat <- matrix(nrow=2,ncol=4)
for (i in 1:2){
for (j in 1:4){
mat[i,j] <- i + j
}
}
mat
## [,1] [,2] [,3] [,4]
## [1,] 2 3 4 5
## [2,] 3 4 5 6

6 / 50
Vectors I I
Indexing

Use the square brackets to access a slot in a vector []

a[2] ## Extract the second element
## [1] 89

R stats counting from 1

a[0] ## Does not exists!
## integer(0)

We can pass multiple indexes using c() function

a[2:3]
## [1] 89 54
## a[2,3] ## What happen here?

What happen when I use a negative number as index

b[-1] ## All but the first element
## [1] 2 3 4 5 6 7 8 9 10

e[-c(1,4)] ## All but the first and the fourth elements

## Error in eval(expr, envir, enclos): object ’e’ not found

NB: Do not use c as variable name

7 / 50
Subsetting using logical operators

Using logic operator inside indexes

Logical operator can be use to subset a vector
Select only the element of the vector matching the TRUE condition
x <- 5:15
y <- 10
x[x > y]
## [1] 11 12 13 14 15
x[x==y]
## [1] 10

can be used also in matrices

mymat <- matrix(3:9, ncol=3)
## Warning in matrix(3:9, ncol = 3): data length [7] is not a sub-multiple or multiple of
the number of rows [3]
mymat > 7 ## Get TRUE where mymat is bigger than 7
## [,1] [,2] [,3]
## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE
mymat[mymat>7] ## Get the actual values where mymat is bigger than 7

## [1] 8 9

8 / 50
Subsetting using logical operators II
Getting indexes

The which() function

Syntax:which(expr)
works only on vectors (matrix and data.frame)
returns the indexes where the expr is TRUE
expr can be any logical expression; combination of AND, OR are accepted

mymat > 7

## [,1] [,2] [,3]

## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE

## Get the indexes where mymat > 7

which(mymat>7)

## [1] 6 7

which(mymat>7, arr.ind=TRUE)

## row col
## [1,] 3 2
## [2,] 1 3

9 / 50
Exercises I

1 Given an integer number x check all its divisors.

2 Given an integer number x compute the sum of all its divisors.
3 A perfect number is a number whose sum of the divisors (apart from itself) is equal to the
number itself. For example 6 is perfect because 1 + 2 + 3 (the divisors) = 6.
1 Given an integer number check if it is perfect.
2 Given an integer number x find all perfect numbers i < x.

10 / 50
Functions I

Define your own function

We have seen many function such as:
sum(mymat)
## [1] 49
mean(mymat)
## [1] 5.4444

Now you can define your custom function

myfunction <- function(arg1, arg2){
do something with arg1 and arg2
return(results)
}
Define a function to convert Fahrenheit to Celsius
FtoC <- function(F){
cels <- (F - 32) * (5/9)
return(cels)
}
FtoC(212)

## [1] 100

11 / 50
Functions II

Define a function to make the power of a number/vector

Use default argument
mypow <- function(x, exponent=2){
res <- x^exponent
return(res)
}
mypow(2)
## [1] 4
mypow(3,5)
## [1] 243

Variables defined inside a function will be valid only inside the function
res
## Error in eval(expr, envir, enclos): object ’res’ not found

Use debug() for debugging a function

It will run line by line
It allows to see the values of the variable inside the function
Each time the function is defined the debug mode will be removed
To exit the debug mode type c

debug(mypow)

12 / 50
Functions II

Function arguments can be call according to positions

bt <- read.table("../Lesson1/example1/BodyTemperature.txt",TRUE, " ") ## This will assign the f
## Gender Age HeartRate Temperature
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

Function arguments can be call by name

## Call arguments by name (position does not count)
bt <- read.table("../Lesson1/example1/BodyTemperature.txt",sep=" ", header=TRUE)
## Gender Age HeartRate Temperature
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

13 / 50
Data Exploration and summary statistic

Develop high level understanding of the data

Given a data.frame let’s understand the data inside.
What variables do we have?
Do they have meaningful names?
What are the variable types? (numeric, boolean, categorical)
What is the distribution of the data?
Are there any categorical variable?

The aim is to reduce the amount of information and focus only on key aspect of the data

14 / 50
Working with data objects

As an example let’s work on the labdf dataset.

bt <- read.table("BodyTemperature.txt", header=TRUE, sep=" ", as.is=TRUE)
head(bt) ## Let's look onlyt the firsts rows of the data.frame

## Gender Age HeartRate Temperature

## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

15 / 50
Working with data objects

Get the structure and some useful statistic

str(bt) ## See the structure of the data object

## 'data.frame': 100 obs. of 4 variables:

## $ Gender : chr "M" "M" "M" "F" ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...

summary(bt) ## Compute some statistic on each variable in the data.frame

## Gender Age HeartRate Temperature

## Length:100 Min. :21.0 Min. :61.0 Min. : 96.2
## Class :character 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Mode :character Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3

names(bt) ## Get the variable names

## [1] "Gender" "Age" "HeartRate" "Temperature"

16 / 50
Working with data objects I

Change the variable mode of the columns:

Check the variable modes
is.data.frame(bt) ## Check if the object is a data.frame

## [1] TRUE
is.numeric(bt$Age) ## Check if the mode of the column is numeric
## [1] TRUE
is.character(bt$Gender) ## Check if the mode of the variable Gender is character
## [1] TRUE

Look at the variable Gender, it is categorical, but it’s stored as character

as.factor(bt$Gender) ## Change variable mode Gender into factor (categorical)
## [1] M M M F F M F F F M M F F F F M F M F F F F F M F M M M M F F F M M M
## [36] F F M F F M M F M M M F F F F M F M M F F F M F F F M M F M M F M M M
## [71] F F M M M M F M F M M F F M F M M M F M F F M M F M F F F M
## Levels: F M

17 / 50
Working with data objects II

Store the changes on the data.frame and check the data.frame

bt$Gender <- as.factor(bt$Gender) ## Store the previous change
str(bt) ## Look at the structure
## 'data.frame': 100 obs. of 4 variables:
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 1 1 2 1 1 1 2 ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...
summary(bt) ## Compute some statistic
## Gender Age HeartRate Temperature
## F:51 Min. :21.0 Min. :61.0 Min. : 96.2
## M:49 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3

18 / 50
Exercise II

1 Define a function that converts km to miles and viceversa.

2 Define a function that check wheter a number is perfect (vd Exercise I).
3 Define a function that given a numeric matrix returns the log of the matrix where the matrix
element is > 0 and NA otherwise.
4 Get the dataset SAheart_sub.data from the website and check the type for each column.
Add a column of factor type with Alchoolic where the value of alchol consumption is > 13 and
Non-Alcoholic otherwise.

19 / 50
Probability Distributions in R

Probability functions:
Every probability function in R has 4 functions denoted by the root (e.g. norm for normal
distribution) and a prefix:
p for “probability”, the cumulative distribution function (c.d.f.)
F (x) = P(X <= x)

q for “quantile”, the inverse of c.d.f.

x = F −1 (p)

d for “density”, the density function (p.d.f.)

2
f (x) = √1 e−x /2
2π

r for “random”, the random variable having the specified distribution

Example:
For the normal distribution we have the functions: pnorm, qnorm, dnorm, rnorm

20 / 50
Probability distribution in R
Available functions

Distributions Functions

Binomial pbinom qbinom dbinom rbinom

Chi-Square pchisq qchisq dchisq rchisq

Exponential pexp qexp dexp rexp

Log Normal plnorm qlnorm dlnorm rlnorm

Normal pnorm qnorm dnorm rnorm

Poisson ppois qpois dpois rpois

Student t pt qt dt rt

Uniform punif qunif dunif runif

Check the help (?<function>) for further information on the parameters and the usage of each
function.

21 / 50
The Normal Distribution in R
Cumulative Distribution Function

pnorm: computes the Cumulative Distribution Function where X is normally distributed

F (x) = P(X <= x)

## P(X<=2), X=N(0,1) Normal Cumulative

pnorm(2)

1.0
## [1] 0.97725

0.8
## P(X<=12), X=N(10,4)
pnorm(12, mean=10, sd=2)

0.6
## [1] 0.84134

pnorm

0.4
What is the P(X > 19) where
0.2
X = N (17.4, 375.67)? 0.0

−4 −2 0 2 4

22 / 50
The Normal Distribution in R
The quantiles

qnorm: computes the inverse of thd c.d.f. Given a number 0 ≤ p ≤ 1 it returns the p − th quantile
of the distribution.
p = F (X )
X = F −1 (p)

## X = F^-1(0.95), N(0,1) Normal Density

qnorm(0.95)

1.0
p

0.95
## [1] 1.6449

0.8
## X = F^-1(0.95), N(100,625)
qnorm(0.95, mean=100, sd=25)

0.6
## [1] 141.12

pnorm
qnorm(p)

What is the 85-th quantile of X = N (72, 68)? 0.4

0.2
0.0

1.645
−3 −2 −1 0 1 2 3

23 / 50
The Normal Distribution in R
The Density Function

dnorm: computes the Probability Density Function (p.d.f.) of the normal distribution.
(x−µ)2
−
f (x) = √1 e 2σ 2
2π

## F(0.5), X = N(0,1) Density Function

dnorm(0.5)

0.4
## [1] 0.35207

## F(-2.5), X = N(-1.5,2)

0.3
dnorm(-2.5, mean=-1.5, sd=sqrt(2))

## [1] 0.2197

dnorm

0.2
0.1
0.0

−4 −2 0 2 4

24 / 50
The Normal Distribution in R
The Random Function

rnorm: simulates a random variates having a specified normal distribution.

## Extract 1000 samples X = N(0,1) Histogram of x

x <- rnorm(1000)

0.025
## Extract 1000 samples X = N(100,225)
x <- rnorm(1000, mean=100, sd=15)

0.020
xx <- seq(min(x), max(x), length=100)
hist(x, probability=TRUE)
lines(xx, dnorm(xx, mean=100, sd=15))

0.015
Density

0.010
0.005
0.000

60 80 100 120 140

25 / 50
Exercise III

1 Compute the values for p = [0.01, 0.05, 0.1, 0.2, 0.25] given X = N (−2, 8)
2 What is P(X = 1) when X = Bin(25, 0.005)?
3 What is P(13 ≤ X ≤ 22) where X = N (17.46, 375.67)?

26 / 50
Plotting in R

High level plot functions

Function Name Plot Produced

plot(x,y) Plot vector x against vector y
boxplot(x) "Box and whiskers" plot
hist(x) Histogram of the frequencies of x
barplot(x) Histogram of the value of x
pairs(x) For a matrix or data.frame plots all bivariate pairs
image(x,y,z) 3D plot using colors instead of lines

27 / 50
Simple visualization on numeric variables

Visualizing two vectors

x <- 1:10
y <- 1:10
plot(x,y)

10
●

●
8

●
6
y

●
4

●
2

2 4 6 8 10

28 / 50
Simple visualization on numeric variables

Visualizing two vectors, adding axis labels and changin the line type
plot(x,y, xlab="X values", ylab="Y values", main="X vs Y", type="b")

X vs Y

10
●

8
●
Y values

●
6

●
4

●
2

2 4 6 8 10

X values

More graphical parameter can be seen looking at the help of par

29 / 50
Additional parameter to graphical functions

Low level plotting functions

Adding point/line to an existing graph using points(x,y) and lines(x,y)
Adding text to an existing plot using text(x,y,label=”")
Adding a legend to a plot using legend(x,y,legend=”")

plot(x,y)
abline(0,1)
points(2,3, pch=19)
lines(x,y)
text(4,6, label="Slope=1") 10

●
8

Slope=1 ●
6
y

●
4

● ●

●
2

2 4 6 8 10

30 / 50
Barplot

The function barplot()

It plots the frequencies of the values of a variable
It is useful for looking at categorical values
It takes a vector or a matrix as input and use the values as frequencies
barplot(1:10)

10
8
6
4
2
0

31 / 50
Barplot

The function barplot()

Given a matrix as input (Death rates per 1000 population per year in Virginia)
VADeaths
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
barplot(VADeaths)
200
150
100
50
0

Rural Male Rural Female Urban Male Urban Female

32 / 50
Visualization on Categorical variables
Summarize the count for factors
table(bt$Gender) ## Collect the factors and count occurences for each factor

##
## F M
## 51 49
Look at the summarization in a bar plot
barplot(table(bt$Gender),
xlab="Gender", ylab="Frequency", main="Summarize Gender variable")

Summarize Gender variable

50
40
30
Frequency

20
10
0

F M

Gender

33 / 50
Histograms

The function hist()

Normaly used to visualize numerical variables
It is similar to a barplot but values are grouped into bins
For each interval the bar height correspond to the frequency (count) of observation in that
interval
The heights sum to sample size

34 / 50
Look at the distribution of the data

How the heart rate is distributed over our dataset?

Histogram of the HeartRate variable using frequency on the Y axis
hist(bt$HeartRate, col="gray80")

Histogram of bt$HeartRate
30
25
20
Frequency

15
10
5
0

60 65 70 75 80 85 90

bt$HeartRate

35 / 50
Look at the distribution of the data

Density on the Y axis

hist(bt$HeartRate, col="gray80", freq=FALSE) ## Use parameter freq to change behaviour

Histogram of bt$HeartRate

0.06
0.05
0.04
Density

0.03
0.02
0.01
0.00

60 65 70 75 80 85 90

bt$HeartRate

36 / 50
Look at the distribution of the data

Changing the intervals

hist(bt$HeartRate, col="gray80", breaks=50) ## Use parameter breaks to change intervals

Histogram of bt$HeartRate

8
6
Frequency

4
2
0

60 65 70 75 80 85

bt$HeartRate

37 / 50
Look at the distribution of the data

Adding information to the histogram, mean and median

hist(bt$HeartRate, col="gray80", main="Histogram of Hear Rate")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))

Histogram of Hear Rate

30
25
20
Frequency

Mean
Median
15
10
5
0

60 65 70 75 80 85 90

bt$HeartRate

38 / 50
Boxplots

The function boxplot()

Visualize the 5-number summary, the range and the quartiles

39 / 50
Boxplots

Look at the boxplot for the HearRate Variable

boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")

60 65 70 75 80 85

40 / 50
Boxplots

Look at the boxplot for the HeartRate Variable

boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")

points(bt$HeartRate, rep(1,length(bt$HeartRat)), pch=19) ## See where the data are
abline(h=1, lty=2)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

60 65 70 75 80 85

41 / 50
Using factors and formula objects

Using a factor as categorical variable to condition the plot

Conditioning a plot using the factor using the formula object:
bt$HeartRate ~ bt$Gender
The numeric values in bt$HeartRate will be divided according to categories in bt$Gender

boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col="grey80")

M
F

60 65 70 75 80 85

42 / 50
Pairs

The pairs()
function
It plots all the possible pairwise comparison in a data.frame
It allows a fast visual data exploration

pairs(bt) ## Look at all possible comparison at once

20 25 30 35 40 45 50 96 97 98 99 101

1.0 1.2 1.4 1.6 1.8 2.0

●● ● ●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●● ● ● ●● ●
●●●
●●●
●●●●
●●●
●●●
●●●
●●● ●● ●

Gender

● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●
●●
●●●●●●●●●●● ● ● ●●
●●●
●●●
●●●
●●
●●●
●●●
●●●
●● ●
20 25 30 35 40 45 50

● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●●
● ● ● ● ●
● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ●●●● ●
● ● ● ● ●● ●● ● ●
● ● ●● ●● ● ● ● ● ●
● ● ● ●● ●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ● ●● ●● ●●
● ● ●
● ● ● ● ●● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
Age ●
●●

●
●
●●
●● ● ●● ●●
● ●
●
● ●
● ●
●●
●●●●
●

●
●
●
●
●
● ●● ●
●●
● ●●
●● ● ●
●●
●
●
● ●
● ●● ●
●

● ● ●● ● ● ●
● ● ● ●● ● ●● ●●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●

● ● ● ● ●

60 65 70 75 80 85
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●● ●●
● ● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ● ●●●● ●
● ● ● ● ● ● ●● ● ● ● ● ●●● ●
● ● ● ● ●● ● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●● ●
●
●
●
●
●
●
●
●
●
●
● ●

●●
●●
● ●
● ●● ●● ● ●
● ●●●
●● ●
●
●
●
●

●
HeartRate ●●

●
●●
●●●●●
●
● ● ● ●
●
●●●
● ● ●
●
●
●
●
●
●●
● ● ● ● ● ●● ● ● ● ●●
● ● ● ● ● ●●● ● ● ●● ● ●● ●
● ● ● ●● ● ● ● ● ● ●
● ● ● ●● ●● ● ● ●
● ●
●●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●

● ● ●
101

● ● ●
● ● ●
● ● ●
●
● ● ● ● ●
●
● ● ● ● ●
● ● ●
●
● ● ● ● ● ●● ●●● ●
● ● ● ● ● ●● ● ● ●
96 97 98 99

● ● ● ●● ●● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ● ●
●●
●●●●
●
●● ●● ●
● ●
● ●●
●
● ●
●

● ●
●●
●
●
● ●●●
●●● ● ●● ●
● ● ●
●
●●● ●
● ● ●
●
Temperature
●
● ● ●● ● ●
●
● ● ● ●●●●● ● ● ● ●●●●●
● ●● ●
● ●
● ● ● ●● ●● ●● ●●●
● ● ● ●
● ●
● ●● ● ●
●
● ●● ● ● ● ●● ● ●
● ●●
●
● ●
● ● ● ● ● ●● ●●● ●●●● ●● ●●
● ●
● ● ● ● ● ●
● ● ●
● ● ●

1.0 1.2 1.4 1.6 1.8 2.0 60 65 70 75 80 85

43 / 50
Normal plot

Let’s look at the variable HearRate vs Temperature

See the use of ∼ in the plot command
## plot(bt$HeartRate, bt$Temperature)
plot(bt$HeartRate~bt$Temperature, main="Heart Rate vs Temperature")

Heart Rate vs Temperature

● ●
85
●
●
●
● ● ● ●
80

● ● ● ●
● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●●
bt$HeartRate

● ● ●● ●
75

● ● ● ●● ● ●●
● ●●
●●● ● ● ●●
● ●● ● ● ●
● ●● ● ●
70

● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●● ●●● ●
●
65

●
●
●
●
60

96 97 98 99 100 101

bt$Temperature

44 / 50
Multiple plots on the same windows
Put more information together on the same plot
par(mfrow=c(2,1)) ## Note mfrow defining 2 rows and 1 column for allowing 2 plots
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))
boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col=c( "pink", "blue"))
title("Boxplot for different gender")
points(bt$HeartRate[bt$Gender=="F"], rep(1,length(bt$HeartRate[bt$Gender=="F"])), pch=19)
points(bt$HeartRate[bt$Gender=="M"], rep(2,length(bt$HeartRate[bt$Gender=="M"])), pch=19)

HeartRate histogram
25
Frequency

Mean
15

Median
0 5

60 65 70 75 80 85 90

bt$HeartRate

Boxplot for different gender

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
F

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

60 65 70 75 80 85

45 / 50
Exporting graphs
It is possible to export graph in different formats
Png, Jpg, Pdf, Eps, Tiff
Look at the help for the functions pdf,png
pdf("myfirstgraph.pdf") ## Start the png device
par(mfrow=c(2,1))
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
boxplot(bt$HeartRate, horizontal=TRUE, col="grey80", main="Boxplot")
dev.off() ## switch off the device
1.4
1.2
nif
.0

46 / 50
Look probability distribution in plot

How an extraction from a N distribution looks like?

Extract enough samples from a N (0, 1)
Use Histogram to look at the data
x <- seq(-3,3,by=0.1) ## Create a vector of x values
y <- dnorm(x) ## Compute the normal density function over the vector x
plot(x,y,type="l") ## Plot it
0.4
0.3
0.2
y

0.1
0.0

−3 −2 −1 0 1 2 3

47 / 50
Data in R

R comes with a lot of dataset included

Look at all the available data sets with:
data() ## See all the availabel datasets
data(package = .packages(all.available = TRUE)) ## See all the available dataset in all the pav
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’base’ to package ’datasets’
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’stats’ to package ’datasets’

Get the VADeaths dataset from the datasets package

data(VADeaths, package="datasets") ## Load the dataset
## ls() ## Look if the dataseta has been loaded
## ?VADeaths ## Look at the documentation

48 / 50
Exercise I

1 Define a function that transform Celsius to Fahrenheit

Given the function defined before think on using an argument to compute the inverse (Fahreneit to
Celsius)

2 Define a function that given a number it computes the Fibonacci series

What can happen if a float number or a negative number is given?

3 Define a function that given a number it checks if it is a prime number

4 Two integer number are “friends” if the quotient between the number itself and the sum of the
divisors are equal. For example the sum of divisors of 6 is 1 + 2 + 3 + 6 =12. The sum of
divisors of 28 is 1 + 2 + 4 + 7 + 14 + 28 = 56. Then 12 /6 = 56 / 28 = 2, thus 6 and 28 are
“friends”.
Define a function that given 2 number as input checks if the numbers are “friends”.

5 Fix the number of samples to 1000 and extract at least 8 N (m, 1) where m ∈ [−3, 3].
With the same number of samples extract at least 8 N (0, s) where s ∈ [0.1, 2].
Plot the results in a same window with 3 different plot, one for N (m, 1), one for N (0, s) and one for
N (m, 1) and N (0, s) together. Decide the color code for each line
suggestion: search for “R color charts” in google and the function colors() in R

Plot the different distribution on the sample plot

49 / 50
Exercise II

6 Extract form a normal distribution an increasing number of samples (10-10000) and look at
the differences in the distribution between sample sizes

7 The dataset Pima.tr collects samples from the US National Institute of Diabetes and
Difestive and Kidney Disease. It includes 200 women of Pima Indian heritage living near
Phoenix, Arizona.
Get the dataset from the MASS package or download it from the website.
Describe the dataset, how many variables, which type of variable, how many samples ...
What do the variable mean?
Get the frquencies of the women affected by diabetes.
Explore the dataset using histograms, barplot and plots. For each plot you do describe what you see
and why did you do that plot.
Using categorical variable type to see if there is any difference in age distribution, bmi, and glu
variables

50 / 50

FINS1612 Case Study
100% (1)
FINS1612 Case Study
2 pages
Verbeke Chapter 1 Conceptual Foundations of International Business Strategy
No ratings yet
Verbeke Chapter 1 Conceptual Foundations of International Business Strategy
8 pages
Multidimensional Man by Jurgen Ziewe PDF
0% (1)
Multidimensional Man by Jurgen Ziewe PDF
7 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Introduction To R - Hands-On
No ratings yet
Introduction To R - Hands-On
64 pages
R Module 2
No ratings yet
R Module 2
30 pages
R
No ratings yet
R
13 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Grouping, Loops and Conditional Execution
No ratings yet
Grouping, Loops and Conditional Execution
13 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
SEC Notes
No ratings yet
SEC Notes
62 pages
An Introduction To R: Biostatistics 615/815
No ratings yet
An Introduction To R: Biostatistics 615/815
59 pages
BRM PRACTICAL FILE H--
No ratings yet
BRM PRACTICAL FILE H--
37 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
Unit_2_R
No ratings yet
Unit_2_R
16 pages
R Studio
No ratings yet
R Studio
41 pages
R Programming 101 Part 1
No ratings yet
R Programming 101 Part 1
53 pages
Glocal University: Practical File of R Programming
100% (1)
Glocal University: Practical File of R Programming
32 pages
unit-2
No ratings yet
unit-2
17 pages
Big-Data Unit-4
No ratings yet
Big-Data Unit-4
110 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Basic-coding-syntax-and-structure-in-R---version-2
No ratings yet
Basic-coding-syntax-and-structure-in-R---version-2
19 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
R study material I
No ratings yet
R study material I
8 pages
DSF Gourav-2
No ratings yet
DSF Gourav-2
30 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
R Examples
No ratings yet
R Examples
56 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Untitled
No ratings yet
Untitled
59 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Basic R Programming
No ratings yet
Basic R Programming
16 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
R Studio Practicals-1
No ratings yet
R Studio Practicals-1
29 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
R-Unit 2
No ratings yet
R-Unit 2
81 pages
R - Programming - Moduel 1 - Module 4
No ratings yet
R - Programming - Moduel 1 - Module 4
88 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
07-ProgrammingR - Programming With Data in R
No ratings yet
07-ProgrammingR - Programming With Data in R
14 pages
Introduction To Data Science With R Programming
No ratings yet
Introduction To Data Science With R Programming
91 pages
R Training by Emma Mba
No ratings yet
R Training by Emma Mba
68 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
CH 03
No ratings yet
CH 03
42 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
No ratings yet
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
10 pages
2 Functions
No ratings yet
2 Functions
49 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
PushpendraLabFile
No ratings yet
PushpendraLabFile
51 pages
R - Lab Experiments - Manual
No ratings yet
R - Lab Experiments - Manual
39 pages
Basics of R Programming - Part 2
No ratings yet
Basics of R Programming - Part 2
7 pages
R Introduction
No ratings yet
R Introduction
40 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
RStudio
No ratings yet
RStudio
31 pages
Rbasics
No ratings yet
Rbasics
96 pages
CH 3
No ratings yet
CH 3
33 pages
Data Structures
No ratings yet
Data Structures
8 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Maximum Likelihood Estimation: Topic 15
No ratings yet
Maximum Likelihood Estimation: Topic 15
10 pages
Training Affects Knee Kinematics and Kinetics in Cutting Maneuvers in Sport
No ratings yet
Training Affects Knee Kinematics and Kinetics in Cutting Maneuvers in Sport
10 pages
Injury Prevention Strategies at The FIFA 2014 World Cup: Perceptions and Practices of The Physicians From The 32 Participating National Teams
No ratings yet
Injury Prevention Strategies at The FIFA 2014 World Cup: Perceptions and Practices of The Physicians From The 32 Participating National Teams
7 pages
Is There A Relationship Between Landing, Cutting, and Pivoting Tasks in Terms of The Characteristics of Dynamic Valgus?
No ratings yet
Is There A Relationship Between Landing, Cutting, and Pivoting Tasks in Terms of The Characteristics of Dynamic Valgus?
8 pages
Bayesianmodeling: User Manual
No ratings yet
Bayesianmodeling: User Manual
53 pages
Journal of Statistical Software: R2Winbugs: A Package For Running Winbugs From R
No ratings yet
Journal of Statistical Software: R2Winbugs: A Package For Running Winbugs From R
16 pages
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
No ratings yet
Part A Simulation: Matthias Winkel Department of Statistics University of Oxford
54 pages
D3.15 Manuel Preservation-TN01175A - EN
No ratings yet
D3.15 Manuel Preservation-TN01175A - EN
7 pages
Design of Girt Member
No ratings yet
Design of Girt Member
3 pages
Download ebooks file Ethics for Engineers Martin Peterson all chapters
100% (4)
Download ebooks file Ethics for Engineers Martin Peterson all chapters
66 pages
057-283 A108 Soft
No ratings yet
057-283 A108 Soft
32 pages
Course Outline Social Work
100% (2)
Course Outline Social Work
7 pages
Lifebook Online Review - My Mindvalley Experience (2023)
100% (1)
Lifebook Online Review - My Mindvalley Experience (2023)
18 pages
NTA UGC NET Management Paper 2 June 2014
No ratings yet
NTA UGC NET Management Paper 2 June 2014
16 pages
Sources of Information
No ratings yet
Sources of Information
12 pages
Fiber Lasers: Fundamentals with MATLAB Modelling 1st Edition Johan Meyer (Editor) - Download the ebook today and experience the full content
100% (1)
Fiber Lasers: Fundamentals with MATLAB Modelling 1st Edition Johan Meyer (Editor) - Download the ebook today and experience the full content
65 pages
Ethical Trading Initiative - ETI
No ratings yet
Ethical Trading Initiative - ETI
6 pages
En GB
No ratings yet
En GB
4 pages
v3z-r31
No ratings yet
v3z-r31
20 pages
Preliminary Program ICHQP 2014 PDF
No ratings yet
Preliminary Program ICHQP 2014 PDF
22 pages
Three Decades of Strategic Management Research On M&As: Citations, Co-Citations, and Topics
No ratings yet
Three Decades of Strategic Management Research On M&As: Citations, Co-Citations, and Topics
12 pages
CU-2022 B.sc. (General) Mathematics Semester-4 Paper-CC4-GE4 QP
No ratings yet
CU-2022 B.sc. (General) Mathematics Semester-4 Paper-CC4-GE4 QP
4 pages
Strategic Human Resource Management: Mba Iv Sem
No ratings yet
Strategic Human Resource Management: Mba Iv Sem
6 pages
CH446DS1 English
No ratings yet
CH446DS1 English
15 pages
FINAL 01 RPMS 2022-2023 PURPLE TEMPLATE - Results-Based-Performance-Management-System
No ratings yet
FINAL 01 RPMS 2022-2023 PURPLE TEMPLATE - Results-Based-Performance-Management-System
43 pages
Activity 1
No ratings yet
Activity 1
2 pages
The 30 day Finding yourself Challenge
No ratings yet
The 30 day Finding yourself Challenge
32 pages
Instrumentation Design Engineering Oil Gas Brochure
100% (1)
Instrumentation Design Engineering Oil Gas Brochure
8 pages
Sikafloor 264-Roller Coating
No ratings yet
Sikafloor 264-Roller Coating
3 pages
Textbook Inc Ch1 5 2020 03 15
No ratings yet
Textbook Inc Ch1 5 2020 03 15
452 pages
The Future of Artificial Intelligence in Everyday Life
100% (1)
The Future of Artificial Intelligence in Everyday Life
2 pages
Aeration Sludge Calaculation - 57
No ratings yet
Aeration Sludge Calaculation - 57
30 pages
Faraway Strang
No ratings yet
Faraway Strang
17 pages
Ratio and Proportion Questions For IBPS Clerk PDF
No ratings yet
Ratio and Proportion Questions For IBPS Clerk PDF
15 pages