Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics
Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics
Name Student ID
Từ Hữu Thịnh 1952471
For the calculation and data illustration process we will use the tidyverse
package (package). We use the library function to perform the process.
>library(tidyverse)
Note: having the above command may error or cannot be executed. The reason
may come from not having installed the tidyverse library. In that case use
install.package function to fix
>if(!require(tidyverse)) install.packages("tidyverse")
For importing file From another source to Rstudio, exactly from excel.csv to R,
we use readr package
> library(readr)
Here we get the age of people corresponding to their type of Diet and the mean
age in each type.
-Type 1: average 40.87500 years old
-Type 2: average 39.00000 years old
-Type 3: average 37.77778 years old
Similarly, we use 2 commands above for others value.
table(Diet,Height)
tapply(Height,Diet,mean,na.rn=TRUE)
table(Diet,pre.weight)
tapply(pre.weight,Diet,mean,na.rn=TRUE)
table(Diet,weight6weeks)
tapply(weight6weeks,Diet,mean.na.rn=TRUE)
c)boxplot for weightlost variable
First, we have to create a new variable weightlost by formula:
weightlost = pre.weight – weight6weeks
Command in R: Diet1$weightlost = (pre.weight)-(weight6weeks)
The boxplot is used to compare the mean weightlost of 3 type of diet and
from that we can assume what is the best type of diet.
Command in R:
boxplot(weightlost~Diet,data=Diet1,col="light blue",
ylab = "Weight lost", xlab = "Diet type")
From the plot, we can assume that diet type3 are the best.
3/ T-test between pre.weight and weight6weeks:
Here we use paired t-test due to definition of paired t-test.
Definition: A paired t-test is used to compare two population means where you
have two samples in which observations in one sample can be paired with
observations in the other sample. Examples of where this might occur are:
Before-and-after observations on the same subjects.
A comparison of two different methods of measurement or two different
treatments where the measurements/treatments are applied to the same
subjects.
We have the same case here with the first example.
-First, we’re going to plot a boxplot to compare the difference in mean between
pre.weight and weight6weeks so that we can genrally say if Diet have usage or
not?
Command in R: boxplot(pre.weight,weight6weeks)
Obviously, there is difference in weight after 6weeks eating diet => Diet is
effective for losing weight.
To know exactly the difference of mean between pre.weight and mean
weight6weeks. We have this command in R:
t.test(pre.weight,weight6weeks,paired = TRUE)
For one-way ANOVA, the hypotheses for the test are the following:
The null hypothesis (H0) is that the group means are all equal.
The alternative hypothesis (HA) is that not all group means are equal.
P<significant level (0.05 in this case) => there are different mean => There are a
significant difference in mean weightlost between diet type
The aim of the study was to see which diet was best for losing weight so the
independent variable is diet
Command in R:
To run the ANOVA, we use aov() and summary () to see the result
anova1 <- aov(weightlost~Diet)
summary(anova1)
P=0.0323 < significant level = 0.05 => There are a significant difference in
mean weightlost between diet type. And to make it clearer to conclude which is
the best diet type for weightlost, we run the Tukey’s test.
Usage of Tukey’s test: The purpose of Tukey’s test is to figure out which groups
in your sample differ. It uses the “Honest Significant Difference,” a number that
represents the distance between groups, to compare every mean with every
other mean..
Command in R:
TukeyHSD(anova1)
It states that diet type is the best for weightlost like what we saw from the t-test.
5/2-way ANOVA
A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use a two-
way ANOVA when you want to know how two independent variables, in
combination, affect a dependent variable.
Command in R:
anova2<-aov(weightlost~as.factor(gender)*as.factor(Diet),data=Diet1)
summary(anova2)
There are 2 N/A values as mentioned from the above so it is deleted in this
command.
Since the interaction effect is significant (p = 0.049<significant level 0.05) =>
Diet cannot be generalised for both males and females together. And to see it
clearer we can plot the interaction plot.
Command in R:
interaction.plot(Diet,gender,weightlost
,type="b",col=c(2:3),leg.bty="o
",leg.bg="beige",lwd=2,pch=c(18,24)
,xlab="Diet",ylab="Weight
lost",main="Interaction plot")
From the plot, we find out the means (or interaction) plot clearly shows a
difference between males and females in the way that diet affects weight lost =>
There was a statistically significant interaction between the effects of Diet and
gender on weightlost. For male, there are not many different in mean weightlost
corresponding to each type. But for female, there are huge different in mean
weightlost between diet types.