Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
59 views

Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics

1. This document describes a study that analyzed data from 78 participants on three different diets. 2. Descriptive statistics were calculated and graphs were created to visualize the data. A t-test found a significant difference between pre-weight and weight after 10 weeks. 3. A one-way ANOVA determined there was a significant difference in weight lost between the three diets, and a Tukey's test showed Diet 3 was the most effective. 4. A two-way ANOVA revealed a significant interaction between diet and gender - diet affected weight lost differently for males versus females.

Uploaded by

THUẬN LÊ MINH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics

1. This document describes a study that analyzed data from 78 participants on three different diets. 2. Descriptive statistics were calculated and graphs were created to visualize the data. A t-test found a significant difference between pre-weight and weight after 10 weeks. 3. A one-way ANOVA determined there was a significant difference in weight lost between the three diets, and a Tukey's test showed Diet 3 was the most effective. 4. A two-way ANOVA revealed a significant interaction between diet and gender - diet affected weight lost differently for males versus females.

Uploaded by

THUẬN LÊ MINH
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

ĐẠI HỌC QUỐC GIA

ĐẠI HỌC BÁCH KHOA TP HỒ CHÍ MINH




Subject: probability and statistics


Project2
Teacher in charge: Nguyễn Tiến Dũng
Group: 2
Class: CC02

Name Student ID
Từ Hữu Thịnh 1952471

Thành phố Hồ Chí Minh – 2020


Project 2-Topic 3
This data set contains information on 78 people using one of three diets (The
University of Sheffield). Attribute Information:
• Person: Participant - number
• gender : Gender (1 = male, 0 = female) - Binary
• Age: Age (years) - Scale
• Height: Height (cm) - Scale
• preweight: Weight before the diet (kg) - Scale
• Diet: Diet - Binary
• weight10weeks: Weight after 10 weeks (kg) - Scale
• weightLOST: Weight lost after 10 weeks (kg) - Scale
Steps:
1. Import data: Diet.csv
2. Data cleaning: NA (Not available)
3. Data visualization
(a) Descriptive statistics for each of the variables
(b) Graphs: boxplot.
4. t.test: between pre.weight and weight6weeks
5. One way ANOVA: What is the best diet for weight loss?
6. Two way ANOVA: How do Diet and gender affect weightLOST?
Solution
Step 1: Import data
1.1 Install Rstudio

1.2 Install package :

For the calculation and data illustration process we will use the tidyverse
package (package). We use the library function to perform the process.

>library(tidyverse)

Note: having the above command may error or cannot be executed. The reason
may come from not having installed the tidyverse library. In that case use
install.package function to fix

>if(!require(tidyverse)) install.packages("tidyverse")

For importing file From another source to Rstudio, exactly from excel.csv to R,
we use readr package

> library(readr)

1.3 Import dataset: Diet1.csv


library(readr)
Diet1 <- read_csv("D:/Download/Diet1.csv")
View(Diet1)
2/ Data visualization
a)Name the column:
+pre.weight <- Diet1$pre.weight
+weight6weeks <- Diet1$weight6weeks
+Diet <- Diet1$Diet
+gender <- Diet1$gender
+Age <- Diet1$Age
+Height <- Diet1$Height

b)Compute descriptive statistics


-Use table() command to group Diet with others values
-Use tapply(…,…,mean,na.rn=true) to find the mean value corresponding to
each type of Diet. In this case we have type 1, type 2, type 3.
 table(Diet,gender)

In this case we have


-Diet type 1:14 female users and 10 male users
-Diet type 2: 14 females users and 11 male users
-Diet type 3: 15 females users and 12 male users
So we get the total of 76 users(dataset has 78 people)because we have 2 N/A
value in gender which can be easily found by the command is.na() and
which(is.na())
 is.na(gender)
 which(is.na(gender))
From the output above we find 2 N/A values are in the first and second gender’s
cells.
 table(Diet,Age)
 tapply(Age,Diet,mean,na.rn=TRUE)

Here we get the age of people corresponding to their type of Diet and the mean
age in each type.
-Type 1: average 40.87500 years old
-Type 2: average 39.00000 years old
-Type 3: average 37.77778 years old
Similarly, we use 2 commands above for others value.
 table(Diet,Height)
 tapply(Height,Diet,mean,na.rn=TRUE)
 table(Diet,pre.weight)
 tapply(pre.weight,Diet,mean,na.rn=TRUE)
 table(Diet,weight6weeks)
 tapply(weight6weeks,Diet,mean.na.rn=TRUE)
c)boxplot for weightlost variable
First, we have to create a new variable weightlost by formula:
 weightlost = pre.weight – weight6weeks
Command in R: Diet1$weightlost = (pre.weight)-(weight6weeks)

 we have weightlost column then name for it


weightlost <- Diet1$weightlost

The boxplot is used to compare the mean weightlost of 3 type of diet and
from that we can assume what is the best type of diet.
Command in R:
 boxplot(weightlost~Diet,data=Diet1,col="light blue",
 ylab = "Weight lost", xlab = "Diet type")
From the plot, we can assume that diet type3 are the best.
3/ T-test between pre.weight and weight6weeks:
Here we use paired t-test due to definition of paired t-test.
Definition: A paired t-test is used to compare two population means where you
have two samples in which observations in one sample can be paired with
observations in the other sample. Examples of where this might occur are:
 Before-and-after observations on the same subjects.
 A comparison of two different methods of measurement or two different
treatments where the measurements/treatments are applied to the same
subjects.
We have the same case here with the first example.
-First, we’re going to plot a boxplot to compare the difference in mean between
pre.weight and weight6weeks so that we can genrally say if Diet have usage or
not?
Command in R: boxplot(pre.weight,weight6weeks)
Obviously, there is difference in weight after 6weeks eating diet => Diet is
effective for losing weight.
To know exactly the difference of mean between pre.weight and mean
weight6weeks. We have this command in R:
 t.test(pre.weight,weight6weeks,paired = TRUE)

So the mean of the differences is 3.844872.


4/1-way ANOVA: what is the best diet for weightlost
One-way ANOVA is a hypothesis test that evaluates two mutually exclusive
statements about two or more population means. These two statements are called
the null hypothesis and the alternative hypotheses. A hypothesis test uses sample
data to determine whether to reject the null hypothesis.

For one-way ANOVA, the hypotheses for the test are the following:

 The null hypothesis (H0) is that the group means are all equal.
 The alternative hypothesis (HA) is that not all group means are equal.

H0: Null hypothesis, mean are equal


H1 :Alternative hypothesis, different mean

P<significant level (0.05 in this case) => there are different mean => There are a
significant difference in mean weightlost between diet type

The aim of the study was to see which diet was best for losing weight so the
independent variable is diet
Command in R:
To run the ANOVA, we use aov() and summary () to see the result
 anova1 <- aov(weightlost~Diet)
 summary(anova1)

P=0.0323 < significant level = 0.05 => There are a significant difference in
mean weightlost between diet type. And to make it clearer to conclude which is
the best diet type for weightlost, we run the Tukey’s test.

Usage of Tukey’s test: The purpose of Tukey’s test is to figure out which groups
in your sample differ. It uses the “Honest Significant Difference,” a number that
represents the distance between groups, to compare every mean with every
other mean..

Command in R:

 TukeyHSD(anova1)

It states that diet type is the best for weightlost like what we saw from the t-test.

5/2-way ANOVA
A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use a two-
way ANOVA when you want to know how two independent variables, in
combination, affect a dependent variable.
Command in R:
 anova2<-aov(weightlost~as.factor(gender)*as.factor(Diet),data=Diet1)
 summary(anova2)

There are 2 N/A values as mentioned from the above so it is deleted in this
command.
Since the interaction effect is significant (p = 0.049<significant level 0.05) =>
Diet cannot be generalised for both males and females together. And to see it
clearer we can plot the interaction plot.
Command in R:
 interaction.plot(Diet,gender,weightlost
,type="b",col=c(2:3),leg.bty="o
",leg.bg="beige",lwd=2,pch=c(18,24)
,xlab="Diet",ylab="Weight
lost",main="Interaction plot")
From the plot, we find out the means (or interaction) plot clearly shows a
difference between males and females in the way that diet affects weight lost =>
There was a statistically significant interaction between the effects of Diet and
gender on weightlost. For male, there are not many different in mean weightlost
corresponding to each type. But for female, there are huge different in mean
weightlost between diet types.

You might also like