Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics

1. This document describes a study that analyzed data from 78 participants on three different diets. 2. Descriptive statistics were calculated and graphs were created to visualize the data. A t-test found a significant difference between pre-weight and weight after 10 weeks. 3. A one-way ANOVA determined there was a significant difference in weight lost between the three diets, and a Tukey's test showed Diet 3 was the most effective. 4. A two-way ANOVA revealed a significant interaction between diet and gender - diet affected weight lost differently for males versus females.

Uploaded by

THUẬN LÊ MINH

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Đại Học Quốc Gia Đại Học Bách Khoa Tp Hồ Chí Minh: Subject: probability and statistics

Uploaded by

THUẬN LÊ MINH

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

ĐẠI HỌC QUỐC GIA

ĐẠI HỌC BÁCH KHOA TP HỒ CHÍ MINH



Subject: probability and statistics

Project2
Teacher in charge: Nguyễn Tiến Dũng
Group: 2
Class: CC02

Name Student ID
Từ Hữu Thịnh 1952471

Thành phố Hồ Chí Minh – 2020

Project 2-Topic 3
This data set contains information on 78 people using one of three diets (The
University of Sheffield). Attribute Information:
• Person: Participant - number
• gender : Gender (1 = male, 0 = female) - Binary
• Age: Age (years) - Scale
• Height: Height (cm) - Scale
• preweight: Weight before the diet (kg) - Scale
• Diet: Diet - Binary
• weight10weeks: Weight after 10 weeks (kg) - Scale
• weightLOST: Weight lost after 10 weeks (kg) - Scale
Steps:
1. Import data: Diet.csv
2. Data cleaning: NA (Not available)
3. Data visualization
(a) Descriptive statistics for each of the variables
(b) Graphs: boxplot.
4. t.test: between pre.weight and weight6weeks
5. One way ANOVA: What is the best diet for weight loss?
6. Two way ANOVA: How do Diet and gender affect weightLOST?
Solution
Step 1: Import data
1.1 Install Rstudio

1.2 Install package :

For the calculation and data illustration process we will use the tidyverse
package (package). We use the library function to perform the process.

>library(tidyverse)

Note: having the above command may error or cannot be executed. The reason
may come from not having installed the tidyverse library. In that case use
install.package function to fix

>if(!require(tidyverse)) install.packages("tidyverse")

For importing file From another source to Rstudio, exactly from excel.csv to R,
we use readr package

> library(readr)

1.3 Import dataset: Diet1.csv

library(readr)
Diet1 <- read_csv("D:/Download/Diet1.csv")
View(Diet1)
2/ Data visualization
a)Name the column:
+pre.weight <- Diet1$pre.weight
+weight6weeks <- Diet1$weight6weeks
+Diet <- Diet1$Diet
+gender <- Diet1$gender
+Age <- Diet1$Age
+Height <- Diet1$Height

b)Compute descriptive statistics

-Use table() command to group Diet with others values
-Use tapply(…,…,mean,na.rn=true) to find the mean value corresponding to
each type of Diet. In this case we have type 1, type 2, type 3.
 table(Diet,gender)

In this case we have

-Diet type 1:14 female users and 10 male users
-Diet type 2: 14 females users and 11 male users
-Diet type 3: 15 females users and 12 male users
So we get the total of 76 users(dataset has 78 people)because we have 2 N/A
value in gender which can be easily found by the command is.na() and
which(is.na())
 is.na(gender)
 which(is.na(gender))
From the output above we find 2 N/A values are in the first and second gender’s
cells.
 table(Diet,Age)
 tapply(Age,Diet,mean,na.rn=TRUE)

Here we get the age of people corresponding to their type of Diet and the mean
age in each type.
-Type 1: average 40.87500 years old
-Type 2: average 39.00000 years old
-Type 3: average 37.77778 years old
Similarly, we use 2 commands above for others value.
 table(Diet,Height)
 tapply(Height,Diet,mean,na.rn=TRUE)
 table(Diet,pre.weight)
 tapply(pre.weight,Diet,mean,na.rn=TRUE)
 table(Diet,weight6weeks)
 tapply(weight6weeks,Diet,mean.na.rn=TRUE)
c)boxplot for weightlost variable
First, we have to create a new variable weightlost by formula:
 weightlost = pre.weight – weight6weeks
Command in R: Diet1$weightlost = (pre.weight)-(weight6weeks)

 we have weightlost column then name for it

weightlost <- Diet1$weightlost

The boxplot is used to compare the mean weightlost of 3 type of diet and
from that we can assume what is the best type of diet.
Command in R:
 boxplot(weightlost~Diet,data=Diet1,col="light blue",
 ylab = "Weight lost", xlab = "Diet type")
From the plot, we can assume that diet type3 are the best.
3/ T-test between pre.weight and weight6weeks:
Here we use paired t-test due to definition of paired t-test.
Definition: A paired t-test is used to compare two population means where you
have two samples in which observations in one sample can be paired with
observations in the other sample. Examples of where this might occur are:
 Before-and-after observations on the same subjects.
 A comparison of two different methods of measurement or two different
treatments where the measurements/treatments are applied to the same
subjects.
We have the same case here with the first example.
-First, we’re going to plot a boxplot to compare the difference in mean between
pre.weight and weight6weeks so that we can genrally say if Diet have usage or
not?
Command in R: boxplot(pre.weight,weight6weeks)
Obviously, there is difference in weight after 6weeks eating diet => Diet is
effective for losing weight.
To know exactly the difference of mean between pre.weight and mean
weight6weeks. We have this command in R:
 t.test(pre.weight,weight6weeks,paired = TRUE)

So the mean of the differences is 3.844872.

4/1-way ANOVA: what is the best diet for weightlost
One-way ANOVA is a hypothesis test that evaluates two mutually exclusive
statements about two or more population means. These two statements are called
the null hypothesis and the alternative hypotheses. A hypothesis test uses sample
data to determine whether to reject the null hypothesis.

For one-way ANOVA, the hypotheses for the test are the following:

 The null hypothesis (H0) is that the group means are all equal.
 The alternative hypothesis (HA) is that not all group means are equal.

H0: Null hypothesis, mean are equal

H1 :Alternative hypothesis, different mean

P<significant level (0.05 in this case) => there are different mean => There are a
significant difference in mean weightlost between diet type

The aim of the study was to see which diet was best for losing weight so the
independent variable is diet
Command in R:
To run the ANOVA, we use aov() and summary () to see the result
 anova1 <- aov(weightlost~Diet)
 summary(anova1)

P=0.0323 < significant level = 0.05 => There are a significant difference in
mean weightlost between diet type. And to make it clearer to conclude which is
the best diet type for weightlost, we run the Tukey’s test.

Usage of Tukey’s test: The purpose of Tukey’s test is to figure out which groups
in your sample differ. It uses the “Honest Significant Difference,” a number that
represents the distance between groups, to compare every mean with every
other mean..

Command in R:

 TukeyHSD(anova1)

It states that diet type is the best for weightlost like what we saw from the t-test.

5/2-way ANOVA
A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use a two-
way ANOVA when you want to know how two independent variables, in
combination, affect a dependent variable.
Command in R:
 anova2<-aov(weightlost~as.factor(gender)*as.factor(Diet),data=Diet1)
 summary(anova2)

There are 2 N/A values as mentioned from the above so it is deleted in this
command.
Since the interaction effect is significant (p = 0.049<significant level 0.05) =>
Diet cannot be generalised for both males and females together. And to see it
clearer we can plot the interaction plot.
Command in R:
 interaction.plot(Diet,gender,weightlost
,type="b",col=c(2:3),leg.bty="o
",leg.bg="beige",lwd=2,pch=c(18,24)
,xlab="Diet",ylab="Weight
lost",main="Interaction plot")
From the plot, we find out the means (or interaction) plot clearly shows a
difference between males and females in the way that diet affects weight lost =>
There was a statistically significant interaction between the effects of Diet and
gender on weightlost. For male, there are not many different in mean weightlost
corresponding to each type. But for female, there are huge different in mean
weightlost between diet types.

Written Report - 6.419x Module 1
No ratings yet
Written Report - 6.419x Module 1
8 pages
Example of Paired Sample T
No ratings yet
Example of Paired Sample T
3 pages
Statistics Formulae Sheet: X X N X F - X N L+ I F N - C) FM F 1) FM F 1) + (FM F 2) × I Lowest Value+highest Value
No ratings yet
Statistics Formulae Sheet: X X N X F - X N L+ I F N - C) FM F 1) FM F 1) + (FM F 2) × I Lowest Value+highest Value
4 pages
100 Anova
No ratings yet
100 Anova
4 pages
4823 Dsejournal
No ratings yet
4823 Dsejournal
129 pages
LESSON-8
No ratings yet
LESSON-8
7 pages
Group 11
No ratings yet
Group 11
17 pages
biostatistics notes part 1
No ratings yet
biostatistics notes part 1
9 pages
Paired Samples T-Test: Group Design in Which Pairs of Subjects That Are Matched
No ratings yet
Paired Samples T-Test: Group Design in Which Pairs of Subjects That Are Matched
8 pages
Statistical Tests
No ratings yet
Statistical Tests
11 pages
T-test One Sample
No ratings yet
T-test One Sample
17 pages
Research Methadology
No ratings yet
Research Methadology
26 pages
Two Way ANOVA
No ratings yet
Two Way ANOVA
27 pages
Chi-Square Tests-Two or More Independent Samples.
No ratings yet
Chi-Square Tests-Two or More Independent Samples.
7 pages
Interactions (Ch. 7)
No ratings yet
Interactions (Ch. 7)
6 pages
Assignmernt5
No ratings yet
Assignmernt5
5 pages
T Test
No ratings yet
T Test
35 pages
Two-Sample t-Test Introduction to Statistics JMP
No ratings yet
Two-Sample t-Test Introduction to Statistics JMP
1 page
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
No ratings yet
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
21 pages
Review 12A AP Statistics Name:: A A A A
No ratings yet
Review 12A AP Statistics Name:: A A A A
4 pages
Nonparametric Testing Using The Chi-Square Distribution: Reading Tips
No ratings yet
Nonparametric Testing Using The Chi-Square Distribution: Reading Tips
4 pages
Biostat - Group 3
No ratings yet
Biostat - Group 3
42 pages
Lectures On Biostatistics-ocr4.PDF 123
No ratings yet
Lectures On Biostatistics-ocr4.PDF 123
100 pages
Paired-Samples T-Test Using SPSS
No ratings yet
Paired-Samples T-Test Using SPSS
31 pages
Chapter 7 One Way Analysis of Variance ANOVA
No ratings yet
Chapter 7 One Way Analysis of Variance ANOVA
68 pages
Computer Practical 3
No ratings yet
Computer Practical 3
7 pages
Logistic Regression - Archived - Forum
No ratings yet
Logistic Regression - Archived - Forum
53 pages
Survival Analysis - lecture 3
No ratings yet
Survival Analysis - lecture 3
72 pages
The One-Sample T-Test: Department of Biostatistics
No ratings yet
The One-Sample T-Test: Department of Biostatistics
8 pages
The One-Sample T-Test: Department of Biostatistics
No ratings yet
The One-Sample T-Test: Department of Biostatistics
8 pages
Kunal DS
No ratings yet
Kunal DS
92 pages
Hypothesis Testing With T Tests Edited 1
No ratings yet
Hypothesis Testing With T Tests Edited 1
31 pages
2 Sample T Test
No ratings yet
2 Sample T Test
9 pages
Unit 545 Differences Between Two or More Groups Non Parametric With Answers
No ratings yet
Unit 545 Differences Between Two or More Groups Non Parametric With Answers
10 pages
SRM
No ratings yet
SRM
6 pages
SDE Assingment 2
No ratings yet
SDE Assingment 2
4 pages
QSCI 381 Lecture 7
No ratings yet
QSCI 381 Lecture 7
28 pages
Hypothesis Testing - Chi Squared Test
No ratings yet
Hypothesis Testing - Chi Squared Test
16 pages
QM
No ratings yet
QM
12 pages
ANP 802 lecture 2verynew
No ratings yet
ANP 802 lecture 2verynew
50 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Chapter 8: Introduction To Hypothesis Testing
No ratings yet
Chapter 8: Introduction To Hypothesis Testing
6 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Homework -7 Dr Avani
No ratings yet
Homework -7 Dr Avani
11 pages
Statistics For College Students-Part 2
100% (1)
Statistics For College Students-Part 2
43 pages
Choosing The Correct Statistical Test Made Easy
100% (1)
Choosing The Correct Statistical Test Made Easy
5 pages
Notes On Hypothesis Testing
No ratings yet
Notes On Hypothesis Testing
3 pages
Biostat Question Paper Answers
No ratings yet
Biostat Question Paper Answers
16 pages
Psych Stat 4
No ratings yet
Psych Stat 4
3 pages
Lec2 PDF
No ratings yet
Lec2 PDF
8 pages
Hypothesis Testing 1-1
No ratings yet
Hypothesis Testing 1-1
20 pages
OLANTIGUE Written Report
No ratings yet
OLANTIGUE Written Report
15 pages
Jurnal Meta Analisis
No ratings yet
Jurnal Meta Analisis
13 pages
Techniques of Annova_20241103_232802_0000
No ratings yet
Techniques of Annova_20241103_232802_0000
32 pages
9 One Way Repeated Anova Jasp
No ratings yet
9 One Way Repeated Anova Jasp
20 pages
Chi Square Report
No ratings yet
Chi Square Report
35 pages
FTEST@3PM 7th Oct
No ratings yet
FTEST@3PM 7th Oct
15 pages
Notes 20
No ratings yet
Notes 20
36 pages
BIOstat T-Test Anova
No ratings yet
BIOstat T-Test Anova
10 pages
t-test-for-means
No ratings yet
t-test-for-means
17 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
MAN5004: Statistics For Business Decision: Problem Set 1 Answer Key
No ratings yet
MAN5004: Statistics For Business Decision: Problem Set 1 Answer Key
7 pages
Statistical Tools For Data Analysis
No ratings yet
Statistical Tools For Data Analysis
4 pages
Factor Analysis: KMO and Bartlett's Test
No ratings yet
Factor Analysis: KMO and Bartlett's Test
7 pages
Ap19 FRQ Statistics
No ratings yet
Ap19 FRQ Statistics
17 pages
Hypothesis Testing: Example 1: Does A New Drug Improve Cancer Survival Rates?
No ratings yet
Hypothesis Testing: Example 1: Does A New Drug Improve Cancer Survival Rates?
25 pages
On The Selection Stability of Stability Selection
No ratings yet
On The Selection Stability of Stability Selection
20 pages
كتاب الاحصاء الحيوية
No ratings yet
كتاب الاحصاء الحيوية
67 pages
XIME-QT-1 Assignment-II
No ratings yet
XIME-QT-1 Assignment-II
2 pages
Introduction To The Course: Quality Control and Reliability
No ratings yet
Introduction To The Course: Quality Control and Reliability
10 pages
l2 Mean Variance Standard D of Discrete PD 2
No ratings yet
l2 Mean Variance Standard D of Discrete PD 2
28 pages
Unit 3 Test
No ratings yet
Unit 3 Test
4 pages
Volatility Forecasting - A Comparison of GARCH (1,1) and EWMA Models
No ratings yet
Volatility Forecasting - A Comparison of GARCH (1,1) and EWMA Models
14 pages
36-401 Modern Regression HW #5 Solutions: Air - Flow
No ratings yet
36-401 Modern Regression HW #5 Solutions: Air - Flow
7 pages
Question 3
No ratings yet
Question 3
3 pages
Inferential Statistics
No ratings yet
Inferential Statistics
34 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
testing of hypothesis
No ratings yet
testing of hypothesis
52 pages
Segunda Asignación de Estadística Aplicada A La Ingeniería
No ratings yet
Segunda Asignación de Estadística Aplicada A La Ingeniería
5 pages
NOTES
No ratings yet
NOTES
14 pages
Assessment of Learning
No ratings yet
Assessment of Learning
12 pages
American International University-Bangladesh (AIUB) Faculty of Science & Technology Course Syllabus
No ratings yet
American International University-Bangladesh (AIUB) Faculty of Science & Technology Course Syllabus
2 pages
Ch4 Supervised
No ratings yet
Ch4 Supervised
78 pages
Dip Secretarial Notes
No ratings yet
Dip Secretarial Notes
116 pages
MIP 06 2022 0221 - Proof
No ratings yet
MIP 06 2022 0221 - Proof
14 pages
Test Sta3073 - Sta2113 0722 (QP)
No ratings yet
Test Sta3073 - Sta2113 0722 (QP)
5 pages
Steps Quantitative Data Analysis
100% (1)
Steps Quantitative Data Analysis
4 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
Day 1 Python Notebook
No ratings yet
Day 1 Python Notebook
19 pages