Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Assignment# 06

The document is an R Notebook detailing statistical analyses performed on two datasets: one concerning students' physical attributes and another related to mood assessments. It includes steps for loading data, checking dataset structures, calculating Pearson correlation coefficients, visualizing relationships, and conducting Shapiro-Wilk tests for normality. The findings indicate significant correlations between body weight and height, as well as between negative and positive moods, with both datasets showing non-normal distributions.

Uploaded by

shanza161199
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment# 06

The document is an R Notebook detailing statistical analyses performed on two datasets: one concerning students' physical attributes and another related to mood assessments. It includes steps for loading data, checking dataset structures, calculating Pearson correlation coefficients, visualizing relationships, and conducting Shapiro-Wilk tests for normality. The findings indicate significant correlations between body weight and height, as well as between negative and positive moods, with both datasets showing non-normal distributions.

Uploaded by

shanza161199
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

11/29/24, 12:41 PM R Notebook

R Notebook
This is an R Markdown (http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the
results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and
pressing Ctrl+Shift+Enter.

plot(cars)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

Assignment-06
Exercise-75
Step-01:Load the data

students<-read.delim("E:\\Statistics\\Datasets\\Students.txt",
stringsAsFactors=F)

Step-02:Check the dataset structure


file:///E:/Statistics/Exercises/Assignment-06.html 1/16
11/29/24, 12:41 PM R Notebook

summary(students)

## ID Sex Sex_coded Blood_group


## Min. : 1.00 Length:82 Min. :0.0000 Length:82
## 1st Qu.:21.25 Class :character 1st Qu.:0.0000 Class :character
## Median :41.50 Mode :character Median :1.0000 Mode :character
## Mean :41.50 Mean :0.6585
## 3rd Qu.:61.75 3rd Qu.:1.0000
## Max. :82.00 Max. :1.0000
## Blood_group_coded Rhesus_factor Rhesus_factor_coded Smoking
## Min. :0.0000 Length:82 Min. :0.0000 Length:82
## 1st Qu.:0.0000 Class :character 1st Qu.:1.0000 Class :character
## Median :1.0000 Mode :character Median :1.0000 Mode :character
## Mean :0.9512 Mean :0.8415
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :3.0000 Max. :1.0000
## Smoking_coded Size_cm Weight_kg Points_exam
## Min. :0.0000 Min. :157.0 Min. :46.00 Min. : 1.000
## 1st Qu.:0.0000 1st Qu.:167.0 1st Qu.:56.25 1st Qu.: 6.250
## Median :0.0000 Median :170.0 Median :61.00 Median : 8.000
## Mean :0.3171 Mean :173.2 Mean :65.84 Mean : 7.988
## 3rd Qu.:1.0000 3rd Qu.:179.0 3rd Qu.:75.75 3rd Qu.:10.000
## Max. :1.0000 Max. :194.0 Max. :98.00 Max. :12.000
## Grade
## Min. :1.000
## 1st Qu.:2.000
## Median :3.000
## Mean :3.122
## 3rd Qu.:4.750
## Max. :5.000

Step-03:Calculate the Pearson correlation coefficient

correlation <- cor(students$Weight_kg, students$Size_cm, method = "pearson")


cat("Pearson Correlation Coefficient:", correlation, "\n")

## Pearson Correlation Coefficient: 0.7790491

Conclusion

file:///E:/Statistics/Exercises/Assignment-06.html 2/16
11/29/24, 12:41 PM R Notebook

# **Is there any linear relationship between the variables?**

# Hypotheses for Pearson Correlation:


#Null Hypothesis(H0): There is no linear relationship between body weight and body height (p=0).
#Alternative Hypothesis (H1): There is a linear relationship between body weight and body height
(p≠0).

# As p<0.05, we reject the null hypothesis. There is sufficient evidence to conclude that body w
eight and body height are significantly positively linearly related with a correlation coefficie
nt of 𝑟=0.7790491.

Step-04:Visualize the Relationship of Scatter

library(ggpubr)

## Loading required package: ggplot2

ggscatter(
students, x = "Weight_kg", y = "Size_cm",
color = "#1f77b4",
add = "reg.line",
conf.int = TRUE,
add.params = list(color = "#ff7f0e"),
cor.coef = TRUE, cor.method = "pearson",
xlab = "Weight (kg)", ylab = "Height (cm)"
)

file:///E:/Statistics/Exercises/Assignment-06.html 3/16
11/29/24, 12:41 PM R Notebook

Conclusion

#The scatter plot reveals a Strong positive linear relationship between coefficient of body weig
ht and body height in the data set students.

Step-05:Shapiro-Wilk tests

# Shapiro-Wilk test for body weight


shapiro_weight <- shapiro.test(students$Weight_kg)
cat("Shapiro-Wilk Test for Weight:\n")

## Shapiro-Wilk Test for Weight:

cat("W-statistic:", shapiro_weight$statistic, "\n")

## W-statistic: 0.9195322

cat("p-value:", shapiro_weight$p.value, "\n")

## p-value: 7.40539e-05

file:///E:/Statistics/Exercises/Assignment-06.html 4/16
11/29/24, 12:41 PM R Notebook

# Shapiro-Wilk test for body height


shapiro_height <- shapiro.test(students$Size_cm)
cat("Shapiro-Wilk Test for Height:\n")

## Shapiro-Wilk Test for Height:

cat("W-statistic:", shapiro_height$statistic, "\n")

## W-statistic: 0.958204

cat("p-value:", shapiro_height$p.value, "\n")

## p-value: 0.009213035

Step-06:Q-Q Plots

library(ggpubr)

# Q-Q plot for body weight


plot1 <- ggqqplot(students$Weight_kg, ylab = "Body Weight (kg)", color = "#FFA500")

# Q-Q plot for body height


plot2 <- ggqqplot(students$Size_cm, ylab = "Body Height (cm)", color = "#FFA500")

# Arrange the plots side by side


ggarrange(plot1, plot2, ncol = 2, nrow = 1,
labels = c("A", "B"), # Add labels to the plots
common.legend = TRUE, legend = "bottom") # Shared legend

file:///E:/Statistics/Exercises/Assignment-06.html 5/16
11/29/24, 12:41 PM R Notebook

Conclusion

# **Test for significance of the correlation**

#Hypotheses for Shapiro-Wilk Test


#Null Hypothesis (H0): The data is normally distributed.
#Alternative Hypothes is (H1): The data is not normally distributed.

#Shapiro-Wilk Test for Weight: W-statistic: 0.9195322,p-value: 7.40539e-05, as p<0.05 we reject


the null hypothesis (The data for body weight is not normally distributed).
#Shapiro-Wilk Test for height: W-statistic: 0.958204,p-value: 0.009213035 , as p<0.05 we reject
the null hypothesis (The data for body height is not normally distributed).

Exercise-76
Step 01:Load the data:

# Load the ICM dataset


ICM <- read.delim("E:\\Statistics\\Datasets\\ICM.txt", stringsAsFactors = FALSE)

# View the structure of the data to identify the columns for negative and positive mood
str(ICM)

file:///E:/Statistics/Exercises/Assignment-06.html 6/16
11/29/24, 12:41 PM R Notebook

## 'data.frame': 199 obs. of 23 variables:


## $ ID : int 75 90 173 189 100 155 63 48 76 165 ...
## $ Gender : chr "female" "female" "female" "female" ...
## $ Age : int 22 22 37 17 19 16 17 19 27 19 ...
## $ Englishfluent : chr "yes" "yes" "yes" "yes" ...
## $ Germanfluent : chr "no" "no" "yes" "yes" ...
## $ Transport : chr "PublicTransport" "PublicTransport" "Car" "Car" ...
## $ Highest_level_of_education: chr "College" "College" "University" "none" ...
## $ Do_you_smoke : chr "No" "No" "No" "No" ...
## $ Socialmediahours : chr "1.5-3hrs/day" "1.5-3hrs/day" "<1.5hrs/day" "1.5-3hrs/da
y" ...
## $ Timewithfriends : chr "2-5hrs/week" "2-5hrs/week" "5-10hrs/week" "10-20hrs/wee
k" ...
## $ Pet : chr "No" "No" "Yes" "Yes" ...
## $ Siblings : chr "Yes" "Yes" "No" "Yes" ...
## $ Children : chr "No" "No" "Yes" "No" ...
## $ Relationshipstatus : chr "Relationship" "Relationship" "Relationship" "Single" ...
## $ Activitieshours : int 10 10 20 40 20 10 10 20 10 20 ...
## $ NegativeMood : num NA NA NA 4 2.82 ...
## $ PositiveMood : num NA NA NA 0 0.333 ...
## $ Mentalhealth : num 2.667 2.667 3.5 1 0.833 ...
## $ Socialization : num NA NA NA 1 2.5 ...
## $ Activity : num 2.8 2.8 3.4 3.2 1.2 2.6 1.6 1.8 1.2 0.4 ...
## $ SocialSupport : num 4 4 2.333 0.667 2.333 ...
## $ Communication_open_direct : num NA NA 3.38 3.62 3.15 ...
## $ OHS : num 4.59 4.59 5.1 3.14 2.76 ...

# View the first few rows to check the data


head(ICM)

ID Gen… A… Englishfluent Germanfluent Transport Highest_level_of_education


<int><chr> <int><chr> <chr> <chr> <chr>

1 75 female 22 yes no PublicTransport College

2 90 female 22 yes no PublicTransport College

3 173 female 37 yes yes Car University

4 189 female 17 yes yes Car none

5 100 female 19 yes yes Walk HighSchool

6 155 female 16 yes no Walk none

6 rows | 1-8 of 24 columns

Step 02:Check for missing values

# Check the number of missing values in both columns


sum(is.na(ICM$NegativeMood))

file:///E:/Statistics/Exercises/Assignment-06.html 7/16
11/29/24, 12:41 PM R Notebook

## [1] 5

sum(is.na(ICM$PositiveMood))

## [1] 3

ICM_clean <- na.omit(ICM[, c("NegativeMood", "PositiveMood")])


correlation <- cor(ICM_clean$NegativeMood, ICM_clean$PositiveMood, method = "pearson")
cat("Pearson Correlation Coefficient:", correlation, "\n")

## Pearson Correlation Coefficient: -0.6433565

Conclusion

# **Is there any linear relationship between the variables?**


#Null Hypothesis (H0):There is no linear relationship between Negative Mood and Positive Mood (p
=0)
# Alternative Hypothesis(H1): There is a linear relationship between Negative Mood and Positive
Mood (p is not equal to zero)

# As p<0.05, we reject the null hypothesis.There is a statistically significant negative linear


relationship between Negative Mood and Positive Mood.

Step 03:Test for Significance

cor_test <- cor.test(ICM_clean$NegativeMood, ICM_clean$PositiveMood, method = "pearson")


cat("Pearson Correlation Test:\n")

## Pearson Correlation Test:

print(cor_test)

##
## Pearson's product-moment correlation
##
## data: ICM_clean$NegativeMood and ICM_clean$PositiveMood
## t = -11.644, df = 192, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7190609 -0.5525618
## sample estimates:
## cor
## -0.6433565

Step 04:Visualize the Relationship (Scatter Plot)

file:///E:/Statistics/Exercises/Assignment-06.html 8/16
11/29/24, 12:41 PM R Notebook

library(ggpubr)
ggscatter(
ICM_clean, x = "NegativeMood", y = "PositiveMood",
color = "#1f77b4",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Negative Mood", ylab = "Positive Mood"
)

Step-05:Shapiro-Wilk tests

# Shapiro-Wilk Test for Normality


shapiro_negative <- shapiro.test(ICM_clean$NegativeMood)
cat("Shapiro-Wilk Test for Negative Mood:\n")

## Shapiro-Wilk Test for Negative Mood:

print(shapiro_negative)

file:///E:/Statistics/Exercises/Assignment-06.html 9/16
11/29/24, 12:41 PM R Notebook

##
## Shapiro-Wilk normality test
##
## data: ICM_clean$NegativeMood
## W = 0.97664, p-value = 0.002498

shapiro_positive <- shapiro.test(ICM_clean$PositiveMood)


cat("Shapiro-Wilk Test for Positive Mood:\n")

## Shapiro-Wilk Test for Positive Mood:

print(shapiro_positive)

##
## Shapiro-Wilk normality test
##
## data: ICM_clean$PositiveMood
## W = 0.98441, p-value = 0.03015

Step-06:Q-Q Plot

# Q-Q plot for Negative Mood


ggqqplot(ICM_clean$NegativeMood, ylab = "Negative Mood", color = "#1f77b4")

file:///E:/Statistics/Exercises/Assignment-06.html 10/16
11/29/24, 12:41 PM R Notebook

# Q-Q plot for Positive Mood


ggqqplot(ICM_clean$PositiveMood, ylab = "Positive Mood", color = "#1f77b4", )

file:///E:/Statistics/Exercises/Assignment-06.html 11/16
11/29/24, 12:41 PM R Notebook

Conclusion

#Test for significance of the correlation.


#Null Hypothesis (H0):The data is normally distributed.
# Alternative Hypothesis(H1):The data is not normally distributed.

# Shapiro-Wilk Test for Negative Mood: As p<0.05 , W = 0.97664, p-value = 0.002498,we reject the
null hypothesis. The data for Negative Mood is not normally distributed.
# Shapiro-Wilk Test for Positive Mood: As p<0.05 , W = 0.98441, p-value = 0.03015,we reject the
null hypothesis. The data for Positive Mood is not normally distributed.

Exercise-79
Step-01:Load the Dataset

# Load the students dataset


students <- read.delim("E:\\Statistics\\Datasets\\Students.txt", stringsAsFactors = FALSE)

# View the structure of the dataset to identify the columns for weight and height
str(students)

file:///E:/Statistics/Exercises/Assignment-06.html 12/16
11/29/24, 12:41 PM R Notebook

## 'data.frame': 82 obs. of 13 variables:


## $ ID : int 24 5 54 9 34 52 12 16 32 59 ...
## $ Sex : chr "M" "M" "F" "M" ...
## $ Sex_coded : int 0 0 1 0 1 1 0 0 1 1 ...
## $ Blood_group : chr "0" "0" "A" "0" ...
## $ Blood_group_coded : int 0 0 1 0 1 0 0 1 0 1 ...
## $ Rhesus_factor : chr "+" "+" "+" "+" ...
## $ Rhesus_factor_coded: int 1 1 1 1 1 1 1 1 1 0 ...
## $ Smoking : chr "no" "no" "no" "no" ...
## $ Smoking_coded : int 0 0 0 0 0 1 1 1 0 0 ...
## $ Size_cm : int 190 187 171 185 166 164 184 187 163 170 ...
## $ Weight_kg : int 98 81 54 70 53 55 74 75 46 63 ...
## $ Points_exam : int 1 2 2 3 3 3 4 4 4 4 ...
## $ Grade : int 5 5 5 5 5 5 5 5 5 5 ...

# View the first few rows of the dataset to check the data
head(students)

ID S… Sex_co… Blood_group Blood_group_coded Rhesus_factor Rhesus_factor_coded Sm


<int><chr> <int> <chr> <int> <chr> <int> <c

1 24 M 0 0 0 + 1 no

2 5 M 0 0 0 + 1 no

3 54 F 1 A 1 + 1 no

4 9 M 0 0 0 + 1 no

5 34 F 1 A 1 + 1 no

6 52 F 1 0 0 + 1 ye

6 rows | 1-9 of 14 columns

Step-02:Calculate Spearman’s rho

# Calculate Spearman's rank correlation coefficient between body weight and body height
spearman_corr <- cor(students$Weight_kg, students$Size_cm, method = "spearman")

# Display the Spearman correlation coefficient


cat("Spearman's rho:", spearman_corr, "\n")

## Spearman's rho: 0.7740172

Step-03:Test for Significance

# Perform the Spearman correlation test


cor_test <- cor.test(students$Weight_kg, students$Size_cm, method = "spearman")

file:///E:/Statistics/Exercises/Assignment-06.html 13/16
11/29/24, 12:41 PM R Notebook

## Warning in cor.test.default(students$Weight_kg, students$Size_cm, method =


## "spearman"): Cannot compute exact p-value with ties

# Print the result of the correlation test


cat("Spearman's rank correlation test result:\n")

## Spearman's rank correlation test result:

print(cor_test)

##
## Spearman's rank correlation rho
##
## data: students$Weight_kg and students$Size_cm
## S = 20764, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.7740172

Conclusion

# **Test for significance of the correlation**


# Null Hypothesis (H0):There is no monotonic relationship between body weight and body height (p
=0)
# Alternative Hypothesis (H1):There is a monotonic relationship between body weight and body hei
ght (p is not equal to 0)

#S = 20764, p-value < 2.2e-16,as the p-value is less than 0.05, we reject the null hypothesis. T
herefore, we conclude that there is a statistically significant monotonic relationship between b
ody weight and body height with a Spearman’s rho of p=0.7740172.

Exercise-80
Step-01:Load and view the Dataset

ICM <- read.delim("E:\\Statistics\\Datasets\\ICM.txt", stringsAsFactors = FALSE)

# View the structure of the data to identify the columns for NegativeMood and OHS
str(ICM)

file:///E:/Statistics/Exercises/Assignment-06.html 14/16
11/29/24, 12:41 PM R Notebook

## 'data.frame': 199 obs. of 23 variables:


## $ ID : int 75 90 173 189 100 155 63 48 76 165 ...
## $ Gender : chr "female" "female" "female" "female" ...
## $ Age : int 22 22 37 17 19 16 17 19 27 19 ...
## $ Englishfluent : chr "yes" "yes" "yes" "yes" ...
## $ Germanfluent : chr "no" "no" "yes" "yes" ...
## $ Transport : chr "PublicTransport" "PublicTransport" "Car" "Car" ...
## $ Highest_level_of_education: chr "College" "College" "University" "none" ...
## $ Do_you_smoke : chr "No" "No" "No" "No" ...
## $ Socialmediahours : chr "1.5-3hrs/day" "1.5-3hrs/day" "<1.5hrs/day" "1.5-3hrs/da
y" ...
## $ Timewithfriends : chr "2-5hrs/week" "2-5hrs/week" "5-10hrs/week" "10-20hrs/wee
k" ...
## $ Pet : chr "No" "No" "Yes" "Yes" ...
## $ Siblings : chr "Yes" "Yes" "No" "Yes" ...
## $ Children : chr "No" "No" "Yes" "No" ...
## $ Relationshipstatus : chr "Relationship" "Relationship" "Relationship" "Single" ...
## $ Activitieshours : int 10 10 20 40 20 10 10 20 10 20 ...
## $ NegativeMood : num NA NA NA 4 2.82 ...
## $ PositiveMood : num NA NA NA 0 0.333 ...
## $ Mentalhealth : num 2.667 2.667 3.5 1 0.833 ...
## $ Socialization : num NA NA NA 1 2.5 ...
## $ Activity : num 2.8 2.8 3.4 3.2 1.2 2.6 1.6 1.8 1.2 0.4 ...
## $ SocialSupport : num 4 4 2.333 0.667 2.333 ...
## $ Communication_open_direct : num NA NA 3.38 3.62 3.15 ...
## $ OHS : num 4.59 4.59 5.1 3.14 2.76 ...

# View the first few rows of the dataset to check the data
head(ICM)

ID Gen… A… Englishfluent Germanfluent Transport Highest_level_of_education


<int><chr> <int><chr> <chr> <chr> <chr>

1 75 female 22 yes no PublicTransport College

2 90 female 22 yes no PublicTransport College

3 173 female 37 yes yes Car University

4 189 female 17 yes yes Car none

5 100 female 19 yes yes Walk HighSchool

6 155 female 16 yes no Walk none

6 rows | 1-8 of 24 columns

Step:02-Calculate Spearman’s rho

file:///E:/Statistics/Exercises/Assignment-06.html 15/16
11/29/24, 12:41 PM R Notebook

# Remove rows with missing values in either NegativeMood or OHS


cleaned_data <- na.omit(ICM[, c("NegativeMood", "OHS")])

# Calculate Spearman's correlation on the cleaned data


spearman_corr <- cor(cleaned_data$NegativeMood, cleaned_data$OHS, method = "spearman")

# Display the Spearman correlation coefficient


cat("Spearman's rho:", spearman_corr, "\n")

## Spearman's rho: -0.5725575

Step-03:Test for Significance

# Perform the Spearman correlation test


cor_test <- cor.test(ICM$NegativeMood, ICM$OHS, method = "spearman")

## Warning in cor.test.default(ICM$NegativeMood, ICM$OHS, method = "spearman"):


## Cannot compute exact p-value with ties

# Print the result of the correlation test


cat("Spearman's rank correlation test result:\n")

## Spearman's rank correlation test result:

print(cor_test)

##
## Spearman's rank correlation rho
##
## data: ICM$NegativeMood and ICM$OHS
## S = 1453320, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.5725575

Conclusion

# **Test for significance of the correlation**


# Null Hypothesis (H0): There is no monotonic relationship between negative mood and OHS (p=0)
# Alternative Hypothesis (H1):There is a monotonic relationship between negative mood and OHS
(p is not equal to 0)

#S = 1453320, p-value < 2.2e-16,as the p-value is less than 0.05, we reject the null hypothesis
and conclude that there is a statistically significant negative monotonic relationship between n
egative mood and OHS.

file:///E:/Statistics/Exercises/Assignment-06.html 16/16

You might also like