Assignment_STAT5002

The document outlines an individual assignment for the STAT5002 Introduction to Statistics course at the University of Sydney, due on November 1, 2024. It consists of five questions covering hypothesis testing, chi-squared tests, logistic regression, and multiple linear regression, with specific tasks and R code provided for each question. Students are instructed to submit their solutions as a single PDF file, ensuring anonymity by including only their student ID.

Uploaded by

Tanisha Dhavalpure

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Assignment_STAT5002

Uploaded by

Tanisha Dhavalpure

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

The University of Sydney

STAT5002 Introduction to Statistics

Semester 2 — 18 Oct, 2024 2024

Lecturers: Tiangang Cui, Mohammad Javad Davoudabadi
This individual assignment is due by 11:59pm Friday 1 Nov 2024, via Canvas. There are five questions
in this assignment, each weigh 10 points. Your solution should be submitted as a single pdf file, include
your written answer and screenshots of your code (and relevant outputs if necessary).
Your submitted file should include your SID. To ensure compliance with our anonymous marking obliga-
tions, please do not under any circumstances include your name in any area of your assignment; only your
SID should be present. Please make sure you review your submissions carefully. What you see is exactly
how the marker will see your assignment. Submissions can be overwritten until the due date.

1. We want to test the difference between two drugs A and B for treating high blood pressure, 20 patients
are paired according to age. One of each pair is chosen at random to receive drug A and the other
receives drug B. The resulting drops in blood pressure are set out below:

Pair 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Drug A 5 4 2 6 9 1 1 5 6 3 7 14 8 3 4 10 7 12 6 9
Drug B 2 5 1 2 6 3 0 2 5 2 7 12 5 2 3 6 5 10 4 7
Difference 3 -1 1 4 3 -2 1 3 1 1 0 2 3 1 1 4 2 2 2 2

The following R outputs will be used.

> # Drug A data as a vector
> drug_A = c(5, 4, 2, 6, 9, 1, 1, 5, 6, 3, 7, 14, 8, 3, 4, 10, 7, 12, 6, 9)
> # Drug B data as a vector
>
> drug_B = c(2, 5, 1, 2, 6, 3, 0, 2, 5, 2, 7, 12, 5, 2, 3, 6, 5, 10, 4, 7)
> diff = drug_A - drug_B
>
> mean(diff)
[1] 1.65
Assuming the difference has a known standard deviation of σ = 1.5 and using the mean of the difference,
we want to carry out the following steps of a hypothesis test.

(a) Introduce appropriate notation to state the null and alternative hypotheses.
(b) What distribution of test statistic should be used here? Explain your answer.
(c) Compute the observed test statistic.
(d) What values of test statistics will argue against the null hypotheses? Is this a one-sided or two-sided
test? Explain your answer.
(e) Applying the 68%-95%-99.7% rule, what is the smallest p-value you can find? Explain your answer.
(f) With a significant level α = 0.05, do we observe suﬀicient evidence to reject the null hypothesis?
Explain your answer.
(g) Now we want to find out if Drug A is more effective then Drug B (the larger drop in blood pressure
means it is more effective).
(1)What is your new alternative hypotheses?

(2)What is your new p-value? (the smallest you can estimate based on 68%-95%-99.7% rule)

(3)With a significant level α = 0.05, do we observe suﬀicient evidence to reject the null hypoth-
esis?
Explain your answer.
2. We consider the same data table in Question 1. The company producing the drugs claims that Drug A
is expected to reduce blood pressure by 7 units. Suppose the drop of blood pressure of Drug A follows
a normal distribution with unknown standard deviation, we want to test if the company’s claim is true.

We want to carry out the following steps of a hypothesis test. You can use the following R outputs.
> # Drug A data as a vector
> drug_A = c(5, 4, 2, 6, 9, 1, 1, 5, 6, 3, 7, 14, 8, 3, 4, 10, 7, 12, 6, 9)
> mean(drug_A)
[1] 6.1
> round(sd(drug_A),1)
[1] 3.5

(a) State the null and alternative hypotheses.

(b) What distribution of test statistic should be used here? Explain your answer.
(c) Compute the observed test statistic.
(d) What values of test statistics will argue against the null hypotheses? Is this a one-sided or two-sided
test? Explain your answer.
(e) What is the p-value? Write down the R code (in one line) that you used for calculating the p-value.
(f) With a significant level α = 0.05, do we observe suﬀicient evidence to reject the null hypothesis?
Explain your answer.
3. Consider the table below which counts the number of right- and left-handers in each of three random
samples of individuals from each of three populations:
Left-handed Right-handed Total
Population 1 4 36 40
Population 2 4 26 30
Population 3 2 28 30
Total 10 90 100
It is of interest to test whether the proportions of hand dominance (left-handed and right-handed) are
the same across all three populations. Following the following steps, performing a chi-squred test to
investigate this.

(a) State the null and alternative hypotheses.

(b) Set up the table of expected count (you may also set up the table of expected probability for this).
(c) Compute the observed test statistic.
(d) What is the degrees of freedom of the distribution of test statistic? Explain your answer.
(e) What values of test statistics will argue against the null hypotheses? Explain your answer.
(f) With a significant level α = 0.05, do we observe suﬀicient evidence to reject the null hypothesis?
Explain your answer.

> qchisq(0.95, 4)
[1] 9.487729
> qchisq(0.95, 3)
[1] 7.814728
> qchisq(0.95, 2)
[1] 5.991465
> qchisq(0.95, 1)
[1] 3.841459
4. A marketing company wants to understand the purchasing behaviour of customers for a new product
they have launched. To do this, they conducted a survey to collect data on customer demographics
and whether or not they purchased the product after receiving a promotional offer.

One key variable they are interested in is Age. The company hypothesises that older customers may
be more likely to purchase the product. The binary outcome of the study is whether the customer
purchased the product (Purchased = 1) or did not purchase the product (Purchased = 0).

You are given a dataset that includes the ages of 100 customers and whether they purchased the prod-
uct. The company asks you to perform logistic regression to analyse the relationship between Age and
the likelihood of purchasing the product.

A simulated data set will be used in this question. You should use the code below to simulate the data
in R to complete the following tasks.
> # Simulating a dataset in R
> set.seed(123)
> n <- 100
> # Age centered around 40 years with standard deviation 10
> Age <- rnorm(n, mean = 40, sd = 10)
> # Logistic function with Age as predictor
> Purchased <- rbinom(n, 1, prob = 1 / (1 + exp(-(0.1 * Age - 4))))
> data <- data.frame(Age, Purchased)
>
> # Display first few rows of the dataset
> head(data)
(a) Fit a logistic regression model to the data using Age as the predictor. Write down the logistic
regression equation using the estimated coeﬀicients from your model.
(b) Interpret the estimated coeﬀicient for Age. What does it tell you about how age affects the prob-
ability of purchasing the product? Describe it in terms of the odds ratio as well.
(c) Based on your model, calculate the predicted probability that a 35-year-old person will purchase
the product. What does it tell you?
5. A real estate company is analysing the factors that influence the price of houses in a specific city. They
collected data on several features of 100 houses, including the number of bedrooms, size of the house
(in square feet), lot size, number of bathrooms, and age of the house.

The company is interested in using multiple linear regression to predict house prices based on these
features. However, they are unsure which variables to include in the final model. You are asked to help
them select the best model using forward and backward variable selection.

The dataset includes 100 observations with the following variables:

- Bedrooms: The number of bedrooms in the house.
- HouseSize: The size of the house in square feet.
- LotSize: The size of the lot in square feet.
- Bathrooms: The number of bathrooms in the house.
- Age: The age of the house (in years).
- Price: The price of the house (in dollars).

A simulated data set will be used in this question. You should use the code below to simulate the data
in R to complete the following tasks.
> # Set seed for reproducibility
> set.seed(123)
> # Simulating data for the 100 houses
> n <- 100
> Bedrooms <- sample(2:5, n, replace = TRUE)
> HouseSize <- rnorm(n, mean = 2000, sd = 500)
> LotSize <- rnorm(n, mean = 6000, sd = 2000)
> Bathrooms <- sample(1:4, n, replace = TRUE)
> Age <- sample(1:20, n, replace = TRUE)
>
> # Generate house prices based on a true underlying model
> Price <- 50000 + 30000 * Bedrooms + 120 * HouseSize + 50 * LotSize
> + 25000 * Bathrooms - 500 * Age + rnorm(n, mean = 0, sd = 50000)
>
> # Create a data frame
> house_data <- data.frame(Bedrooms, HouseSize, LotSize, Bathrooms, Age, Price)

(a) Perform forward variable selection using AIC to choose the best model for predicting house price
based on the features. Report the variables that were included in the final model, and if any
variables were removed, specify them. Additionally, provide the regression equation for the final
model, including the estimated coeﬀicients.
(b) Perform backward variable selection using AIC to choose the best model for predicting house price
based on the features. Report the variables that were included in the final model, and if any
variables were removed, specify them. Additionally, provide the regression equation for the final
model, including the estimated coeﬀicients.
(c) Compare the models obtained from forward and backward selection. Are they the same? Explain
any differences and suggest which model you would recommend to the real estate company.

MH3511 Midterm 2017 Q
No ratings yet
MH3511 Midterm 2017 Q
4 pages
Statistics Skittles Project
No ratings yet
Statistics Skittles Project
7 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Assignment 4 - BUS 336
No ratings yet
Assignment 4 - BUS 336
4 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Sta 226
No ratings yet
Sta 226
5 pages
Cs1b 221 Exam Final Clean
No ratings yet
Cs1b 221 Exam Final Clean
8 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
ECON20003 S1 2024 Sample Exam
No ratings yet
ECON20003 S1 2024 Sample Exam
27 pages
assignment-3
No ratings yet
assignment-3
17 pages
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
100% (4)
Full Download Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva PDF DOCX
55 pages
BES - R Lab 6
No ratings yet
BES - R Lab 6
7 pages
Unit 540 Differences Between Two Groups Without Answers
No ratings yet
Unit 540 Differences Between Two Groups Without Answers
5 pages
Sec Assignment - Unit II
No ratings yet
Sec Assignment - Unit II
14 pages
Assignment 3 (2023)
No ratings yet
Assignment 3 (2023)
9 pages
Statistics Econometrics Exam Feb
No ratings yet
Statistics Econometrics Exam Feb
8 pages
R-Practical questions-Sem-IV
No ratings yet
R-Practical questions-Sem-IV
4 pages
R Project Task 2023-1
No ratings yet
R Project Task 2023-1
1 page
CS1B - September 2024 - Exam Paper
No ratings yet
CS1B - September 2024 - Exam Paper
6 pages
Base Your Answers On It.: XX Xy Yy 2
No ratings yet
Base Your Answers On It.: XX Xy Yy 2
2 pages
CS1B September 23 EXAM Clean Proof
No ratings yet
CS1B September 23 EXAM Clean Proof
7 pages
Programming With R Test 2
50% (2)
Programming With R Test 2
5 pages
Copy of Assignment5_Fall 2024
No ratings yet
Copy of Assignment5_Fall 2024
14 pages
Practice Questions - Final With Feedback
No ratings yet
Practice Questions - Final With Feedback
8 pages
Attachment 1
No ratings yet
Attachment 1
6 pages
BA - Advanced statistical method using R (P2)
No ratings yet
BA - Advanced statistical method using R (P2)
12 pages
Algorithm M
No ratings yet
Algorithm M
8 pages
Econometrics Trial exam 1
No ratings yet
Econometrics Trial exam 1
15 pages
CS1B_April_2024_Exam_Paper
No ratings yet
CS1B_April_2024_Exam_Paper
7 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
s
No ratings yet
s
20 pages
BES - R Lab 5
No ratings yet
BES - R Lab 5
7 pages
Chapter 7-Exercises Solutions
No ratings yet
Chapter 7-Exercises Solutions
5 pages
Commands for Data Analysis using R
No ratings yet
Commands for Data Analysis using R
11 pages
Copy of Hints of Assignment5_Fall 2024
No ratings yet
Copy of Hints of Assignment5_Fall 2024
11 pages
CS1B September22 EXAM Clean Proof
No ratings yet
CS1B September22 EXAM Clean Proof
5 pages
Computer Lab 1 MM
No ratings yet
Computer Lab 1 MM
26 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Problems
No ratings yet
Problems
12 pages
Pritha@xlri - Ac.in: To Immediate Cancellation of The Examination
No ratings yet
Pritha@xlri - Ac.in: To Immediate Cancellation of The Examination
5 pages
stat2, HW2
No ratings yet
stat2, HW2
10 pages
ESB2021 Resit With Solution
No ratings yet
ESB2021 Resit With Solution
9 pages
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
No ratings yet
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
5 pages
STAT 302-1 Sample Final Exam
No ratings yet
STAT 302-1 Sample Final Exam
26 pages
Tutorial 8
No ratings yet
Tutorial 8
3 pages
Account Based Analytics Final Spring 2025
No ratings yet
Account Based Analytics Final Spring 2025
2 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
R Commands
No ratings yet
R Commands
5 pages
A1 QuestionFinalExam
No ratings yet
A1 QuestionFinalExam
8 pages
Stat 302 Practice Final: Brad Mcneney 2017-04-15
No ratings yet
Stat 302 Practice Final: Brad Mcneney 2017-04-15
7 pages
HLST 2301 Notes Print Me
No ratings yet
HLST 2301 Notes Print Me
29 pages
Computer Project - Student Choose Data
No ratings yet
Computer Project - Student Choose Data
4 pages
304BA AdvancedStatisticalMethodsUsingR
No ratings yet
304BA AdvancedStatisticalMethodsUsingR
31 pages
SN1_project_part2
No ratings yet
SN1_project_part2
2 pages
PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
100% (3)
PDF Solutions Manual to Advanced Regression Models with SAS and R 1st Edition Olga Korosteleva download
33 pages
Unit 540 Differences Between Two Groups With Answers
No ratings yet
Unit 540 Differences Between Two Groups With Answers
8 pages
BES - R Lab 7
No ratings yet
BES - R Lab 7
5 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
FDS LAB 2
No ratings yet
FDS LAB 2
14 pages
Analysis of Indian Economic Growth & Development
No ratings yet
Analysis of Indian Economic Growth & Development
33 pages
FDS LAB 4
No ratings yet
FDS LAB 4
8 pages
Inflation in Indian Context
No ratings yet
Inflation in Indian Context
13 pages
India's External Sector
No ratings yet
India's External Sector
22 pages
Sampling
No ratings yet
Sampling
101 pages
Statistics and Probability: Quarter 4 - Module 2
No ratings yet
Statistics and Probability: Quarter 4 - Module 2
20 pages
Low MS v2-1
0% (1)
Low MS v2-1
7 pages
MIT18 05S14 Reading2
No ratings yet
MIT18 05S14 Reading2
6 pages
Statistics and Probability Lesson 1
100% (1)
Statistics and Probability Lesson 1
6 pages
Example Sheet
No ratings yet
Example Sheet
11 pages
Prepared By: Engr. Muhammad Amin Qureshi: Probability & Statistics Week 10
No ratings yet
Prepared By: Engr. Muhammad Amin Qureshi: Probability & Statistics Week 10
30 pages
Non Parametric Statistics
No ratings yet
Non Parametric Statistics
96 pages
Worked Examples of Non-Parametric Tests
No ratings yet
Worked Examples of Non-Parametric Tests
22 pages
CIG403 - Plane Surveying - Theodolite, Total Station, and Traverse (Lecture Slides)
No ratings yet
CIG403 - Plane Surveying - Theodolite, Total Station, and Traverse (Lecture Slides)
49 pages
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
No ratings yet
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
4 pages
Test Duncan
No ratings yet
Test Duncan
9 pages
Probability PDF
No ratings yet
Probability PDF
7 pages
NCERT Solution For Cbse Class 9 Maths Chapter 15 Probability
No ratings yet
NCERT Solution For Cbse Class 9 Maths Chapter 15 Probability
6 pages
Geostatistics and Reservoir Modeling Module: Review of Basic Statistics
No ratings yet
Geostatistics and Reservoir Modeling Module: Review of Basic Statistics
52 pages
Quiz - Estimators Attempt Review
No ratings yet
Quiz - Estimators Attempt Review
5 pages
Problems
No ratings yet
Problems
2 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
77 pages
How To Analyze A Split-Plot Experiment: Design of Experiments
No ratings yet
How To Analyze A Split-Plot Experiment: Design of Experiments
8 pages
Introduction To Biostatistics: Reynaldo G. San Luis III, MD
No ratings yet
Introduction To Biostatistics: Reynaldo G. San Luis III, MD
28 pages
Medical Statistics: by Dr. Wafaayousif
No ratings yet
Medical Statistics: by Dr. Wafaayousif
24 pages
Hybrid Math 11 Stat Q1 M1 W1 V2
No ratings yet
Hybrid Math 11 Stat Q1 M1 W1 V2
13 pages
pr2 Exam POINTERS
No ratings yet
pr2 Exam POINTERS
4 pages
Module2 - Random Variable
No ratings yet
Module2 - Random Variable
24 pages
Aashto M 231-1995 R2010
No ratings yet
Aashto M 231-1995 R2010
6 pages
Hubungan Antara Hospitalisasi Anak Dengan Tingkat Kecemasan Orang Tua
No ratings yet
Hubungan Antara Hospitalisasi Anak Dengan Tingkat Kecemasan Orang Tua
4 pages
MCQ Hypothesis Testing 4
No ratings yet
MCQ Hypothesis Testing 4
3 pages
Brm Notes(Unit IV)
No ratings yet
Brm Notes(Unit IV)
51 pages
It Is Claimed That Automobiles Are Driven On Average More Than 2
100% (1)
It Is Claimed That Automobiles Are Driven On Average More Than 2
2 pages