Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Statistical Modelling Assignment II

The document outlines an assignment involving the analysis of two datasets: mosquito death data and graduate admission data. It details the use of logistic regression to analyze the graduate admission dataset, which includes variables such as GRE scores, GPA, and rank, and provides insights into how these factors affect admission odds. The results include coefficients for each predictor and their significance, along with confidence intervals and model fit measures.

Uploaded by

singhvasudha1095
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Statistical Modelling Assignment II

The document outlines an assignment involving the analysis of two datasets: mosquito death data and graduate admission data. It details the use of logistic regression to analyze the graduate admission dataset, which includes variables such as GRE scores, GPA, and rank, and provides insights into how these factors affect admission odds. The results include coefficients for each predictor and their significance, along with confidence intervals and model fit measures.

Uploaded by

singhvasudha1095
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Statistical Modelling Assignment II

Please analysis the following data and submit a report based on


the analysis.

 Data I: Mosquito death data

 Data II: Graduate admission data

Case I: Graduate Admission Data:


For our data analysis below, we have imported a data set from
the website which gives us information about Graduate
Admission Data.
data1 = read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
head(data1)

admit gre gpa rank


1 0 380 3.61 3
2 1 660 3.67 3
3 1 800 4.00 1
4 1 640 3.19 4
5 0 520 2.93 4
6 1 760 3.00 2

This dataset has a binary response variable called admit. There


are three predictor variables: gre, gpa and rank. We treat the
variables gre and gpa as continuous variables. We can get the
basic descriptive for the entire dataset using summary.
summary(data1)

admit gre gpa rank


Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
Median :0.0000 Median :580.0 Median :3.395 Median :2.000
Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000

Using the Logistic Regression Model:


The code below estimates a logistic regression model using
the glm (generalized linear model) function. First, we
convert rank to a factor to indicate that rank should be treated
as a categorical variable. Then to get the results, we use the
summary command.
data1$rank = factor(data1$rank)
logit = glm(admit ~ gre + gpa + rank, data = data1, family = 'binomial')
summary(logit)
Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial",
data = data1)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6268 -0.8662 -0.6388 1.1490 2.0790

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
rank2 -0.675443 0.316490 -2.134 0.032829 *
rank3 -1.340204 0.345306 -3.881 0.000104 ***
rank4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 499.98 on 399 degrees of freedom


Residual deviance: 458.52 on 394 degrees of freedom
AIC: 470.52

Number of Fisher Scoring iterations: 4

The logistic regression coefficients give the change in the log


odds of the outcome for a one unit increase in the predictor
variable.
 For every one-unit change in gre, the log odds of admission
(versus non – admission) increases by 0.002.
 For a one unit increase in gpa, the log odds of being
admitted to graduate school increases by 0.804.
 Having attended an undergraduate institution with rank of
2, versus an institution with a rank of 1, changes the log
odds of admission by -0.675.
 Having attended an undergraduate institution with rank of
3, versus an institution with a rank of 1, changes the log
odds of admission by -1.34.
 Having attended an undergraduate institution with rank of
4, versus an institution with a rank of 1, changes the log
odds of admission by -1.55.
We can use confint function to obtain confidence estimates.
confint(logit) # Confidence Intervals using profiled log – likelihood.

Waiting for profiling to be done...


2.5 % 97.5 %
(Intercept) -6.2716202334 -1.792547080
gre 0.0001375921 0.004435874
gpa 0.1602959439 1.464142727
rank2 -1.3008888002 -0.056745722
rank3 -2.0276713127 -0.670372346
rank4 -2.4000265384 -0.753542605
We can also check the measures of how well our model fits. One
measure of model fit is the significance of the overall model.
This test asks whether the model with predictors fits significantly
better than a model with just an intercept (a null model).
with(logit, null.deviance - deviance)

[1] 41.45903

Finally, the p-value can be obtained as:


with(logit, pchisq(null.deviance - deviance, df.null - df.residual,
lower.tail = FALSE))

[1] 7.578194e-08

You might also like