Baudm - Logistic Regression
Baudm - Logistic Regression
Baudm - Logistic Regression
SPSStop ^
Assumptions
When you choose to analyse your data using binomial logistic regression, part of the
process involves checking to make sure that the data you want to analyse can actually
be analysed using a binomial logistic regression. You need to do this because it is only
appropriate to use a binomial logistic regression if your data "passes" seven
assumptions that are required for binomial logistic regression to give you a valid result.
In practice, checking for these seven assumptions just adds a little bit more time to your
analysis, requiring you to click a few more buttons in SPSS when performing your
analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to some of these assumptions, do not be surprised if, when
analysing your own data using SPSS, one or more of these assumptions is violated
(i.e., not met). This is not uncommon when working with real-world data rather than
textbook examples, which often only show you how to carry out binomial logistic
regression when everything goes well! However, dont worry. Even when your data fails
certain assumptions, there is often a solution to overcome this. First, let's take a look at
some of these assumptions:
SPSStop ^
Example
A health researcher wants to be able to predict whether the "incidence of
heart disease" can be predicted based on "age", "weight", "gender" and
"VO2max" (i.e., where VO2max refers to maximal aerobic capacity, an
indicator of fitness and health). To this end, the researcher recruited 100
participants to perform a maximum VO2max test as well as recording their
age, weight and gender. The participants were also evaluated for the
presence of heart disease. A binomial logistic regression was then run to
determine whether the presence of heart disease could be predicted from
their VO2max, age, weight and gender. Note: this example and data is
fictitious.
SPSStop ^
Setup in SPSS
In this example, there are six variables: (1) heart_disease, which is whether
the participant has heart disease: "yes" or "no" (i.e., thedependent
variable); (2) VO2max, which is the maximal aerobic capacity; (3) age, which
is the participant's age; (4) weight, which is the participant's weight
(technically, it is their 'mass'); and (5) gender, which is the participant's
gender (i.e., the independent variables); and (6) caseno, which is the case
number.
Note: The caseno variable is used to make it easy for you to eliminate cases
(e.g., "significant outliers", "high leverage points" and "highly influential
points") that you have identified when checking for assumptions. It is not
used directly in calculations for a binomial logistic regression analysis.
In our enhanced binomial logistic regression guide, we show you how to
correctly enter data in SPSS to run a binomial logistic regression when you
are also checking for assumptions. You can learn about our enhanced data
setup content here. Alternately, we have a generic, "quick start" guide to
show you how to enter data into SPSS, available here.
SPSStop ^
Click Analyze > Regression > Binary Logistic... on the main menu,
as shown below:
You will be presented with the Logistic Regression dialogue box, as shown
below:
is not
Click the
Note: SPSS requires you to define all the categorical predictor values in the
logistic regression model. It does not do this automatically.
button, as shown
Note: Whether you choose Last or First will depend on how you set up your
data. In this example, males are to be compared to females, with females
acting as the reference category (who were coded "0"). Therefore, First is
chosen.
Click the
Click the
In the Statistics and Plots area, click the Classification plots, Hosmer-Lemeshow
goodness-of-fit, Casewise listing of residuals
Click the
Click the
SPSStop ^
This table contains the Cox & Snell R Square and Nagelkerke R
Square values, which are both methods of calculating the explained
variation. These values are sometimes referred to as pseudo R2 values (and
will have lower values than in multiple regression). However, they are
interpreted in the same manner, but with more caution. Therefore, the
explained variation in the dependent variable based on our model ranges
from 24.0% to 33.0%, depending on whether you reference the Cox &
Snell R2 or Nagelkerke R2 methods, respectively. Nagelkerke R2 is a
modification of Cox & Snell R2, the latter of which cannot achieve a value of
1. For this reason, it is preferable to report the Nagelkerke R2 value.
Category prediction
Firstly, notice that the table has a subscript which states, "The cut value is .
500". This means that if the probability of a case being classified into the
"yes" category is greater than .500, then that particular case is classified into
the "yes" category. Otherwise, the case is classified as in the "no" category
(as mentioned previously). Whilst the classification table appears to be very
simple, it actually provides a lot of important information about your
binomial logistic regression result, including:
General
A logistic regression was performed to ascertain the effects of age, weight, gender and
VO2max on the likelihood that participants have heart disease. The logistic regression
model was statistically significant, 2(4) = 27.402, p < .0005. The model explained
33.0% (Nagelkerke R2) of the variance in heart disease and correctly classified 71.0% of
cases. Males were 7.02 times more likely to exhibit heart disease than females.
Increasing age was associated with an increased likelihood of exhibiting heart disease,
but increasing VO2max was associated with a reduction in the likelihood of exhibiting
heart disease.
In addition to the write-up above, you should also include: (a) the results
from the assumptions tests that you have carried out; (b) the results from
the "Classification Table", including sensitivity, specificity, positive
predictive value and negative predictive value; and (c) the results from the
"Variables in the Equation" table, including which of the predictor
variables were statistically significant and what predictions can be made
based on the use of odds ratios. If you are unsure how to do this, we show
you in our enhanced binomial logistic regression guide. We also show you
how to write up the results from your assumptions tests and binomial logistic
regression output if you need to report this in a dissertation/thesis,
assignment or research report. We do this using the Harvard and APA styles.
You can learn more about our enhanced content here.