Multinomial Logistic Regression Basic Relationships

This document provides an overview of multinomial logistic regression. It discusses key topics such as: - Multinomial logistic regression compares multiple groups through a combination of binary logistic regressions. - It predicts probabilities of group membership and compares predicted vs. actual groups to determine classification accuracy. - For a model to be considered useful, its classification accuracy must be at least 25% higher than chance-level accuracy. - Relationship between individual predictors and the outcome are evaluated through likelihood ratio and Wald tests.

Uploaded by

Bundanya Savina

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views

Multinomial Logistic Regression Basic Relationships

Uploaded by

Bundanya Savina

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

SW388R7

Data Analysis &

Computers II

Slide 1
Multinomial Logistic Regression
Basic Relationships

Multinomial Logistic Regression

Describing Relationships

Classification Accuracy

Sample Problems
SW388R7
Data Analysis &
Computers II

Slide 2
Multinomial logistic regression
Multinomial logistic regression is used to analyze relationships
between a non-metric dependent variable and metric or
dichotomous independent variables.

Multinomial logistic regression compares multiple groups
through a combination of binary logistic regressions.

The group comparisons are equivalent to the comparisons for a
dummy-coded dependent variable, with the group with the
highest numeric score used as the reference group.

For example, if we wanted to study differences in BSW, MSW,
and PhD students using multinomial logistic regression, the
analysis would compare BSW students to PhD students and MSW
students to PhD students. For each independent variable, there
would be two comparisons.
SW388R7
Data Analysis &
Computers II

Slide 3
What multinomial logistic regression predicts
Multinomial logistic regression provides a set of coefficients for
each of the two comparisons. The coefficients for the
reference group are all zeros, similar to the coefficients for the
reference group for a dummy-coded variable.

Thus, there are three equations, one for each of the groups
defined by the dependent variable.

The three equations can be used to compute the probability
that a subject is a member of each of the three groups. A case
is predicted to belong to the group associated with the highest
probability.

Predicted group membership can be compared to actual group
membership to obtain a measure of classification accuracy.
SW388R7
Data Analysis &
Computers II

Slide 4
Level of measurement requirements
Multinomial logistic regression analysis requires that the
dependent variable be non-metric. Dichotomous, nominal, and
ordinal variables satisfy the level of measurement requirement.

Multinomial logistic regression analysis requires that the
independent variables be metric or dichotomous. Since SPSS
will automatically dummy-code nominal level variables, they
can be included since they will be dichotomized in the analysis.

In SPSS, non-metric independent variables are included as
factors. SPSS will dummy-code non-metric IVs.

In SPSS, metric independent variables are included as
covariates. If an independent variable is ordinal, we will
attach the usual caution.
SW388R7
Data Analysis &
Computers II

Slide 5
Assumptions and outliers
Multinomial logistic regression does not make any assumptions
of normality, linearity, and homogeneity of variance for the
independent variables.

Because it does not impose these requirements, it is preferred
to discriminant analysis when the data does not satisfy these
assumptions.

SPSS does not compute any diagnostic statistics for outliers. To
evaluate outliers, the advice is to run multiple binary logistic
regressions and use those results to test the exclusion of
outliers or influential cases.
SW388R7
Data Analysis &
Computers II

Slide 6
Sample size requirements
The minimum number of cases per independent variable is 10,
using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.

For preferred case-to-variable ratios, we will use 20 to 1.
SW388R7
Data Analysis &
Computers II

Slide 7
Methods for including variables
The only method for selecting independent variables in SPSS is
simultaneous or direct entry.

SW388R7
Data Analysis &
Computers II

Slide 8
Overall test of relationship - 1
The overall test of relationship among the independent
variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.

This difference in likelihood follows a chi-square distribution,
and is referred to as the model chi-square.

The significance test for the final model chi-square (after the
independent variables have been added) is our statistical
evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.

SW388R7
Data Analysis &
Computers II

Slide 9
Overall test of relationship - 2
Model Fitting Information
284.429
265.972 18.457 6 .005
Model
Intercept Onl y
Fi nal
-2 Log
Li kel i hood Chi-Square df Si g.
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
SW388R7
Data Analysis &
Computers II

Slide 10
Strength of multinomial logistic regression
relationship
While multinomial logistic regression does compute correlation
measures to estimate the strength of the relationship (pseudo R
square measures, such as Nagelkerke's R), these correlations
measures do not really tell us much about the accuracy or
errors associated with the model.

A more useful measure to assess the utility of a multinomial
logistic regression model is classification accuracy, which
compares predicted group membership based on the logistic
model to the actual, known group membership, which is the
value for the dependent variable.

SW388R7
Data Analysis &
Computers II

Slide 11
Evaluating usefulness for logistic models
The benchmark that we will use to characterize a multinomial
logistic regression model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.

Even if the independent variables had no relationship to the
groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.

The estimate of by chance accuracy that we will use is the
proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group. The only
difference between by chance accuracy for binary logistic
models and by chance accuracy for multinomial logistic models
is the number of groups defined by the dependent variable.
SW388R7
Data Analysis &
Computers II

Slide 12
Computing by chance accuracy
The percentage of cases in each group defined by the dependent
variable is found in the Case Processing Summary table.
Case Processing Summary
62 37.1%
93 55.7%
12 7.2%
167 100.0%
103
270
153
a
1
2
3
HIGHWAYS
AND BRIDGES
Vali d
Missing
Total
Subpopul ati on
N
Margi nal
Percentage
The dependent vari abl e has onl y one val ue observed
i n 146 (95.4%) subpopul ati ons.
a.
The proportional by chance accuracy rate was
computed by calculating the proportion of cases for
each group based on the number of cases in each
group in the 'Case Processing Summary', and then
squaring and summing the proportion of cases in each
group (0.371 + 0.557 + 0.072 = 0.453).

The proportional by chance accuracy criteria is 56.6%
(1.25 x 45.3% = 56.6%).
SW388R7
Data Analysis &
Computers II

Slide 13
Comparing accuracy rates
To characterize our model as useful, we compare the overall
percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for multinomial logistic regression .)
Classification
15 47 0 24.2%
7 86 0 92.5%
5 7 0 .0%
16.2% 83.8% .0% 60.5%
Observed
1
2
3
Overal l Percentage
1 2 3
Percent
Correct
Predi cted
The classification accuracy rate was 60.5%
which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).

The criteria for classification accuracy is
satisfied in this example.
SW388R7
Data Analysis &
Computers II

Slide 14
Numerical problems
The maximum likelihood method used to calculate multinomial
logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.
SW388R7
Data Analysis &
Computers II

Slide 15
Relationship of individual independent
variables and the dependent variable
There are two types of tests for individual independent
variables:
The likelihood ratio test evaluates the overall relationship
between an independent variable and the dependent
variable
The Wald test evaluates whether or not the independent
variable is statistically significant in differentiating between
the two groups in each of the embedded binary logistic
comparisons.

If an independent variable has an overall relationship to the
dependent variable, it might or might not be statistically
significant in differentiating between pairs of groups defined by
the dependent variable.
SW388R7
Data Analysis &
Computers II

Slide 16
Relationship of individual independent
variables and the dependent variable
The interpretation for an independent variable focuses on its
ability to distinguish between pairs of groups and the
contribution which it makes to changing the odds of being in
one dependent variable group rather than the other.

We should not interpret the significance of an independent
variables role in distinguishing between pairs of groups unless
the independent variable also has an overall relationship to the
dependent variable in the likelihood ratio test.

The interpretation of an independent variables role in
differentiating dependent variable groups is the same as we
used in binary logistic regression. The difference in
multinomial logistic regression is that we can have multiple
interpretations for an independent variable in relation to
different pairs of groups.
SW388R7
Data Analysis &
Computers II

Slide 17
Relationship of individual independent
variables and the dependent variable
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
TOO LITTLE
ABOUT RIGHT
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: TOO MUCH.
a.
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
SPSS identifies the comparisons it makes for
groups defined by the dependent variable in
the table of Parameter Estimates, using either
the value codes or the value labels, depending
on the options settings for pivot table labeling.

The reference category is identified in the
footnote to the table.

In this analysis, two comparisons will be
made:
the TOO LITTLE group (coded 1, shaded
blue) will be compared to the TOO MUCH
group (coded 3, shaded purple)
the ABOUT RIGHT group (coded 2 ,
shaded orange)) will be compared to the
TOO MUCH group (coded 3, shaded
purple).

The reference category plays the same role in
multinomial logistic regression that it plays in
the dummy-coding of a nominal variable: it is
the category that would be coded with zeros
for all of the dummy-coded variables that all
other categories are interpreted against.
SW388R7
Data Analysis &
Computers II

Slide 18
Relationship of individual independent
variables and the dependent variable
Likelihood Ratio Tests
268.323 2.350 2 .309
268.625 2.652 2 .265
270.395 4.423 2 .110
275.194 9.221 2 .010
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference in -2 l og-l i kel i hoods
between the fi nal model and a reduced model . The reduced model i s
formed by omi tti ng an effect from the fi nal model . The nul l hypothesi s
i s that al l parameters of that effect are 0.
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
In this example, there is a
statistically significant
relationship between the
independent variable
CONLEGIS and the dependent
variable. (0.010 < 0.05)
As well, the independent
variable CONLEGIS is
significant in distinguishing
both category 1 of the
dependent variable from
category 3 of the dependent
variable. (0.027 < 0.05)
And the independent variable CONLEGIS is significant in
distinguishing category 2 of the dependent variable from
category 3 of the dependent variable. (0.007 < 0.05)
SW388R7
Data Analysis &
Computers II

Slide 19
Interpreting relationship of individual independent
variables to the dependent variable
Likelihood Ratio Tests
268.323 2.350 2 .309
268.625 2.652 2 .265
270.395 4.423 2 .110
275.194 9.221 2 .010
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference in -2 l og-l i kel i hoods
between the fi nal model and a reduced model . The reduced model i s
formed by omi tti ng an effect from the fi nal model . The nul l hypothesi s
i s that al l parameters of that effect are 0.
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
Survey respondents who had less confidence in congress (higher
values correspond to lower confidence) were less likely to be in the
group of survey respondents who thought we spend too little money
on highways and bridges (DV category 1), rather than the group of
survey respondents who thought we spend too much money on
highways and bridges (DV category 3).

For each unit increase in confidence in Congress, the odds of being
in the group of survey respondents who thought we spend too little
money on highways and bridges decreased by 74.7%. (0.253 1.0
= -0.747)
SW388R7
Data Analysis &
Computers II

Slide 20
Interpreting relationship of individual independent
variables to the dependent variable
Likelihood Ratio Tests
268.323 2.350 2 .309
268.625 2.652 2 .265
270.395 4.423 2 .110
275.194 9.221 2 .010
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference in -2 l og-l i kel i hoods
between the fi nal model and a reduced model . The reduced model i s
formed by omi tti ng an effect from the fi nal model . The nul l hypothesi s
i s that al l parameters of that effect are 0.
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
Survey respondents who had less confidence in congress (higher
values correspond to lower confidence) were less likely to be in the
group of survey respondents who thought we spend about the right
amount of money on highways and bridges (DV category 2), rather
than the group of survey respondents who thought we spend too
much money on highways and bridges (DV Category 3).

For each unit increase in confidence in Congress, the odds of being
in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by
80.9%. (0.191 1.0 = 0.809)
SW388R7
Data Analysis &
Computers II

Slide 21
Relationship of individual independent
variables and the dependent variable
Likelihood Ratio Tests
327.463
a
.000 0 .
333.440 5.976 2 .050
329.606 2.143 2 .343
334.636 7.173 2 .028
338.985 11.521 2 .003
Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference i n -2 log-li kel i hoods
between the fi nal model and a reduced model. The reduced model
i s formed by omi tting an effect from the final model . The nul l
hypothesi s is that al l parameters of that effect are 0.
This reduced model i s equi val ent to the fi nal model because
omi tti ng the effect does not i ncrease the degrees of freedom.
a.
Parameter Estimates
8.434 2.233 14.261 1 .000
-.023 .017 1.756 1 .185 .977 .944 1.011
-.066 .102 .414 1 .520 .936 .766 1.144
-.575 .251 5.234 1 .022 .563 .344 .921
-2.167 .805 7.242 1 .007 .115 .024 .555
0
b
. . 0 . . . .
4.485 2.255 3.955 1 .047
-.001 .018 .003 1 .955 .999 .965 1.034
.011 .104 .011 1 .916 1.011 .824 1.240
-.397 .257 2.375 1 .123 .673 .406 1.114
-1.606 .824 3.800 1 .051 .201 .040 1.009
0
b
. . 0 . . . .
Intercept
AGE
EDUC
POLVIEWS
[SEX=1]
[SEX=2]
Intercept
AGE
EDUC
POLVIEWS
[SEX=1]
[SEX=2]
NATCHLD
a
TOO LITTLE
ABOUT RIGHT
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: TOO MUCH.
a.
This parameter is set to zero because i t i s redundant.
b.
In this example, there is
a statistically significant
relationship between SEX
and the dependent
variable, spending on
childcare assistance.
As well, SEX plays a
statistically significant role
in differentiating the TOO
LITTLE group from the TOO
MUCH (reference) group.
(0.007 < 0.5)
However, SEX does not
differentiate the ABOUT
RIGHT group from the
TOO MUCH (reference)
group.(0.51 > 0.5)
SW388R7
Data Analysis &
Computers II

Slide 22
Interpreting relationship of individual independent
variables and the dependent variable
Likelihood Ratio Tests
327.463
a
.000 0 .
333.440 5.976 2 .050
329.606 2.143 2 .343
334.636 7.173 2 .028
338.985 11.521 2 .003
Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference i n -2 log-li kel i hoods
between the fi nal model and a reduced model. The reduced model
i s formed by omi tting an effect from the final model . The nul l
hypothesi s is that al l parameters of that effect are 0.
This reduced model i s equi val ent to the fi nal model because
omi tti ng the effect does not i ncrease the degrees of freedom.
a.
Parameter Estimates
8.434 2.233 14.261 1 .000
-.023 .017 1.756 1 .185 .977 .944 1.011
-.066 .102 .414 1 .520 .936 .766 1.144
-.575 .251 5.234 1 .022 .563 .344 .921
-2.167 .805 7.242 1 .007 .115 .024 .555
0
b
. . 0 . . . .
4.485 2.255 3.955 1 .047
-.001 .018 .003 1 .955 .999 .965 1.034
.011 .104 .011 1 .916 1.011 .824 1.240
-.397 .257 2.375 1 .123 .673 .406 1.114
-1.606 .824 3.800 1 .051 .201 .040 1.009
0
b
. . 0 . . . .
Intercept
AGE
EDUC
POLVIEWS
[SEX=1]
[SEX=2]
Intercept
AGE
EDUC
POLVIEWS
[SEX=1]
[SEX=2]
NATCHLD
a
TOO LITTLE
ABOUT RIGHT
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: TOO MUCH.
a.
This parameter is set to zero because i t i s redundant.
b.
Survey respondents who were male (code 1 for sex) were less likely
to be in the group of survey respondents who thought we spend too
little money on childcare assistance (DV category 1), rather than the
group of survey respondents who thought we spend too much
money on childcare assistance (DV category 3).

Survey respondents who were male were 88.5% less likely (0.115
1.0 = -0.885) to be in the group of survey respondents who thought
we spend too little money on childcare assistance.
SW388R7
Data Analysis &
Computers II

Slide 23
Interpreting relationships for independent
variable in problems
In the multinomial logistic regression problems, the problem
statement will ask about only one of the independent variables.
The answer will be true or false based on only the relationship
between the specified independent variable and the dependent
variable. The individual relationships between other
independent variables are the dependent variable are not used
in determining whether or not the answer is true or false.
SW388R7
Data Analysis &
Computers II

Slide 24
Problem 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II

Slide 25
Dissecting problem 1 - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
For these problems, we will
assume that there is no problem
with missing data, outliers, or
influential cases, and that the
validation analysis will confirm
the generalizability of the
results

In this problem, we are told to
use 0.05 as alpha for the
multinomial logistic regression.
SW388R7
Data Analysis &
Computers II

Slide 26
Dissecting problem 1 - 2
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
SPSS only supports direct or
simultaneous entry of independent
variables in multinomial logistic
regression, so we have no choice of
method for entering variables.
The variables listed first in the problem
statement are the independent variables
(IVs): "age" [age], "highest year of school
completed" [educ] and "confidence in
Congress" [conlegis].
The variable used to define
groups is the dependent
variable (DV): "opinion about
spending on highways and
bridges" [natroad].
SW388R7
Data Analysis &
Computers II

Slide 27
Dissecting problem 1 - 3
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
SPSS multinomial logistic regression models the relationship by
comparing each of the groups defined by the dependent variable to the
group with the highest code value.

The responses to opinion about spending on highways and bridges were:
1= Too little, 2 = About right, and 3 = Too much.
The analysis will result in two comparisons:
survey respondents who thought we spend too little money
versus survey respondents who thought we spend too much
money on highways and bridges
survey respondents who thought we spend about the right
amount of money versus survey respondents who thought we
spend too much money on highways and bridges.
SW388R7
Data Analysis &
Computers II

Slide 28
Dissecting problem 1 - 4
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend too little money on highways and bridges, rather
than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the
group of survey respondents who thought we spend too little money on highways and
bridges decreased by 74.7%. Survey respondents who had less confidence in congress were
less likely to be in the group of survey respondents who thought we spend about the right
amount of money on highways and bridges, rather than the group of survey respondents
who thought we spend too much money on highways and bridges. For each unit increase in
confidence in Congress, the odds of being in the group of survey respondents who thought
we spend about the right amount of money on highways and bridges decreased by 80.9%.
Each problem includes a statement about the relationship between
one independent variable and the dependent variable. The answer
to the problem is based on the stated relationship, ignoring the
relationships between the other independent variables and the
dependent variable.

This problem identifies a difference for both of the comparisons
among groups modeled by the multinomial logistic regression.
SW388R7
Data Analysis &
Computers II

Slide 29
Dissecting problem 1 - 5
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.

In order for the multinomial logistic regression
question to be true, the overall relationship must
be statistically significant, there must be no
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
SW388R7
Data Analysis &
Computers II

Slide 30
Request multinomial logistic regression
Select the Regression |
Multinomial Logistic
command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II

Slide 31
Selecting the dependent variable
Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.
First, highlight the
dependent variable
natroad in the list
of variables.
SW388R7
Data Analysis &
Computers II

Slide 32
Selecting metric independent variables
Move the metric
independent variables,
age, educ and conlegis to
the Covariate(s) list box.
Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
In this analysis, there are no non-
metric independent variables. Non-
metric independent variables would be
moved to the Factor(s) list box.
SW388R7
Data Analysis &
Computers II

Slide 33
Specifying statistics to include in the output
While we will accept most of
the SPSS defaults for the
analysis, we need to specifically
request the classification table.

Click on the Statistics button
to make a request.
SW388R7
Data Analysis &
Computers II

Slide 34
Requesting the classification table
First, keep the SPSS
defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the
checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
SW388R7
Data Analysis &
Computers II

Slide 35
Completing the multinomial
logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports
additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
SW388R7
Data Analysis &
Computers II

Slide 36
LEVEL OF MEASUREMENT - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution

Multinomial logistic regression requires that the
dependent variable be non-metric and the
independent variables be metric or dichotomous.

"Opinion about spending on highways and
bridges" [natroad] is ordinal, satisfying the non-
metric level of measurement requirement for the
dependent variable.

It contains three categories: survey respondents
who thought we spend too little money, about
the right amount of money, and too much money
on highways and bridges.
SW388R7
Data Analysis &
Computers II

Slide 37
LEVEL OF MEASUREMENT - 2
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.

Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
"Age" [age] and "highest year of
school completed" [educ] are interval,
satisfying the metric or dichotomous
level of measurement requirement for
independent variables.
"Confidence in Congress" [conlegis] is ordinal,
satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
ordinal level variables as metric variables, the level
of measurement requirement for the analysis is
satisfied. Since some data analysts do not agree
with this convention, a note of caution should be
included in our interpretation.
SW388R7
Data Analysis &
Computers II

Slide 38
Sample size ratio of cases to variables
Case Processing Summary
62 37.1%
93 55.7%
12 7.2%
167 100.0%
103
270
153
a
1
2
3
HIGHWAYS
AND BRIDGES
Vali d
Missing
Total
Subpopul ati on
N
Margi nal
Percentage
The dependent vari abl e has onl y one val ue observed
i n 146 (95.4%) subpopul ati ons.
a.
Multinomial logistic regression requires that the minimum ratio
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (167) to number of independent variables
(3) was 55.7 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.

The preferred ratio of valid cases to independent variables is 20
to 1. The ratio of 55.7 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
SW388R7
Data Analysis &
Computers II

Slide 39
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Model Fitting Information
284.429
265.972 18.457 6 .005
Model
Intercept Onl y
Fi nal
-2 Log
Li kel i hood Chi-Square df Si g.
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
SW388R7
Data Analysis &
Computers II

Slide 40
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
NUMERICAL PROBLEMS
Multicollinearity in the multinomial
logistic regression solution is
detected by examining the standard
errors for the b coefficients. A
standard error larger than 2.0
indicates numerical problems, such
as multicollinearity among the
independent variables, zero cells for
a dummy-coded independent
variable because all of the subjects
have the same value for the
variable, and 'complete separation'
whereby the two groups in the
dependent event variable can be
perfectly separated by scores on
one of the independent variables.
Analyses that indicate numerical
problems should not be interpreted.

None of the independent variables in
this analysis had a standard error
larger than 2.0. (We are not
interested in the standard errors
associated with the intercept.)
SW388R7
Data Analysis &
Computers II

Slide 41
Likelihood Ratio Tests
268.323 2.350 2 .309
268.625 2.652 2 .265
270.395 4.423 2 .110
275.194 9.221 2 .010
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference in -2 l og-l i kel i hoods
between the fi nal model and a reduced model . The reduced model i s
formed by omi tti ng an effect from the fi nal model . The nul l hypothesi s
i s that al l parameters of that effect are 0.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
The statistical significance of the relationship between
confidence in Congress and opinion about spending on
highways and bridges is based on the statistical significance of
the chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".

For this relationship, the probability of the chi-square statistic
(9.221) was 0.010, less than or equal to the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with confidence in Congress were equal
to zero was rejected. The existence of a relationship between
confidence in Congress and opinion about spending on
highways and bridges was supported.
SW388R7
Data Analysis &
Computers II

Slide 42
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
In the comparison of survey respondents who thought we spend
too little money on highways and bridges to survey respondents
who thought we spend too much money on highways and
bridges, the probability of the Wald statistic (4.913) for the
variable confidence in Congress [conlegis] was 0.027. Since the
probability was less than or equal to the level of significance of
0.05, the null hypothesis that the b coefficient for confidence in
Congress was equal to zero for this comparison was rejected.
SW388R7
Data Analysis &
Computers II

Slide 43
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
The value of Exp(B) was 0.253 which implies that for each unit
increase in confidence in Congress the odds decreased by 74.7%
(0.253 - 1.0 = -0.747).

The relationship stated in the problem is supported. Survey
respondents who had less confidence in congress were less likely
to be in the group of survey respondents who thought we spend
too little money on highways and bridges, rather than the group of
survey respondents who thought we spend too much money on
highways and bridges. For each unit increase in confidence in
Congress, the odds of being in the group of survey respondents
who thought we spend too little money on highways and bridges
decreased by 74.7%.
SW388R7
Data Analysis &
Computers II

Slide 44
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
In the comparison of survey respondents who thought we spend
about the right amount of money on highways and bridges to
survey respondents who thought we spend too much money on
highways and bridges, the probability of the Wald statistic
(7.298) for the variable confidence in Congress [conlegis] was
0.007. Since the probability was less than or equal to the level
of significance of 0.05, the null hypothesis that the b coefficient
for confidence in Congress was equal to zero for this comparison
was rejected.
SW388R7
Data Analysis &
Computers II

Slide 45
Parameter Estimates
3.240 2.478 1.709 1 .191
.019 .020 .906 1 .341 1.019 .980 1.061
.071 .108 .427 1 .514 1.073 .868 1.327
-1.373 .620 4.913 1 .027 .253 .075 .853
3.639 2.456 2.195 1 .138
.003 .020 .017 1 .897 1.003 .963 1.043
.172 .110 2.463 1 .117 1.188 .958 1.474
-1.657 .613 7.298 1 .007 .191 .057 .635
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
HIGHWAYS
AND BRIDGES
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
The value of Exp(B) was 0.191 which implies that for each unit increase in
confidence in Congress the odds decreased by 80.9% (0.191-1.0=-0.809).

The relationship stated in the problem is supported. Survey respondents
who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend about the right amount of
money on highways and bridges, rather than the group of survey
respondents who thought we spend too much money on highways and
bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by 80.9%.
SW388R7
Data Analysis &
Computers II

Slide 46
Case Processing Summary
62 37.1%
93 55.7%
12 7.2%
167 100.0%
103
270
153
a
1
2
3
HIGHWAYS
AND BRIDGES
Vali d
Missing
Total
Subpopul ati on
N
Margi nal
Percentage
The dependent vari abl e has onl y one val ue observed
i n 146 (95.4%) subpopul ati ons.
a.
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC
REGRESSION MODEL: BY CHANCE ACCURACY RATE
The proportional by chance accuracy rate was computed by
calculating the proportion of cases for each group based on
the number of cases in each group in the 'Case Processing
Summary', and then squaring and summing the proportion of
cases in each group (0.371 + 0.557 + 0.072 = 0.453).

The independent variables could be characterized as useful
predictors distinguishing survey respondents who thought we
spend too little money on highways and bridges, survey
respondents who thought we spend about the right amount
of money on highways and bridges and survey respondents
who thought we spend too much money on highways and
bridges if the classification accuracy rate was substantially
higher than the accuracy attainable by chance alone.
Operationally, the classification accuracy rate should be 25%
or more higher than the proportional by chance accuracy
rate.
SW388R7
Data Analysis &
Computers II

Slide 47
Classification
15 47 0 24.2%
7 86 0 92.5%
5 7 0 .0%
16.2% 83.8% .0% 60.5%
Observed
1
2
3
Overal l Percentage
1 2 3
Percent
Correct
Predi cted
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC
REGRESSION MODEL: CLASSIFICATION ACCURACY
The classification accuracy rate was 60.5%
which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).

The criteria for classification accuracy is
satisfied.
SW388R7
Data Analysis &
Computers II

Slide 48
Answering the question in problem 1 - 1
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

We found a statistically significant overall
relationship between the combination of
independent variables and the dependent
variable.

There was no evidence of numerical problems in
the solution.

Moreover, the classification accuracy surpassed
the proportional by chance accuracy criteria,
supporting the utility of the model.
SW388R7
Data Analysis &
Computers II

Slide 49
Answering the question in problem 1 - 2
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
We verified that each statement about the relationship
between an independent variable and the dependent
variable was correct in both direction of the relationship
and the change in likelihood associated with a one-unit
change of the independent variable, for both of the
comparisons between groups stated in the problem.
The answer to the question is true
with caution.

A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis &
Computers II

Slide 50
Problem 2
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

SW388R7
Data Analysis &
Computers II

Slide 51
Dissecting problem 2 - 1
1. In the dataset GSS2000, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, outliers, or
influential cases, and that the validation analysis will confirm the generalizability of the
results. Use a level of significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

For these problems, we will
assume that there is no problem
with missing data, outliers, or
influential cases, and that the
validation analysis will confirm
the generalizability of the
results

In this problem, we are told to
use 0.05 as alpha for the
multinomial logistic regression.
SW388R7
Data Analysis &
Computers II

Slide 52
Dissecting problem 2 - 2
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
SPSS only supports direct or
simultaneous entry of independent
variables in multinomial logistic
regression, so we have no choice of
method for entering variables.
The variables listed first in the problem
statement are the independent variables
(IVs): "highest year of school completed"
[educ], "sex" [sex] and "total family
income" [income98].
The variable used to define
groups is the dependent
variable (DV): "opinion about
spending on space
exploration" [natspac].
SW388R7
Data Analysis &
Computers II

Slide 53
Dissecting problem 2 - 3
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration
from survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
SPSS multinomial logistic regression models the relationship
by comparing each of the groups defined by the dependent
variable to the group with the highest code value.

The responses to opinion about spending on the space
program were:
1= Too little, 2 = About right, and 3 = Too much.
The analysis will result in two comparisons:
survey respondents who thought we spend too little money
versus survey respondents who thought we spend too much
money on space exploration
survey respondents who thought we spend about the right
amount of money versus survey respondents who thought we
spend too much money on space exploration.
SW388R7
Data Analysis &
Computers II

Slide 54
Dissecting problem 2 - 4
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of
survey respondents who thought we spend about the right amount of money on space
exploration, rather than the group of survey respondents who thought we spend too much
money on space exploration. For each unit increase in total family income, the odds of
being in the group of survey respondents who thought we spend about the right amount of
money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
Each problem includes a statement about the
relationship between one independent variable and
the dependent variable. The answer to the problem
is based on the stated relationship, ignoring the
relationships between the other independent
variables and the dependent variable.
This problem identifies a difference for only one
of the two comparisons based on the three values
of the dependent variable.

Other problems will specify both of the possible
comparisons.
SW388R7
Data Analysis &
Computers II

Slide 55
Dissecting problem 2 - 5
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
In order for the multinomial logistic regression
question to be true, the overall relationship must
be statistically significant, there must be no
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
SW388R7
Data Analysis &
Computers II

Slide 56
LEVEL OF MEASUREMENT - 1
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic
Multinomial logistic regression requires that the
dependent variable be non-metric and the
independent variables be metric or dichotomous.

"Opinion about spending on space exploration"
[natspac] is ordinal, satisfying the non-metric
level of measurement requirement for the
dependent variable.

It contains three categories: survey respondents
who thought we spend too little money, about
the right amount of money, and too much money
on space exploration.
SW388R7
Data Analysis &
Computers II

Slide 57
LEVEL OF MEASUREMENT - 2
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend about the right amount of money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.

1. True
2. True with caution
"Sex" [sex] is dichotomous,
satisfying the metric or
dichotomous level of measurement
requirement for independent
variables.
"Highest year of school
completed" [educ] is interval,
satisfying the metric or
dichotomous level of
measurement requirement for
independent variables.
"Total family income" [income98] is ordinal,
satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
ordinal level variables as metric variables, the level
of measurement requirement for the analysis is
satisfied. Since some data analysts do not agree
with this convention, a note of caution should be
included in our interpretation.
SW388R7
Data Analysis &
Computers II

Slide 58
Request multinomial logistic regression
Select the Regression |
Multinomial Logistic
command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II

Slide 59
Selecting the dependent variable
Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.
First, highlight the
dependent variable
natspac in the list
of variables.
SW388R7
Data Analysis &
Computers II

Slide 60
Selecting non-metric independent variables
Move the non-metric
independent variables
listed in the problem to
the Factor(s) list box.
Select the
dichotomous
variable sex.
Non-metric independent variables are specified as
factors in multinomial logistic regression. Non-metric
variables can be either dichotomous, nominal, or ordinal.

These variables will be dummy coded as needed and
each value will be listed separately in the output.
SW388R7
Data Analysis &
Computers II

Slide 61
Selecting metric independent variables
Move the metric
independent variables,
educ and income98, to
the Covariate(s) list box.
Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
SW388R7
Data Analysis &
Computers II

Slide 62
Specifying statistics to include in the output
While we will accept most of
the SPSS defaults for the
analysis, we need to specifically
request the classification table.

Click on the Statistics button
to make a request.
SW388R7
Data Analysis &
Computers II

Slide 63
Requesting the classification table
First, keep the SPSS
defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the
checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
SW388R7
Data Analysis &
Computers II

Slide 64
Completing the multinomial
logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports
additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
SW388R7
Data Analysis &
Computers II

Slide 65
Case Processing Summary
33 15.9%
90 43.3%
85 40.9%
94 45.2%
114 54.8%
208 100.0%
62
270
138
a
1
2
3
SPACE EXPLORATION
PROGRAM
1
2
RESPONDENTS SEX
Val id
Missing
Total
Subpopul ati on
N
Margi nal
Percentage
The dependent vari able has only one val ue observed i n 112
(81.2%) subpopul ations.
a.
Sample size ratio of cases to variables
Multinomial logistic regression requires that the minimum ratio
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (208) to number of independent variables(
3) was 69.3 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.

The preferred ratio of valid cases to independent variables is 20
to 1. The ratio of 69.3 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
SW388R7
Data Analysis &
Computers II

Slide 66
Model Fitting Information
354.268
334.967 19.301 6 .004
Model
Intercept Onl y
Fi nal
-2 Log
Li kel i hood Chi-Square df Si g.
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".

In this analysis, the probability of the model chi-square
(19.301) was 0.004, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
SW388R7
Data Analysis &
Computers II

Slide 67
Parameter Estimates
-4.136 1.157 12.779 1 .000
.101 .089 1.276 1 .259 1.106 .929 1.317
.097 .050 3.701 1 .054 1.102 .998 1.216
.672 .426 2.488 1 .115 1.959 .850 4.515
0
b
. . 0 . . . .
-2.487 .840 8.774 1 .003
.108 .068 2.521 1 .112 1.114 .975 1.273
.058 .034 2.932 1 .087 1.060 .992 1.133
.501 .317 2.492 1 .114 1.650 .886 3.072
0
b
. . 0 . . . .
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
SPACE EXPLORATION
PROGRAM
a
1
2
B Std. Error Wal d df Si g. Exp(B) Lower Bound Upper Bound
95% Confi dence Interval for
Exp(B)
The reference category i s: 3.
a.
This parameter i s set to zero because it i s redundant.
b.
NUMERICAL PROBLEMS
Multicollinearity in the multinomial
logistic regression solution is
detected by examining the
standard errors for the b
coefficients. A standard error
larger than 2.0 indicates numerical
problems, such as multicollinearity
among the independent variables,
zero cells for a dummy-coded
independent variable because all of
the subjects have the same value
for the variable, and 'complete
separation' whereby the two
groups in the dependent event
variable can be perfectly separated
by scores on one of the
independent variables. Analyses
that indicate numerical problems
should not be interpreted.

None of the independent variables
in this analysis had a standard
error larger than 2.0.
SW388R7
Data Analysis &
Computers II

Slide 68
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests
334.967
a
.000 0 .
337.788 2.821 2 .244
340.154 5.187 2 .075
338.511 3.544 2 .170
Effect
Intercept
EDUC
INCOME98
SEX
-2 Log
Li kel i hood of
Reduced
Model Chi-Square df Si g.
The chi -square stati sti c i s the di fference i n -2 log-li kel i hoods
between the fi nal model and a reduced model. The reduced model
i s formed by omi tting an effect from the final model . The nul l
hypothesi s is that al l parameters of that effect are 0.
This reduced model i s equi val ent to the fi nal model because
omi tti ng the effect does not i ncrease the degrees of freedom.
a.
The statistical significance of the relationship between
total family income and opinion about spending on space
exploration is based on the statistical significance of the
chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".

For this relationship, the probability of the chi-square
statistic (5.187) was 0.075, greater than the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with total family income were
equal to zero was not rejected. The existence of a
relationship between total family income and opinion
about spending on space exploration was not supported.
SW388R7
Data Analysis &
Computers II

Slide 69
Answering the question in problem 2
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.

The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.

Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.

1. True
2. True with caution
3. False
4. Inappropriate application of a statistic

We found a statistically significant overall
relationship between the combination of
independent variables and the dependent
variable.

There was no evidence of numerical problems in
the solution.

However, the individual relationship between
total family income and spending on space was
not statistically significant.

The answer to the question is false.
SW388R7
Data Analysis &
Computers II

Slide 70
Steps in multinomial logistic regression:
level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in multinomial logistic
regression:
Inappropriate
application of
a statistic
Yes
No
Dependent non-metric?
Independent variables
metric or dichotomous?
Yes
Ratio of cases to
independent variables at
least 10 to 1?
Yes
No
Inappropriate
application of
a statistic

Run multinomial logistic regression

SW388R7
Data Analysis &
Computers II

Slide 71
Steps in multinomial logistic regression:
overall relationship and numerical problems
Yes
Yes
Standard errors of
coefficients indicate no
numerical problems (s.e.
<= 2.0)?
No
False
Overall relationship
statistically significant?
(model chi-square test)
No
False
SW388R7
Data Analysis &
Computers II

Slide 72
Steps in multinomial logistic regression:
relationships between IV's and DV
Overall relationship
between specific IV and DV
is statistically significant?
(likelihood ratio test)
Yes
Role of specific IV and DV
groups statistically
significant and interpreted
correctly?
(Wald test and Exp(B))
No
Yes
False
No
False
SW388R7
Data Analysis &
Computers II

Slide 73
Steps in multinomial logistic regression:
classification accuracy and adding cautions
Yes
Overall accuracy rate is
25% > than proportional
by chance accuracy rate?
Yes
No
False
One or more IV's are
ordinal level treated as
metric?

No
Yes
True
Satisfies preferred ratio of
cases to IV's of 20 to 1

No
Yes Yes
True with caution
True with caution

Problem Statement - Stock Market News Sentiment Analysis and Summarization_ Introduction to Natural Language Processing - Great Learning
0% (1)
Problem Statement - Stock Market News Sentiment Analysis and Summarization_ Introduction to Natural Language Processing - Great Learning
2 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
2020 Testing by Betting A Strategy For Statistical and Scientific Communication
No ratings yet
2020 Testing by Betting A Strategy For Statistical and Scientific Communication
30 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Probability Distributions in Data Science - Towards Data Science
No ratings yet
Probability Distributions in Data Science - Towards Data Science
15 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
DATA SCIENCE INTERVIEW
No ratings yet
DATA SCIENCE INTERVIEW
32 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
100% (1)
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
51 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Survival Competing Risk
No ratings yet
Survival Competing Risk
29 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Time Series
No ratings yet
Time Series
23 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
ISyE 6669 Homework 15 PDF
No ratings yet
ISyE 6669 Homework 15 PDF
3 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Priors Algorithms Bayesian
No ratings yet
Priors Algorithms Bayesian
108 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
Fox and Weisberg Logistic Regression
100% (1)
Fox and Weisberg Logistic Regression
4 pages
Notes PDF
No ratings yet
Notes PDF
407 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Session 3 - Logistic Regression
50% (2)
Session 3 - Logistic Regression
28 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Data Science Interview Preparation 7
No ratings yet
Data Science Interview Preparation 7
10 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
MiniTab Introduction
100% (1)
MiniTab Introduction
124 pages
Simple Linear Regression - Assign2
No ratings yet
Simple Linear Regression - Assign2
9 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
BS4S14 Assessment 2 Brief (Reflection)
No ratings yet
BS4S14 Assessment 2 Brief (Reflection)
3 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
Data Analysis Formula Sheet Tables (DADM)
No ratings yet
Data Analysis Formula Sheet Tables (DADM)
8 pages
Regression
100% (1)
Regression
20 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Tax in Watershed
No ratings yet
Tax in Watershed
2 pages
Logistic Regression Analysis
100% (4)
Logistic Regression Analysis
65 pages
Elasticity of Demand 4
No ratings yet
Elasticity of Demand 4
3 pages
A K A: A Review of The Models Used To Measure Intellectual Capital
No ratings yet
A K A: A Review of The Models Used To Measure Intellectual Capital
25 pages