Discriminant Analysis

Discriminant Analysis
Discriminant Analysis With 2-Groups Comparison of 2-Group Discriminant Analysis With Logistical Regression Discriminant Analysis With More Than 2-Groups
Notes on Discriminant Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
2 KEY CONCEPTS ***** Discriminant Analysis
Discriminant function A priori categories or groups Homogeneity of variance/covariance matrices Differences between discriminant analysis and logistical regression Partitioning of sums of squares in discriminant analysis TSS = BSS = WSS Discriminant score Discriminant weight or coefficient Discriminant constant Discriminant analysis assumptions Steps in the discriminant analysis process Box's M test and its null hypothesis Wilks' lambda Stepwise method in discriminant analysis Pin and Pout criteria F-test to determine the effect of adding or deleting a variable from the model Unstandardized and standardized discriminant weights Measures of goodness-of-fit Eigenvalue Canonical correlation Model Wilks' lambda Classification table and hit ratio t-test for a hit ratio Maximum chance criteria Proportional chance criteria Press's Q statistic Histogram of discriminant scores Casewise plot of the predictions Calculation of the cutting score: equal and unequal groups Prior probability Conditional probability Bayes' theorem and posterior probability Structure coefficient or discriminant loading Group centroid Testing the collinearity of the predictor variables Assumptions about multiple discriminant functions Number: g-1 or k whichever is less Functions may be collinear Discriminant scores must be independent KEYCONCEPTS(cont.)
Notes on Discriminat Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
3 Interpretation of multiple discriminant functions Territorial map Scatterplot of the discriminant scores across the discriminant functions
Lecture Outline
What is discriminant analysis The concept of partitioning sums of squares Discriminant assumptions Stepwise discriminant analysis with Wilks' lambda Testing the goodness-of-fit of the model Determining the significance of the predictor variables A 2-group discriminant problem A multi-group discriminant problem
Discriminant Analysis
Z = a + W1X1 + W2X2 + ... + WkXk Dependency Technique Dependent variable is nonmetric Independent variables can be metric and/or nonmetric Used to predict or explain a nonmetric dependent variable with two or more a priori categories Assumptions Xk are multivariate normally distributed Homogeneity of variance-covariance matrices of Xk across groups Xk are independent, non-collinear The relationship is linear Absence of outliers
Predicting a Nonmetric Variable

Two approaches Logistical Regression with a dummy coded DV Limited to a binary nonmetric dependent variable Makes relatively few restrictive assumptions Discriminant Analysis with a nonmetric dependent variable with 2 or more groups Not limited to a binary nonmetric dependent variable Makes several restrictive assumptions
Partitioning Sums of Squares (SS) in Discriminant Analysis

In Linear Regression The Total SS ( Y- Y) 2 is partitioned into Regression SS ( Y'- Y) 2 Residual SS + ( Y'- Y) 2 ( Y- Y) 2 = ( Y'- Y) 2 + ( Y'- Y) 2 Goal Estimate parameters that minimize the Residual SS
8 PartitioningSumsof Squares(SS) in DiscriminantAnalysis(cont.)
In Discriminant Analysis The Total SS ( Zi- Z) 2 is partitioned into: Between Group SS ( Zj- Z) 2 Within Groups SS ( Zij- Zj) 2 ( Zi- Z) 2 = ( Zj- Z) 2 + ( Zij- Zj) 2 i = an individual case, j = group j Zi = individual discriminant score Z = grand mean of the discriminant scores Zj = mean discriminant score for group j Goal Estimate parameters that minimize the Within Group SS
Elements of the Discriminant Model
Z = a + W1X1 + W2X2 + ... + WkXk
Z = discriminant score, a number used to predict group membership of a case a = discriminant constant Wk = discriminant weight or coefficient, a measure of the extent to which variable Xk discriminates among the groups of the DV Xk = an IV or predictor variable. Can be metric or nonmetric. Discriminant analysis uses OLS to estimate the values of the parameters (a) and Wk that minimize the Within Group SS
10
An Example of Discriminant Analysis with a Binary Dependent Variable
Predicting whether a felony offender will receive a probated or prison sentence as a function of various background factors. Dependent Variable Type of sentence (type_sent) (0 = probation, 1 = prison) Independent Variables Degree of drug dependency (dr_score) Age at first arrest (age_firs) Level of work skill (skl_index) The seriousness of the crime (ser_indx)
11
Discriminant Analysis with Two Groups

Z0
f of Z
Z1
Probation (0)
Prison (1)
Between SS = (Z0 - Z)2 + (Z1 - Z)2 = (Zj - Z)2 = BSS Within SS = (Zi0 - Z0)2 + (Zi1 - Z1)2 = (Zij - Zj)2 =WSS Total SS = (Zi - Z)2 = TSS Z0 and Z1 are called centroids, the mean discriminant score for each group
12
Discriminant Analysis Assumptions
The predictor variables are multivariate normal, ipso facto univariate normal The variance-covariance matrices of the predictor variables across the various groups are the same in the population, i.e. homogeneous The groups defined by the DV exist a priori The predictor variables are noncollinear The relationship is linear in its parameters Absence of leverage point outliers The sample is large enough, say 30 cases for each predictor variable
13
Steps in Discriminant Analysis Process
Specify the dependent & the predictor variables Test the models assumptions a priori Determine the method for selection and criteria for entering the predictor variables into the model Estimate the parameters of the model Determine the goodness-of-fit of the model and examine the residuals Determine the significance of the predictors Test the assumptions ex post facto Validate the results
14
The Sentence-Type Discriminant Model

Specification of the model (N = 70)
Z = a + W1(dr_score) + W2(age_firs) + W3(skl_indx)... + W4(ser_indx)
Are the predictor variables multivariate normally distributed?
12
10
0 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
DR_SCORE
15 The SentenceTypeDiscriminantModel(cont.)
20
10
0 0.0 2.0 4.0 6.0 8.0 10.0
SKL_INDX
16 14 12 10 8 6 4 2 0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0
AGE_FIRS
16
The SentenceTypeDiscriminantModel(cont.)
14
12
10
2 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
SER_INDX
17
Variable Dr-score Age_firs Skl_indx Ser_indx
Skew -0.5049 +0.7728 -0.0266 +0.2197
Kurtosis -0.6946 -0.4250 -1.2321 -1.1727
Are the Variance/Covariance Matrices of the Two Groups Homogeneous?
Covariance Matrices TYPE_SEN .00 DR_SCORE AGE_FIRS SKL_INDX SER_INDX DR_SCORE AGE_FIRS SKL_INDX SER_INDX DR_SCORE 7.590 -1.945 1.932 2.110 6.466 -2.688 -.648 1.474 AGE_FIRS -1.945 5.553 -.329 -1.225 -2.688 4.500 .469 -2.219 SKL_INDX 1.932 -.329 8.632 -.857 -.648 .469 7.922 -.507 SER_INDX 2.110 -1.225 -.857 3.441 1.474 -2.219 -.507 3.405
1.00
The variances are on the diagonals, and the covariances are on the off-diagonals.
18
Q Are the variance/covariance matrices of the two

groups the same homogeneous? in the population, i.e.
19 Are the Variance/CovarianceMatricesof the TwoGroupsHomogeneous?(cont.)
Box's M test H0: the variance/covariance matrices of the two groups are the same in the population.
Log Determinants Log Determinant 3.076 2.988 3.040
TYPE_SEN .00 1.00 Pooled within-groups
Rank
2 2 2
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results Box's M F Approx. df1 df2 Sig. .361 .116 3 1476249 .951
Tests null hypothesis of equal population covariance matrices.
Box's M = 0.361, Approximate F = 0.116, p = 0.951 Conclusion: The null hypothesis with respect to the homogeneity of variance/covariance matrices in the population is accepted.
20
How Will the Predictor Variables Be Entered into the Discriminant Model?
SPSS offers two methods for building a discriminant model Entering all the variables simultaneously Stepwise method In this example, the variables will be entered in a stepwise fashion using Wilks' lambda criterion
Q What is Wilks' lambda ()?

For a given predictor variable, is the ratio of the WSS to the TSS ( = WSS / TSS) It is derived from a one-way ANOVA with type_sent as the IV and the predictor variable as the DV
21 HowWill the PredictorVariablesbe Enteredinto the DiscriminantModel?(cont.)
= (WSS / TSS) = (Zij - Zj)2 / (Zi - Z)2

Step 1: Compute four one-way ANOVAs with type_sent as the IV and each of the four predictor variables as the DVs
Variables in the Analysis Sig. of F to Remove .000 .000 .019 Wilks' Lambda .983 .832
Step 1 2
SER_INDX SER_INDX DR_SCORE
Tolerance 1.000 .864 .864
Step 2: Identify the predictor variable that has the lowest significant Wilks' lambda () and enter it into the discriminant model, i.e. ser_indx. (Pin default = 0.05)
Step 3: Estimate the parameters of the resulting discriminant model
22 HowWill the PredictorVariablesbe Enteredinto the DiscriminantModel?(cont.)
Step 4: Of the variables not in the model, select the predictor that has the lowest significant and enter it into the model. Determine if the addition of the variable was significant. Now check if the predictor(s) previously entered are still significant. (Pout default = 0.10) Step 5: Repeat Step 4 until all the predictor variables are entered into the model or until none the variables outside the model have significant 's.
23
How Is the Significance of Change Determined When a Variable is Entered Into the Discriminant Function?
Use an F-ratio comparing the Wilks' lambda of the model with the greater number of predictors (k) with the one with the lesser number of predictors (k-1)
F=
1 - ( k-1) / (k) ( k-1) / (k)
(N - g - 1) (g - 1)
= WSS / TSS of the function N = total sample size g = number of DV groups df = (N - g - 1) and (g - 1)
24
Estimation of the Parameters of the Model

Z = a + W1(dr_score) + W2(age_firs) + W3(skl_indx)... + W4(ser_indx)
What values of the constant (a) and the discriminant coefficients Wk best predict whether a case will receive a probated or a prison sentence? After variable selection by a stepwise process using Wilks' , the best equation was found to be
Canonical Discriminant Function Coefficients Function 1 -.235 .564 -.706
DR_SCORE SER_INDX (Constant)
Unstandardized coefficients
Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
25 Estimationof the Parametersof the Model(cont.)
Prediction of a case Take a case with dr_score = 9, ser_indx = 1, and an actual sentence = 0, (i.e. a probated case) Z = -0.706 - 0.235 (9) + 0.564 (1) = -2.25 Since -2.25 is closer to the code 0 than the code 1, the case would be predicted a probated case, i.e. code 0.
26
How Can the Goodness-of-Fit of the Model Be Measured?

Eigenvalues () The Canonical Correlation eta () Wilks' Lambda () Classification Table Hit Ratio t-test of the Hit Ratio Maximum Chance Criteria Proportional Chance Criteria Presss Q Statistic Casewise Plot of the Predictions
27
What is an Eigenvalue?
In matrix algebra, an eigenvalue is a constant, which if subtracted from the diagonal elements of a matrix, results in a new matrix whose determinant equals zero. An example
Given the matrix: 4 A= 2 5 1
(4 - x) A = 2
1 = 0.0 (5 - x)
Calculating the determinant of the matrix A: ( 4 - x) (5 - x) - (2) (1) = 0.0 (20 - 4x - 5x + x2 - 2) = 0.0 (18 - 9x + x2) = 0.0 (x2 - 9x + 18) = 0.0 This quadratic equation has two solutions or eigenvalues: + 6 and + 3
28
What Is an Eigenvalue in Discriminant Analysis?

In the present example, the DV is composed of two groups; i.e. cases either sentenced to probation or prison. When there are two groups, one discriminant function can be extracted from the data and its associated eigenvalue is as follows
= BSS / WSS = [ ( Zj- Z)2 / ( Zij- Zj)2 ] Interpretation If = 0.00, the model has no discriminatory power, BSS = 0.0 The larger the value of , the greater the discriminatory power of the model
29
What is the Eigenvalue for the Sentence-Type Example?
Eigenvalues Canonical Correlation .483
Function 1
Eigenvalue % of Variance .305a 100.0
Cumulative % 100.0
a. First 1 canonical discriminant functions were used in the analysis.
The eigenvalue of the discriminant function = 0.305 The % of the variance explained that is explained by this discriminant function = 100%* The cumulative percentage of the variance explained by the 1st discriminant function = 100%*
* With two DV groups, only one discriminant function can be extracted, which will therefore explain all the variance explained by the model. But with three groups, two functions can be extracted, with g groups, (g - 1) functions can be extracted, or k functions if k is less than g. Therefore, a different % of the total variance explained will be explained by each of the successive functions extracted.
30
How Can You Tell if the Eigenvalue Is Significant?

Two useful statistical indicators can be derived from the eigenvalue The canonical correlation eta () Wilks' lambda () for the model The canonical correlation () = / (1 + ) = BSS / TSS
= the correlation of the predictor(s) with the discriminant scores produced by the model 2 = coefficient of determination 1 - 2 = coefficient of non-determination For the sentence-type example = 0.3050 / (1 + 0.3050) = 0.483
31 HowCanYouTell if the Eigenvalueis Significant?(cont.)
The Wilks' for the discriminant model

= (1 - 2) = [ 1 / (1 + ) ] = WSS / TSS is chi-square distributed for df = (k - 1), k equal to the number of parameters estimated *** For the sentence-type example = 0.3050 = [ 1 / (1 + 0.3050) ] = 0.766, which can be converted to a chi-square statistic *** 2 = 17.837, df = 2, p = 0.0001 H0: In the population Z0 = Z1 = Z Since the chi-square results are significant H0 is rejected and it is concluded the differences in the mean discriminant scores of the two groups are greater than could be attributed to sampling error.
*** 2 = - [(n - 1) - 0.5 (m + k + 1)] ln , df= (k - 1), m=number of discriminant function extracted, k=number of predictor variables
32 HowCanYouTell if the Eigenvalueis Significant?(cont.)

Eigenvalues Function 1 Eigenvalue % of Variance .305a 100.0 Cumulative % 100.0 Canonical Correlation .483
eigenvalue canonical correlation

Wilks' Lambda Test of Function(s) 1 Wilks' Lambda .766 Chi-square 17.837 df 2 Sig. .000
Wilks'
Chi-Square
33
How Well Does the Model Predict?

a Classification Results
Original
Count %
TYPE_SEN .00 1.00 .00 1.00
Predicted Group Membership .00 1.00 27 10 14 19 73.0 27.0 42.4 57.6
Total
37 33 100.0 100.0
a. 65.7% of original grouped cases correctly classified.
Overall results Overall hit ratio = 65.7% Correctly classified probationers = 73.0% Correctly classified prisoners = 57.6%
34
Does the Model Predict Any Better Than Chance?

Maximum chance criterion (MCC) Predict that all 70 cases are in the group with the largest number of cases. MCC = (nL / NL) (100) nL = number of subjects in the larger of the two groups NL = total number of subjects in to combined groups For the sentence-type example Probation group, n = 37 Prison group, n = 33 If all the cases were predicted to receive probation MCC = (37 / 70) (100) = 52.86% correct by chance
35 Doesthe ModelPredictAnyBetterThanChance?(cont.)
Proportional chance criterion (Cpro) Randomly classify the cases proportionate to the number of cases in either group.
Cpro =p2 + (1 - p)2 p = proportion of subjects in one group (1 - p) = proportion of cases in the other group Proportion of probationers = (37 / 70) = 0.5286 Proportion of prisoners = (33 / 70) = 0.4714 Cpro = 0.5286 2 + (1 - 0.4714)2 = 0.5588 or a hit ratio of 55.88% Comparison of hit ratios The model MCC Cpro 65.71% 52.86% 55.88%
36
Is the Hit Ratio of the Model Significantly Better Than Chance?

By the maximum chance criterion (MCC), one could guess 52.86% of the cases correctly. The model hit 65.71% correctly. Is this significantly better than chance? Two ways to test whether the model hit ratio is significantly better than chance t-test for groups of equal size Press's Q statistic, groups can be of unequal size t-test for a model with equal size groups
H0: the model hit ratio is no better than chance t = (P - 0.5) / (0.5) (1 - 0.5) / N
P = the proportion the model predicted correctly
df = (N - 2)
37 Is the Hit Ratioof the ModelSignificantlyBetterThanChance?(cont.)
Press's Q statistic Q = [ N - (n) (g) ] 2 / [ N * (g - 1)] N = total number of subjects n = number of cases correctly classified g = number of groups Q is chi-square distributed for df = 1 For the sentence-type example Q = [ 70 - (46) (2) ] 2 / [ 70 - (2 - 1)] = 7.0145 p < 0.01 Decision The null hypothesis that the model hit ratio is no better than chance is rejected
38
How Can a Cutting Score Be Established to Sort the Cases Into Either Group Based on Their Discriminant Scores?
When n0 = n1 Zcutting = (Z0 + Z1) / 2
(Zj = mean discriminant score for group j)
When n0 n1 Zcutting = (n0 Z0 + n1 Z1) / 2
For the sentencing-type study Z0 = -0.5141 and Z1 = +0.5764 Zcutting = [ (37) (-0.5141 )+ (33) (+0.5764) ] / 2 Zcutting = -0.00025, or slightly less than 0.0
39
Canthe Predictionsof the ModelBe Graphed?(cont.)
Box-Whisker plot of the distributions of discriminant scores for probation and prison cases with the cutting score set at -0.00025
-1
-2
-3
N= 37 33
.0
1.0
Type of Sentence (probation = 0, prison = 1)
Cutting score (-0.00025)
40
What is the Best Way to See the Predictions Made on Individual Cases
Casewise Statistics Discrimin ant Scores
Highest Group Squared Mahalanobis Distance to Centroid 3.040 2.000 3.914 1.621 1.390 .144 3.040 .722 .225 1.390
Second Highest Group
Case Number Actual Group Original 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 1 **. Misclassified case
Predicted Group 0 0 0 0 0 0 0 0 0 0**
P(D>d | G=g) p df .081 .157 .048 .203 .238 .704 .081 .395 .635 .238
1 1 1 1 1 1 1 1 1 1
P(G=g | D=d) .932 .905 .946 .891 .880 .755 .932 .837 .773 .880
Group 1 1 1 1 1 1 1 1 1 1
P(G=g | D=d) .068 .095 .054 .109 .120 .245 .068 .163 .227 .120
Squared Mahalanobis Distance to Centroid Function 1 8.031 -2.258 6.273 -1.928 9.418 -2.493 5.588 -1.787 5.151 -1.693 2.162 -.894 8.031 -2.258 3.765 -1.364 2.448 -.988 5.151 -1.693
Discriminant scores the column on the extreme right hand side of the table For case 1, discriminant score = -2.2575
41
How Does SPSS Classify Cases?

In SPSS, case classification is accomplished by calculating the probability of a case being in one group or the other (i.e. probation or prison), rather than by simply using a cutting score. This is accomplished by calculating the posterior probability of group membership using Bayes' Theorem
P (Gi D) = [ P (D Gi) P (Gi) ] / [ P(D Gi) P (Gi) ]
D = the discriminant score (i.e. Z) P (Gi D) = posterior probability that a case is in group i, given that it has a specific discriminant score D P (D Gi) = conditional probability that a case has a discriminant score of D, given that it is in group i P (Gi) = prior probability that a case is in group i, which would be equal to (ni / N)
42 HowDoesSPSSClassifyCases?(cont.)
The Bayesian probabilities associated with being in either group are calculated, and the greater of the two probabilities is used to classify the case. Example: Case 1 Posterior probability of being in the probation group P (Gprobation D) = 0.932
Posterior probability of being in the prison group P (Gprison D) = 0.068 Since P (Gprobation D) > P (Gprison D), the case is classified as a probation case. (0.932 > 0.068) The column labeled "Actual Group" shows the group the case actually belongs to. If the Bayesian probability misclassifies the case, the case is marked with two asterisks (**). These are the errors produced by the model, which can also be seen in the classification table in a previous exhibit.
43
Is Each Predictor Variable in the Model Significant?

The significance of the individual predictors variables is accomplished by conducting a one-way MANOVA
With the grouping variable as the IV and The discriminant predictors as the DVs. The MANOVA sums of squares are then used to calculate Wilks' lambda () for each predictor
= WSS / TSS
The results of the final stepwise discriminant model
Variables in the Analysis Step 1 2 Tolerance 1.000 .864 .864 Sig. of F to Remove .000 .000 .019 Wilks' Lambda .983 .832
SER_INDX SER_INDX DR_SCORE
44
Is EachPredictorVariablein the ModelSignificant?(cont.)
For dr_score (F to remove: p = 0.0195)

= WSS / TSS = (232.862) / (232.862 + 47.081) = 0.8318
For ser_indx (F to remove: p < 0.0001)

= WSS / TSS = (480.152) / (480.152 + 8.433) = 0.9827
The Null Hypothesis

H0: the discriminant coefficients in the population are equal to zero = WSS / TSS = 1.0, i.e. the WSS = TSS, & BSS = 0 Decision The null hypothesis is rejected since both Wilks' lambdas () associated with the two predictors are significant
This finding should not be surprising since

The stepwise processes guarantees that only significant variables will be entered into the model, and That all variables in the model are checked to assure that they remain significant as new variables are added.
45
How are the Discriminant Coefficients Interpreted

The Final Stepwise Model
For dr-score
As dr_score increases by one unit, the discriminant score Z decreases by 0.235 Holding the seriousness of the offence (ser_indx) constant, the more drug dependent the defendant, the more likely he/she will be granted probation (code = 0)
For ser_indx
As ser_indx increases by one unit, the discriminant score Z increases by 0.564 Holding the drug dependency (dr_score) constant, the more serious the offence, the more likely the defendant will be sent to prison (code = 1)
46
How Can the Relative Impact on the DV of the Different Predictor Variables be Compared?
Two ways Compare the standardized discriminant weights, i.e. coefficients Compare the structure coefficients, also called the discriminant loadings
Standardized discriminant coefficient (Ck) The relative difference among the discriminant coefficients can not be compared If the predictors variables are in different units of measurement. The discriminant coefficients must first be converted to standardized coefficients (Ck)
47 HowCanthe RelativeImpacton the DV of the DifferentPredictorVariablesbe Compared?(cont.)
Zz = C1ZX1 + C2ZX2 + + CkZXk
Ck = Wk
(Xk - Xk)2 / (N - g)
Wk = the unstandardized discriminant coefficient of variable k (Xk - Xk)2 = SS of the predictor variable N = total sample size g = number of DV groups
Examples: (dr_score) and (ser_indx)

Cdr_score = - 0.235 C ser_indx = + 0.5643 495.67/ (70 - 2) = - 0.6345 232.857/ (70 - 2) = +1.044
Since +1.044 is greater in absolute value than -0.6345, ser_indx has greater discriminatory impact than dr_score.
48 HowCanthe RelativeImpacton the DV of the DifferentPredictorVariablesbe Compared?(cont.)
Unstandardaized discriminant coefficients (Wk)

Canonical Discriminant Function Coefficients Function 1 -.235 .564 -.706
DR_SCORE SER_INDX (Constant)
Standardized discriminant coefficients (Ck)
Standardized Canonical Discriminant Function Coefficients Function 1 -.625 1.044
DR_SCORE SER_INDX
Notice that there is no constant (a) in a standardized discriminant function equation since the mean of a standardized variable equals zero.
49
What is a Structure Coefficient?

A structure coefficient, or discriminant loading, is the correlation between a predictor variable and the discriminant scores produced by the discriminant function.
The higher the absolute value of the coefficient, the greater the discriminatory impact of the predictor variable on the DV.
Structure coefficients of the predictors

Structure Matrix Function 1 .814 -.240 -.194 -.185
SER_INDX DR_SCORE a SKL_INDX a AGE_FIRS
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. a. This variable not used in the analysis.
Ser_index has the highest correlation with the discriminant scores, followed by dr_score, skl_indx and age_firs. The algebraic sign () indicates the direction of the relationship.
50
On Average, How Well Did the Discriminant Function Divide the Two Groups?
Group Centroids One way to determine the degree of separation between the two groups is to compute the mean discriminant score for either group. These means are called the group centroids
Probation (0) and prison (1) group centroids
Functions at Group Centroids Function 1 -.514 .576
TYPE_SEN .00 1.00
Unstandardized canonical discriminant functions evaluated at group means
51
Are the Predictor Variables Independent, Noncollinear?

Discriminant analysis assumes that the predictor variables are independent or noncollinear. This problem is partially addressed by using a stepwise procedure to enter the variables into the equation, since
The collinearity among the variables is considered in the process and the resulting discriminant coefficients are partial coefficients.
Q Are
dr_score correlated?
and
ser_indx
significantly
Not withstanding the stepwise process, the final two predictors are significantly correlated. (r = 0.2791, p = 0.019).
52
Would the Same Results be Achieved Using Logistical Regression?

Yes virtually the same results. The functions
Discriminant function
Logistical regression equation

Prob event = 1 / (1 + e - ( -0.8011 - 0.272dr_score + 0.611 ser_indx ) )
Hit ratios
Discriminant analysis Logistical Regression 65.71% 65.71%
Significance of predictor variables

dr_score Discriminant function Logistical regression p = .0001 p < .0001 ser_indx p = .0004 p < .0001
53
Results of the Logistical Analysis of Type of Sentence
---------------------- Variables in the Equation ----------------------Variable DR_SCORE SER_INDX Constant B -.2720 .6111 -.8011 S.E. .1183 .1716 .7311 Wald 5.2810 12.6776 1.2006 df 1 1 1 Sig .0216 .0004 .2732 R -.1841 .3321 Exp(B) .7619 1.8424
--------------- Variables not in the Equation ----------------Residual Chi Square 1.360 with 2 df Sig = .5067 Variable AGE_FIRS SKL_INDX Score 1.0003 .3970 df 1 1 Sig .3172 .5287 R .0000 .0000
-2 Log Likelihood Goodness of Fit
78.474 67.926 Chi-Square df Significance 2 1 .0001 .0149
Model Chi-Square Improvement
18.338 5.931
Classification Table for TYPE_SEN Predicted .0 1.0 Percent Correct 0 | 1 Observed +-------+-------+ .0 0 | 27 | 10 | 72.97% +-------+-------+ 1.0 1 | 14 | 19 | 57.58% +-------+-------+ Overall 65.71%
54
What if the Dependent Variable Has More Than Two Groups?

Example Dependant variable Pre-disposition status (jail, bail, or ROR) Independent variables Age of first arrest (age_firs) Age at time of arrest (age) Degree of drug dependency (dr_score) Number of prior arrests (pr_arrst) Type of counsel (counsel 0 = court appointed, 1 = retained)
55
One Versus Multiple Discriminant Functions

When the dependant variable has two groups One discriminant function can be extracted from the data
When there are three groups Two functions can be extracted from the data
When there are g-number of groups (g - 1) functions can be extracted from the data, Or k-number of functions if the number of predictor variables (k) is less than the number of groups (g)
56
Geometry of Two Discriminant Functions

Imagine a problem with two predictor variables and a DV with three groups. Now draw a scatterplot of the cases in each group across the two predictor variables. X2 x x xx x xx xxx oo o ooo o o oo o o *** ** ** * * ** X1
Group 1 = Group 2 = o Group 3 = x
Q How best to describe these three groups?

57 Geometryof TwoDiscriminantFunctions(cont.)
X2 Z1 x xx x x xxxx x x x x oo o ooo o o oo o o *** ** ** * * ** Z2
X1
Two vectors are fit to the data Z1 Z2 reasonably good fit for groups 1 and 3, but a bad fit to group 2 (1st discriminant function) reasonably good fit for group 2, but a bad fit for groups 1 & 3 (2nd discriminant function)
The two vectors taken together better explain the three groups than either one by itself.
58
Statistics with Multiple Discriminant Functions

The statistical output with multiple discriminant functions is comparable to that with one function Except that multiple sets of statistics are derived for each discriminant function, including: Discriminant coefficients, or weights Standardized coefficients, or weights Centroids Structured coefficients, or loadings Eigenvalues Canonical correlations Wilks' lambdas
59
Assumptions About Multiple Discriminant Functions Q Must the various discriminant functions be
independent of each other, i.e. noncollinear? No, they may be collinear or noncollinear, whatever best fits the data. Geometrically, the functions can be other than 90 apart.
Q Must the discriminant scores (Z) produced by

the various discriminant functions be independent of each other, i.e. noncollinear? Yes, the correlations among the discriminant scores produced by the various functions must all be equal to zero (0.0) r Z1 Z2 = 0.0
60
Discriminant Analysis of Pre-Disposition Status

The analysis was conducted using a stepwise selection process using the Wilks' lambda criterion
Q Are the variance/covariance matrices of the

three groups the same in the population?
Covariance Matrices PRE_STAT 1.00 AGE_FIRS AGE DR_SCORE PR_ARRST COUNSEL AGE_FIRS AGE DR_SCORE PR_ARRST COUNSEL AGE_FIRS AGE DR_SCORE PR_ARRST COUNSEL AGE_FIRS 3.415 -2.579 -.391 -1.639 .232 5.190 1.557 -2.326 -.152 .210 5.882 -1.305 -3.890 -.938 .298 AGE -2.579 19.080 4.389 -1.729 -.229 1.557 5.957 .457 .814 -.257 -1.305 6.382 -.515 2.313 -.327 DR_SCORE -.391 4.389 4.770 -1.510 .103 -2.326 .457 8.490 .631 -7.381E-02 -3.890 -.515 10.154 1.125 -9.559E-02 PR_ARRST -1.639 -1.729 -1.510 6.814 -.138 -.152 .814 .631 .662 -.148 -.938 2.313 1.125 3.000 -.500 COUNSEL .232 -.229 .103 -.138 .136 .210 -.257 -7.381E-02 -.148 .190 .298 -.327 -9.559E-02 -.500 .154
2.00
3.00
61 DiscriminantAnalysisof Per-DispositionStatus(cont.)
Box's M test for the homogeneity of variance/ covariance matrices

Analysis 1 Box's Test of Equality of Covariance Matrices
Log Determinants Log Determinant .934 .066 -.130 .606
PRE_STAT 1.00 2.00 3.00 Pooled within-groups
Rank
2 2 2 2
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results Box's M F 12.391 Approx. 1.968 df1 6 df2 38111.579 Sig. .066
Tests null hypothesis of equal population covariance matrices.
Decision
(Box's M = 12.39, p = 0.0663)
The null hypothesis that the variance/covariance matrices are equal in the population is accepted.
62
What Is the Final Model Estimated by the Discriminant Analysis?

Two discriminant functions were extracted
Canonical Discriminant Function Coefficients Function 1 2 -.146 .253 1.946 1.682 2.375 -6.655
AGE COUNSEL (Constant)
1st Function Z1 = 2.375 - 0.146 (age) + 1.946 (counsel) 2nd Function Z2 = -6.655 + 0.253 (age) + 1.682 (counsel) Given a 22-year-old offender with retained counsel Z1 = 2.375 - 0.146 (22) + 1.946 (1) = 1.109 Z2 = -6.655 + 0.253 (22) + 1.682 (1) = 0.593 These two discriminant scores will be used to classify the offender into one of the three pre-disposition groups.
63
Are the Two Discriminant Functions Significant?

Eigenvalues Canonical Correlation .685 .134
Function 1 2
Eigenvalue % of Variance .882a 98.0 .018a 2.0
Cumulative % 98.0 100.0

Wilks' Lambda Wilks' Lambda .522 .982
Test of Function(s) 1 through 2 2
Chi-square 43.254 1.206
df
4 1
Sig. .000 .272
1st Function Eigenvalue = 0.8819 Of the variance explained by the two functions, the 1st explains 97.97% The canonical correlation () between the two predictor variables and the discriminant scores produced by the 1st function = 0.6846 The chi-square test of the Wilks' is significant (2 = 43.254, p < 0.0001). The null hypothesis that in the population the BSS = 0, = 0, is rejected.
64 Are the TwoDiscriminantFunctionsSignificant?(cont.)
2nd Function Eigenvalue = 0.0183 Of the variance explained by the two functions, the 2nd explains 2.03% The canonical correlation () between the two predictor variables and the discriminant scores produced by the 2nd function = 0.1341 The chi-square test of the Wilks' is not significant (2 = 1.206, p = 0.272). The null hypothesis that in the population the BSS = 0, = 0, is accepted. Decision Since the second function is not significant, its associated statistics will not be used in the interpretation of the affect of age and counsel on pre-disposition status.
65
What Are the Standardized Canonical Discriminant Functions?

Standardized Canonical Discriminant Function Coefficients Function AGE COUNSEL 1 -.508 .770 2 .883 .666
Zz = C1ZX1 + C2ZX2 + + CkZXk
1st Function Zz1 = -0.508 (age) + 0.770 (counsel) 2nd Function Zz2 = 0.883 (age) + 0.666 (counsel) Nota Bene Recall that the 2nd function was found not to be significant. Of the two variables in the 1st function, counsel has the greater impact.
66
What is the Correlation Between Each of the Predictor Variables and the Discriminant Scores Produced By the Two Functions?
Structure coefficients, or loadings
Structure Matrix Function COUNSEL a AGE_FIRS PR_ARRSTa AGE DR_SCOREa 1 .867* .291* -.219* -.654 -.109 2 .499 .067 -.190 .757* .195*
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. *. Largest absolute correlation between each variable and any discriminant function a. This variable not used in the analysis.
The predictors counsel, age_firs, and pr_arrst load highest on the 1st function, while age and dr_score load highest on the 2nd function.
67
What is the Mean Discriminant Score for Each Pre-Disposition Group on Each Discriminant Function?
Recall that these mean discriminant scores are called centroids and that the 2nd discriminant function is
not significant.
Functions at Group Centroids Function 1 2 -1.001 -1.64E-03 .856 -.160 .827 .201
PRE_STAT 1.00 2.00 3.00
Unstandardized canonical discriminant functions evaluated at group means
Notice how numerically similar the centroids of the 1st function are for groups 2 and 3, i.e. bail and ROR. This means that the 1st function, while significant, will do a poor job discriminating between the bail and ROR groups, and most of its discriminatory power will be discriminating between the jail group versus the other two groups.
68
What Would a Scatterplot of the Discrimanant Scores of the Three PreDisposition Groups Reveal?
Canonical Discriminant Functions
3 2
1 Group 1 Group 3 Group 2
PRE_STAT
-1 Group Centroids Group 3 -2 -3 -3 -2 -1 0 1 2 Group 2 Group 1
Function 1
Reading across horizontally, notice how the 1st discriminant function separates the centroid-pair of the jail group (1) from that of the bail (2) and ROR (3) groups. Reading vertically, however, notice that the 2nd discriminant function fails to separate the three centroidpairs of the three groups. This is why the 2nd function was not found to be significant.
69
How Were the Individual Cases Classified?

Casewise plot of the cases
Casewise Statistics Highest Group Squared Mahalanobis Distance to Centroid .766 .480 .766 .364 .480 .420 .364 .420 .447 .766 3.745 .766 .825 1.190 1.871 3.745 .179 .681 .343 4.938 Second Highest Group Discriminant Scores
Case Number Actual Group Original 1 2 2 2 3 3 4 2 5 2 6 3 7 2 8 3 9 3 10 3 11 2 12 2 13 1 14 3 15 2 16 1 17 1 18 3 19 1 20 2 **. Misclassified case
Predicted Group 2 2 2** 2 2 2** 2 2** 3 2** 1** 2 3** 1** 1** 1 1 1** 1 1**
P(D>d | G=g) p df .682 .787 .682 .833 .787 .811 .833 .811 .800 .682 .154 .682 .662 .551 .392 .154 .914 .711 .842 .085
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
P(G=g | D=d) .578 .550 .578 .521 .550 .490 .521 .490 .465 .578 .596 .578 .470 .773 .721 .596 .910 .818 .855 .526
Group
3 3 3 3 3 3 3 3 2 3 2 3 2 2 2 2 2 2 2 2
P(G=g | D=d) .391 .409 .391 .426 .409 .442 .426 .442 .426 .391 .283 .391 .392 .144 .184 .283 .049 .112 .086 .341
Squared Mahalanobis Distance to Centroid Function 1 Function 2 1.127 1.694 -.411 .649 1.548 -.158 1.127 1.694 -.411 .342 1.403 .096 .649 1.548 -.158 .206 1.257 .349 .342 1.403 .096 .206 1.257 .349 1.044 .965 .856 1.127 1.694 -.411 4.394 -.398 -1.840 1.127 1.694 -.411 1.613 .819 1.109 3.707 -.836 -1.080 3.765 -.690 -1.333 4.394 -.398 -1.840 5.185 -1.419 -.067 3.820 -.981 -.827 4.104 -1.127 -.573 4.965 -.252 -2.094
70
What Was the Hit Ratio of the Discriminant Model?
a Classification Results
Original
Count
PRE_STAT 1.00 2.00 3.00 1.00 2.00 3.00
Predicted Group Membership 1.00 2.00 3.00 27 1 4 5 14 2 3 11 3 84.4 3.1 12.5 23.8 66.7 9.5 17.6 64.7 17.6
Total 32 21 17 100.0 100.0 100.0
a. 62.9% of original grouped cases correctly classified.
Hit Ratio = (44 / 70) (100) = 62.9% Errors = (26 / 70) (100) = 37.14%

Discriminant Analysis

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Discriminant Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discriminant Analysis

Uploaded by

Copyright:

Available Formats

Discriminant Analysis

2 KEY CONCEPTS ***** Discriminant Analysis

Predicting a Nonmetric Variable

Partitioning Sums of Squares (SS) in Discriminant Analysis

8 PartitioningSumsof Squares(SS) in DiscriminantAnalysis(cont.)

Elements of the Discriminant Model

Z = a + W1X1 + W2X2 + ... + WkXk

An Example of Discriminant Analysis with a Binary Dependent Variable

Discriminant Analysis with Two Groups

Discriminant Analysis Assumptions

Steps in Discriminant Analysis Process

The Sentence-Type Discriminant Model

Z = a + W1(dr_score) + W2(age_firs) + W3(skl_indx)... + W4(ser_indx)

Are the predictor variables multivariate normally distributed?

0 0.0 2.0 4.0 6.0 8.0 10.0

16 14 12 10 8 6 4 2 0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0

2 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Variable Dr-score Age_firs Skl_indx Ser_indx

Skew -0.5049 +0.7728 -0.0266 +0.2197

Kurtosis -0.6946 -0.4250 -1.2321 -1.1727

Are the Variance/Covariance Matrices of the Two Groups Homogeneous?

Q Are the variance/covariance matrices of the two

19 Are the Variance/CovarianceMatricesof the TwoGroupsHomogeneous?(cont.)

TYPE_SEN .00 1.00 Pooled within-groups

Tests null hypothesis of equal population covariance matrices.

Q What is Wilks' lambda ()?

21 HowWill the PredictorVariablesbe Enteredinto the DiscriminantModel?(cont.)

= (WSS / TSS) = (Zij - Zj)2 / (Zi - Z)2

SER_INDX SER_INDX DR_SCORE

Tolerance 1.000 .864 .864

Step 3: Estimate the parameters of the resulting discriminant model

22 HowWill the PredictorVariablesbe Enteredinto the DiscriminantModel?(cont.)

1 - ( k-1) / (k) ( k-1) / (k)

Estimation of the Parameters of the Model

DR_SCORE SER_INDX (Constant)

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

25 Estimationof the Parametersof the Model(cont.)

How Can the Goodness-of-Fit of the Model Be Measured?

What Is an Eigenvalue in Discriminant Analysis?

What is the Eigenvalue for the Sentence-Type Example?

Eigenvalues Canonical Correlation .483

Eigenvalue % of Variance .305a 100.0

a. First 1 canonical discriminant functions were used in the analysis.

How Can You Tell if the Eigenvalue Is Significant?

31 HowCanYouTell if the Eigenvalueis Significant?(cont.)

The Wilks' for the discriminant model

32 HowCanYouTell if the Eigenvalueis Significant?(cont.)

a. First 1 canonical discriminant functions were used in the analysis.

eigenvalue canonical correlation

How Well Does the Model Predict?

TYPE_SEN .00 1.00 .00 1.00

Predicted Group Membership .00 1.00 27 10 14 19 73.0 27.0 42.4 57.6

a. 65.7% of original grouped cases correctly classified.

Does the Model Predict Any Better Than Chance?

Is the Hit Ratio of the Model Significantly Better Than Chance?

P = the proportion the model predicted correctly

37 Is the Hit Ratioof the ModelSignificantlyBetterThanChance?(cont.)

X2 Z1 x xx x x xxxx x x x x oo o ooo o o oo o o * ** * * ** Z2