Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1 views

Rm Unit 4 - Overview

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Rm Unit 4 - Overview

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIT 4

1
UNIT IV
Syllabus
• Summarizing the Data: Mean, Median,
Mode and Standard Deviation
• Data Analysis Techniques: Univariate and
Bivariate Analysis (Chi Square, ANOVA, Sign
test); Multivariate Analysis (Discriminant
Analysis, Cluster Analysis, Factor Analysis,
Multiple Linear Regression).

2
Central Tendency

3
Central Tendency

Mean Mode

Median

4
Deviation
• Mean Deviation
• Standard Deviation

5
Univariate and Bivariate Analysis

Univariate Analysis
Univariate Analysis refers to the analysis of one variable like average weight of
employees in an organisation. Here, there is no relationship of this variable with
any other variable.

Bivariate Analysis
Bivariate Analysis refers to the analysis of two variable like Age and Weight of
Employees. Here, the correlation can be determined between the two variables.

6
Data Analysis and Data Type

Goodness of Fit
Chi Square Test
(Non-Parametric Test) Independent
Nominal/ Categorical Data Attributes

ANOVA One Way


(Parametric Test)
Interval/ Ratio Scaled Data Two Way

Single Sample
Small Sample
Sign Test
Paired Samples
(Non-Parametric Test)
Single Sample
Ordinal/ Interval/ Ratio Scaled Data Large Sample
Paired Samples

7
Chi-Square: General Structure
• For analysis of categorical data
– Test for equality of percentages (goodness of Fit)
– Test for independence
• The chi-square statistic measures the difference between
the actual counts and the expected counts (assuming
validity of null hypothesis) as follows:
n
Oi  E i 
2


2
sta t   Ei
i 1

When we need to find out if the two more qualitative facets are
independent
8
Test for Goodness of Fit

Example Coin
A coin is tossed for 50 times and head appears to be 30 times. Test at five
percent level of significance if the coin is unbiased.

9
Solution
H0: The coin is unbiased
H1: The coin is biased n
2 stat  
Oi  E i 2
i1 Ei
df  n 1

Facets Observed Frequency Expected Frequency (O-E)2/E

H 30 25 1
T 20 25 1
2

Vc =2, Vt = 3.841
Vc <Vt
H0 is accepted
The coin is unbiased.
10
Chi Square Independent Attributes
Synergy Ltd. is organising a training programme for its 500 employee to improve
performance. Some attend the programme while others not. The observation is as
follows:

View Improve Not Improve


Attend 132 91
Not Attend 140 137

Is the training effective? Test at 5% level of significance.

11
H0 : Training is not effective.
H1 : Training is effective.
View Improve Not Improve Row Total
Attend 132 91 223
Not Attend 140 137 277
Column Total 272 228 Grand Total = 500
(Row Total)X(Column Total)
Expected Frequency 
Grand Total 223 272
a11   221.31
df  (c 1)(r 1); Where c  Column, r  Row 500
223 228
a12   101.69
Calculation of Expected Frequency 500
277  272
View Improve Not Improve a 21   150.69
Attend a11 a12 500
Not Attend a21 a22 277  228
a 22   126.31
500

12
O E (O – E)2 (O – E)2/E
132 121.31 114.23 0.94
91 101.69 114.23 1.12
140 150.69 114.23 0.76
137 126.31 114.23 0.90
3.73

2 3.73
Vc = 3.73
df  (c 1)(r 1) Vt = 3.841
df  (2 1)  (2 1) Vc < Vt
df  1 H0 is accepted.
The training is not effective.
Table Value 2 at 5% for 1 df  3.841

13
ANOVA: General Structure
• One-Way ANOVA
Variance SS DF MS F-ratio F-table
Between Sample SSB k-1 MSB=SSB/(k-1) MSB/MSW
Within Sample SSW n-k MSW=SSW/(n-k)
Total SST n-1

• Two-Way ANOVA
Variance SS DF MS F-ratio F-table
Between Columns SSC c-1 MSC=SSC/(c-1) MSC/MSE
Between Rows SSR r-1 MSR=SSR/(r-1) MSR/MSE
Residual Error SSE (c-1)(r-1) MSE=SSE/(c-1)(r-1)
Total SST n-1

14
One-Way ANOVA
Variance SS DF MS F-ratio F-table
Between Sample SSB V1 = k – 1 MSB=SSB/V1
MSB/MSW
Within Sample SSW V2 = N – k MSW=SSW/V2
Total SST N–1
T  Sum Total
T2
Correction Factor 
N
SST  SSB  SSW

SSB 
 X   X   X 
2 2 2

 ... 
 X  2
T2
  
1 2 3 n

n1 n2 n3 nn N
n T2
SST   ( X i )  2

i1 N

df  v1, v 2
v1  k 1
v2  N  k
15
Example
Healthy Agro Ltd. sows three samples of three kinds of seeds in a farm. The
productivity in tons is observed as follows:

Seed1 Seed2 Seed3


5 7 18
5 7 18
7 12 18

Is there any difference in productivity level of seeds? Test at 5% level of


significance.

16
H0: There is no difference in the seeds.
H1: There is difference in the seeds.
Seed Seed Seed S12 S22 S32
1 2 3
5 7 18 25 49 324
5 7 18 25 49 324
7 12 18 49 144 324
SUM 17 26 54 99 242 972
Total 97 Total 1313
N  X i2 
SST   X i  CF
k
T  97 2
SSB      CF

2 i1  ni 
CF  T i1

N SST  13131045.44 SSB 


X 1  X 2  X 3 
2 2

2
 CF

CF 
9 7  2
SST  267.55 n1 n2 n3
9
SSB 

17 2 26 2 54 2
  1045.44
CF 
3 3 3
1 0 4 5 .4 4
SSB  248.22
17
SST  SSB  SSW v1  k 1 v2  N  k
SSW  SST  SSB v1  3 1 v2  9  3
SSW  267.55  248.22 v1  2 v2  6
SSW  19.33
ANOVA Table
SV SS df MS F Fcrit
SST  267.556
Between SSB = 248.22 V1=2 MSB=124.11
SSB  248.222 38.5 5.14
Within SSW = 19.33 V2=6 MSW=3.22
SSW  19.3333
Total SST = 267.55
v1  2
Vc = 38.5
v2  6
Vt = 5.14
Vc > Vt
H0 is rejected
There is difference in the seeds.

18
Two-Way ANOVA
Variance SS DF MS F-ratio
Between
SSC V1 = c-1 MSC=SSC/(c-1) MSC/MSE
Columns
Between Rows SSR V1 = r-1 MSR=SSR/(r-1) MSR/MSE
Residual Error SSE V2 = (c-1)(r-1) MSE=SSE/(c-1)(r-1)
Total SST N-1
T  S u m To t a l
2
C o r r e c t i o n F a c t o r C F  
T df  v 1 , v 2
N
SST  SSC  SSR  SSE v1 (For Column)  c  1
n
v1 (For Row)  r  1
SST   ( Xi )2  CF
i 1
v 2  (c 1)(r 1)
 X 
2
 X 
2
 X 
2
 X 
2

SSC     ...   CF
1 2 3 n

n1 n2 n3 nn
 Y1     Y3   Yn 
2 2 2 2
Y2
SSR     ...   CF
n1 n2 n3 nn
19
Example
Three brands of detergents have been used in three water
temperatures to wash similar kinds of cloths. The cleanliness is observed
as follows:
Surf Excel Tide Wheel
Cold 5 7 18
Normal 7 12 21
Warm 10 14 25

Test if there is any difference because of


• Brands
• Water temperature

20
H01: There is no significant difference in cleanliness due to the water temperatures.

H11: There is significant difference in cleanliness due to the water temperatures.

H02: There is no significant difference in cleanliness due to the different detergent


brands.

H12: There is significant difference in cleanliness due to the detergent brands.

Surf Excel (X1) Tide (X2) Wheel (X3) Total


Cold (Y1) 5 7 18 30
Normal (Y2) 7 12 21 40
Warm (Y3) 10 14 25 49
Total 22 33 64 119

21
Surf Excel (X1) Tide (X2) Wheel (X3) Total X12 X22 X32
Cold (Y1) 5 7 18 30
25 49 324 398
Normal (Y2) 7 12 21 40 49 144 441 634
Warm (Y3) 10 14 25 49 100 196 625 921
Total 22 33 64 119 1953

T  119
119  2 Y  Y  Y 
2 2 2

 X    X    X  SSR    CF
1 2 3

SSC  
CF   1573.44
2 2 2

9
1 2 3
 CF n1 n2 n3
n
n1 n2 n3
SST   ( X i )  CF
2 2 2
2 (30) (40) (49)
SSR    1573.44
2 2 2
(22) (33) (64)
SSC    1573.44 3 3 3
i1 3 3 3
SSC  316.22 SSR  60.22
SST 19531573.44
SST  379.55 SSE  SST  (SSC  SSR)
SSE  3.12

22
Source of
SS df MS F F crit
Variation
Fc = MSC/MSE
Columns SSC = 316.22 V1(C) = 2 MSC = (316.22)/(2) = 158.11 6.94
= (158.11)/(0.78) = 203.29
Fr = MSR/MSE
Rows SSR = 60.22 V1(r) = 2 MSR = (60.22)/(2) = 30.11 6.94
(30.11)/(0.78) = 38.71

Error SSE = 3.12 V2 =(C-1)*(r-1)= 4 MSE = (3.12)/(4) = 0.78

Total SST = 379.55 8

V c c  203.29 V cc  V tc H01 and H02 are rejected. So there is


V c r  38.71 H02 is rejected significant difference in cleanliness due
V c r  38.71 to the water temperatures and there is
V t
c  6.94 significant difference in cleanliness due
V t r  6.94 V cr  V tr to the detergent brands.
H01 is rejected

23
Sign Test
Small Samples
Single Sample

For the following data Test if the Median is 15.


1.00 8.90
2.00 9.00
3.00 9.30
4.00 9.70
5.00 12.00
6.00 12.25
6.70 14.25
7.00 14.45
7.10 18.00
7.25 19.00

24
p  q r  n C r p r  q nr
Solution
H0: Median = 15
H1: Median ≠ 15
P = 2(20C2p2q18+20C1p1q19+20C0p0q20)
1.00 – 8.90 –
Cr  n
2.00 – 9.00 – n

rn  r 
3.00 – 9.30 –
C2  20
4.00 – 9.70 – 20
220  2 
5.00 – 12.00 –
201918
6.00 – 12.25 – 20
C2 
218
6.70 – 14.25 – 20
C2  190
7.00 – 14.45 –
7.10 – 18.00 + =2((190x(0.5)20)+(20x(0.5)20)+(1x(0.5)20)
7.25 – 19.00 + =0.00040245
P<0.05
+ = 2, – = 18 H0 is rejected
Median ≠ 15

25
Sign Test
Small Sample
Paired Sample
In an institute two scientists Mr. Goldsworthy and Mr. Sheraton develop
two methods of giving training to the new employees. Since it is a typical
type of data for a very limited type of employees, the data is not supposed
to be normally distributed which is as follows:

Sr. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Method1 20 24 28 24 20 29 19 27 20 30 18 28 26 24
Method2 16 26 18 17 20 21 23 22 23 20 18 21 17 26

Test whether there is a significant difference between the two methods.

26
Solution
H0: Median 1 = Median 2
H1: Median 1 ≠ Median 2
Sr. No. Method1 Method2 d
1 20 16 - P = 2(12C4p4q8+12C3p3q9+12C2p2q10 +12C 1p1q11+12C0p0q12)
2 24 26 + = 0.3876953125
3 28 18 - P > 0.05
4 24 17 -
5 20 20 =
H0 is accepted
6 29 21 - Median 1 = Median 2
7 19 23 + There is no significant difference between the two
8 27 22 -
methods.
9 20 23 +
10 30 20 -
11 18 18 =
12 28 21 -
13 26 17 -
14 24 26 +

+ = 04, – = 08, = = 02

27
Sign Test
For Large Samples

Normal Variate Method


It is used when n > 25

x - np o x - o.5 n
Z Or Z 
np 0 (1- p0) o.5 n

When X  np, It should be X  0.5


When X  np, It should be X  0.5

(X  0.5)  o.5 n (X  0.5)  o.5 n


Z Or Z 
o.5 n o.5 n

28
Sign Test
For Large Samples Single Sample
Gannet India Ltd. estimates the age of its employees to be 45 years. A sample
of 100 employees is taken out of which 60 are above 45 years, 8 are of 45 years and
32 are less than 45 years of age. Test if the age median of the employees of the
company is 45 years.
Solution
H0: Med = 45
H1: Med ≠ 45 Z
(X  0.5)  o.5 n V c  2.81
o.5 n
V t  1.96
Here (32  0.5)  o.5  92
Z Vc Vt
n  100  8  92 o.5 92

Z
13.5 H 0 is rejected
X  32 4.795 Median is not 45 years
p  0.5 Z  2.81

29
Sign Test
For Large Samples Paired Samples
Gannet India Ltd. provides training to its 100 employees. The findings are as follows:
Improved = 70
Worse than Previous = 25
No Change = 5
Test if the training programme is effective.
Solution
H0: The training programme is not effective.
H1: The training programme is effective.
V c  4.51
(X  0.5)  o.5 n
Here Z V t  1.645
o.5 n
n  100  5  95 (25  0.5)  o.5  95
Vc Vt
Z
X  25 o.5 95
H 0 is rejected
The training programme is effective.
p  0.5 Z
22
4.87
Z  4.51

30
Factor Analysis
It is used

• To minimise the unnecessary bulk of variables


and components

• Checking Weak Loads

• Checking Reloads

• Identifying Similar Variables

31
Factor Analysis
It discovers how the original variables are organised in a particular way
reflecting another ‘latent variable’

Factor Analysis

Principal Component Method Centroid Method Maximum Likelihood Method

32
• Most often it is used in multivariate technique of research
studies, particularly in social and behavioural sciences

• It is applicable when there is a systematic


interdependence among a set of observed or manifest
variables and the researcher is interested in finding out
something more fundamental or latent behaviour.

33
Construct = Latent Variable = Factor

Observed/ Manifest Variable Latent Variable


• Memory Test (X1)
• Verbal Test (X2)
• Written Test (X3)
• Reading Test (X4)
• Comprehension Test (X5)
≅ Intelligence
• Speed Test (X6)
• Decision Making Test (X7)

34
Error

X1 e1
Degree of
ei (I = 1, 2, 3, … ) is the unique
Correlation X2 e2 contribution of the variable
that cannot be predicted from
the remaining variables. It is
X3 e3 equal to 1 – R2.

Intelligence X4 e4

X5 e5

X6 e6

X7 e7

35
Factor Analysis
Why do we look at “dimensions”?
• We study phenomena that can not be directly observed
– (ego, personality, intelligence)
• We have too many observations
– need to “reduce” them to a smaller set of factors
• Items are representations of underlying or latent factors.
– We want to know what these factors are.
– We have an idea of the phenomena that a set of items represent (construct
validity).
• To find underlying latent constructs
– As manifested in individual items
• To assess the association between these factors
• To produce usable scores that reflect critical aspects of any complex
phenomenon
– (e.g. personality, intelligence, values, air)
• An end in itself and a major step toward creating error free measures

36
Factor Analysis
Basic Concept

• If two items are highly correlated


– They must represent the same phenomenon
– If they tell us about the same underlying variance, combining them to form a
single measure is reasonable for two reasons
• Parsimony
• Reduction in Error
– Represented by a regression line

• BUT suppose one is just a little better than the other at representing this
underlying phenomena?
• FACTOR ANALYSIS looks for the phenomena underlying the observed variance
and covariance in a set of variables.
• These phenomena are called “factors” or “principal components.”

37
Condition
• For only Interval or Ratio Scaled Data
• Usually Sample Size should be five times higher than total
number of variables.
Adequacy
• Bertlett’s Chi Square p value should be less than 0.05
• KMO Statistic should be greater than 0.5
• Determinant of Correlation Matrix R > 0.00001 (Field 2012 p771)

KMO Interpretation
> 0.90 Marvellous
0.80 – 0.90 Meritorious
0.70 – 0.79 Middling
0.60 – 0.69 Mediocre
0.50 – 0.59 Miserable
< 0.50 Unacceptable

38
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity is a test statistic used
to examine the hypothesis that the variables are uncorrelated in the
population. In other words, the population correlation matrix is an identity
matrix; each variable correlates perfectly with itself (r = 1) but has no
correlation with the other variables (r = 0).

• The Bartlett Test of Sphericity compares the correlation matrix with a matrix of
zero correlations (technically called the identity matrix, which consists of all
zeros except the 1’s along the diagonal).

39
 2 p  5
   n 1
2
x ln R

 6 
p( p 1)
df 
2

where
n  Sample Size
p  Number of Variables
ln  Natural Log (log(x,2.71828))
R  Deter min ant of Correlation Matrix

40
Kaiser-Meyer-Olkin Measure of Sampling Adequacy

 ij
r 2

j i
KMO  i

 ij  ij
r 2
 a 2

i j i i j i

Where
a  Partial Correlation
Vij
aij  
Vii.V jj
Vij  Inverse Matrix of r
Vii  Diagonal of Vij
V jj  Transpose of Vii

Sum of Square of r(Except Diagonal)


 KMO 
Sum of Square of r(Except Diagonal)  Sum of Square of a(Except Diagonal)

41
Role of Correlation in Factor Analysis
• Holzinger and Swineford (1939) – Variables sets must be clustered with high Correlation
• Tabachmick and Fidell (2001) – Maximum Correlation must be greater than 0.3
• Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple
correlations, r, between all possible pairs of variables included in the analysis. The
diagonal elements, which are all 1, are usually omitted.
Name of Matrix Elements Good Signs Bad Signs
Many above 0.3 and
Correlation R Correlations Few above 0.3
possible clustering
Few above 0.3 and
Partial correlation Partial correlation Few above 0.3
possible clustering
Partial correlation Few above 0.3 and
Anti Image Few above 0.3
reversed possible clustering

42
X1 X2 X3 X4 X5 X6 X7
X1 1.000 0.770 0.810 0.210 0.180 0.190 0.210
X2 0.770 1.000 0.870 0.250 0.170 0.210 0.220
X3 0.810 0.870 1.000 0.180 0.210 0.240 0.410
X4 0.210 0.250 0.180 1.000 0.270 0.240 0.210
X5 0.180 0.170 0.210 0.270 1.000 0.870 0.900
X6 0.190 0.210 0.240 0.240 0.870 1.000 0.870
X7 0.210 0.220 0.410 0.210 0.900 0.870 1.000

43
Elements in Principal Component Method

• Eigenvalue. The eigenvalue represents the total variance explained by each


factor. Norman and Streiner (2008) state that Eigenvalue should be greater than
1 for a estimated latent factor.
Eigenvalue > 1
• Total Variance Explained
It indicates how much of the variability in the data has been modelled by the
extracted factors. It is estimated as the proportion of sum of Eigenvalues more
than 1 and total number of variables.
n

E i
Total Variance Explained  1
n

44
• Factor loadings. Factor loadings are simple correlations between the variables and the factors. It is
correlation between a specific observed variable and a specific factor. Higher values mean a closer
relationship. They are equivalent to standardised regression coefficients (β weights) in multiple regression.
Higher the value the better.
• Communality. Communality is the amount of variance a variable shares with all the other variables being
considered. This is also the proportion of variance explained by the common factors. It is the total influence
on a single observed variable from all the factors associated with it. It is individual variability of the variable
and equal to the sum of all the squared factor loadings for all the factors of Eigenvalue greater than 1 related
to the observed variable and this value is the same as R2 in multiple regression. Higher the value the better.
1 – Communality of a variable is not explained or predicted by the model.

k
F1 F2 F3 --- Fk
h X
2 2
ik
X1 X11 X12 X13 --- X1k i1

• Factor loading plot. A factor loading plot is a plot of the original variables using the factor loadings as
coordinates.
• Factor matrix. A factor matrix contains the factor loadings of all the variables on all the factors extracted.

45
• Rotation
Rotation is selected as per the nature of interrelations of variables. It is of two types:
• Oblique; and
• Orthogonal
• Oblique
If the variables are assumed to be dependent or related, Oblique Rotation is
selected. It consists of Direct Oblimin or Promax.

X Y θ is constant
θ may be  or -

X Y Z

θ θ θ

46
• Orthogonal
If the variables are assumed to be Independent, Orthogonal Rotation is selected.
It consists of Varimax, Quatrimax and Equimax.

The variables are uncorrelated and


considered perpendicular
 

 Y
Z

47
Rotated Component Matrix
The Rotated Component Matrix identifies the group of similar variables
with the latent factor. In initial Eigenvalues the difference of Eigenvalues between
the factors is higher than in rotated loadings.

Transformation Matrix
It explains how much a particular factor is rotated. For example, if the
value is 0.707, it means the rotation is 450 because Cos 450 = 0.707.

Cos  X Excel
 CosRadians   X
Cos 1 X  
 ACOS X * 180 / PI    

48
Eigenvalue in Matrix
Eigenvalue can be determined by λ, when for matrix A with Vector V
A  I V  0 Where
A  Square Matrix
I  Identity Matrix
V  Vectors
Eigenvalue in Factor Analysis is the sum of the squares of coefficients of a particular factor.

F1 F2 F3 --- --- Fi
X1 X11 X12 X13 --- --- X1i
X2 X21 X22 X23 --- --- X2i
7
X3 X31 X32 X33 --- --- X3i
Eigenvalue for Factor1   X i12
X4 X41 X42 X43 --- --- X4i i1
X5 X51 X52 X53 --- --- X5i
X6 X61 X62 X63 --- --- X6i
X7 X71 X72 X73 --- --- X7i
49
Multiple Regression Analysis
• A procedure for analyzing associative relationships between a metric
dependent variable and one or more independent variable
– Existence of a relationship
– Strength of the relationship
– Predict the values of the dependent variable
– Control for other independent variable when evaluating the contributions
of a specific variable or a set of variables

50
Multiple Regression
Multiple Regression allows us to:
 Use several variables at once to explain the variation in a continuous
dependent variable.
 Isolate the unique effect of one variable on the continuous dependent
variable while taking into consideration that other variables are affecting it
too.
 Write a mathematical equation that tells us the overall effects of several
variables together and the unique effects of each on a continuous
dependent variable.
 Control for other variables to demonstrate whether bivariate relationships
are spurious

51
Regression Analysis

Deterministic Model
Yi = β0 + β1X1 + β2X2 + β3X3 + … + βiXi

Probabilistic Model
Yi = β0 + β1X1 + β2X2 + β3X3 + … + βiXi + μ

52
Multiple Regression Analysis
The general form of the multiple regression model is as follows:

which is estimated by the following equation:

Ŷ = a + b1X1 + b2X2 + b3X3+ . . . + bkXk

The coefficient a represents the intercept, but the b's are the partial regression
coefficients i.e. slope.

53
Cluster Analysis

Age Income

Gender

54
Cluster Analysis
Introduction
• A technique used to classify objects or cases into relatively homogeneous
groups called clusters

• Objects in the cluster show same behavioral pattern

• Also called classification analysis, or numerical taxonomy

55
Cluster Analysis
Application
• Market segmentation based on benefits sought by the customers

• Buyer behavior – Identifying homogeneous groups of buyers

56
Statistics Associated with Cluster
Analysis
• Agglomeration schedule. An agglomeration schedule gives information on
the objects or cases being combined at each stage of a hierarchical clustering
process.

• Cluster centroid. The cluster centroid is the mean values of the variables for
all the cases or objects in a particular cluster.

• Cluster centers. The cluster centers are the initial starting points in
nonhierarchical clustering. Clusters are built around these centers, or seeds.

• Cluster membership. Cluster membership indicates the cluster to which each


object or case belongs.

57
Discriminant Analysis
Introduction
• Need to classify people into two or more groups
– Buyers / Non-Buyers
– Good / Bad credit risk
– Superior / Average / Poor Products
• Goal
– To establish a procedure to find the predictors that best classify subjects
• Uses
– Market segmentation research

58
Discriminant Analysis
Introduction
• Dependent variable is categorical (nominal or non metric)
– Nominal: Gender, Religion
– Promotion: Low, Medium, High
• Predictor variable is interval in nature
• Involves deriving a variate, the linear combination of the two (or more)
independent variables that will discriminate best between a priori defined
groups
• Hypothesis: Group means of a set of independent variables for two or more
groups are equal

59
Discriminant Analysis
Objectives
• Development of discriminant functions, or linear combinations of the predictor or
independent variables, which will best discriminate between the categories of the
criterion or dependent variable (groups).

• Examination of whether significant differences exist among the groups, in terms of the
predictor variables.

• Determination of which predictor variables contribute the most of the intergroup


differences.

• Classification of cases to one of the groups based on the values of the predictor
variables.

• Evaluation of the accuracy of classification.

60
Discriminant Analysis
Discriminant Function
• Discriminant Analysis is done by calculating a linear function of the form
Di = d0 + d1X1 + d2X2 + d3X3 + . . . + dpXp

Where
– Di is the score on discriminant function i.
– The di’s are weighting coefficients; do is constant.
– The X’s are the values of the discriminating variable used in the analysis
• No. of discriminant equations required
– Two groups – One; Three groups – Two; N groups – N-1 equations

61
Discriminant Analysis

It is used when dependent variable is a categorical data

System Discipline Flexibility Facility

Good 5 5 5

Bad 7 5 7

Normal 3 5 7

62

You might also like