Mba - BRM - Unit Iv: Dr.N.Chitra Devi
Mba - BRM - Unit Iv: Dr.N.Chitra Devi
Mba - BRM - Unit Iv: Dr.N.Chitra Devi
CHITRA DEVI
MBA –BRM– UNIT IV
CONTENTS OF UNIT - IV
Data preparation – editing – coding
multivariate statistical Techniques
Factor Analysis
Discriminant analysis
Cluster analysis
Chi – Square Test
Analysis of Variance
multiple regression And Correlation
Application of SPSS package
DATA EDITING AND CODING
DATA EDITING AND CODING IN SPSS
VARIABLES
Variables
A variable is any characteristic, number, quantity,
that can be measured or counted.
Example
Age (21,33,25,36 …)
Gender (Male, Female)
Income (11000,15000,25000)
Country of birth (China, India, Pakistan)
Types of variables
VARIABLES
DISCRETE Nominal
CONTINUOUS Ordinal
Continues
Types of Variables – Discrete Variable
1. Discrete Variable
A variable which values are whole numbers is called (discrete)
For example
Number of bought by a customer in a supermarket(10,50…)
Number of active bank accounts of a borrower (1,4,7)
Number of children in a family (2,3,4)
Types of Variable – 2. Continuous variable
A variable that may contain any value within some range is called
continuous. Example:
Categorical Variables
Nominal Ordinal
Variable Variables
Categorical Nominal/Ordinal Variables
Categorical Nominal Variables which can not be ordered in a
meaningfulway.
For example
Intended to use of Loan – Nominal
Home Ownership - Nominal
Loan Status - Nominal
Categorical Ordinal Variables can be ordered in a meaningful way.
For example
Liking of Icecream – Strongly Agree, Agree
Classification Analysis based on Variables
Types of Analysis
Univariate Analysis
2. Bivariate Analysis
2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
A Simple example showing the implementation of k-
means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids (k=2) for two
clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
Step 2:
Thus, we obtain two clusters containing:
X ( calls 1 2 2 3 4 5
Usage)
Y (Data 1 1 3 2 3 5
Usage
NAME DATA POINTS DISTANCE FROM B (2,1) DISTANCE FROM C (2,3) ASSIGNED CENTRE
A (1,1) 1 2.24 V1
B (2,1) 0 2 V1
C (2,3) 2 0 V2
E (4,3) 2.83 2 V2
F (5,5) 5 3.61 V2
Clustering Analysis
• EUCLIDIAN DISTANCE MATRIX =
• DATA POINTS = Random Points B = (2-1) & C = (2-3)
• A = (1,1) = and Random Points (Y1) = (2-1) = X1 and Y1
• X2 – Y2 = B = (2-1) , Random Points = C = (2-3)
Never Married 18 36 21 9 6 90
Married 12 36 45 36 21 150
Divorced 6 9 9 3 3 30
Widowed 3 9 9 6 3 30
Total 39 90 84 54 33 300
Expected Values
Qualification/ Marital Middle School HIGH SCHOOL Bachelor’s Master’s Ph.D Total
Status
Total
Chi- Square Test
• Expected Values = ( Row Total *Column Total) /Grand Total
• Step 3 : Find out the Deviation from Actual values to Expected
values using the following formula:
• = (Observed Value – Expected Value)^2 / Expected Value
Chi – square Test
OBSERVED VALUE EXPECTED VALUE (0-E) (0-E)^2 (0-E)^2/E
12 19.5 9 81 3
• Then a second set of weights can be selected, so that the second factor
explains most of the residual variance, subject to being uncorrelated
with the first factor.
• This same principle applies for selecting additional weights for the
additional factors.
Statistics Associated with Factor Analysis
• Communality. Amount of variance a variable shares with all the other
variables. This is the proportion of variance explained by the common
factors.
• Eigenvalue. Represents the total variance explained by each factor.
• Factor loadings. Correlations between the variables and the factors.
• Factor matrix. A factor matrix contains the factor loadings of all the
variables on all the factors
Determine the Method of Factor Analysis
• In Common factor analysis, the factors are estimated based only on the common
variance.
-Communalities are inserted in the diagonal of the correlation matrix.
-Used to identify the underlying dimensions and when the common variance is of
interest.
Determine the Number of Factors
• Determination Based on Scree Plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction. The point at which the scree begins denotes the
true number of factors.
• Through rotation the factor matrix is transformed into a simpler one that is easier to
interpret.
• After rotation each factor should have nonzero, or significant, loadings for only some of
the variables. Each variable should have nonzero or significant loadings with only a few
factors, if possible with only one.
• The rotation is called orthogonal rotation if the axes are maintained at right angles.
Rotation of Factors
• Varimax procedure. Axes maintained at right angles
Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
(a) (b)
High Loadings High Loadings
Before Rotation After Rotation
7.ANOVA – ONE WAY ANOVA
Two or three different Populations, we can apply Anova. For
example: To assess the possible variation in the performance in
a test between the convent schools of a city A, B,C
A B C
Performance of Students
9 13 14
11 12 13
13 10 17
9 15 7
8 5 9
APPLICATION OF ONE WAY ANOVA IN SPSS
ANOVA – TWO WAY ANOVA
8.APPLICATION OF TWO WAY ANOVA IN SPSS