Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
38 views70 pages

Mba - BRM - Unit Iv: Dr.N.Chitra Devi

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 70

Dr.N.

CHITRA DEVI
MBA –BRM– UNIT IV
CONTENTS OF UNIT - IV
 Data preparation – editing – coding
multivariate statistical Techniques
 Factor Analysis
Discriminant analysis
Cluster analysis
Chi – Square Test
Analysis of Variance
 multiple regression And Correlation
Application of SPSS package
DATA EDITING AND CODING
DATA EDITING AND CODING IN SPSS
VARIABLES

Variables
A variable is any characteristic, number, quantity,
that can be measured or counted.
Example
Age (21,33,25,36 …)
Gender (Male, Female)
Income (11000,15000,25000)
Country of birth (China, India, Pakistan)
Types of variables

VARIABLES

NUMERIC VARIABLES CATEGORICAL VARIABLES

DISCRETE Nominal

CONTINUOUS Ordinal

Continues
Types of Variables – Discrete Variable

1. Discrete Variable
A variable which values are whole numbers is called (discrete)
For example
Number of bought by a customer in a supermarket(10,50…)
Number of active bank accounts of a borrower (1,4,7)
Number of children in a family (2,3,4)
Types of Variable – 2. Continuous variable
A variable that may contain any value within some range is called
continuous. Example:

Amount paid by a customer in a supermarket ($32.50)


House Price (in principle it can take any value)(INR3,50,000)
Time Spent 3.4 hours
Illustrations

Interest Rate – Continuous Variable


Number of Accounts opened in bank – Discrete
Loan Defaulted - binary
Categorical Variables

Categorical Variables

Nominal Ordinal
Variable Variables
Categorical Nominal/Ordinal Variables
Categorical Nominal Variables which can not be ordered in a
meaningfulway.
For example
Intended to use of Loan – Nominal
Home Ownership - Nominal
Loan Status - Nominal
Categorical Ordinal Variables can be ordered in a meaningful way.
For example
Liking of Icecream – Strongly Agree, Agree
Classification Analysis based on Variables
Types of Analysis
Univariate Analysis
2. Bivariate Analysis

4. LINEAR REGRESSION ANALYSIS


5. CHI-SQUARE TEST
6. ANALYSIS OF VARIANCE
3. Multivariate Analysis
1. Linear Regression Analysis
2. Correlation Analysis
3. Factor Analysis
4. Analysis of Variance
5. Cluster Analysis
1. Linear Regression Analysis
1. Linear Regression Analysis
Regression Analysis – Manual Calculation
Regression Analysis – Application in SPSS
2. Correlation Analysis
2. Correlation Analysis
2. Correlation Analysis
2. Correlation Analysis
2. Correlation Analysis
2. Correlation Analysis
CORRELATION ANALYSIS IN SPSS
3.K-MEANS
CLUSTERING
INTRODUCTION-
What is clustering?

 Clustering is the classification of objects into different


groups, or more precisely, the partitioning of a data set into
subsets (clusters), so that the data in each subset (ideally)
share some common trait - often according to some defined
distance measure.
Common Distance measures:

 Distance measure will determine how the similarity of two elements is


calculated and it will influence the shape of the clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is given by:

2. The Manhattan distance (also called taxicab norm or 1-norm) is given by:
A Simple example showing the implementation of k-
means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids (k=2) for two
clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters containing:

{1,2,3} and {4,5,6,7}.


 Their new centroids are:
Step 3:
 Now using these centroids we
compute the Euclidean distance of
each object, as shown in table.

 Therefore, the new clusters are:


{1,2} and {3,4,5,6,7}

 Next centroids are: m1=(1.25,1.5)


and m2 = (3.9,5.1)
APPLICATION OF CLUSTERING IN SPSS
K- means Clustering Analysis – ILLUSTRATION 2
sTEP 1 : CHOOSE TWO CLUSTER CENTRE
V1 = (2,1) , V2 = (2,3)
STEP 2 : FINDING THE DISTANCE BETWEEN THE CLUSTER CENTRE AND THE
SAMPLE POINTS

X ( calls 1 2 2 3 4 5
Usage)
Y (Data 1 1 3 2 3 5
Usage

NAME DATA POINTS DISTANCE FROM B (2,1) DISTANCE FROM C (2,3) ASSIGNED CENTRE

A (1,1) 1 2.24 V1

B (2,1) 0 2 V1
C (2,3) 2 0 V2

D (3,2) 1.41 1.41 V1

E (4,3) 2.83 2 V2

F (5,5) 5 3.61 V2
Clustering Analysis
• EUCLIDIAN DISTANCE MATRIX =
• DATA POINTS = Random Points B = (2-1) & C = (2-3)
• A = (1,1) = and Random Points (Y1) = (2-1) = X1 and Y1
• X2 – Y2 = B = (2-1) , Random Points = C = (2-3)

• X = Data Usage and Y = Number of Calls


• SQRT ( 1-2)^2 +(1-1)^2 = 1+0 = 1
• SQRT (2-2)^2 + (1-1)^2 = 0
• SQRT (2-2)^2 + (3-1)^2 = 2
• FORMING OF CLUSTERS
Clustering Analysis
STEP 2 : FINDING THE DISTANCE BETWEEN THE CLUSTER CENTRE AND THE SAMPLE POINTS

NAME DATA POINTS DISTANCE FROM DISTANCE FROM ASSIGNED


VI (2,1.33) V2 (3.67, 3.67)) CENTRE

A (1,1) (1.05) (3.78) V1

B (2,1) (0.33) (3.15) V1

C (2,3) (1.67) (1.8) V1

D (3,2) (1.204) (1.8) V1

E (4,3) (2.605) (0.75) V2

F (5,5) (4.74) (1.82) V2


CLUSTER ANALYSIS
• CLUSTER 1 (A,B,C,D)
• CLUSTER = (E,F)
• CLUSTER 1 = 1/ 4 (1,1) , (2,1) , (2,3) ( 3,2) = (2, 1. 75)
• CLUSTER 2 = 1/2 ( 4,3) ( 5,5) = 1/2 OF (9,8) = 4.5,4)
• CLUSTER ELEMENTS ARE NOT IN THE SAME AS PREVIOUS
CLUSTERS. ITERATION NEED TO BE DONE WITH THE NEW
CLUSTER POINTS.
FINAL CLUSTERS
• CLUSTER 1 (A,B,C,D)
• CLUSTER = (E,F)
NAME DATA POINTS DISTANCE FROM DISTANCE FROM ASSIGNED
VI (2, 1.75) V2 (4.5, 4 )) CENTRE

A (1,1) 1.25 4.61 V1

B (2,1) 0.75 3.9 V1

C (2,3) 1.25 2.69 V1

D (3,2) 1.03 2.5 V1

E (4,3) 2.36 1.12 V2

F (5,5) 4.42 1.12 V2


4. Chi – Square Test
Null Hypothesis : There is no relationship between Qualification and Marital Status of the Students.
Questionaire:
1. Marital Status of the respondents
A. Never Married
B. Married
C. Divorced
D. Widowed
 
2. Education Qualification of the Respondents
A. Middle School
B. High School
C. Bachelor’s
D.Master’s
E. PhD
Chi – Square Test
Qualification/ Marital Middle School HIGH SCHOOL Bachelor’s Master’s Ph.D Total
Status

Never Married 18 36 21 9 6 90

Married 12 36 45 36 21 150

Divorced 6 9 9 3 3 30

Widowed 3 9 9 6 3 30

Total 39 90 84 54 33 300
Expected Values
Qualification/ Marital Middle School HIGH SCHOOL Bachelor’s Master’s Ph.D Total
Status

Never Married (90*39)/300 27 25.2 16.2 9.9  


=11.7

Married 19.5 45 42 27 16.5  

Divorced 3.9 9 8.4 5.4 3.3  

Widowed 3.9 9 8.4 5.4 3.3  

Total            
Chi- Square Test
• Expected Values = ( Row Total *Column Total) /Grand Total
• Step 3 : Find out the Deviation from Actual values to Expected
values using the following formula:
• = (Observed Value – Expected Value)^2 / Expected Value
Chi – square Test
OBSERVED VALUE EXPECTED VALUE (0-E) (0-E)^2 (0-E)^2/E

18 11.7 6.3 39.69 3.39

12 19.5 9 81 3

6 3.9 -4.2 17.64 0.7


3 3.9 -7.2 51.84 3.2
36 27 -3.9 15.21 1.53
36 45 7.5 56.25 2.88
9 9 0 81 1.8
9 9 0 . .
21 25.2 . . .
45 42 . . .
9 8.4 . . .
9 8.4 0. 0.09 0.02
        23.57
Chi- Square Analysis
• DEGREES OF FREEDOM = (COLUMNS -1)* (ROWS-1)
• = (5-1) *(4-1) =12
Chi- Square Analysis
APPLICATION OF CHI- SQUARE TEST IN SPSS
Factor Analysis

© 2007 Prentice Hall


Factor Analysis Model
• The first set of weights (factor score coefficients) are chosen so that the
first factor explains the largest portion of the total variance.

• Then a second set of weights can be selected, so that the second factor
explains most of the residual variance, subject to being uncorrelated
with the first factor.

• This same principle applies for selecting additional weights for the
additional factors.
Statistics Associated with Factor Analysis
• Communality. Amount of variance a variable shares with all the other
variables. This is the proportion of variance explained by the common
factors.
• Eigenvalue. Represents the total variance explained by each factor.
• Factor loadings. Correlations between the variables and the factors.
• Factor matrix. A factor matrix contains the factor loadings of all the
variables on all the factors
Determine the Method of Factor Analysis

• In Principal components analysis, the total variance in the data is considered.


-Used to determine the min number of factors that will account for max variance in the
data.

• In Common factor analysis, the factors are estimated based only on the common
variance.
-Communalities are inserted in the diagonal of the correlation matrix.
-Used to identify the underlying dimensions and when the common variance is of
interest.
Determine the Number of Factors

• A Priori Determination. Use prior knowledge.


 
• Determination Based on Eigenvalues. Only factors with Eigenvalues greater than 1.0 are
retained.

• Determination Based on Scree Plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction. The point at which the scree begins denotes the
true number of factors.

• Determination Based on Percentage of Variance.


Rotation of Factors

• Through rotation the factor matrix is transformed into a simpler one that is easier to
interpret.

• After rotation each factor should have nonzero, or significant, loadings for only some of
the variables. Each variable should have nonzero or significant loadings with only a few
factors, if possible with only one.

• The rotation is called orthogonal rotation if the axes are maintained at right angles.
Rotation of Factors
• Varimax procedure. Axes maintained at right angles

-Most common method for rotation.


-An orthogonal method of rotation that minimizes the number of variables with high
loadings on a factor.
-Orthogonal rotation results in uncorrelated factors.
• Oblique rotation. Axes not maintained at right angles

-Factors are correlated.


-Oblique rotation should be used when factors in the population are likely to be
strongly correlated.
Interpret Factors

• A factor can be interpreted in terms of the variables that load


high on it.
• Another useful aid in interpretation is to plot the variables, using
the factor loadings as coordinates. Variables at the end of an
axis are those that have high loadings on only that factor, and
hence describe the factor.
Determine the Model Fit

• The correlations between the variables can be deduced from the


estimated correlations between the variables and the factors.

• The differences between the observed correlations (in the input


correlation matrix) and the reproduced correlations (estimated
from the factor matrix) can be examined to determine model fit.
These differences are called residuals.
Another Example of Factor Analysis
• To determine benefits from toothpaste
• Responses were obtained on 6 variables:
V1: It is imp to buy toothpaste to prevent cavities
V2: I like a toothpaste that gives shiny teeth
V3: A toothpaste should strengthen your gums
V4: I prefer a toothpaste that freshens breath
V5: Prevention of tooth decay is not imp
V6: The most imp consideration is attractive teeth
• Responses on a 7-pt scale (1=strongly disagree; 7=strongly agree)
Another Example of Factor Analysis
Table 19.1 RES P ONDENT
NUMB ER V1 V2 V3 V4 V5 V6
1 7 .0 0 3 .0 0 6 .0 0 4 .0 0 2 .0 0 4 .0 0
2 1 .0 0 3 .0 0 2 .0 0 4 .0 0 5 .0 0 4 .0 0
3 6 .0 0 2 .0 0 7 .0 0 4 .0 0 1 .0 0 3 .0 0
4 4 .0 0 5 .0 0 4 .0 0 6 .0 0 2 .0 0 5 .0 0
5 1 .0 0 2 .0 0 2 .0 0 3 .0 0 6 .0 0 2 .0 0
6 6 .0 0 3 .0 0 6 .0 0 4 .0 0 2 .0 0 4 .0 0
7 5 .0 0 3 .0 0 6 .0 0 3 .0 0 4 .0 0 3 .0 0
8 6 .0 0 4 .0 0 7 .0 0 4 .0 0 1 .0 0 4 .0 0
9 3 .0 0 4 .0 0 2 .0 0 3 .0 0 6 .0 0 3 .0 0
10 2 .0 0 6 .0 0 2 .0 0 6 .0 0 7 .0 0 6 .0 0
11 6 .0 0 4 .0 0 7 .0 0 3 .0 0 2 .0 0 3 .0 0
12 2 .0 0 3 .0 0 1 .0 0 4 .0 0 5 .0 0 4 .0 0
13 7 .0 0 2 .0 0 6 .0 0 4 .0 0 1 .0 0 3 .0 0
14 4 .0 0 6 .0 0 4 .0 0 5 .0 0 3 .0 0 6 .0 0
15 1 .0 0 3 .0 0 2 .0 0 2 .0 0 6 .0 0 4 .0 0
16 6 .0 0 4 .0 0 6 .0 0 3 .0 0 3 .0 0 4 .0 0
17 5 .0 0 3 .0 0 6 .0 0 3 .0 0 3 .0 0 4 .0 0
18 7 .0 0 3 .0 0 7 .0 0 4 .0 0 1 .0 0 4 .0 0
19 2 .0 0 4 .0 0 3 .0 0 3 .0 0 6 .0 0 3 .0 0
20 3 .0 0 5 .0 0 3 .0 0 6 .0 0 4 .0 0 6 .0 0
21 1 .0 0 3 .0 0 2 .0 0 3 .0 0 5 .0 0 3 .0 0
22 5 .0 0 4 .0 0 5 .0 0 4 .0 0 2 .0 0 4 .0 0
23 2 .0 0 2 .0 0 1 .0 0 5 .0 0 4 .0 0 4 .0 0
24 4 .0 0 6 .0 0 4 .0 0 6 .0 0 4 .0 0 7 .0 0
25 6 .0 0 5 .0 0 4 .0 0 2 .0 0 1 .0 0 4 .0 0
26 3 .0 0 5 .0 0 4 .0 0 6 .0 0 4 .0 0 7 .0 0
27 4 .0 0 4 .0 0 7 .0 0 2 .0 0 2 .0 0 5 .0 0
28 3 .0 0 7 .0 0 2 .0 0 6 .0 0 4 .0 0 3 .0 0
29 4 .0 0 6 .0 0 3 .0 0 7 .0 0 2 .0 0 7 .0 0
30 2 .0 0 3 .0 0 2 .0 0 4 .0 0 7 .0 0 2 .0 0
Correlation Matrix
Table 19.2
Variables V1 V2 V3 V4 V5 V6
V1 1.000
V2 -0.530 1.000
V3 0.873 -0.155 1.000
V4 -0.086 0.572 -0.248 1.000
V5 -0.858 0.020 -0.778 -0.007 1.000
V6 0.004 0.640 -0.018 0.640 -0.136 1.000
Results of Principal Components Analysis
Table 19.3
Bartlett’s Test
Apprx. chi-square=111.3, df=15, significance=0.00
Kaiser-Meyer-Olkin msa=0.660
Communalities
Variables Initial Extraction
V1 1.000 0.926
V2 1.000 0.723
V3 1.000 0.894
V4 1.000 0.739
V5 1.000 0.878
V6 1.000 0.790

Initial Eigen values


Factor Eigen value % of variance Cumulat. %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
6 0.085 1.420 100.000
Results of Principal Components Analysis

Table 19.3, cont.


Extraction Sums of Squared Loadings
Factor Eigen value % of variance Cumulat. %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
Factor Matrix
Variables Factor 1 Factor 2
V1 0.928 0.253
V2 -0.301 0.795
V3 0.936 0.131
V4 -0.342 0.789
V5 -0.869 -0.351
V6 -0.177 0.871

Rotation Sums of Squared Loadings


Factor Eigenvalue % of variance Cumulat. %
1 2.688 44.802 44.802
2 2.261 37.687 82.488
Results of Principal Components Analysis

Table 19.3, cont.


Rotated Factor Matrix
Variables Factor 1 Factor 2
V1 0.962 -0.027
V2 -0.057 0.848
V3 0.934 -0.146
V4 -0.098 0.845
V5 -0.933 -0.084
V6 0.083 0.885

Factor Score Coefficient Matrix


Variables Factor 1 Factor 2
V1 0.358 0.011
V2 -0.001 0.375
V3 0.345 -0.043
V4 -0.017 0.377
V5 -0.350 -0.059
V6 0.052 0.395
Results of Principal Components Analysis

Table 19.3, cont.

-The lower-left triangle is correlation matrix;


-The diagonal has the communalities;
-The upper-right triangle has the residuals between the
observed correlations and the reproduced correlations.

Factor Score Coefficient Matrix


Variables V1 V2 V3 V4 V5 V6
V1 0.926 0.024 -0.029 0.031 0.038 -0.053
V2 -0.078 0.723 0.022 -0.158 0.038 -0.105
V3 0.902 -0.177 0.894 -0.031 0.081 0.033
V4 -0.117 0.730 -0.217 0.739 -0.027 -0.107
V5 -0.895 -0.018 -0.859 0.020 0.878 0.016
V6 0.057 0.746 -0.051 0.748 -0.152 0.790
Factor Matrix Before and After Rotation
Fig. 19.4

Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
(a) (b)
High Loadings High Loadings
Before Rotation After Rotation
7.ANOVA – ONE WAY ANOVA
Two or three different Populations, we can apply Anova. For
example: To assess the possible variation in the performance in
a test between the convent schools of a city A, B,C
A B C
Performance of Students
9 13 14

11 12 13

13 10 17

9 15 7

8 5 9
APPLICATION OF ONE WAY ANOVA IN SPSS
ANOVA – TWO WAY ANOVA
8.APPLICATION OF TWO WAY ANOVA IN SPSS

You might also like