Multivariate Analysis: Are Some of The Variables Dependent On Others?
Multivariate Analysis: Are Some of The Variables Dependent On Others?
Yes No
Interdependence
Dependence Methods
Methods
If a multivariate technique attempts to explain or predict the dependent variables on the
basis of 2 or more independent then we are analyzing dependence. Multiple regression
analysis, multiple discriminant analysis, multi-variate analysis of variance and canonical
correlation analysis are all dependence methods.
Analysis of Interdependence
The goal of interdependence methods is to give meaning to a set of variables or to seek to
group things together. No one variable or variable subset is to be predicted from the
others or explained by them. The most common of these methods are factor analysis,
cluster analysis and multidimensional scaling. A manager might utilize these techniques
to identify profitable market segments or clusters. Can be used for classification of
similar cities on the basis of population size, income distribution etc;
As in other forms of data analysis, the nature of measurement scales will determine
which MV technique is appropriate for the data. The exhibits below show the selection of
MV technique requires consideration of the types of methods for dependent and
interdependent variables.
Non-metric- nominal and ordinal scales
Metric- Interval and ratio scales
Exhibit 1 – independent variable is metric
Classification of dependence Methods
How many Variables
are dependent?
Multiple dependent
One dependent Several dependent
and independent
variable variables
variables
Multiple discriminant
Multiple regression MANOVA Conjoint analysis
analysis
ANALYSIS OF DEPENDENCE
Multiple Regression analysis is an extension of bivariate regression analysis, which
allows for the simultaneous investigation of the effect of two or more independent
variables on a single interval-scaled dependent variable. In reality several factors are
likely to affect such a dependent variable.
An example of a multiple regression equation is
Y = a + B1X1 + B2X2 + B3X3+ …………..BnXn + e
Where B0= a constant, the value of Y when all X values =0
Bi= slope of the regression surface, B represents the regression coefficient associated
with each X
E= an error term, normally distributed about a mean of 0
Let us look at a forecasting example. Suppose a toy manufacturer wishes to forecast sales
by sales territory. It is thought that competitor’s sales, the presence or absence of a
company’s salesperson in the territory (a binary variable) and grammar school enrollment
are the independent variables that might explain the variation in the sales of a toy. The
data is fit in and the results from the mathematical computations are as follows
Y = 102.18 + 387X1 + 115.2X2 + 6.73X3
R2 = 0.845
F value 14.6
The regression equation indicates sales are positively related to X1 and X2 and X3
The coefficients B show the effects on the dependent variables of unit increases in any
independent variable. The value of B2 = 115.2 indicates that an increase of Rs 115,200 in
toy sales is expected with an additional unit of X2. Thus it appears that adding a company
salesperson has a very positive effect on sales. Grammar school enrollments also help
predict sales. An increase in 1 enrollment of students ( 1000) indicates a sales increase of
Rs 6730. A 1 unit increase in competitor sales volume X1 does not add much to the toy
manufacturer’s sales.
The regression coefficient can either be stated in raw score units (actual X values) or as
standardized coefficients(X values in terms of their standard deviation. When regression
coefficient are standardized they are called as beta weights B an their values indicate the
relative importance of the associated X values especially when predictors are unrelated. If
B1= .60 and B2 = .20 then X1 has three times the influence on Y as X2
In multiple regression the coefficients B1 and B2 etc are called coefficients of partial
regression because the independent variables are correlated with other independent
variables. The correlation between Y and X1, with the correlation that X1 and X2 have in
common with Y held constant is partial correlation. Because the partial correlation
between sales and X1 has been adjusted for the effect produced by variation produced in
X2(and other independent variables) , the coefficient of correlation coefficient obtained
from the bivariate regression will not be the same as the partial coefficient in the multiple
regression coefficient. N multiple regression the coefficient B1 is defined a partial
regression coefficient for which the other independent variables are held constant.
The coefficient of multiple determination indicates the percentage of variation in Y
explained by the variation in the independent variables. R2 = .845 tells us that the
variation in the independent accounted for 64.5% of the variance in the dependent
variable. Adding more of the independent variables in the equation explains more of the
variation in Y.
To test for statistical significance an F-test comparing the different sources of variation is
necessary. The F test allows for testing the relative magnitudes of the sum of squares due
to the regression (SSe) and the error sum of squares (SSr) with their appropriate degrees
of freedom
F = (SSr)/k/ (SSe) / (n-k-1)
K= nos of independent variables
N= nos of respondents or observations.
Refer F tables and test hypothesis at .05 level of significance
In the eg F ratio = 14.6
df = df for numerator =k =3
df for denominator n-k-1 = 8
Accept or reject H0 on the basis of comparison between calculated and table value
A continuous interval-scaled dependent variable is required in multiple regression as in
bivariate regression, interval scaling is also required for the independent variables.
However dummy variables such as the binary variable in our example may be utilized. A
dummy variable is one that has two or more distinct levels 0 and 1
Multiple regression is used as a descriptive tool in three types of situations
1. It is often used to develop a self-weighting estimating equation by which to predict
values for a criterion variable (DV) from the values of several predictor variables
(IV)
2. A descriptive application of multiple reg calls for controlling for confounding
variables to better evaluate the contribution of other variables- control brand and
study effect of price alone
3. To test and explain casual theories-referred to as Path analysis-reg is used to
describe an entire structure of linkages that have advanced from casual theories
4. Used as an inference tool to test hypotheses and estimate population values
Let us look at the following eg for SPSS
Let us assume that we use multiple regression to arrive at key drivers of customer usage
for hybrid mail. Among the explanatory variables are customer perceptions of (1) cost
speed valuation, (2) security, (3) reliability, (4) receiver technology, (5) Impact/emotional
value. Let us choose the first 3 variables all measured on a 5 point scale
Y=customer usage
X1=cost/speed evaluation
X2= security
X3= reliability
SPSS computed the model and the regression coeff. Eqn can be built with
1. specific variables
2. all variables
3. select a method that sequentially adds or removes variables.. Forward selection
starts with the constant and variables that result in large R2 increases. Backward
elimination begins with a model containing all independent var and removes var
that changes R2 the least. Stepwise selection, most popular, combines the two.
The independent var that contributes the most to explaining the dependent var is
added first. Subsequent var are added based on their incremental contribution over
the first var whenever they meet the criterion of entering the Eqn (eg a level of sig
of .01. var may be removed at each step if they meet the removal criterion which
is larger sig level than for entry
The std elements of a step-wise important indicator of the relative importance of
predictor variables output are shown in exhibit
DISCRIMINANT ANALYSIS
In a myriad of situations the researcher’s purpose is to classify objects by a set of
independent variables into two or more exclusively categories. A manager might want to
distinguish between applicants as those to hire and not to hire. The challenge is to find
the discriminating variables to be utilized in a predictive equation that will produce better
than chance assignment of the individuals to the groups.
The prediction of a categorical variable (rather than a continuous interval-scaled variable
as in multiple regressions) is the purpose of multiple discriminant analysis. In each of the
above problems the researcher must determine which variables are associated with the
probability of an object falling into a particular group. In a statistical sense the problems
of studying the direction of group differences is the problem of finding a linear
combination of independent variables, the discriminant function that shows large
differences as group means. Discriminant analysis is a statistical tool for determining
such linear combinations. Deriving the coefficients of a linear function is the task of a
researcher.
We will consider a two group discriminant analysis problem where the dependent
variable Y is measured on a nominal scale (n way discriminant analysis is possible)
Suppose a personnel manager believes that it is possible to predict whether an applicant
will be successful on the basis of age, sales aptitude test scores and mechanical ability
test scores stated at the outset, the problem is to find a linear combination of the
independent variables that shows large differences in group means. The first task is to
estimate the coefficients of the individuals discriminant scores. The following linear
function is used
Zi =b1X1i +b2x2i +………+ bnXni
Where
Xni = applicant’s value on the nth independent variable
bn= discriminant function for the nth variable
Zi = ith applicants discriminant score
Using scores for all individuals in the sample, a discriminate function is determined based
on the criterion that the groups be maximally discriminated on the set of independent
variables. Returning to the example with three independent variables, suppose the
personnel manager calculates the standardized weights in the equation to be
Z = b1X1 +b2X2+b3X3
= .069X1 + .013X2 +.0007X3
This means that age (X1) is much more important than the sales aptitude test scores(X2)
and mechanical ability (X3) has relatively less discriminating power.
In the computation of linear discriminant function weights are assigned to the variables
such that the ratio of difference between the means of the two groups to the std dev
within the group is maximized. The standardised discriminant coefficients of weights
provide information about the relative importance of each of these variables in
discriminating between these groups.
An important goal of discriminant analysis is to perform a classification function. The
object of classification in our example is to predict which applicants will be successful
and which will be unsuccessful and to group them accordingly. To determine if
discriminant anlysis can be used as a good predictor information provided in the
‘confusion matrix’is utilized. Suppose the personnel manager has 40 successful and 45
unsuccessful employees in the sample
Confusion Matrix
Predicted Group
Successful 34 6 40
Unsuccessful 7 38 45
The confusion matrix shows that the number of correctly classified employees (76%) is
much higher than would be expected by chance. Tests can be performed to determine if
the create of correct classification is statistically significant.
A second example will allow us to portray DA from a graphic perspective. Suppose a
bank loan officer wants to segregate corporate loan applicants into those likely to default
and not o default. Assume that some data is available on a group of firms that went
bankrupt and another that did not. For simplicity we assume that only current ratio and
debt/asset ratio are analysed. The ratio for the sample firms are given.
The data in the table have been plotted in the graph. Xs represent firms that remained
solvent. For eg Point A in the upper left segment is the point for firm 2 which had a
current ratio of 3.0 and debt/asset ratio of 20% .The dot at point A indicates that the firm
did not go bankrupt. From a graphic perspective we construct a boundary line (the
discriminant function) through the graph such that if a firm is to the left of the line it is
not likely to become insolvent. In our example the line takes this form
Z= a + b1(current ratio) +b2(debt/asset ratio)
Here a is a constant term and b1 and b2 indicate the effect that the current ratio and the
debt/asset ratio have on the probability of a firm going bankrupt.
The following discriminant function is obtained
INTERDEPENDENCE METHODS
Are Inputs metric?
Metric-scales are ratio or interval Non metric-the scales are nominal or ordinal
Cluster Analysis
Multi-dimensional scaling
Factor Analysis
FA is a general term for several specific computational techniques. It has the objective of
reducing to a manageable number many var that belong together and have many
overlapping measurement characteristics. The predictor-criterion relationship that was
found in the dependence situation is replaced by a matrix of inter correlations among
several variables, none of which viewed as dependent on the other. For eg, one may have
data on 100 employees with scores on 6 attitude scale items.
Method
FA begins with the construction of new set of var based on the relationships in the
correlation matrix. While this can be done in a number of ways, the frequently used
approach is the Principal Component Analysis. This method transforms a set of var into a
new set of composite var or principal components that are not correlated with one
another. These linear combinations of var called factors, account for the variance in the
data as a whole. The best combinations makes up the first principal component. The
second principal component is defined is defined as the best linear combination of var for
the variance not explained by the first factor. In turn there may be third, fourth and kth
component, each being the best of linear combination of variables not accounted for by
the previous factors.
The process continues until all the var have been accounted for but is usually stopped
after a few fac have been extracted.
The values in the table are cor coeff bet fac and the var. (.70 is the r bet fac 1 and var
A)The cor coeff are called as loadings Eigen values are the sum of the variances of the
fac values (.70sq+.60Sq---+.60Sq). When divided by the nos of var aneigen value yields
an estimate of tha amt of total var explained by the fac. Eg fac 1 accounts for36% of the
tot var. The col hsq gives the communalities or estimates of the var in each var explained
by 2 other fac. With var A the communality is .70sq=(-.40) sq =.65, indicating that that
65% of the variance in A is statistically explained in terms of fac 1&2
In this case the unratated fac loadings are not enlightening. We need to find some pattern
in fac 1 which would have ahigh r on some var and fac II on others. We can attempt to
secure this less ambiguous condition bet fac nad var by rotation. This procedure can be
carried out by either orthogonal or oblique methods.
The interpretation of fac loadings is Lrgely subjective. There is no way to calculate the
meaning of the fac, they are what one sees them. For this reason fac analysis is largely
use dfor exploration. One can detect patterns in latent var, discover new concepts and
reduce data.
In order to further clarify the fac, a varimax rotation issued to secure the matrix. Varimax
can clarify relationships but interpretation is largely subjective
CLUSTER ANALYSIS
CA is a technique of grouping similar objects or prople. CA shares some similarities with
FA, especially when FA is applied to people instead of var. It differs from discriminant
analysis in that DA begins with a well define group composed of 2 or more distinct set of
charac in search of a set of var to seprate them. Ca starts with an undifferentiated grp of
people, events or objects and attempts to reorganize them into homo subgrps
Method
5 steps are basic to the application of cluster studies
1. Selection of the sample to be clustered (eg buyers, employees etc)
2. Definition of the var on which to measure the objects, events or people(financial
status, political affiliation etc)
3. Computation of similarities among the entities through correlation and other
techniques
4. Selection of mutually exclusive clusters(maxn of within cluster similarity and
between cluster differences) or hierarchically arranged clusters)
5. Cluster comparison and validation
Different clustering methods can and do produce different solutions. It is important to
have enough information about the data to know when the derived groups are real and not
merely imposed on the data by the method
CA can be used to plan marketing campaigns and develop strategies.
MULTIDIMENSIONAL SCALING
MDS creates a special description of a respondent’s perception about a product, service
or any other product of interest. This helps the business researcher to understand difficult-
to-measure constructs such as product quality or desirability. In contrast to var that can be
measured directly many constructs are perceived and cognitively mapped in different
ways by individuals. With MDS items that are perceived to be similar will fall close
together on MD space and items that are dissimilar will be farther apart.
METHOD
We may think of 3 type of attribute space, each representing a MD map.
1. Objective base, in which an object can be positioned in terms of its measureable
attributes; flavour, weight, nutritional value
2. Subjective space: perceptions about the objects flavour, weight and nutritional
value can be positioned. Obj and sub attribute assessments may coincide but often
they do not. A comparison of the 2 allows us to judge how accurately an objective
is being perceived. Individuals may hold different perceptions of an object
simultaneously and these may be averaged to present a summary measure of
perception. A person’ perception may vary overtime and in different
circumstances. Such measurements are valuable to gauge the impact of various
perception-affecting actions such as advertising programmes
3. Describe respondent’s preferences using the object’s attributes. This represents
their ideal. All objects close to this ideal point are interpreted as preferred by
respondents to those that are more distant. Ideal points from many people can be
positioned in this preference space to reveal the pattern and size of preference
clusters. These can be compared to subjective space to how well the preferences
correspond to perception clusters. In this way CA and MDS can be combined to
map market segments and then design products designed for those segments