Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Multivariate Analysis

Multivariate analysis

Uploaded by

vibhor.gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Multivariate Analysis

Multivariate analysis

Uploaded by

vibhor.gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Multivariate

Analysis
Techniques
Introduction

All statistical techniques which simultaneously analyse more than two variables on a
sample of observations can b categorized as multivariate techniques.

Multi variate analysis is collection of methods for analysing data in which a number of
observations are available for each object.

These techniques are being applied in many fields such as economics, sociology,
psychology, agriculture, anthropology, biology and medicine.

These techniques are used when the variables concerning research studies of these fields
are supposed to be correlated with each other and rigorous probabilistic models cannot
be appropriately used.

Application of multivariate techniques in practice have been accelerated in modern times


because of the advent of high speed electronic computers.
Characteristics and Applications

Multivariate techniques are largely empirical and deal with


the reality

The posses the ability to analysis complex data. Multivariate


techniques represent a collection of massive data in
simplified way.

It also helps in various types of decision making.


Types of Multivariate Techniques
• Dependent Methods: They are used when data containing both
dependent and independent variables. This includes techniques like
multiple regression analysis, multiple discriminant analysis,
multivariate analysis of variance and canonical analysis.
• Interdependent Methods: They are used when data contains several
variables without dependency relationship. This includes techniques
like factor analysis, cluster analysis, multidimensional scaling or MDS
and latent structure analysis
Classification of Multivariate Techniques
All
multivariate
Methods

Are some
variables
dependent?

Yes
No

Dependent Interdepende
Methods nt Methods
Dependen
t Methods

How many
variables
dependent
?

One Several

Is it Are they
Metric? Metric?

Yes No Yes No

Multiple Multivariate
Multiple Canonical
discriminant Analysis of
Regression
Analysis Variance Analysis
Interdependent
Methods

Are input
metric?

Yes No

Latent
Cluster
Factor Analysis Metric MDS Non Metric MDS Structure
Analysis
analysis
Multiple Regression
• Multiple regression is an extension of simple linear regression.
• It is used when we want to predict the value of a variable based on
the value of two or more other variables.
• The variable we want to predict is called the dependent variable (or
sometimes, the outcome, target or criterion variable).
• The variables we are using to predict the value of the dependent
variable are called the independent variables (or sometimes, the
predictor, explanatory or regressor variables).
Example
• You could use multiple regression to understand whether exam
performance can be predicted based on revision time, test anxiety,
lecture attendance and gender.
• Alternately, you could use multiple regression to understand whether
daily cigarette consumption can be predicted based on smoking
duration, age when started smoking, smoker type, income and
gender.
Characteristics of Multiple Regression
• Multiple regression also allows you to determine the overall fit
(variance explained) of the model and the relative contribution of
each of the predictors to the total variance explained.
• For example, you might want to know how much of the variation in
exam performance can be explained by revision time, test anxiety,
lecture attendance and gender "as a whole", but also the "relative
contribution" of each independent variable in explaining the variance.
Multiple Discriminant Analysis (MDA)
• Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction
technique.
• It has been used to predict signals as diverse as neural memory traces and corporate failure.
• MDA is not directly used to perform classification.
• It merely supports classification by yielding a compressed signal amenable to classification.
• This technique reduces the differences between some variables so that they can be classified
in a set number of broad groups, which can then be compared to another variable.
• Multiple discriminant analysis is related to discriminant analysis, which helps classify a data
set by setting a rule or selecting a value that will provide the most meaningful separation.
• MDA has been used to reveal neural codes.
Example
• An analyst who wants to select securities based on values that
measure volatility and historical consistency might use multiple
discriminant analysis to factor out other variables such as price.
Characteristics Of MDA
• MDA is sensitive to outliers. These should be identified and
treated accordingly.
• MDA is only suitable when evaluating the variables' ability
to linearly discriminate between any grouping.
• Highly correlated variables will contribute very similarly to an MDA solution
and may be redundant. Thus, variables that are uncorrelated are preferable.
• While unequal group sizes can be tolerated, very large differences in group
sizes can distort results, particularly if there are very few (< 20) objects per
group.
• If MANOVA tests on a given set of explanatory variables are insignificant,
MDA is unlikely to be useful.
• When interpreting the coefficients of a discriminant function, carefully
distinguish between standardised and unstandardised coefficients.
• Heteroscedasticity (differing dispersion)is likely to lead to invalid
significance tests.
• Across implementations, the absolute values of discriminant weights may
vary due to different scaling and standardisation approaches, but their
relative proportions should be the same.
Multivariate analysis of variance
(MANOVA)
• Multivariate analysis of variance (MANOVA) is an extension of the univariate
analysis of variance (ANOVA).
• In an ANOVA, we examine for statistical differences on one continuous dependent variable by an
independent grouping variable.
• The MANOVA extends this analysis by taking into account multiple continuous dependent
variables, and bundles them together into a weighted linear combination or composite variable.
• The MANOVA will compare whether or not the newly created combination differs by the
different groups, or levels, of the independent variable.
• In this way, the MANOVA essentially tests whether or not the independent grouping variable
simultaneously explains a statistically significant amount of variance in the dependent variable.
Examples
• Do the various school assessments vary by grade level?
• Do the rates of graduation among certain state universities differ by
degree type?
• Which diseases are better treated, if at all, by either X drug or Y
drug?
Canonical Correlation Analysis
• Canonical correlation analysis is used to identify and measure the
associations among two sets of variables.
• Canonical correlation is appropriate in the same situations where
multiple regression would be, but where, there are multiple
intercorrelated outcome variables.
• Canonical correlation analysis determines a set of canonical variates,
orthogonal linear combinations of the variables within each set that
best explain the variability both within and between sets.
Examples
• Example 1. A researcher has collected data on three psychological variables, four
academic variables (standardized test scores) and gender for 600 college freshman. She
is interested in how the set of psychological variables relates to the academic variables
and gender. In particular, the researcher is interested in how many dimensions
(canonical variables) are necessary to understand the association between the two sets
of variables.
• Example 2. A researcher is interested in exploring associations among factors from two
multidimensional personality tests, the MMPI and the NEO. She is interested in what
dimensions are common between the tests and how much variance is shared between
them. She is specifically interested in finding whether the neuroticism dimension from
the NEO can account for a substantial amount of shared variance between the two tests.
Factor Analysis
• Factor analysis is a technique that is used to reduce a large number of variables into
fewer numbers of factors.
• This technique extracts maximum common variance from all variables and puts them into
a common score.
• As an index of all variables, we can use this score for further analysis.
• Factor analysis is part of general linear model (GLM)
• This method also assumes several assumptions: there is linear relationship, there is no
multicollinearity, it includes relevant variables into analysis, and there is true correlation
between variables and factors.
• Several methods are available, but principle component analysis is used most commonly.
Example
• Suppose a psychologist has the hypothesis that there are two kinds of intelligence, "verbal
intelligence" and "mathematical intelligence", neither of which is directly observed.
Evidence for the hypothesis is sought in the examination scores from each of 10 different
academic fields of 1000 students. If each student is chosen randomly from a large
population, then each student's 10 scores are random variables. The psychologist's
hypothesis may say that for each of the 10 academic fields, the score averaged over the
group of all students who share some common pair of values for verbal and mathematical
"intelligences" is some constant times their level of verbal intelligence plus another
constant times their level of mathematical intelligence, i.e., it is a combination of those two
"factors". The numbers for a particular subject, by which the two kinds of intelligence are
multiplied to obtain the expected score, are posited by the hypothesis to be the same for all
intelligence level pairs, and are called "factor loading" for this subject.
Cluster Analysis
• It is a class of techniques that are used to classify objects or cases into
relative groups called clusters.
• Cluster analysis is also called classification analysis or numerical taxonomy.
• In cluster analysis, there is no prior information about the group or cluster
membership for any of the objects.
• Cluster analysis involves formulating a problem, selecting a distance
measure, selecting a clustering procedure, deciding the number of clusters,
interpreting the profile clusters and finally, assessing the validity of
clustering.
Example
• Cluster Analysis has been used in marketing for various purposes.
Segmentation of consumers in cluster analysis is used on the basis of
benefits sought from the purchase of the product. It can be used to
identify homogeneous groups of buyers.
Multidimensional Scaling (MDS)
• Multidimensional scaling (MDS) can be considered to be an
alternative to factor analysis.
• In general, the goal of the analysis is to detect meaningful underlying
dimensions that allow the researcher to explain observed similarities
or dissimilarities (distances) between the investigated objects.
• In factor analysis, the similarities between objects (e.g., variables)
are expressed in the correlation matrix. With MDS, you can analyze
any kind of similarity or dissimilarity matrix, in addition to correlation
matrices.
Example
• Suppose we take a matrix of distances between major US cities from a map. We
then analyze this matrix, specifying that we want to reproduce the distances based
on two dimensions. As a result of the MDS analysis, we would most likely obtain a
two-dimensional representation of the locations of the cities, that is, we would
basically obtain a two-dimensional map.
• In general then, MDS attempts to arrange "objects" (major cities in this example) in
a space with a particular number of dimensions (two-dimensional in this example)
so as to reproduce the observed distances. As a result, we can "explain" the
distances in terms of underlying dimensions; in our example, we could explain the
distances in terms of the two geographical dimensions: north/south and east/west.
Latent Structure Analysis
• Latent structure analysis is a general class of methods that involve manifest and latent variables that are
continuous or categorical.
• Manifest variables (directly measured or observed) are observed and are usually used as measures of the latent
variables.
• Latent variables are not observed and are the constructs of interest in a theory.
• When the latent variables are continuous, the models are known as structural equation models, which have been
widely used in a number of disciplines, such as psychology, education, biology, and medicine.
• When the latent variables are discrete, the models are known as latent class models, which have been widely
used in sociology and to a lesser extent in psychology.
• In both cases, the manifest variables can be treated as being either continuous or discrete (with ordered or un-
ordered categories).
• Recent developments include models that combine aspects of latent class analysis and structural equation
modeling.

You might also like