Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
154 views

01 - Multivariate - Introduction To Multivariate Analysis

This document provides an introduction to multivariate analysis techniques. It discusses what multivariate analysis is, how emerging factors like big data, algorithmic models, and causal inference impact multivariate analysis. It also covers the nature of measurement scales and their relationship to techniques used, understanding measurement error, and how to determine the appropriate technique based on the research problem and data. Key multivariate techniques discussed include exploratory factor analysis, cluster analysis, and multiple regression.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
154 views

01 - Multivariate - Introduction To Multivariate Analysis

This document provides an introduction to multivariate analysis techniques. It discusses what multivariate analysis is, how emerging factors like big data, algorithmic models, and causal inference impact multivariate analysis. It also covers the nature of measurement scales and their relationship to techniques used, understanding measurement error, and how to determine the appropriate technique based on the research problem and data. Key multivariate techniques discussed include exploratory factor analysis, cluster analysis, and multiple regression.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

INTRODUCTION to

MULTIVARIATE ANALYSIS
MULTIVARIATE for BUSINESS
MANAJEMEN
Semester Ganjil 2022/2023
What is Multivariate Analysis?
• Multivariate data analysis is a powerful tool for researchers.
• Multivariate analysis techniques are popular because they enable
organizations to create knowledge and thereby improve their decision-
making.
• Multivariate analysis refers to all statistical techniques that
simultaneously analyze multiple measurements on individuals or objects
under investigation.
• Thus, any simultaneous analysis of more than two variables can be
considered multivariate analysis.
The Implications of Big Data, The Emergence
of Algorithmic Models and Causal Inference on
Multivariate Analysis
• Three emerging factors in the domain of analytics have created a very different
and constantly changing environment for researchers over the past decade and
will continue in the future.
• The emergence of Big Data has fundamentally influenced aspects include the
abundance of data (both variables and observations), changes in the
fundamental characteristics of data now available, and the increased desire for
data-driven decisions within all types of organizations.
• Along with the emergence of Big Data there is more widespread use of
algorithmic models where the emphasis is on prediction rather than explanation.
• Finally, the abundance of data and the addition of entirely new sets of
techniques has compelled researchers to strive for more causal inference in their
analyses in order to avoid capturing spurious correlations that can result in
invalid conclusions.
The Nature of Measurement Scales and Their
Relationship to Multivariate Techniques
• Data analysis involves the identification and measurement of
variation in a set of variables, either among themselves or between a
dependent variable and one or more independent variables.
• The key word here is measurement because the researcher cannot
identify variation unless it can be measured.
• Measurement is important in accurately representing the research
concepts being studied and is instrumental in the selection of the
appropriate multivariate method of analysis.
• Data can be classified into one of two categories—nonmetric
(qualitative) and metric (quantitative)—based on the type of
attributes or characteristics they represent. The researcher must
define the measurement type for each variable.
Understand the Nature of Measurement
Error and Its Impact on Multivariate
Analysis
• Use of multiple variables and reliance on their combination (the
variate) in multivariate methods focuses attention on a
complementary issue: measurement error.
• Measurement error is the degree to which the observed values
are not representative of the “true” values.
• Measurement error has many sources, ranging from data entry
errors to the imprecision of the measurement and the inability of
respondents to accurately provide information.
• When variables with measurement error are used to compute
correlations or means, the “true” effect is partially masked by the
measurement error, causing the correlations to weaken and the
means to be less precise.
Determine Which Multivariate Technique
is Appropriate for a Specific Research
Problem
The multivariate techniques can be classified based on three judgments
the researcher must make about the research objective and nature of the
data:
1) Can the variables be divided into independent and dependent
classifications based on some theory?
2) If they can, how many variables are treated as dependent in a single
analysis?
3) How are the variables, both dependent and independent, measured?
Selection of the appropriate multivariate technique depends on the
answers to these three questions.
Types of Multivariate Techniques
• Interdependence Techniques
• Exploratory Factor Analysis: Principal components and common factor analysis
• Cluster analysis
• Dependence Techniques
• Multiple regression and multiple correlation
• Multivariate analysis of variance and covariance
• Multiple discriminant analysis
• Logistic regression
• Structural equation modeling and confirmatory factor analysis
• Partial least squares structural equation modeling and confirmatory composite analysis
• Canonical correlation analysis
• Conjoint analysis
• Perceptual mapping, also known as multidimensional scaling
• Correspondence analysis
What type of
relationship is
being
examined?
Dependence Interdependence

Multiple Several One Variables Case/ Objects


relationships dependent dependent respondents
of dependent variables in variable in a
Exploratory
and single single Factor
Multidimen
Cluster sional
independent relationship relationship Analysis
analysis Scaling
variables
Metric Nonmetric Metric Nonmetric Confirmator
Correspond
y Factor
ence
SEM Analysis
analysis
Canonical Multiple Multiple
Canonical
Analysis Regression Discriminan
with
dummy var. t Analysis
Conjoint
MANOVA Analysis
Logistic
Regression
EXPLORATORY FACTOR ANALYSIS:
Principal Components And Common Factor
Analysis (1)
• Exploratory factor analysis, including both principal component analysis
and common factor analysis, is a statistical approach that can be used to
analyze interrelationships among a large number of variables and to
explain these variables in terms of their common underlying dimensions
(factors).
• The objective is to find a way of condensing the information contained in
a number of original variables into a smaller set of variates (factors) with
a minimal loss of information.
• By providing an empirical estimate of the structure of the variables
considered, exploratory factor analysis becomes an objective basis for
creating summated scales.
EXPLORATORY FACTOR ANALYSIS:
Principal Components And Common Factor
Analysis (2)
• A researcher can use factor analysis, for example, to better understand
the relationships between customers’ ratings of a fast-food restaurant.
• Assume you ask customers to rate the restaurant on the following six
variables: food taste, food temperature, freshness, waiting time,
cleanliness, and friendliness of employees.
• The analyst would like to combine these six variables into a smaller
number.
• By analyzing the customer responses, the analyst might find that the
variables food taste, temperature, and freshness combine together to
form a single factor of food quality, whereas the variables waiting time,
cleanliness, and friendliness of employees combine to form another
single factor, service quality.
CLUSTER ANALYSIS (1)
• Cluster analysis is an analytical technique for developing meaningful subgroups of
individuals or objects.
• Specifically, the objective is to classify a sample of entities (individuals or objects) into a
small number of mutually exclusive groups based on the similarities among the entities.
• In cluster analysis, unlike discriminant analysis, the groups are not predefined.
• Instead, the technique is used to identify the groups.
• Cluster analysis usually involves at least three steps. The first is the measurement of
some form of similarity or association among the entities to determine how many
groups really exist in the sample.
• The second step is the actual clustering process, whereby entities are partitioned into
groups (clusters).
• The final step is to profile the persons or variables to determine their composition.
• Many times this profiling may be accomplished by applying discriminant analysis to the
groups identified by the cluster technique.
CLUSTER ANALYSIS (2)

• As an example of cluster analysis, let’s assume a restaurant owner


wants to know whether customers are patronizing the restaurant for
different reasons.
• Data could be collected on perceptions of pricing, food quality, and
so forth.
• Cluster analysis could be used to determine whether some subgroups
(clusters) are highly motivated by low prices versus those who are
much less motivated to come to the restaurant based on price
considerations.
MULTIPLE REGRESSION (1)

• Multiple regression is the appropriate method of analysis when the


research problem involves a single metric dependent variable
presumed to be related to two or more metric independent
variables.
• The objective of multiple regression analysis is to predict the changes
in the dependent variable in response to changes in the independent
variables.
• This objective is most often achieved through the statistical rule of
least squares.
• Whenever the researcher is interested in predicting the amount or
size of the dependent variable, multiple regression is useful.
MULTIPLE REGRESSION (2)

• For example, monthly expenditures on dining out


(dependent variable) might be predicted from information
regarding a family’s income, its size, and the age of the head
of household (independent variables).
• Similarly, the researcher might attempt to predict a
company’s sales from information on its expenditures for
advertising, the number of salespeople, and the number of
stores carrying its products.
MULTIVARIATE ANALYSIS OF
VARIANCE (1)
• Multivariate analysis of variance (MANOVA) is a statistical technique
that can be used to simultaneously explore the relationship between
several categorical independent variables (usually referred to as
treatments) and two or more metric dependent variables.
• As such, it represents an extension of univariate analysis of variance
(ANOVA).
• MANOVA is useful when the researcher designs an experimental
situation (manipulation of several nonmetric treatment variables) to
test hypotheses concerning the variance in group responses on two
or more metric dependent variables.
MULTIVARIATE ANALYSIS OF
VARIANCE (2)
• Assume a company wants to know if a humorous ad will be more
effective with its customers than a non-humorous ad.
• It could ask its ad agency to develop two ads—one humorous and
one non-humorous—and then show a group of customers the two
ads.
• After seeing the ads, the customers would be asked to rate the
company and its products on several dimensions, such as modern
versus traditional or high quality versus low quality.
• MANOVA would be the technique to use to determine the extent of
any statistical differences between the perceptions of customers who
saw the humorous ad versus those who saw the non-humorous one.
MULTIPLE DISCRIMINANT ANALYSIS
(1)
• Multiple discriminant analysis (MDA) is the appropriate multivariate
technique if the single dependent variable is dichotomous (e.g., male–female)
or multichotomous (e.g., high–medium–low) and therefore nonmetric.
• As with multiple regression, the independent variables are assumed to be
metric.
• Discriminant analysis is applicable in situations in which the total sample can
be divided into groups based on a nonmetric dependent variable
characterizing several known classes.
• The primary objectives of multiple discriminant analysis are to understand
group differences and to predict the likelihood that an entity (individual or
object) will belong to a particular class or group based on several metric
independent variables.
• Discriminant analysis might be used to distinguish innovators from non-
innovators according to their demographic and psychographic profiles.
MULTIPLE DISCRIMINANT ANALYSIS
(2)
• Other applications include distinguishing heavy product users from
light users, males from females, national-brand buyers from private-
label buyers, and good credit risks from poor credit risks.
• Even the Internal Revenue Service uses discriminant analysis to
compare selected federal tax returns with a composite, hypothetical,
normal taxpayer’s return (at different income levels) to identify the
most promising returns and areas for audit.
LOGISTIC REGRESSION (1)
• Logistic regression models, often referred to as logit analysis, are a
combination of multiple regression and multiple discriminant analysis.
• This technique is similar to multiple regression analysis in that one or more
independent variables are used to predict a single dependent variable.
• What distinguishes a logistic regression model from multiple regression is
that the dependent variable is nonmetric, as in discriminant analysis.
• The nonmetric scale of the dependent variable requires differences in the
estimation method and assumptions about the type of underlying
distribution, yet in most other facets it is quite similar to multiple
regression.
• Logistic regression models are distinguished from discriminant analysis
primarily in that they only apply to binary dependent variables,
accommodate all types of independent variables (metric and nonmetric)
and do not require the assumption of multivariate normality.
LOGISTIC REGRESSION (2)

• Assume financial advisors were trying to develop a means of


selecting emerging firms for start-up investment.
• To assist in this task, they reviewed past records and placed firms into
one of two classes: successful over a five-year period, and
unsuccessful after five years.
• For each firm, they also had a wealth of financial and managerial
data.
• They could then use a logistic regression model to identify those
financial and managerial data that best differentiated between the
successful and unsuccessful firms in order to select the best
candidates for investment in the future.
STRUCTURAL EQUATION MODELING and
CONFIRMATORY FACTOR ANALYSIS (1)

• Structural equation modeling (SEM) is a technique that allows separate


relationships for each of a set of dependent variables.
• This method of SEM is based on an analysis of only common variance
and begins with calculating the covariance matrix, and is often referred
to as covariance-based SEM.
• In its simplest sense, structural equation modeling provides the
appropriate and most efficient estimation technique for a series of
separate multiple regression equations estimated simultaneously. It is
characterized by two basic components:
(1) the structural model and
(2) the measurement model.
STRUCTURAL EQUATION MODELING and
CONFIRMATORY FACTOR ANALYSIS (2)

• The structural model is the path model, which relates independent to


dependent variables.
• In such situations, theory, prior experience, or other guidelines enable
the researcher to distinguish which independent variables predict each
dependent variable.
• Models discussed previously that accommodate multiple dependent
variables—multivariate analysis of variance and canonical correlation—
are not applicable in this situation because they allow only a single
relationship between dependent and independent variables.
• The measurement model enables the researcher to use several variables
(indicators) for a single independent or dependent variable.
STRUCTURAL EQUATION MODELING and
CONFIRMATORY FACTOR ANALYSIS (3)

• For example, the dependent variable might be a concept represented by


a summated scale, such as self-esteem.
• In a confirmatory factor analysis (CFA) the researcher can assess the
contribution of each scale item as well as incorporate how well the scale
measures the concept (reliability).
• The scales are then integrated into the estimation of the relationships
between dependent and independent variables in the structural model.
• This procedure is similar to performing an exploratory factor analysis
(discussed in a later section) of the scale items and using the factor
scores in the regression.
STRUCTURAL EQUATION MODELING and
CONFIRMATORY FACTOR ANALYSIS (4)
• A study by management consultants identified several factors that affect worker
satisfaction: supervisor support, work environment, and job performance.
• In addition to this relationship, they noted a separate relationship wherein
supervisor support and work environment were unique predictors of job
performance.
• Hence, they had two separate, but interrelated relationships.
• Supervisor support and the work environment not only affected worker
satisfaction directly, but had possible indirect effects through the relationship
with job performance, which was also a predictor of worker satisfaction.
• In attempting to assess these relationships, the consultants also developed
multi-item scales for each construct (supervisor support, work environment, job
performance, and worker satisfaction).
• SEM provides a means of not only assessing each of the relationships
simultaneously rather than in separate analyses, but also incorporating the
multi-item scales in the analysis to account for measurement error associated
with each of the scales.
PARTIAL LEAST SQUARES STRUCTURAL
EQUATION MODELING (1)
• An alternative approach to structural equation modeling is partial least squares
structural equation modeling (PLS-SEM), often referred to as variance-based
SEM.
• This method of SEM is based on an analysis of total variance and also includes
both a measurement model and a structural model.
• The initial step in applying this method examines the measurement model and is
referred to as confirmatory composite analysis.
• As with CFA, in this step the researcher also identifies the contribution of each
measured variable to its construct as well as evaluating the reliability and validity
of the measurement models.
• After the measurement models are determined to be valid and reliable, the
analyst examines the structural model.
• The focus of variance-based SEM is primarily on prediction and explanation of
the relationships, whereas with covariance-based SEM the focus is on
confirmation of well-established theory.
CANONICAL CORRELATION (1)
• Canonical correlation analysis can be viewed as a logical extension of
multiple regression analysis. Recall that multiple regression analysis
involves a single metric dependent variable and several metric
independent variables.
• With canonical analysis the objective is to correlate simultaneously several
metric dependent variables and several metric independent variables.
• Whereas multiple regression involves a single dependent variable,
canonical correlation involves multiple dependent variables.
• The underlying principle is to develop a linear combination of each set of
variables (both independent and dependent) in a manner that maximizes
the correlation between the two sets.
• Stated in a different manner, the procedure involves obtaining a set of
weights for the dependent and independent variables that provides the
maximum simple correlation between the set of dependent variables and
the set of independent variables.
CANONICAL CORRELATION (2)

• Assume a company conducts a study that collects information on its


service quality based on answers to 50 metrically measured questions.
• The study uses questions from published service quality research and
includes benchmarking information on perceptions of the service quality
of “world-class companies” as well as the company for which the
research is being conducted.
• Canonical correlation could be used to compare the perceptions of the
world-class companies on the 50 questions with the perceptions of the
company.
• The research could then conclude whether the perceptions of the
company are correlated with those of world-class companies.
• The technique would provide information on the overall correlation of
perceptions as well as the correlation between each of the 50 questions.
CONJOINT ANALYSIS (1)

• Conjoint analysis is a dependence technique that brings new


sophistication to the evaluation of objects, such as new products,
services, or ideas.
• The most direct application is in new product or service development,
allowing for the evaluation of complex products while maintaining a
realistic decision context for the respondent.
• The market researcher is able to assess the importance of attributes as
well as the levels of each attribute while consumers evaluate only a few
product profiles, which are combinations of product levels.
CONJOINT ANALYSIS (2)

• Assume a product concept has three attributes (price, quality, and color),
each at three possible levels (e.g., red, yellow, and blue as the three
levels of color).
• Instead of having to evaluate all (3 x 3 x 3) possible combinations, a
subset (9 or more) can be evaluated for their attractiveness to
consumers, and the researcher knows not only how important each
attribute is but also the importance of each level (e.g., the attractiveness
of red versus yellow versus blue).
• Moreover, when the consumer evaluations are completed, the results of
conjoint analysis can also be used in product design simulators, which
show customer acceptance for any number of product formulations and
aid in the design of the optimal product.
CORRESPONDENCE ANALYSIS (1)
• Correspondence analysis is a recently developed interdependence
technique that facilitates the perceptual mapping of objects (e.g., products,
persons) on a set of nonmetric attributes.
• Researchers are constantly faced with the need to “quantify the qualitative
data” found in nominal variables.
• Correspondence analysis differs from the interdependence techniques
discussed earlier in its ability to accommodate both nonmetric data and
nonlinear relationships.
• In its most basic form, correspondence analysis employs a contingency
table, which is the cross-tabulation of two categorical variables.
• It then transforms the nonmetric data to a metric level and performs
dimensional reduction (similar to exploratory factor analysis) and
perceptual mapping.
• Correspondence analysis provides a multivariate representation of
interdependence for nonmetric data that is not possible with other
methods.
CORRESPONDENCE ANALYSIS (2)

• As an example, respondents’ brand preferences can be cross-tabulated


on demographic variables (e.g., gender, income categories, occupation)
by indicating how many people preferring each brand fall into each
category of the demographic variables.
• Through correspondence analysis, the association, or “correspondence,”
of brands and the distinguishing characteristics of those preferring each
brand are then shown in a two- or three-dimensional map of both
brands and respondent characteristics.
• Brands perceived as similar are located close to one another.
• Likewise, the most distinguishing characteristics of respondents
preferring each brand are also determined by the proximity of the
demographic variable categories to the brand’s position.
Understand The Six-step Approach to
Multivariate Model Building
The six-step model-building process provides a framework for developing,
interpreting, and validating any multivariate analysis.
1) Define the research problem, objectives, and multivariate technique to
be used.
2) Develop the analysis plan.
3) Evaluate the assumptions.
4) Estimate the multivariate model and evaluate fit.
5) Interpret the variates.
6) Validate the multivariate model.
STAGE 1: Define The Research Problem,
Objectives, and Multivariate Technique to be
Used
• The starting point for any multivariate analysis is to define the research problem
and analysis objectives in conceptual terms before specifying any variables or
measures.
• No matter whether in academic or applied research, the researcher must first
view the problem in conceptual terms by defining the concepts and identifying
the fundamental relationships to be investigated.
• A conceptual model need not be complex and detailed; instead, it can be just a
simple representation of the relationships to be studied.
• If a dependence relationship is proposed as the research objective, the
researcher needs to specify the dependent and independent concepts.
• For an application of an interdependence technique, the dimensions of structure
or similarity should be specified.
• Note that a concept (an idea or topic), rather than a specific variable, is defined
in both dependence and interdependence situations.
STAGE 2: Develop the analysis plan

• With the conceptual model established and the multivariate technique


selected, attention turns to the implementation issues.
• The issues include general considerations such as minimum or desired
sample sizes and allowable or required types of variables (metric versus
nonmetric) and estimation methods.
STAGE 3: Evaluate the assumptions

• With data collected, the first task is not to estimate the multivariate
model but to evaluate its underlying assumptions, both statistical and
conceptual, that substantially affect their ability to represent
multivariate relationships.
• For the techniques based on statistical inference, the assumptions of
multivariate normality, linearity, independence of the error terms, and
equality of variances must all be met.
• Each technique also involves a series of conceptual assumptions dealing
with such issues as model formulation and the types of relationships
represented.
• Before any model estimation is attempted, the researcher must ensure
that both statistical and conceptual assumptions are met.
STAGE 4: Estimate the multivariate model
and evaluate fit
• With the assumptions satisfied, the analysis proceeds to the actual estimation of
the multivariate model and an assessment of overall model fit.
• In the estimation process, the researcher may choose among options to meet
specific characteristics of the data (e.g., use of covariates in MANOVA) or to
maximize the fit to the data (e.g., rotation of factors or discriminant functions).
• After the model is estimated, the overall model fit is evaluated to ascertain
whether it achieves acceptable levels on statistical criteria (e.g., level of
significance), identifies the proposed relationships, and achieves practical
significance.
• Many times, the model will be respecified in an attempt to achieve better levels
of overall fit and/or explanation.
• In all cases, however, an acceptable model must be obtained before proceeding.
STAGE 5: Interpret the variates

• With an acceptable level of model fit, interpreting the variate(s) reveals


the nature of the multivariate relationship.
• The interpretation of effects for individual variables is made by examining
the estimated coefficients (weights) for each variable in the variate.
• Moreover, some techniques also estimate multiple variates that represent
underlying dimensions of comparison or association.
• The interpretation may lead to additional respecifications of the variables
and/or model formulation, wherein the model is re-estimated and then
interpreted again.
• The objective is to identify empirical evidence of multivariate relationships
in the sample data that can be generalized to the total population.
STAGE 6: Validate the multivariate model

• Before accepting the results, the researcher must subject them to


one final set of diagnostic analyses that assess the degree of
generalizability of the results by the available validation methods.
• The attempts to validate the model are directed toward
demonstrating the generalizability of the results to the total
population.
• These diagnostic analyses add little to the interpretation of the
results but can be viewed as “insurance” that the results are the
most descriptive of the data, yet generalizable to the population.

You might also like