Factor Analysis As A Tool For Survey Analysis
Factor Analysis As A Tool For Survey Analysis
Factor Analysis As A Tool For Survey Analysis
*Karan Khulbe, **Pradeep Kumar, #Prof. Yashwant Singh Thakur, #Course Coordinator, Shubham Dadariya,
Abstract Factor analysis is particularly suitable to extract few factors from the large number of related
variables to a more manageable number, prior to using them in other analysis such as multiple regression or
multivariate analysis of variance. It can be beneficial in developing of a questionnaire. Sometimes adding more
statements in the questionnaire fail to give clear understanding of the variables. With the help of factor analysis,
irrelevant questions can be removed from the final questionnaire. This study proposed a factor analysis to
identify the factors underlying the variables of a questionnaire to measure tourist satisfaction. In this study,
Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of Sphericity are used to assess the
factorability of the data. Determinant score is calculated to examine the multicollinearity among the variables.
To determine the number of factors to be extracted, Kaiser’s Criterion and Scree test are examined. Varimax
orthogonal factor rotation method is applied to minimize the number of variables that have high loadings on
each factor. The internal consistency is confirmed by calculating Cronbach’s alpha and composite reliability to
test the instrument accuracy. The convergent validity is established when average variance extracted is greater
than or equal to 0.5. The results have revealed that the factor analysis not only allows detecting irrelevant items
but will also allow extracting the valuable factors from the data set of a questionnaire survey. The application of
factor analysis for questionnaire evaluation provides very valuable inputs to the decision makers to focus on few
important factors rather than a large number of parameters.
1.Introduction
Factor Analysis is a multivariate statistical satisfaction. Therefore, in order to identify the
technique applied to a single set of variables when factors, it is necessary to understand the concept
the investigator is interested in determining which and steps to apply factor analysis for the
variables in the set form logical subsets that are questionnaire survey. Factory analysis is based on
relatively independent of one another.[1] In other the assumption that all variables correlate to some
words, factor analysis is particularly useful to degree. The variables should be measured at least
identify the factors underlying the variables by at the ordinal level. The sample size for factor
means of clubbing related variables in the same analysis should be larger but the more acceptable
factor[2] . In this paper, the main focus is given on range would be a ten-to-one ratio[3,4]. There are
the application of factor analysis to reduce huge two main approaches to factor analysis: exploratory
number of inter-correlated measures to a few factor analysis (EFA) and confirmatory factor
representative constructs or factors that can be used analysis (CFA). Exploratory factor analysis is used
for subsequent analysis[4]. The goal of the present for checking dimensionality and often used in the
work is to examine the application of factor early stages of research to gather information about
analysis of a questionnaire item to measure tourist the interrelationships among a set of variables[5].
The confirmatory factor analysis is a more complex clusters of closely related data points. Principal
and sophisticated set of techniques used in the component analysis has applications in many fields
research process to test specific hypotheses or such as population genetics, microbiome studies,
theories concerning the structure underlying a set and atmospheric science.[8]
of variables[6,7].
Applications Of PCA :
What is Factor Analysis?
1.Intelligence
It refers to a method that reduces a large variable into
a smaller variable factor. Furthermore, this technique
takes out maximum ordinary variance from all 2.Resedential Differentiation.
the variables and put them in common score.
3.Developement Indexes
Moreover, it is a part of General Linear Model
(GLM) and it believes several theories that contain
no multicollinearity, linear relationship, 4.Population Genetics
true correlation, and relevant variables into the
analysis among factors and variables. 5.Market Research And Index Of Attitudes
The most common approach to deciding the Once the number of factors are decided the
number of factors is to generate a scree plot. The researcher runs another factor analysis to get the
scree plot is a two dimensional graph with factors loadings for each of the factors. To do this, one has
on the x-axis and eigenvalues on the y-axis. to decided which mathematical solution to use to
Eigenvalues are produced by a process called find the loadings. There are about five basic
principal components analysis (PCA) and represent extraction methods (1) PCA, which is the default in
the variance accounted for by each underlying most packages. PCA assumes there is no
factor. They are not represented by percentages measurement error and is considered not to be a
but scores that total to the number of items. A 12- true exploratory factor analysis; (2) maximum
item scale will theoretically have 12 possible likelihood (a.k.a. canonical factoring); (3) alpha
underlying factors, each factor will have an factoring, (4) image factoring, (5) principal axis
eigenvalue that indicates the amount of variation factoring with iterated communalities (a.k.a. least
in the items accounted for by each factor. If a the squares). Without getting into the details of each
first factor has an eigenvalue of 3.0, it accounts for of these, I think the best evidence supports the
25% of the variance (3/12=.25). The total of all the use of principal axis factoring and maximum
eigenvalues will be 12 if there are 12 items, so likelihood approaches. I typically use the former.
some factors will have smaller eigenvalues. They Gorsuch (1989) recommends the latter if only a
are typically arranged in a scree plot in decending few iterations are performed (not really possible in
order like the following: most packages). Snook and Gorsuch (1989) show
that PCA can give poor estimates of the population
loadings in small samples. With larger samples,
most approaches will have similar results. The
extraction method will produce factor loadings for
every item on every extracted factor. Researchers
hope their results will show what is called simple
structure, with most items having a large loading
on one factor but small loadings on other factors.
From the scree plot you can see that the first EXPLORATORY FACTOR ANALYSIS: ROTATION
couple of factors account for most of the variance, Once an initial solution is obtained, the loadings
then the remaining factors all have small are rotated. Rotation is a way of maximizing high
eigenvalues. The term “scree” is taken from the loadings and minimizing low loadings so that the
word for the rubble at the bottom of a mountain. simplest possible structure is achieved. There are
A researcher might Factors Eigenvalues Newsom 3 two basic types of rotation: orthogonal and
SEM Winter 2005 examine this plot and decide oblique. Orthogonal means the factors are
there are 2 underlying factors and the remainder assumed to be uncorrelated with one another.
of factors are just “scree” or error variation. So, This is the default setting in all statistical packages
this approach to selecting the number of factors but is rarely a logical assumption about factors in
involves a certain amount of subjective judgment. the social sciences. Not all researchers using EFA
Another approach is called the Kaiser-Guttman realize that orthogonal rotations imply the
rule and simply states that the number of factors assumption that they probably would not
are equal to the number of factors with consciously make. Oblique rotation derives factor
eigenvalues greater than 1.0. I tend to recommend loadings based on the assumption that the factors
the scree plot approach because the Kaiser- are correlated, and this is probably most likely the
Guttman approach seems to produce many case for most measures. So, oblique rotation gives
factors. the correlation between the factors in addition to
the loadings. Here are some common algorithms
for orthogonal and oblique rotation: Orthogonal background knowledge to develop a hypothesis
rotation: varimax, quartamax, equamax. Oblique about how to measure it, then apply CFA to test
rotation: oblimin, promax, direct quartimin I am the accuracy of their ideas. Researchers use
not an expert on the advantages and structural equation modeling software to conduct
disadvantages of each of these rotation confirmatory factor analysis because it requires
algorithms, and they reportedly produce fairly processing complex data sets with advanced
similar results under most circumstances (although mathematical models and equations.
orthogonal and oblique rotations will be rather
different). I tend to use promax rotation because it CFA is a popular research and data analysis
is known to be relatively efficient at achieving procedure in the social sciences, particularly
simple oblique structure. psychology, because it can address theoretical
models and concepts that are difficult to measure,
Advantages of EFA : such as emotions and psychological symptoms. In
social sciences, these measurement systems are
Exploratory factor analysis (EFA) is generally usually survey questions, rating scales and other
used to discover the factor structure of a measure inventories. For example, a researcher may use
and to examine its internal reliability. EFA is often CFA to determine how well each mental health
recommended when researchers have no survey question shows anxiety disorder symptoms.
hypotheses about the nature of the underlying
factor structure of their measure. Key terms of CFA :
Limitation of EFA : Here are some of the basic terms to know when
conducting a confirmatory factor analysis:[9]
The major limitation behind Exploratory Factor
Analysis is its simplicity. Hence, the researcher will Observed Variable
not get a reliable inference. Therefore, Exploratory
Factor Analysis is used less as compared to An observed variable is a factor that you use to
Confirmatory Factor Analysis. measure a concept. Observed variables include the
data you record during your research. Questions
3.Confirmatory Factor on surveys often address different observed
variables.
Analysis:
For example, consider a mental health
Understanding the relationships between different professional using a survey to assess anxiety
variables is an important part of statistical analysis. symptoms. One survey question asks the
Confirmatory factor analysis is a procedure respondent to rate their stress level from one to
researchers use to determine if their theories five. Because the respondent's stress levels may
about data relationships are accurate. If you're show anxiety, and the survey provides a
interested in social research or statistics, quantifiable system to measure stress, it's an
understanding how to apply this technique can observed variable.
help you learn essential insights about your data.
Latent Variable
What is confirmatory factor
The latent variable, also known as the construct, is
analysis? the shared concept that different measurement
systems assess. Latent variables are difficult to
Confirmatory factor analysis (CFA) is a statistical
observe directly but can influence the outcome of
modeling method that assesses how accurately
the observed variables in an experiment. For
different systems measure and evaluate a concept.
example, the latent variable of anxiety may affect
With this method, researchers use their
the outcome of someone's reported stress levels. Establishing a baseline to describe the latent
Someone with anxiety may rate their stress at variable makes it possible for you to evaluate the
level five, while someone without anxiety is likely accuracy of the observed variables. You can define
to choose a lower score. Although asking about the latent variable by listing characteristics or
stress doesn't directly measure anxiety, it can still collecting additional data. For example, if you
provide researchers with insight into the want to use CFA to determine if an intake survey is
relationship between stress and anxiety. a good assessment of self-esteem, start by
defining self-esteem. You may use your
CFA exists to assess the indirect relationship professional knowledge to determine that self-
between latent and observed variables. esteem involves traits like confidence, sociability,
adaptability and goals.
Factor Loading
2.Determine measurement
Factor loading is a number that describes how
closely an observed variable corresponds with a methods :
latent variable. It is usually between zero and one,
although some data sets can produce factor Next, identify the measurement method you want
loadings higher than one when computing multiple to test and which observed variables to include.
variables. Factor loadings with a higher value have These variables are usually survey questions. You
a stronger correlation with the latent variable. may include multiple questions from the same
survey or choose questions from different surveys
For example, the data analysis on a survey for the depending on the type of analysis you want to
latent variable of anxiety produces a factor loading conduct.
score of 0.85 for question one and 0.33 for
question two. Because the factor loading in Here are some example observed variables from a
question one is higher than in question two, survey assessing self-esteem:
question one is likely better at identifying people
Rank your confidence from one to five.
with anxiety than question two.
Agree or disagree: I’m uncomfortable accepting
CFA VS EFA complements from others.
Confirmatory factor analysis and exploratory Rate your adaptability from one to five.
factor analysis are two complementary techniques
for reviewing research data. Exploratory factor 3.Collect the data :
analysis identifies possible relationships between
variables, while confirmatory factor analysis tests Gather the information you want to use in your
those relationships. Researchers with an extensive confirmatory factor analysis. Decide if you'll collect
background in a subject area often use initial responses from your own research or use
confirmatory factor analysis because they can outside data from other sources. Try to secure a
predict possible relationships in their data. They large sample size of information to ensure an
use exploratory factor analysis to learn about new accurate analysis. Once you have enough quality
patterns and identify innovative trends. information, input it into statistical modeling
software.
Use your statistical modeling software to compute Assumptions: CFA assumes that the data collected
the factor loading for your data. Follow the is accurate and that the proposed model is correct.
prompts for your specific software interface to
produce your results. Most factor analysis
software puts this information in a table, although 1. One Factor Confirmatory Factor
some generate graphs and tables to express the Analysis
same information.
6.Interpretation :
Review the factor loading column of the factor
analysis table to determine how well each
observed variable relates to the latent variable.
Decide what factor loading value shows a
significant relationship, and use that to guide your
interpretation. For example, you may decide that
any variables with a factor loading of 0.75 are valid
for assessing self-esteem. If all the survey
questions have a factor loading over 0.75, you can
conclude that your survey is a good overall
measurement. 2. Two Factor Confirmatory Factor
Analysis
Advantage of CFA :
The main advantage of CFA lies in its ability to aid
researchers in bridging the often-observed gap
between theory and observation. For example, an
instrument might be developed by creating
multiple items for each of several specific
theoretical constructs.
Review Matrix
2 Syed Mohammad Ather 2009 Factor Analysis (FA) attempts to simplify complex and diverse
relationships that exist among a set of observed variables by
uncovering common.
5 Johnny R.J. Fontaine 2005 CFA offers a measurement model based on structural
equation modeling
6 S.E. Richards, E. Holmes 2015 Reviews different chemometrics methods for the analysis of
genomics, transcriptomics, proteomics, metabolomics,
and metagenomics datasets. It discusses a range of statistical
data integration techniques.
8 Michael C. Ashton 2013 Factor analysis allows the researcher to reduce many specific
traits into a few more general “factors” or groups of traits,
each of which includes several of the specific traits.
9 Nerea Martín- 2019 Factor analysis is a multivariable method that uses the
Calvo, Miguel observed data to define one or several vectors (one or several
Ángel Martínez- dietary patterns) grouping the food or food groups according
González to their degree of correlation.
[2] Verma, J. and Abdel-Salam, A., Testing statistical assumptions in research, John Willey & Sons
Inc., 2019.
[3] Ho, R., Handbook of univariate and multivariate data analysis and interpretation with SPSS,
Chapman & Hall/CRC, Boca Raton, 2006.
[4] Hair, J.F., Anderson, R.E., Tatham, R.L., and Black, W.C., Multivariate data analysis (5th ed.), N J:
Prentice-Hall, Upper Saddle River, 1998.
[5] Pituch, K. A. and Stevens, J., Applied multivariate statistics for the social sciences: Analyses
with SAS and IBM’s SPSS (6th ed.), Taylor & Francis, New York, 2016.
[6] Hair, J. J., Black, W.C., Babin, B. J., Anderson, R. R., Tatham, R. L., Multivariate data analysis,
Upper Saddle River, New Jersey, 2006.
[7] Pallant, J., SPSS survival manual: a step by step guide to data analysis using SPSS, Open
University Press/ Mc Graw-Hill, Maidenhead, 2010.
[10] https://www.du.ac.in/du/uploads/departments/Operational%20Research/24042020_Lect-
8%20Factor%20Analysis.pdf