Factor Analysisd
Factor Analysisd
Factor Analysisd
Contents
1 Statistical model
o 1.1 Definition
o 1.2 Example
o 1.3 Mathematical model of the same example
o 1.4 Geometric interpretation
2 Practical implementation
o 2.1 Type of factor analysis
o 2.2 Types of factoring
o 2.3 Terminology
o 2.4 Criteria for determining the number of factors
o 2.5 Rotation methods
o 3.1 History
o 3.2 Applications in psychology
o 3.3 Advantages
o 3.4 Disadvantages
8 Implementation
9 See also
10 References
11 Further reading
12 External links
Statistical model
Definition
Suppose we have a set of observable random variables,
.
with means
, where
Here, the
are independently distributed error terms with zero mean and finite variance,
, so that we have
2.
3.
Any solution of the above set of equations following the constraints for
factors, and as the loading matrix.
Suppose
have
is defined as the
, we
or
or
Example
The following example is for expository purposes, and should not be taken as being realistic.
Suppose a psychologist proposes a theory that there are two kinds of intelligence, "verbal
intelligence" and "mathematical intelligence", neither of which is directly observed. Evidence
for the theory is sought in the examination scores from each of 10 different academic fields of
1000 students. If each student is chosen randomly from a large population, then each student's
10 scores are random variables. The psychologist's theory may say that for each of the 10
academic fields, the score averaged over the group of all students who share some common
pair of values for verbal and mathematical "intelligences" is some constant times their level
of verbal intelligence plus another constant times their level of mathematical intelligence, i.e.,
it is a combination of those two "factors". The numbers for a particular subject, by which the
two kinds of intelligence are multiplied to obtain the expected score, are posited by the theory
to be the same for all intelligence level pairs, and are called "factor loadings" for this
subject. For example, the theory may hold that the average student's aptitude in the field of
taxonomy is
{10 the student's verbal intelligence} + {6 the student's mathematical
intelligence}.
The numbers 10 and 6 are the factor loadings associated with taxonomy. Other academic
subjects may have different factor loadings.
Two students having identical degrees of verbal intelligence and identical degrees of
mathematical intelligence may have different aptitudes in taxonomy because individual
aptitudes differ from average aptitudes. That difference is called the "error" a statistical
term that means the amount by which an individual differs from what is average for his or her
levels of intelligence (see errors and residuals in statistics).
The observable data that go into factor analysis would be 10 scores of each of the 1000
students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of
intelligence of each student must be inferred from the data.
where
Observe that by doubling the scale on which "verbal intelligence"the first component in
each column of Fis measured, and simultaneously halving the factor loadings for verbal
intelligence makes no difference to the model. Thus, no generality is lost by assuming that the
standard deviation of verbal intelligence is 1. Likewise for mathematical intelligence.
Moreover, for similar reasons, no generality is lost by assuming the two factors are
uncorrelated with each other. In other words:
where
is the Kronecker delta (0 when
to be independent of the factors:
and 1 when
Note that, since any rotation of a solution is also a solution, this makes interpreting the factors
difficult. See disadvantages below. In this particular example, if we do not know beforehand
that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as
the two different types of intelligence. Even if they are uncorrelated, we cannot tell which
factor corresponds to verbal intelligence and which corresponds to mathematical intelligence
without an outside argument.
The values of the loadings L, the averages , and the variances of the "errors" must be
estimated given the observed data X and F (the assumption about the levels of the factors is
fixed for a given F). The "fundamental theorem" may be derived from the above conditions:
The term on the left is just the correlation matrix of the observed data, and its
diagonal
elements will be 1's. The last term on the right will be a diagonal matrix with terms less than
unity. The first term on the right is the "reduced correlation matrix" and will be equal to the
correlation matrix except for its diagonal values which will be less than unity. These diagonal
elements of the reduced correlation matrix are called "communalities":
This is equivalent to minimizing the off-diagonal components of the error covariance which,
in the model equations have expected values of zero. This is to be contrasted with principal
component analysis which seeks to minimize the mean square error of all residuals.[2] Before
the advent of high speed computers, considerable effort was devoted to finding approximate
solutions to the problem, particularly in estimating the communalities by other means, which
then simplifies the problem considerably by yielding a known reduced correlation matrix.
This was then used to estimate the factors and the loadings. With the advent of high-speed
computers, the minimization problem can be solved quickly and directly, and the
communalities are calculated in the process, rather than being needed beforehand. The
MinRes algorithm is particularly suited to this problem, but is hardly the only means of
finding an exact solution.
Geometric interpretation
Geometric interpretation of Factor Analysis parameters for 3 respondents to question "a". The
"answer" is represented by the unit vector , which is projected onto a plane defined by two
orthonormal vectors
to the plane, so that
the factor vectors as
and
The parameters and variables of factor analysis can be given a geometrical interpretation. The
data ( ), the factors (
) and the errors ( ) can be viewed as vectors in an
dimensional Euclidean space (sample space), represented as ,
and respectively. Since
the data is standardized, the data vectors are of unit length (
). The factor vectors
define an
-dimensional linear subspace (i.e. a hyperplane) in this space, upon which the
data vectors are projected orthogonally. This follows from the model equation
and the errors are vectors from that projected point to the data point and are perpendicular to
the hyperplane. The goal of factor analysis is to find a hyperplane which is a "best fit" to the
data in some sense, so it doesn't matter how the factor vectors which define this hyperplane
are chosen, as long as they are independent and lie in the hyperplane. We are free to specify
them as both orthogonal and normal (
) with no loss of generality. After a
suitable set of factors are found, they may also be arbitrarily rotated within the hyperplane, so
that any rotation of the factor vectors will define the same hyperplane, and also be a solution.
As a result, in the above example, in which the fitting hyperplane is two dimensional, if we
do not know beforehand that the two types of intelligence are uncorrelated, then we cannot
interpret the two factors as the two different types of intelligence. Even if they are
uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which
corresponds to mathematical intelligence, or whether the factors are linear combinations of
both, without an outside argument.
The data vectors have unit length. The correlation matrix for the data is given by
. The correlation matrix can be geometrically interpreted as the cosine of the
angle between the two data vectors and . The diagonal elements will clearly be 1's and
the off diagonal elements will have absolute values less than or equal to unity. The "reduced
correlation matrix" is defined as
.
The goal of factor analysis is to choose the fitting hyperplane such that the reduced
correlation matrix reproduces the correlation matrix as nearly as possible, except for the
diagonal elements of the correlation matrix which are known to have unit value. In other
words, the goal is to reproduce as accurately as possible the cross-correlations in the data.
Specifically, for the fitting hyperplane, the mean square error in the off-diagonal components
The term on the right is just the covariance of the errors. In the model, the error covariance is
stated to be a diagonal matrix and so the above minimization problem will in fact yield a
"best fit" to the model: It will yield a sample estimate of the error covariance which has its
off-diagonal components minimized in the mean square sense. It can be seen that since the
are orthogonal projections of the data vectors, their length will be less than or equal to the
length of the projected data vector, which is unity. The square of these lengths are just the
diagonal elements of the reduced correlation matrix. These diagonal elements of the reduced
correlation matrix are known as "communalities":
Large values of the commmunalities will indicate that the fitting hyperplane is rather
accurately reproducing the correlation matrix. It should be noted that the mean values of the
factors must also be constrained to be zero, from which it follows that the mean values of the
errors will also be zero.
Practical implementation
This section needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (April 2012)
Types of factoring
Principal component analysis (PCA): PCA is a widely used method for factor extraction,
which is the first phase of EFA.[3] Factor weights are computed in order to extract the
maximum possible variance, with successive factoring continuing until there is no further
meaningful variance left.[3] The factor model must then be rotated for analysis.[3]
Canonical factor analysis, also called Rao's canonical factoring, is a different method of
computing the same model as PCA, which uses the principal axis method. Canonical factor
analysis seeks factors which have the highest canonical correlation with the observed
variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data.
Common factor analysis, also called principal factor analysis (PFA) or principal axis
factoring (PAF), seeks the least number of factors which can account for the common
variance (correlation) of a set of variables.
Image factoring: based on the correlation matrix of predicted variables rather than actual
variables, where each variable is predicted from the others using multiple regression.
Alpha factoring: based on maximizing the reliability of factors, assuming variables are
randomly sampled from a universe of variables. All other methods assume cases to be
sampled and variables fixed.
Factor regression model: a combinatorial model of factor model and regression model; or
alternatively, it can be viewed as the hybrid factor model,[4] whose factors are partially
known.
Terminology
Factor loadings: The factor loadings, also called component loadings in PCA, are the
correlation coefficients between the cases (rows) and factors (columns). Analogous to
Pearson's r, the squared factor loading is the percent of variance in that indicator variable
explained by the factor. To get the percent of variance in all the variables accounted for by
each factor, add the sum of the squared factor loadings for that factor (column) and divide by
the number of variables. (Note the number of variables equals the sum of their variances as
the variance of a standardized variable is 1.) This is the same as dividing the factor's
eigenvalue by the number of variables.
Interpreting factor loadings: By one rule of thumb in confirmatory factor analysis, loadings
should be .7 or higher to confirm that independent variables identified a priori are represented
by a particular factor, on the rationale that the .7 level corresponds to about half of the
variance in the indicator being explained by the factor. However, the .7 standard is a high one
and real-life data may well not meet this criterion, which is why some researchers,
particularly for exploratory purposes, will use a lower level such as .4 for the central factor
and .25 for other factors. In any event, factor loadings must be interpreted in the light of
theory, not by arbitrary cutoff levels.
In oblique rotation, one gets both a pattern matrix and a structure matrix. The structure matrix
is simply the factor loading matrix as in orthogonal rotation, representing the variance in a
measured variable explained by a factor on both a unique and common contributions basis.
The pattern matrix, in contrast, contains coefficients which just represent unique
contributions. The more factors, the lower the pattern coefficients as a rule since there will be
more common contributions to variance explained. For oblique rotation, the researcher looks
at both the structure and pattern coefficients when attributing a label to a factor. Principles of
oblique rotation can be derived from both cross entropy and its dual entropy.[5]
Communality: The sum of the squared factor loadings for all factors for a given variable
(row) is the variance in that variable accounted for by all the factors, and this is called the
communality. The communality measures the percent of variance in a given variable
explained by all the factors jointly and may be interpreted as the reliability of the indicator.
Spurious solutions: If the communality exceeds 1.0, there is a spurious solution, which may
reflect too small a sample or the researcher has too many or too few factors.
Uniqueness of a variable: That is, uniqueness is the variability of a variable minus its
communality.
Eigenvalues:/Characteristic roots: The eigenvalue for a given factor measures the variance
in all the variables which is accounted for by that factor. The ratio of eigenvalues is the ratio
of explanatory importance of the factors with respect to the variables. If a factor has a low
eigenvalue, then it is contributing little to the explanation of variances in the variables and
may be ignored as redundant with more important factors. Eigenvalues measure the amount
of variation in the total sample accounted for by each factor.
Extraction sums of squared loadings: Initial eigenvalues and eigenvalues after extraction
(listed by SPSS as "Extraction Sums of Squared Loadings") are the same for PCA extraction,
but for other extraction methods, eigenvalues after extraction will be lower than their initial
counterparts. SPSS also prints "Rotation Sums of Squared Loadings" and even for PCA,
these eigenvalues will differ from initial and extraction eigenvalues, though their total will be
the same.
Factor scores (also called component scores in PCA): are the scores of each case (row) on
each factor (column). To compute the factor score for a given case for a given factor, one
takes the case's standardized score on each variable, multiplies by the corresponding loadings
of the variable for the given factor, and sums these products. Computing factor scores allows
one to look for factor outliers. Also, factor scores may be used as variables in subsequent
modeling.
emphasizes parsimony (explaining variance with as few factors as possible), the criterion
could be as low as 50%
Scree plot: The Cattell scree test plots the components as the X axis and the corresponding
eigenvalues as the Y-axis. As one moves to the right, toward later components, the
eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep
decline, Cattell's scree test says to drop all further components after the one starting the
elbow. This rule is sometimes criticised for being amenable to researcher-controlled
"fudging". That is, as picking the "elbow" can be subjective because the curve has multiple
elbows or is a smooth curve, the researcher may be tempted to set the cut-off at the number of
factors desired by his or her research agenda.
Horn's Parallel Analysis (PA): A Monte-Carlo based simulation method that compares the
observed eigenvalues with those obtained from uncorrelated normal variables. A factor or
component is retained if the associated eigenvalue is bigger than the 95th of the distribution
of eigenvalues derived from the random data. PA is one of the most recommendable rules for
determining the number of components to retain,[citation needed] but only few programs include
this option.[9]
However, before dropping a factor below one's cutoff, the analyst(s) should create a data set
based on the factor loadings[clarification needed] and check the scores' correlation with any given
dependent variable(s) of interest. Scores based on a factor with a very small eigenvalue can
correlate strongly with dependent variables, in which case dropping such a factor from a
theoretical model may reduce its predictive validity.
Velicers (1976) MAP test[10] involves a complete principal components analysis followed
by the examination of a series of matrices of partial correlations (p. 397). The squared
correlation for Step 0 (see Figure 4) is the average squared off-diagonal correlation for the
unpartialed correlation matrix. On Step 1, the first principal component and its associated
items are partialed out. Thereafter, the average squared off-diagonal correlation for the
subsequent correlation matrix is then computed for Step 1. On Step 2, the first two principal
components are partialed out and the resultant average squared off-diagonal correlation is
again computed. The computations are carried out for k minus one step (k representing the
total number of variables in the matrix). Thereafter, all of the average squared correlations for
each step are lined up and the step number in the analyses that resulted in the lowest average
squared partial correlation determines the number of components or factors to retain (Velicer,
1976). By this method, components are maintained as long as the variance in the correlation
matrix represents systematic variance, as opposed to residual or error variance. Although
methodologically akin to principal components analysis, the MAP technique has been shown
to perform quite well in determining the number of factors to retain in multiple simulation
studies.[11][12][13] This procedure is made available through SPSS's user interface. See Courtney
(2013)[14] for guidance.
Rotation methods
The unrotated output maximises variance accounted for by the first and subsequent factors,
and forcing the factors to be orthogonal. This data-compression comes at the cost of having
most items load on the early factors, and usually, of having many items load substantially on
more than one factor. Rotation serves to make the output more understandable, by seeking socalled "Simple Structure": A pattern of loadings where items load most strongly on one
factor, and much more weakly on the other factors. Rotations can be orthogonal or oblique
(allowing the factors to correlate).
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of the
squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has
the effect of differentiating the original variables by extracted factor. Each factor will tend to
have either large or small loadings of any particular variable. A varimax solution yields
results which make it as easy as possible to identify each variable with a single factor. This is
the most common rotation option. However, the orthogonality (i.e., independence) of factors
is often an unrealistic assumption. Oblique rotations are inclusive of orthogonal rotation, and
for that reason, oblique rotations are a preferred method.[15]
Quartimax rotation is an orthogonal alternative which minimizes the number of factors
needed to explain each variable. This type of rotation often generates a general factor on
which most variables are loaded to a high or medium degree. Such a factor structure is
usually not helpful to the research purpose.
Equimax rotation is a compromise between Varimax and Quartimax criteria.
Direct oblimin rotation is the standard method when one wishes a non-orthogonal (oblique)
solution that is, one in which the factors are allowed to be correlated. This will result in
higher eigenvalues but diminished interpretability of the factors. See below.[clarification needed]
Promax rotation is an alternative non-orthogonal (oblique) rotation method which is
computationally faster than the direct oblimin method and therefore is sometimes used for
very large datasets.
History
Charles Spearman pioneered the use of factor analysis in the field of psychology and is
sometimes credited with the invention of factor analysis. He discovered that school children's
scores on a wide variety of seemingly unrelated subjects were positively correlated, which led
him to postulate that a general mental ability, or g, underlies and shapes human cognitive
performance. His postulate now enjoys broad support in the field of intelligence research,
where it is known as the g theory.
Raymond Cattell expanded on Spearman's idea of a two-factor theory of intelligence after
performing his own tests and factor analysis. He used a multi-factor theory to explain
intelligence. Cattell's theory addressed alternate factors in intellectual development, including
motivation and psychology. Cattell also developed several mathematical methods for
adjusting psychometric graphs, such as his "scree" test and similarity coefficients. His
research led to the development of his theory of fluid and crystallized intelligence, as well as
his 16 Personality Factors theory of personality. Cattell was a strong advocate of factor
analysis and psychometrics. He believed that all theory should be derived from research,
which supports the continued use of empirical observation and objective testing to study
human intelligence.
Applications in psychology
Factor analysis is used to identify "factors" that explain a variety of results on different tests.
For example, intelligence research found that people who get a high score on a test of verbal
ability are also good on other tests that require verbal abilities. Researchers explained this by
using factor analysis to isolate one factor, often called crystallized intelligence or verbal
intelligence, which represents the degree to which someone is able to solve problems
involving verbal skills.
Factor analysis in psychology is most often associated with intelligence research. However, it
also has been used to find factors in a broad range of domains such as personality, attitudes,
beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by
finding if the instrument indeed measures the postulated factors.
Advantages
Identification of groups of inter-related variables, to see how they are related to each
other. For example, Carroll used factor analysis to build his Three Stratum Theory. He
found that a factor called "broad visual perception" relates to how good an individual
is at visual tasks. He also found a "broad auditory perception" factor, relating to
auditory task capability. Furthermore, he found a global factor, called "g" or general
intelligence, that relates to both "broad visual perception" and "broad auditory
perception". This means someone with a high "g" is likely to have both a high "visual
perception" capability and a high "auditory perception" capability, and that "g"
therefore explains a good part of why someone is good or bad in both of those
domains.
Disadvantages
Factor analysis can be only as good as the data allows. In psychology, where
researchers often have to rely on less valid and reliable measures such as self-reports,
this can be problematic.
unique to each variable and error variance). That would, therefore, by definition, include only
variance that is common among the variables."
Brown (2009), Principal components analysis and exploratory factor analysis
Definitions, differences and choices
For this reason, Brown (2009) recommends using factor analysis when theoretical ideas about
relationships between variables exist, whereas PCA should be used if the goal of the
researcher is to explore patterns in their data.
PCA results in principal components that account for a maximal amount of variance
for observed variables; FA account for common variance in the data.[19]
PCA inserts ones on the diagonals of the correlation matrix; FA adjusts the diagonals
of the correlation matrix with the unique factors.[19]
PCA minimizes the sum of squared perpendicular distance to the component axis; FA
estimates factors which influence responses on observed variables.[19]
In PCA, the components yielded are uninterpretable, i.e. they do not represent
underlying constructs; in FA, the underlying constructs can be labeled and readily
interpreted, given an accurate model specification.[19]
Identify the salient attributes consumers use to evaluate products in this category.
Use quantitative marketing research techniques (such as surveys) to collect data from
a sample of potential customers concerning their ratings of all the product attributes.
Input the data into a statistical program and run the factor analysis procedure. The
computer will yield a set of underlying attributes (or factors).
Use these factors to construct perceptual maps and other product positioning devices.
Information collection
The data collection stage is usually done by marketing research professionals. Survey
questions ask the respondent to rate a product sample or descriptions of product concepts on a
range of attributes. Anywhere from five to twenty attributes are chosen. They could include
things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The
attributes chosen will vary depending on the product being studied. The same question is
asked about all the products in the study. The data for multiple products is coded and input
into a statistical program such as R, SPSS, SAS, Stata, STATISTICA, JMP, and SYSTAT.
Analysis
The analysis will isolate the underlying factors that explain the data using a matrix of
associations.[22] Factor analysis is an interdependence technique. The complete set of
interdependent relationships is examined. There is no specification of dependent variables,
independent variables, or causality. Factor analysis assumes that all the rating data on
different attributes can be reduced down to a few important dimensions. This reduction is
possible because some attributes may be related to each other. The rating given to any one
attribute is partially the result of the influence of other attributes. The statistical algorithm
deconstructs the rating (called a raw score) into its various components, and reconstructs the
partial scores into underlying factor scores. The degree of correlation between the initial raw
score and the final factor score is called a factor loading.
Advantages
Both objective and subjective attributes can be used provided the subjective attributes
can be converted into scores.
Factor analysis can identify latent dimensions or constructs that direct analysis may
not.
Disadvantages
If sets of observed variables are highly similar to each other and distinct from other
items, factor analysis will assign a single factor to them. This may obscure factors that
represent more interesting relationships.[clarification needed]
R-mode factor analysis, and the location of possible sources can be suggested by contouring
the factor scores.[24]
In geochemistry, different factors can correspond to different mineral associations, and thus
to mineralisation.[25]
Implementation
Factor analysis has been implemented in several statistical analysis programs since the 1980s:
SAS, BMDP and SPSS.[27] It is also implemented in the R programming language (with the
factanal function), OpenOpt, and the statistical software package Stata. Rotations are
implemented in the GPArotation R package.
See also
Wikimedia Commons has media related to Factor analysis.
Design of experiments
Multilinear PCA
Perceptual mapping
Product management
Q methodology
Recommendation system
Varimax rotation
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Velicer, W.F. (1976). "Determining the number of components from the matrix
of partial correlations". Psychometrika 41: 321327. doi:10.1007/bf02293557.
11.
12.
Garrido, L. E., & Abad, F. J., & Ponsoda, V. (2012). A new look at Horn's
parallel analysis with ordinal variables. Psychological Methods. Advance online
publication. doi:10.1037/a0030005
13.
14.
15.
Russell, D.W. (December 2002). "In search of underlying dimensions: The use
(and abuse) of factor analysis in Personality and Social Psychology Bulletin".
Personality and Social Psychology Bulletin 28 (12): 162946.
doi:10.1177/014616702237645.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Love, D.; Hallbauer, D.K.; Amos, A.; Hranova, R.K. (2004). "Factor analysis
as a tool in groundwater quality management: two southern African case studies".
Physics and Chemistry of the Earth 29: 113543. doi:10.1016/j.pce.2004.09.027.
25.
26.
27.
Further reading
Child, Dennis (2006). The Essentials of Factor Analysis (3rd ed.). Continuum
International. ISBN 978-0-8264-8000-2.
Fabrigar, L.R.; Wegener, D.T.; MacCallum, R.C.; Strahan, E.J. (September 1999).
"Evaluating the use of exploratory factor analysis in psychological research".
Psychological Methods 4 (3): 272299. doi:10.1037/1082-989X.4.3.272.
Jennrich, Robert I., "Rotation to Simple Loadings Using Component Loss Function: The
Oblique Case," Psychometrika, Vol. 71, No. 1, pp. 173-191, March 2006.
Katz, Jeffrey Owen, and Rohlf, F. James. Primary product functionplane: An oblique rotation
to simple structure. Multivariate Behavioral Research, April 1975, Vol. 10, pp. 219232.
Katz, Jeffrey Owen, and Rohlf, F. James. Functionplane: A new approach to simple structure
rotation. Psychometrika, March 1974, Vol. 39, No. 1, pp. 3751.
Katz, Jeffrey Owen, and Rohlf, F. James. Function-point cluster analysis. Systematic Zoology,
September 1973, Vol. 22, No. 3, pp. 295301.
External links