Quality Exploratory Factor Analysis
Quality Exploratory Factor Analysis
Quality Exploratory Factor Analysis
Estimates in Exploratory
Item Factor Analysis
Abstract
This article proposes a comprehensive approach for assessing the quality and appro-
priateness of exploratory factor analysis solutions intended for item calibration and
individual scoring. Three groups of properties are assessed: (a) strength and replic-
ability of the factorial solution, (b) determinacy and accuracy of the individual score
estimates, and (c) closeness to unidimensionality in the case of multidimensional solu-
tions. Within each group, indices are considered for two types of factor-analytic
models: the linear model for continuous responses and the categorical-variable-
methodology model that treats the item scores as ordered-categorical. All the
indices proposed have been implemented in a noncommercial and widely known pro-
gram for exploratory factor analysis. The usefulness of the proposal is illustrated with
a real data example in the personality domain.
Keywords
exploratory item factor analysis, factor determinacy, marginal and conditional reliabil-
ity, EAP-estimation, H index, closeness to unidimensionality
1
Universitat Rovira i Virgili, Tarragona, Spain
Corresponding Author:
Pere J. Ferrando, Research Centre for Behavioral Assessment, Universitat Rovira i Virgili, Facultad de
Psicologı́a, Carretera Valls s/n, 43007 Tarragona, Spain.
Email: perejoan.ferrando@urv.cat
2 Educational and Psychological Measurement
intended for SEMs in general (e.g., Ferrando & Lorenzo-Seva, 2017; Yuan, Chan,
Marcoulides, & Bentler, 2017). In principle, an acceptable fit is a basic requirement
for judging an EFA solution as appropriate. However, the sole reliance on this
requirement does not guarantee that the solution is a good one or is of practical use-
fulness, a point that is particularly relevant when EFA is used as a psychometric tool
for item calibration and individual scoring. Indeed, it is quite possible to obtain an
acceptable fit in a poorly determined solution based on low-quality items, which, in
turn, yields unreliable and indeterminate factor score estimates. Also, an essentially
unidimensional solution might require a multidimensional solution to be specified if
the model–data fit is to be acceptable. However, this solution might well consist of
additional minor and ill-defined factors of no substantive interest (e.g., Reise,
Bonifay, & Haviland, 2013).
Several complementary indices have been proposed for assessing the determinacy,
quality, and usefulness of psychometric FA solutions. Most of them have focused on
the unidimensional case (see, e.g., Hancock & Mueller, 2000), but recently,
Rodriguez, Reise, and Haviland (2016a, 2016b) have put forward a well-organized
proposal in the context of bifactor FA solutions (Reise, 2012). Also, most of the
indices are derived from the standard linear FA model. In this framework, most deri-
vations are quite direct because both the item scores and the factor score estimates
are linearly related to the common factors.
In practice, most item scores are discrete and bounded, so the linear FA model can
only be approximately correct (at best) when they are fitted. Our position is that the
linear approximation is reasonable when (a) the items have nonextreme distributions
and moderate discriminating power and (b) the number of categories is relatively
high (see Culpepper, 2013; Ferrando, 2009; Rhemtulla, Brosseau-Liard, & Savalei,
2012). When these conditions are not met, it is generally better to use categorical-
variable-methodology factor analysis (CVM-FA). CVM-FA is briefly summarized
below, but the most relevant point regarding the present developments is that the
relations between the factor(s) and the observed item scores are no longer linear.
The main aim of the present article is to propose a general approach for assessing
the quality, accuracy, and usefulness of a psychometric EFA application. The organi-
zation of our proposal closely follows that by Rodriguez et al. (2016a, 2016b).
However, there are important differences in both scope and content. First, we focus
mainly on multiple oblique solutions. Second, we consider measures based on both
linear FA and CVM-FA. Third, we are not concerned with sum test scores but only
with factor score estimates derived from calibration results. Finally, we propose only
simple indices that will be implemented in a well-known, noncommercial EFA pro-
gram, and which can be routinely used by the practitioner.
We shall now go on to summarize the starting position and scope of our proposal.
We consider full psychometric applications in which FA is used for both item cali-
bration and individual scoring. In this context, we consider that a good FA solution
not only has to reach an acceptable level of goodness of model-data fit but also has
Ferrando and Lorenzo-Seva 3
Factor score determinacy and accuracy FDI (regression based) FDI (EAP-based)
Marginal reliability Marginal reliability
(regression based) (EAP-based)
Individual reliabilities
Construct replicability G-H G-H latent
G-H observed
Closeness to unidimensionality ECV-global ECV-global
I-ECV I-ECV
IREAL-global IREAL-global
IREAL-item IREAL-item
to provide (a) a clearly interpretable and strong pattern solution expected to be replic-
able across samples and studies and (b) factor score estimates that are determinate
and accurate. The need for these strong requirements, however, should be qualified.
If only the assessment of the test structure is of interest, then only requirement (a) is
relevant. Requirement (b) is relevant in validity assessments based on estimated
scores and, above all, in individual assessment.
In our proposal, property (a) above—the strength and replicability of a pattern or
structure solution—is assessed by using extensions of Hancock and Mueller’s (2000)
H index, while property (b)—the determinacy and accuracy of the individual trait
estimates—is assessed by using different determinacy and reliability indices.
Many psychometric measures were initially intended to be essentially unidimen-
sional. However, as mentioned above, the EFAs of these measures in most cases
yield multidimensional solutions in which the factor structures and derived score
estimates do not reach the requirements discussed above. In this case it is quite rele-
vant to assess how close a multidimensional solution is to a unidimensional solution,
and we also propose indices to assess this issue. Overall, a summary of the present
proposal is given in Table 1.
Background
Consider a test, made up of n items, that measures m traits or common factors uk. Let
Xij be the observed score of respondent i on item j. In the linear EFA model, Xij is
taken as a continuous-unbounded variable, and its expected score is given by
where the ls are the factor loadings. Both the Xs, and the factors, us, are scaled in a
z-score metric (mean 0 and variance 1), so the ls are standardized loadings. For fixed
u, the Xs become linearly independent and their conditional distributions are assumed
to be normal. Furthermore, the marginal distribution of u is also assumed to be nor-
mal. The structural correlation matrix implied by Model (1) is
R = LFL0 + C ð2Þ
where L is the pattern loading matrix, F is the interfactor correlation matrix, and C
is the diagonal matrix of the item residual variances.
In the CVM-FA case, Model (1) is assumed to hold for latent response variables
X*s, normally distributed and scaled in a z-score metric, that underlie the observed
item scores
E(X ij jui ) = lj1 ui1 + + ljk uik + + ljm uim ð3Þ
Furthermore, the observed scores are assumed to arise as a result of a step func-
tion governed by c 2 1 thresholds: t1, . . . , tc 2 1 where c is the number of response
categories
X = i , ti1 \X \ti
ð4Þ
‘ = t0 \t1 \tc1 \tc = +‘
Under the conditions described so far, the CVM-EFA implied correlation structure
is that of Equation (2) in which R is now the interitem polychoric correlation matrix.
With reparameterization, the CVM-EFA model becomes the item response theory
(IRT) multidimensional two-parameter normal-ogive model for the binary case and
the normal-ogive multidimensional graded response model for more than two ordered
categories (see, e.g., Ferrando & Lorenzo-Seva, 2013, or McDonald, 1999). Here we
shall mainly use the FA parameterization. However, some IRT results will also be
used when the CVM-based indices are derived.
In the conventional EFA scenario considered here, the linear and the CVM models
are fitted by using a random-regressors two-stage estimation approach (McDonald,
1982). In the first stage (calibration), the structural item parameters in (2) and (4) are
estimated. In the second stage (scoring), the item parameter estimates are taken as fixed
and known, and used to estimate the individual trait levels for each respondent. We
shall not consider here specific calibration procedures. However, in the scoring stage
we shall consider only Bayes Expected a Posteriori (EAP) score estimates. The main
reason for this choice is that these scores have the highest correlations with the common
factors they measure (e.g., Mulaik, 2010). This is a basic property in some of the indices
proposed here and considerably simplifies many of the present developments.
In the linear EFA model (1), and under the conditional and prior normality
assumptions discussed above, the EAP point factor score estimates are known in FA
terminology as ‘‘regression factor scores’’ and were originally proposed by
Ferrando and Lorenzo-Seva 5
Thurstone (1935). The term ‘‘factor scores,’’ however, might lead to confusion
between the latent factor scores (which are unknown) and the score estimates. For
this reason we shall continue using the terminology ‘‘factor score estimates.’’
Strictly speaking, however, it should be noted that the term ‘‘estimates’’ is not cor-
rect in the usual statistical sense because there are no ‘‘true’’ parameter values to be
approximated by these estimates (see Maraun, 1996).
In the general oblique case, regression factor score estimates can be obtained in
closed form as (Thurstone, 1935)
where Xi, of dimension n 3 1 is the vector containing the standardized item scores of
respondent i, and S, of dimension n 3 m is the structure matrix whose elements are
the item–factors correlations.
In the case of CVM-EFA, the EAP point estimate of ui for the k dimension (uik)
cannot be obtained in closed form, and is obtained via the general definition:
Ð
uk L(xi ju)g(u)du
EAP(uik ) = uÐ ð6Þ
u L(xi ju)g(u)du
where g(u) is the joint multivariate prior density of u and L is the likelihood of xi
which can be written generically as
Y
n
L(xi jui ) = P(Xij jui ) ð7Þ
j=1
And the generic expression P(Xj|u) denotes the conditional probability assigned to a
specific item score for fixed u.
The diagonal elements of the posterior (error) covariance matrix are given by
Ð
2 (uk EAP(uik ))2 L(xi ju)g(u)du
PSD (uik ) = u Ð ð8Þ
u L(xi ju)g(u)du
(e.g., Beauducel, 2011). As mentioned above, the FDIs in (9) are the highest possible
of all the types of factor score estimates.
The unidimensional case is useful for understanding the determinants of the FDI
values. In this case, the FDI is obtained as
1
r(^uu) = (l0 R1 l)1=2 = sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð10Þ
1
1+ P n l2
j
s2
j=1 ej
P
n .
The term l2j 2 in (10) is the (constant) amount of information in the linear
sej
j=1
FA model (Ferrando, 2009; Mellenbergh, 1996). Clearly, the degree of determinacy
Ferrando and Lorenzo-Seva 7
depends on (a) the number of items and the signal-to-noise ratios between the squared
loadings and the residual variances. In the standardized modeling considered here,
the residual variances depend only on the loadings, so test length and the magnitude
of the loadings are the sole determinants of FDI.
The square of r(^uu) is one of the standard definitions of a reliability coefficient
(Brown & Croudace, 2015; Mellenbergh, 1996). So, by this definition, the squared
values of the FDI estimates obtained in (9) are interpreted as the reliabilities of the
corresponding factor score estimates.
We turn now to CVM-FA. Reliability estimates based on the r2(^uu) definition (and,
therefore, on the corresponding FDI estimates) have received some attention in the
IRT literature (Green, Bock, Humphreys, Linn, & Reckase, 1984; Samejima, 1977).
To derive the FDIs in this case, we shall write the EAP estimated score for individual
i in factor k as
^
uik = uik + dik ð11Þ
(see, e.g., Samejima, 1977). If the estimator in (11) is conditionally unbiased, then by
standard covariance algebra, it follows that the FDI could be obtained as
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var(^ uk ) Var(dk ) Var(^uk ) E(PSD2 (uik ))
r(^uk uk ) = = ð12Þ
Var(^ uk ) Var(^uk )
And its squared value is the corresponding reliability estimate. This estimate is an
empirical estimate (Brown & Croudace, 2015) which uses (a) the variance of the
EAP scores and (b) the average of the squared PSDs both obtained in the calibration
sample. However, unlike the linear estimate (9), which is correct for any number of
items, (12) is only asymptotically correct, because, as discussed above, the estimator
(11) is only asymptotically unbiased. As discussed below, in very short tests, we
expect (12) to be somewhat upwardly biased.
A conditional or individual reliability estimate (Green et al., 1984, Raju, Price,
Oshima, & Nering, 2007) can further be obtained as
Var(^
uk ) PSD2 (uik )
r(uik ) =
^ ð13Þ
Var(^uk )
So the reliability marginal estimate (i.e., the squared value in 12) is the average of
the individual estimates in (13). We propose to obtain the distribution of these indi-
vidual estimates as auxiliary information that complements the information provided
by the marginal estimate. To see the interest of this additional measure, consider that
an acceptable marginal reliability estimate is still compatible with the presence of a
non-negligible proportion of respondents that cannot be accurately measured.
To close this section, we note that Beauducel and Hilger (in press) considered a
scoring schema that is half way between (5) and (6), and derived unbiased FDIs (and
marginal reliability estimates) based on the resulting score estimates. Specifically
8 Educational and Psychological Measurement
they considered obtaining linear regression estimates of the form (5) in which S and
R were based on the CVM but Xi contained the observed categorical scores. We shall
not consider this approach in the present proposal, but it would be of interest in the
future to assess how the resulting FDI and reliability estimates behave in comparison
to those proposed here.
Construct Replicability
Hancock and Mueller (2000) and Hancock (2001) proposed an index to assess the
extent to which a factor is well represented by a set of items. This general concept
comprises several properties (mainly, the quality of the items as indicators of the fac-
tor, and the replicability of the factor solution across studies). Hancock and Mueller
(2000) labeled their index H, and used the term ‘‘construct reliability.’’Rodriguez
et al. (2016b) renamed it as ‘‘construct replicability,’’ which is the name we shall
use here. The initial proposal considered only the unidimensional case, and, using
the present notation, can be written as
1
H = (l0 R1 l) = 1
ð14Þ
1+ Pn l2
j
s2
j=1 ej
Essentially (14) measures the maximal proportion of the variance of the factor that
can be accounted for by its indicators. So, H is the squared correlation between the
factor and an optimal composite of its indicator scores, or in other words, the squared
multiple correlation between the factor and its indicators. We note that (14) is the
square of the FDI measure in (10) and, therefore, the reliability estimate we propose
here for the unidimensional linear model. This result is only to be expected given that
the regression factor score estimates are the optimal linear composite that maximizes
the multiple correlation.
In the general oblique case, the multiple correlations between the factors and their
indicators are obtained as the squared diagonal elements of the matrix (9) (e.g.,
Mulaik, 2010, Equation 13.16)
G H = diag FL0 R1 LF) = diag S0 R1 S ð15Þ
reference value because the factor was well represented. Rodriguez et al. (2016b)
raised it to 0.80. For the G-H indices proposed here, the 0.80 cutoff also seems to be
reasonable.
In summary, if the G-H conceptualization is accepted, it follows that, in the linear
model and when regression factor score estimates are considered, the measures of
determinacy, reliability, and construct replicability are all obtained from the same
basic expression. So, the squared FDIs can be interpreted as both the reliabilities of
the regression factor score estimates and the squared multiple correlations between
the item scores and the common factors (i.e., generalized H measures).
We turn now to the CVM-FA where the relations are more complex. Consider first
that (15) is computed by using (a) the calibration estimates obtained from fitting a
CVM-FA solution and (b) the interitem polychoric correlation matrix. The diagonal
elements of (15) now become the multiple correlations between the factors and the
continuous latent response variables that underlie the observed item scores. We shall
label the index proposed so far as G-H-latent.
The multiple correlations between the factors and the observed item scores are
necessarily lower than the corresponding G-H-latent values due to (a) the nonlinear-
ity of the item-factor regressions and (b) attenuation for coarse grouping. They can
be predicted from the CVM-FA solution as follows. First, R can be directly esti-
mated via the product moment interitem correlation matrix. Second, the elements of
S in G-H-latent are the (polyserial) item–factor correlations. So, the product–moment
item–factor correlations can be predicted from the elements of S by using the relation
between the polyserial and the product-moment correlation (e.g., Olsson, Drasgow,
& Dorans, 1982)
cP
1
r(X j , uk ) f(tu )
u=1
r(X j , uk ) = pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð16Þ
Var(X j )
where f is the ordinate of the standard normal distribution. The resulting measure is
denoted by G-H-observed and, when compared to G-H-latent, quantifies the pre-
dicted loss of information and construct replicability that will occur if the item scores
are treated as continuous-unbounded variables and fitted with the linear EFA model.
We believe that this information is relevant to deciding which model is the most rea-
sonable for a given analysis: if the differences between G-H-latent and G-H-observed
are minor, the simpler linear model could be considered.
Finally, we should point out that the equivalence between the reliabilities of the
factor score estimates and the generalized H measures does not hold in the CVM
case. G-H-latent can be viewed as the hypothetical reliability that the regression
scores would have in model (3) if the underlying latent response variables were avail-
able. Indeed, this is not the case, and the EAP estimates in (6) are obtained from the
pattern of observed scores, as specified in Equation (7).
10 Educational and Psychological Measurement
Closeness to Unidimensionality
A review of many reported oblique solutions suggests that they are compatible with
an essentially unidimensional solution (Reise, Cook, & Moore, 2015; Reise et al.,
2013). Furthermore, according to the proposal made here, for an oblique solution to
be justifiable and useful, all the proposed factors have to be well defined and replic-
able (in terms of G-H) and lead to determinate and reliable factor score estimates.
We suspect that this is not the case in most applications. So, given these results, it
seems necessary to assess the extent to which an oblique EFA solution is close to
unidimensionality, and interpretable in these terms. In this assessment, it should also
be considered that forcing a unidimensional solution on data that is clearly multidi-
mensional can lead to biased results in which the single fitted factor does not reflect
a unitary construct but is, essentially, a weighted composite of the different factors.
A simple and informative index that assesses closeness to unidimensionality has
been proposed in slightly different variants for the linear FA model (see, e.g.,
Rodriguez et al., 2016a, 2016b). Here we propose using the version by ten Berge and
Kiers (1991) based on minimum rank factor analysis (MRFA). For a unidimensional
solution, MRFA produces a reduced correlation matrix (with communalities in the
main diagonal) so that the sum of its eigenvalues except the first one is the smallest
possible. Conceptually this is equivalent to obtaining a canonical factor solution
(e.g., Harman, 1962) in n 2 1 factors in which the sum of the squared loadings on
the first factor is the maximum possible and the sum of the squared loadings on the
remaining n 2 2 factors is the smallest possible. A natural index in this setting is
the explained common variance (ECV) index, which in terms of factor loadings is
given by
P 2
lj1
j
ECV = P 2 P 2 P 2 ð17Þ
lj1 + lj2 + + ljn1
j j j
Stucky, Thissen, and Edelen (2013) proposed that ECV should also be computed
at the single item level j, and that the resulting index be labelled I-ECV. Here we
propose that this index (as derived from 17 in our case) also be used as an auxiliary
measure useful for detecting the items that most contribute to the departure from
unidimensionality.
Essentially, (17) measures the relative magnitude of the squared loadings on the
first MRFA factor with respect to the magnitude of the full set of squared loadings
on the complete MRFA solution in n 2 1 factors. So, in principle, the index can be
directly computed from the linear and CVM solutions (although the interpretation in
terms of explained common variance is different). We also note that the index can be
computed with no need to specify a particular alternative solution in terms of struc-
ture or number of factors. Finally, regarding cutoff values, it has been proposed that
ECV cutoff values should be in the range 0.70 to 0.85 if it is to be concluded that a
Ferrando and Lorenzo-Seva 11
Implementation
All the indices proposed in this article have been implemented in version 10.5 of the
program FACTOR (Ferrando & Lorenzo-Seva, 2017), a well-known, free exploratory
factor analysis program that can be downloaded at http://psico.fcep.urv.cat/utilitats/
factor/. Indices of determinacy, reliability, and construct replicability are provided as
default output for both linear and CVM solutions. Indices of closeness to unidimen-
sionality are provided as default when a unidimensional solution is requested, and are
optional otherwise. All the proposed indices are relatively simple to implement.
However, as Grice (2001) noted in the context of factor score assessment, no com-
mercial or widely available programs appear to provide this type of index.
Hancock and Mueller (2000) considered that it was important to report confidence
intervals (CIs) for H, and proposed that they be derived with Bootstrap resampling.
We believe that this point is also relevant for all the indices proposed here. In princi-
ple, CIs for some of the indices based on linear FA, mainly FDIs, marginal reliabil-
ities, and G-H indices, could be analytically approximated by using the delta method
(see, e.g., Raykov, 2002). For the remaining indices, however, an analytical treatment
appears to be very complex. For this reason, we decided to implement a unified treat-
ment in FACTOR in which bootstrap-based confidence intervals are available for all
the indices proposed here. The 90%, 95%, and 99% confidence intervals available
are (a) percentile intervals and (b) bias-corrected percentile intervals. The number of
bootstrap samples can be defined by the user in the range [500, 3,000].
12 Educational and Psychological Measurement
Illustrative Example
The real-data study in this section is based on a Spanish version of Buss and Perry’s
(1992) aggression questionnaire (AQ; Vigil-Colet, Lorenzo-Seva, & Morales-Vives,
2015). The AQ is a multidimensional questionnaire made up of 5-point Likert-type
items intended to measure different related dimensions of aggression. For the present
illustration, we chose a subset of 20 items that were expected to measure two factors:
physical aggression (PA; 7 items) and nonphysical aggression (NPA; 13 items). The
indicators, however, were not expected to be so factorially pure that an independent-
cluster solution could be specified. So, an unrestricted solution was fitted instead.
The questionnaire was administered to a sample of 538 secondary school students
aged between 12 and 17 years. Data were kindly supplied by Dr. A. Vigil-Colet.
Descriptive analysis of the item scores showed that the distributions were gener-
ally not extreme and that linear EFA could be considered a reasonable approach. To
illustrate all the procedures proposed here, both linear EFA and CVM-EFA solutions
were fitted to the data. In both cases a two-factor solution was fitted by using robust
unweighted least squares estimation as implemented in FACTOR.
Goodness of model–data fit was assessed by using both the conventional approach
and the recent proposal by Yuan et al. (2017) based on equivalence testing. So far,
the latter approach has only been fully developed for the root mean square error of
approximation (RMSEA) and comparative fit index (CFI) measures based on the lin-
ear model, so we have only used it in this case.
For both models, goodness of fit results are in the upper panel of Table 2. The
RMSEA and CFI measures are based on the second-order (mean and variance) cor-
rected chi-square statistic proposed by Asparouhov and Muthen (2010). Overall, the
fit based on the conventional approach can be considered to be acceptable and quite
similar in both solutions. Equivalence-testing results for linear FA also suggests that
the fit of the model is acceptable.
The canonical pattern was then rotated using the Promin criterion (Lorenzo-Seva,
1999), and the solutions are in table 2 with the dominant loadings boldfaced. The esti-
mated inter-factor correlations were f = 0.50 (linear) and f = 0.53 (CVM).
As Table 2 shows, none of the solutions have an independent-cluster structure.
However, they are quite clear: Bentler’s simplicity indices are 0.997 in both linear
and CVM, and the overall congruence between the linear and the ECV solution is
0.999. Overall, (a) the factors can be well distinguished, (b) the solution agrees with
the ‘‘a priori’’ hypothesis, and (c) the linear and CVM patterns are very similar.
The calibration estimates were taken as fixed and known, and EAP factor score
estimates, and PSDs, were obtained. In the CVM case, the prior for u was specified
as bivariate standard normal with correlations of 0.50 (linear) and 0.53 (CVM) (see
Ferrando & Lorenzo-Seva, 2016).
The results about the determinacy and accuracy of the EAP scores are in the upper
rows of Table 3. For linear FA, the determinacies are acceptable for both factors, sug-
gesting that the factor score estimates reflect quite univocally the latent levels they
attempt to estimate. And the estimated reliabilities are appropriate for most
Ferrando and Lorenzo-Seva 13
Linear .054 (.051; .055) .062 (fair) .97 .953 (close) .98 .052
CVM .056 (.051; .057) .97 .98 .058
Note. RMSEA = root mean square error of approximation; CI = confidence interval; CFI = comparative
fit index; GFI = goodness of fit index; RMSR = root mean square residual; FA = factor analysis; CVM-FA
= categorical-variable-methodology factor analysis. Values in boldface indicate the dominant loading.
applications, although perhaps a little low for accurate individual assessment. As for
the CVM-FA, the determinacy and reliability results are virtually the same as the
ones obtained from the linear model for the second factor, but are clearly higher for
the longer NPA factor.
Table 4 shows the within-and-between-model (linear vs. CVM) correlations
between the factor score estimates. Furthermore, the correlations corresponding to
the same factor measured with the different models were corrected for unreliability
by using the marginal reliability estimates in Table 3, which are again displayed
in the main diagonal of Table 4. Results can be summarized as follows. First, the
disattenuated correlations are 1 for both factors, which again suggests that the
14 Educational and Psychological Measurement
FDI .921 (.908; .935) .936 (.924; .946) .954 (.939; .966) .936 (.897; .965)
Marginal reliability .849 (.824; .864) .876 (.853; .894) .912 (.882; .934) .876 (.805; .931)
Latent Latent
G-H .849 (.824; .864) .876 (.853; .894) .876 (.848; .889) 0.919 (.899; .930)
Observed Observed
Table 4. Correlations Among the Factor Score Estimates With the Marginal Reliabilities in
the Main Diagonal.
^u1L .849
^u2L .558 .876
^u1CVM .951 .503 .912
^u2CVM .574 .945 .556 .876
linear-based and the CVM-based score estimates measure the same factors. Second,
the interfactor correlation estimates, both within and between models, agree quite
well with the structural interfactor correlation estimates reported above. This second
result provides more support for the FDI results above that the factor score estimates
are good proxies for the corresponding latent factor scores.
Figure 1 shows the distribution of the individual reliabilities for both factors in the
CVM-FA. It seems clear that Factor 1 not only has a higher marginal reliability but is
also able to accurately measure most of the respondents. In contrast, although the esti-
mated marginal reliability of Factor 2 is only a bit lower, it will provide poor mea-
surement precision for many respondents.
Construct replicability indices and the confidence intervals are in the lower rows
of Table 3. In all cases they are acceptable, which suggests that in both linear and
CVM solutions both factors are well defined and so the solution is expected to remain
stable across studies. In the linear case, the G-H values are the same as the reliability
estimates, as discussed above, and they reasonably agree with the G-H-Observed val-
ues predicted from the CVM-FA. As expected, the GH-latent values are the highest
for both factors, reflecting the result that the factors are better defined by the
Ferrando and Lorenzo-Seva 15
Figure 1. Distribution of the individual reliabilities estimates of the F1 and F2 scores, CVM-FA.
underlying responses than by the observed item scores. Finally, we note that the
CVM-based marginal reliability for the PA factor is below the corresponding G-H
value, which seems reasonable. However, this is not the case for the NPA factor,
which suggests that the marginal reliability estimate for this factor is possibly a little
too optimistic.
16 Educational and Psychological Measurement
Discussion
The main purpose of this article was to propose and implement a series of auxiliary
indices designed to judge the quality and usefulness of FA solutions intended for psy-
chometric applications. Our idea was to propose simple indices that could be provided
as the standard output of an FA program requiring minimal specifications by the user.
Overall, we believe that this purpose has been achieved, and that the proposal is poten-
tially useful for practitioners. However, some issues deserve further discussion.
The first of these issues is the relevance and scope of the contribution. For
decades, the dominant view regarding item FA has been that confirmatory FA is the
way to go, while EFA is at best a rough precursor that can be useful only in the pre-
liminary stages of the analysis (see, e.g., Ferrando & Lorenzo-Seva, 2017). In princi-
ple, we do not agree with this view and, like Cattell (1986), believe that most items
are inherently complex and that unrestricted FA is the most natural and flexible
approach for calibrating and scoring them. This is not an isolated opinion. In recent
times there has been growing discontent among practitioners regarding the unneces-
sarily strong restrictions of strict confirmatory solutions more flexible methods have
been on the rise (e.g., Marsh, Morin, Parker, & Kaur, 2014).
With regard to the scope, we believe it is considerable. In the illustrative example,
we have purposely considered the less restricted form of EFA based on analytical
rotations. However, the procedures proposed here can also be used with more
restricted approaches based on Procrustres transformations against fully specified or
semispecified targets, which are also available in FACTOR (e.g., Ferrando &
Lorenzo-Seva, 2013).
On the practical level, we have implemented CIs based on bootstrap resampling
for all the proposed indices. They seem to work well but are rather time-consuming.
So, perhaps an approach in which approximate CIs are obtained analytically for the
indices in which this approach is feasible, and using Bootstrap for the remaining ones
would be the best option. This point is left for future research.
On the methodological level, the proposal made here is mostly based on results
that are known in the psychometric or statistical literature. However, the novelty is
that this is the first time so many of these results have been used in the present
Ferrando and Lorenzo-Seva 17
context. We are not aware of generalized H indices being used in oblique solutions,
or that they are interpreted differently in linear and the CVM models.
To summarize, we acknowledge that the proposal has its share of limitations and
points that deserve further study. While factor indeterminacy and reliability estimates
are correct for any number of items in the linear case, the empirical estimates in the
CVM case are only asymptotically correct, and probably biased in short tests.
Furthermore, this bias might well depend on the estimation method that is chosen for
calibrating the items (see Beauducel & Hilger, 2017). So, the potential improvement
of these estimates is an issue that warrants further research. More generally, if the
procedures proposed here are to be used correctly, sensible and well-established ref-
erence values need to be provided for all the indices. This point is particularly rele-
vant for the IREAL index in which only a rough rule of thumb has been tentatively
proposed as a cutoff. Overall, further intensive research based on both simulation
and real data, as well as further statistical developments are needed if reference val-
ues are to be improved.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship,
and/or publication of this article: This research was supported by a grant from the Catalan
Ministry of Universities, Research and the Information Society (2014 SGR 73) and by a grant
from the Spanish Ministry of Economy and Competitivity (PSI2014-52884-P).
References
Asparouhov, T., & Muthen, B. (2010, May 3). Simple second order chi-square correction
(Unpublished manuscript). Retrieved from https://www.statmodel.com/download/
WLSMV_new_chi21.pdf
Beauducel, A. (2011). Indeterminacy of factor scores in slightly misspecified confirmatory
factor models. Journal of Modern Applied Statistical Methods, 10, 583-598.
Beauducel, A., & Hilger, N. (2017). The determinacy of the regression factor score predictor
based on continuous parameter estimates from categorical variables. Communications in
Statistics—Theory and Methods, 46, 3417-3425.
Beauducel, A., & Hilger, N. (in press). On the bias of factor score determinacy coefficients
based on different estimation methods of the exploratory factor model. Communications in
Statistics—Simulation and Computation.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer
environment. Applied Psychological Measurement, 6, 431-444.
Brown, A., & Croudace, T. (2015). Scoring and estimating score precision using
multidimensional IRT. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response
18 Educational and Psychological Measurement
McDonald, R. P. (1982). Linear versus models in item response theory. Applied Psychological
Measurement, 6, 379-396.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: LEA.
Mellenbergh, G. J. (1996). Measurement precision in test score and item response models.
Psychological Methods, 1, 293-299.
Mulaik, S. A. (2010). Foundations of factor analysis. Boca Raton, FL: CRC Press.
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient.
Psychometrika, 47, 337-347.
Raju, N. S., Price, L. R., Oshima, T. C., & Nering, M. L. (2007). Standardized conditional
SEM: A case for conditional reliability. Applied Psychological Measurement, 31, 169-180.
Raykov, T. (2002). Analytic estimation of standard error and confidence interval for scale
reliability. Multivariate Behavioral Research, 37, 89-103.
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral
Research, 47, 667-696.
Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring and modeling psychological
measures in the presence of multidimensionality. Journal of Personality Assessment, 95,
129-140.
Reise, S. R., Cook, K. F., & Moore, T. M. (2015). Evaluating the impact of
multidimensionality on unidimensional item response theory model parameters. In S. P.
Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to
typical performance assessment (pp. 13-40). New York, NY: Routledge.
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be
treated as continuous? A comparison of robust continuous and categorical SEM estimation
methods under suboptimal conditions. Psychological Methods, 17, 354-373.
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016a). Evaluating bifactor models:
Calculating and interpreting statistical indices. Psychological Methods, 21, 137-150.
Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016b). Applying bifactor statistical indices in
the evaluation of psychological measures. Journal of Personality Assessment, 98, 223-237.
Samejima, F. (1977). Weakly parallel tests in latent trait theory with some criticism of classical
test theory. Psychometrika, 42, 193-198.
Stucky, B. D., Thissen, D., & Orlando Edelen, M. (2013). Using logistic approximations of
marginal trace lines to develop short assessments. Applied Psychological Measurement, 37,
41-57.
ten Berge, J. M., & Kiers, H. A. (1991). A numerical approach to the approximate and the
exact minimum rank of a covariance matrix. Psychometrika, 56, 309-315.
Thurstone, L. L. (1935). The vectors of mind. Chicago, IL: University of Chicago Press.
Vigil-Colet, A., Lorenzo-Seva, U., & Morales-Vives, F. (2015). The effects of ageing on self-
reported aggression measures are partly explained by response bias. Psicothema, 27,
209-215.
Yuan, K. H., Chan, W., Marcoulides, G. A., & Bentler, P. M. (2016). Assessing structural
equation models by equivalence testing with adjusted fit indexes. Structural Equation
Modeling, 23, 319-330.
Yule, G. U. (1907). On the theory of correlation for any number of variables, treated by a new
system of notation. Proceedings of the Royal Society of London. Series A, Containing
Papers of a Mathematical and Physical Character, 79, 182-193.