Quality of Life Research (2005) 14: 1203–1218
DOI 10.1007/s11136-004-5742-3
Ó Springer 2005
Are factor analytical techniques used appropriately in the validation of health
status questionnaires? A systematic review on the quality of factor analysis of
the SF-36
Henrica C.W. de Vet1, Herman J. Adèr1,2, Caroline B. Terwee1 & François Pouwer 1,3
1
Institute for Research in Extramural Medicine (E-mail: HCW.deVet@vumc.nl); 2Department of Clinical
Epidemiology and Biostatistics; 3Department of Medical Psychology, VU University Medical Center,
Amsterdam, The Netherlands
Accepted in revised form 1 November 2004
Abstract
Factor analysis is widely used to evaluate whether questionnaire items can be grouped into clusters representing different dimensions of the construct under study. This review focuses on the appropriate use of
factor analysis. The Medical Outcomes Study Short Form-36 (SF-36) is used as an example. Articles were
systematically searched and assessed according to a number of criteria for appropriate use and reporting.
Twenty-eight studies were identified: exploratory factor analysis was performed in 22 studies, confirmatory
factor analysis was performed in five studies and in one study both were performed. Substantial shortcomings were found in the reporting and justification of the methods applied. In 15 of the 23 studies in
which exploratory factor analysis was performed, confirmatory factor analysis would have been more
appropriate. Cross-validation was rarely performed. Presentation of the results and conclusions was often
incomplete. Some of our results are specific for the SF-36, but the finding that both the application and the
reporting of factor analysis leaves much room for improvement probably applies to other health status
questionnaires as well. Optimal reporting and justification of methods is crucial for correct interpretation of
the results and verification of the conclusions. Our list of criteria may be useful for journal editors,
reviewers and researchers who have to assess publications in which factor analysis is applied.
Key words: Confirmatory factor analysis, Exploratory factor analysis, Health status, Methodological
quality, Principal component analysis, SF-36, Systematic review, Validation
Introduction
In psychosocial and medical sciences, constructs
such as ‘anxiety’ or ‘quality of life’ are usually
measured by means of multi-item health status
questionnaires. It is important that the validity of
these instruments is extensively tested before they
are used. An important step in the validation of
multi-item questionnaires is factor analysis. Factor analysis is a statistical technique which is
designed to reveal whether or not the pattern of
responses on a number of items can be explained
by a smaller number of underlying factors [1–3].
The aim may be either pure data reduction,
assessment of the factor structure (dimensions)
being measured by the questionnaire, or investigating whether the questionnaire shows the same
dimensions across different groups (structural
reliability).
When no clear-cut ideas about the factor
structure (number of dimensions and their mutual
associations) exist, the factor structure of an
instrument can best be investigated by means of
exploratory factor analysis. If prior hypotheses
exist, based on theory or previous analyses, confirmatory factor analysis is more appropriate: it
can be used to test whether the data fit a premeditated factor structure [1].
1204
There are two forms of exploratory factor
analysis: principal component analysis and common factor analysis [1, 2]. Principal component
analysis aims to explain all variance in the data set;
common factor analysis only explains the common
variance of all items, and not the unique variance
of the items. Whether this leads to different outcomes is still a subject of discussion [1, 2, 4].
According to Floyd and Widaman [1] principal
component analysis is most appropriate for data
reduction, making it possible to represent the data
in a minimal number of dimensions (factors). This
method is useful when the aim is to group a set of
items in a few meaningful clusters for which
summary indices can be calculated. Common factor analysis provides valuable insights into the
multivariate structure of an instrument, by
explaining the covariance and correlation structure
among the measured items [1, 2]. It aims to identify a set of more general factors (latent variables)
that explain the covariances among the measured
variables. In common factor analysis, the factor
loadings can be interpreted as the regression
weights for predicting the items from the latent
variables, also producing correlations among these
[1]. In practice, both methods are used in similar
situations.
Confirmatory factor analysis makes it possible to
test whether a hypothesized factor structure of a
questionnaire (based either on empirical data or
on theory) is supported by actual data. Structural
equation modeling techniques are used to test
hypotheses about relationships between observed
variables (items) and factors. This requires formal
specification of a model to be estimated. The
goodness-of-fit of the model to the data can be
assessed by means of fit indices, while the estimated coefficients provide information about the
associations between items and factors, and also
between factors [1].
Factor analysis is a complex, but flexible analytical procedure. When reading studies on factor
analysis, we encountered a wide variation in
methods used and in the ways the results were
presented and interpret. Our own need for guidelines to judge and interpret results of factor analysis made us initiate the present study. The quality
of results of factor analysis greatly depends on
whether researchers comply with the assumptions
underlying the methods and the principles that
have been developed for appropriate application.
Explicitness in rationale and details of the methods
used in every step of the procedure are necessary to
help other researchers to interpret, verify and
replicate the results. To the best of our knowledge,
no critical assessment of the quality of factor
analysis in the validation of health status questionnaires has yet been performed.
Since there is a wealth of multi-item questionnaires that can be used, we had to restrict our
selection in some way. Options were restriction to
a specific journal and evaluate all factor analyses
that were done on multi-item health status questionnaires, or examining all factor analyses in relevant journals in a specific year (e.g. 2003), or
choose one specific health status questionnaire. We
choose for one, well known, health status questionnaire because of several advantages: only one
instead of many different health status questionnaires and its background (available data about
hypothesized structure) had to be described, reasonings and discussions by the authors can be
judged better, and it facilitates comparability of
the results [not subject of this manuscript]. We
decided to restrict our analysis to the Medical
Outcomes Study Short Form-36 [5]. The SF-36 is a
generic, 36-item questionnaire that was designed in
the 1980s to measure the concept of health status.
It has been reported on in over 1000 publications
[6]. Exploratory factor analyses have been used to
determine the factor structure of the SF-36 [7–9].
The manual of this instrument [8] describes eight
subscales (dimensions) that can be calculated from
35 of the 36 items (one item about self-reported
health transition is not included in the scores): (1)
physical functioning (10 items); (2) social functioning (2 items); (3) role limitations due to physical problems (4 items); (4) role limitations due to
emotional problems (3 items); (5) mental health (5
items); (6) energy/vitality (4 items); (7) pain (2
items); and (8) general health perception (5 items).
These 8 subscales were subjected to factor analysis
which resulted in the development of a physical
health subscale and a mental health subscale. This
factor analysis based on the 8 subscales and not on
the 35 individual items is called a second-order
factor analysis. Ware and Gandek [10] present an
excellent overview of the development and psychometric evaluation of the reliability and validity
of the SF-36. In many studies the factor structure
1205
of the SF-36 has been assessed when applying the
questionnaire to different populations, with respect to language, culture or health status.
The objective of this systematic review is, primarily, to perform a critical assessment of the use
of factor analytical techniques in the construct
validity of health status questionnaires; the SF-36
is used as an example.
Methods
Searching for relevant studies
The electronic Medline (1966–August 2002) database was used to search for relevant studies. The
following free-text words were used: ‘SF-36’ in
combination with ‘factor analysis’, ‘factor analyses’, ‘factor analytical’, ‘principal component’ and
‘validity’, ‘exploratory’, ‘confirmatory’, ‘factor
structure’ or ‘latent structure’. Additionally, the
reference lists of all relevant studies were scanned.
Study selection
We included studies that evaluated the factor
structure of the SF-36, using exploratory factor
analysis (principal components analysis or common factor analysis) or confirmatory factor analysis, and were published in a peer-reviewed journal
(English language). Studies that evaluated the firstorder factor structure (of the 35 individual items)
and studies that evaluated the second-order factor
structure (of the 8 subscales) were selected. Studies
that reported on the factor structure of shorter
versions of the SF-36 (e.g. SF-12 [11]) were not
included. Two researchers, FP and HdV, checked
the titles and abstracts for eligibility. In case of
doubt the full paper was examined.
Checklist to evaluate the methodological quality
of factor analysis
We developed a checklist to score the methodological quality of the factor analysis in the identified studies. This checklist encompasses the choice
and justification of the methods used, items on
data quality, and detailed information on statistical entities. As it is difficult to apply the theoretical
criteria to empirical studies, we scored and
discussed five studies in a pilot run, in order to
compile detailed instructions on how to score the
items. The list of methodological criteria is presented in Table 1.
A. Choice and justification of the method of factor
analysis
1. Basic choice for exploratory and/or confirmatory
factor analysis
As mentioned in the introduction, it is important to
make a distinction between exploratory factor
analysis and confirmatory factor analysis, which
may provide answers to different research questions, and pose different requirements on the data
collected. The two techniques have different
assumptions about the data and how they should
be handled. For an elaborate account of exploratory factor analysis, we refer to Zwick and Velicer
[12]. In short, principal component analysis or
common factor analysis is applied to a set of items,
and this results in an (unrotated) factor structure.
To determine how many factors are meaningful, a
number of methods are available [13]. The factors
are rotated to facilitate interpretation. Orthogonal
(e.g. Varimax) rotation, which is often used, assumes the factors to be uncorrelated. Oblique
rotation allows for correlations between the factors
[1]. The requirements for confirmatory factor
analysis have been extensively described by Bollen
[14]. The information to considered includes the
estimates of the parameters and their standard errors, and the appropriate goodness-of-fit measures.
The choice of the method of factor analysis
(exploratory or confirmatory) is crucial. We assessed whether the choice of the specific method
was appropriate for the research question that was
addressed. We considered exploratory factor
analyses to be appropriate if the aim of the study
(according to the authors) was to examine the
factor structure of the SF-36 in a patient population or language in which the SF-36 had not yet
been used, without a prior hypothesis. If the aim of
the study (according to the authors) was to confirm
the existing first-order 8-factor structure or the
second order 2-factor structure, we considered
confirmatory factor analysis to be more appropriate. If exploratory factor analysis was used, it
should be stated whether principal component
analysis or common factor analysis was performed.
1206
Table 1. Checklist and results
Item Description
Scoring
+
A. Choice and justification of methods
1. Exploratory vs. confirmatory factor analysis
1.1 Is the type of factor analysis appropriate in view of the research question?
13
1.2 When both types of factor analysis were used, has this analysis strategy been
1
convincingly justified?
2. Exploratory factor analysis
2.1 Has the number of factors to be rotated been justified?
20
2.2 Has the choice of the rotation method been justified?
2
2.3 Is the interpretation of the final factor solution properly justified?
12
2.4 In the case of a non-orthogonal factor structure, has the association between factors 2
been discussed?
3. Confirmatory factor analysis
3.1 Has the model to be confirmed been well described?
6
3.2 Has the strategy to arrive at the ‘best’ model been well described?
6
3.3 Were the analysis results properly interpreted?
5
3.4 Has the association between factors been discussed?
3
4. Cross-validation
4.1 Has cross-validation been applied in case this was possible?
4
4.2 Has cross-validation been performed with different randomly drawn samples?
4.3 If applied, did the number of observations justify this procedure?
4
4.4 If applied, was the interpretation of the results convincing?
3
B. Sample size and data quality
1. Sample size
1.1 Has the number of observations been sufficient to justify the use of factor analysis?1
1.2 Has the number of observations been sufficient to perform cross-validation?2
2. Data quality: missing data procedures
2.1 Does the study report on the percentage of missings?
2.2 If this percentage is alarming (>25%), is there information about whether the
missing were considered random?
2.3 If missing data have been imputed, was the imputation method appropriate?
3. Data quality: distributional properties
3.1 Have the distributional properties (at least standard deviations in exploratory
factor analysis and kurtosis in confirmatory factor analysis) of the variables
been reported?
3.2 In case of undesirable distributional properties (lack of variance in exploratory
factor analysis and excessive kurtosis in confirmatory factor analysis), have they
been properly handled?
C. Full report of statistical entities
1. Exploratory factor analysis (N = 23)
1.1 Principal component analyses or common factor analyses
1.2 Criteria for retaining factors
1.3 Eigenvalues, percentage of variance accounted for by the (un)rotated factors
1.4 Rotation method
1.5 Rationale for rotation in case of oblique solutions
1.6 All rotated factor loadings
1.7 Factor inter-correlation in oblique solutions
2. Confirmatory factor analysis (n = 6)
2.1 Number of factors
2.2 Composition of factors
2.3 Orthogonal vs. correlated factors
2.4 Other model constraints (fixed and free parameters)
2.5 Methods of estimation
2.6 Overall fit
)
?
0
N.A.
15
0
0
0
0
0
0
27
3
1
8
1
0
0
0
0
0
0
0
0
5
25
8
25
0
0
1
3
0
0
0
0
0
0
0
0
22
22
22
22
21
4
3
24
24
24
1
25
25
3
3
0
0
0
0
0
0
15
0
13
1
0
0
0
0
0
27
8
0
0
0
20
21
7
0
0
0
1
3
0
0
24
Yes
23
16
14
21
2
20
2
6
6
4
3
5
6
No
0
7
9
2
1
3
1
NA
5
5
5
5
25
5
25
0
0
2
3
1
0
22
22
22
22
22
22
1207
Table 1. Continued
Item Description
Scoring
+
2.7 Relative fit
2.8 Parsimony
2.9 Any model modification to improve model fit to data
2.10 Factor loadings
2.11 Communality (or squared correlations of observed variables with the factors)
2.12 Factor correlations
)
4
2
5
4
1
3
?
2
0
2
5
3
0
N.A.
22
26
23
22
22
22
* – Number of scores, for papers for which the question was relevant.
+ – The description is both informative and methodologically sound.
) – The description is informative, but methodologically doubtful.
? – The description is too unclear or too incomplete to answer the question.
0 – No relevant information about this item is given in the paper.
1
Smallest subsample was scored.
2
Largest subsample was scored.
2. Performance of exploratory factor analysis
Justification of the choice of methods within
exploratory analysis is required. This includes (1) a
justification of the number of factors to be rotated:
authors should have used and described at least
one of the criteria for retaining factors before
rotation; (2) a justification of the choice of the
rotation method (if orthogonal (e.g. Varimax)
rotation was used, no justification was required);
(3) an elucidation of the interpretation of the factor solution and a discussion of the consequences:
these should include a final choice for a specific
factor structure and a justification of this choice;
and (4) if factor analysis with non-orthogonal
rotation was used, the associations between factors
should be discussed.
3. Performance of confirmatory factor analysis
This is based on a hypothesized factor structure
(model). The following quality criteria are important: (1) the factor structure to be confirmed
should be clearly described; (2) the strategy that
was applied to arrive at the best model should be
clearly described; (3) the results of the analyses
should be correctly interpreted; and (4) the associations between factors should be discussed.
4. Performance of cross-valiadtion
Cross-validation means that the results of factor
analysis of one part of the data set are tested on
another part of the same data set. To ascertain
that these two parts are as comparable as possible,
it is essential that the two subsamples are selected
at random. Cross-validation gives an indication of
the stability of the factor structure, and it is a
requirement for both exploratory and confirmatory factor analysis. Obviously, splitting the data
randomly into two subsamples requires twice the
usual sample size. A potentially useful practice for
cross-validation is to perform exploratory factor
analysis on one half of the data set and then perform confirmatory factor analysis on the other half
to confirm the factor structure [15].
B. Sample size and data Quality
1. Sample size
The number of subjects included in the analysis is
another point that must be considered. Both
exploratory factor analysis and confirmatory factor analysis require a reasonable amount of data to
produce reliable results. The debate on the minimal
requirement is still ongoing [1]. Rules-of-thumb
vary from 4 to 10 subjects per variable, with a
minimum number of 100 subjects to ensure stability of the variance–covariance matrix [16]. The required number of subjects per variable depends,
among other things, on the factor loadings, on the
number of variables per factor [17], and on the total
sample size, with a smaller sample size requiring
more subjects per variable [3]. We decided to set the
requirement to 7 subjects per item. For the SF-36
this means that in a first-order factor analysis
(focusing on the 35 individual items), at least 245
1208
subjects should be available. For a second-order
analysis on the 8 subscale scores, 56 subjects would
be sufficient, but in that case we applied the rule-ofthumb of a minimum of 100 subjects.
2. Missing data
It is important to provide information about the
number of missing values on each item, and how
the missing values were dealt with. For, an accumulation of missing values due to paired missings
may seriously curtail the number of subjects on
which the variance–covariance matrix is based.
Missing values of more than 25% in any item was
considered to be the maximum, in which case,
information should be provided as to whether
missing values were considered to be random. If
the percentage of missing values was acceptable
(<25%) or if these were considered to be missing
at random, imputation of mean values was scored
as an acceptable method, but complete case analysis was considered to be inadequate.
3. Distributional concerns
Exploratory factor analysis requires a normal
distribution of the data. Confirmatory factor
analysis requires sufficient variation in the data.
However, only substantial violations of normality
or lack of variance, respectively, affect the results.
Nevertheless, we assessed whether the authors
described the distribution (means and standard
deviations) of the data, and how they dealt with
any violations of normality or lack of variance.
C. Full report of statistical entities
Floyd and Widaman [1] have developed a checklist
of statistical entities that should be reported in
order to enable a correct interpretation of the results. The Floyd and Widaman checklist focuses
mainly on the technical aspects of the procedures.
They explained the importance and interpretation
of all items in detail [1].
Methodological assessment of the papers
Two members of our review group (HA and CT)
independently evaluated the quality of the studies
according to the afore-mentioned checklist. Disagreements were resolved by consensus, with or
without a third reviewer (HdV).
Results
The PubMed search yielded 83 publications. Based
on the abstracts, 34 studies were likely to meet the
inclusion criteria. Full reports were then obtained
from the library. Of the 34 studies 8 did not meet
our eligibility criteria for the following reasons: six
studies performed no factor analysis[10, 18–22],
and in two studies the factor analysis was not
performed on the SF-36 [23, 24]. A new search by
an independent reviewer yielded one additional
study [25] and another study [26] was identified by
correspondence with authors (both studies were
not available in PubMed). So, two eligible studies
were included during the reviewing process [25,
26].
Table 2 presents an overview of the characteristics of the 28 included studies. Only six studies
performed a first-order factor analysis on the 35
items of the SF-36 [27–32]. Twenty-five studies
performed a second-order factor analysis on the
eight sub-scales of which three performed both
first-order and second-order factor analyses [25,
30, 31].
Table 1 shows how many papers satisfied the
various methodological criteria.
A. Choice and justification of methods
1. Explorartory vs. confirmatory analysis
Perhaps the most striking result of this review is
that in about half of the studies (15/28) the chosen
method was inappropriate. In all these cases
exploratory factor analysis was performed,
whereas confirmatory analysis would have been
more appropriate to answer the research question
at issue. Both exploratory and confirmatory analyses were performed in only one study [33].
2. Exploratory analysis
Twenty-three studies performed exploratory analysis. Twenty studies used orthogonal rotation (15
used Varimax), for which no justification of the
method was considered to be necessary. Three
studies [25, 33, 34] used oblique rotation, one gave
no justification for this choice [34]. The interpretation of the factor solution and its consequences
was properly justified in 12/20 studies. In three
studies [33, 35, 36], factor analysis was used to
determine country-specific factor loadings to be
Table 2. General characteristics of the studies
Reference
SF-36 version
Population
Country
Number of patients Aim of FA
included in FA
FA on items
or subscales
McHorney and Ware [7]
US english
Adults, visiting clinician
USA
3445
Subscales
Perneger et al. [43]
French
Young healthy adults
(18–44 yr)
Switzerland
1007
Wolinsky [32]
US english
USA
1051
Dexter et al. [25]
US english
USA
1053
Essink-Bot et al. [47]
Dutch
Disadvantaged, older adults
(>50 yr) in clinic
Disadvantaged, older adults
(>50 yr) in clinic
Migraine patients (1);
Matched control group (2)
Netherlands
496 (1), 515 (2)
Jenkinson et al. [37]
UK english
UK
8204
Lewin – Epstein et al. [29]
Hebrew
General population
(18–64 yr)
Jewish Israelies (45–75 yr)
Israel
2030
Mishra and Schofield [35]
Aus english
Representative women
Australia
41,546
Reed [30]
US english
Adult health care workers
USA
394 + 461
(cross-validation)
Stadnyk et al. [46]
UK english
(acute version)
Aus english
Frail elderly patients
Canada
146
General population
Australia
855
Italian
General population
Italy
2031
Fukuhara et al. [41]
Japanese
Japan
588
Fukuhara et al. [40]
Japanese
Industrial workers and their
family
General population
Japan
3395
Ware et al. [9]
Danish, French, General population
German, Italian,
Dutch, Norwegian,
Swedish, Spanish,
UK english, US
english
Denmark, France,
Germany, Italy,
Netherlands,
Norway, Sweden,
Spain, UK, USA
4084, 3656, 2914,
1483, 1771, 2323,
8930, 9151, 2056,
2474
Sanson-Fisher and
Perkins [44]
Apolone and Mosconi [39]
Confirmation of previously
found factor structure
Psychometric testing in new
(healthy) population and
language
Psychometric testing in new
patient population
Psychometric testing in new
patient population
Comparing the psychometric
properties of four generic
questionnaires
Confirmation of previously
found factor structure
Psychometric testing in new
language
Deriving Australian normative
weights for calculating
summary scores
Confirmation of previously
found factor structure in
new population
Psychometric testing in new
(elderly) population
Psychometric testing in new
language
Psychometric testing in new
language
Psychometric testing in new
language
Psychometric testing in new
language
Equivalence testing of ten
language versions
Subscales
Items
Items +
Subscales
Subscales
Subscales
Items
Subscales
Items +
Subscales
Subscales
Subscales
Subscales
Subscales
Subscales
Subscales
1209
Country
Number of patients Aim of FA
included in FA
FA on items
or subscales
Denmark, France,
Germany, Italy,
Netherlands,
Norway, Sweden,
Spain, UK, USA
4084, 3656, 2914,
1483, 1771, 2323,
8930, 9151, 2056,
2474
Comparing country-specific
versus standard scoring
algorithms for calculating
summary scores
Subscales
Denmark, France,
Germany, Italy,
Netherlands,
Norway, Sweden,
Spain, UK, USA
4084, 3656, 2914,
1483, 1771, 2323,
8930, 9151, 2056,
2474
Confirmation of factor
structure from one country in
nine other countries
Subscales
UK
8889
USA
Failde and Ramos [27]
Spanish
Patients with suspected
ischemic cardiopathy (IC)
General population
Spain
4255
(2 measurements)
185
New Zealand
7862
Psychometric testing of new
version (II)
Testing stability of factor
structure over time
Psychometric testing in new
patient population
Psychometric testing in new
general population
Subscales
Chern et al. [49]
Danish, French, General population
German, Italian,
Dutch, Norwegian,
Swedish, Spanish,
UK english, US
english
Danish, French, General population
German, Italian,
Dutch, Norwegian,
Swedish, Spanish,
UK english, US
english
UK english
General population
(version II)
(18–64 yr)
US english
Working population
General population of
Maori (1), Pacific (2), and
New Zealand Europeans (3)
General population
New Zealand
1296(1), 168(2),
5467(3)
Psychometric testing in new
(ethnic) populations
Subscales
Australia
18,492
Subscales
Goldbeck and Schmitz [48] German
Patients with cystic fibrosis
Germany
70
Hardt et al. [34]
US english, UK
english, German
patients with chronic fatigue
syndrome
US, UK,
Germany
740, 82, 65
Hobart et al. [42]
?
UK
149(1), 288(2)
Thumboo et al. [31]
UK english (1),
Chinese
(Hong Kong) (2)
UK english and
Chinese
(Hong Kong)a
Adults with multiple
sclerosis
General population
(21–65 yr)
Singapore
4122(1), 1381(2)
Comparing scoring algorithms
for calculating summary scores
Comparing the psychometric
properties of three generic
questionnaires
Confirmation of previously
found factor structure in
new patient population
Psychometric testing in new
patient population
Psychometric testing in new
(ethnic) populations
Bilingual volunteers
Singapore
157(2 FAs)
Equivalence testing of two
language versions
Subscales
Reference
SF-36 version
Ware et al. [36]
Keller et al. [28]
Jenkinson et al. [38]
Scott and Haslett 1999 [26] Australia/
New Zealand
adaptation
Scott et al. [45]
?
Wilson et al. [33]
Thumboo et al. [50]
a
?
All subjects completed both versions.
Population
Subscales
Items
Subscales
Subscales
Subscales
Subscales
Items +
Subscales
1210
Table 2. Continued
1211
used as weighing factors for the calculation of the
physical and mental health subscale scores.
Therefore, interpretation of the factor structure
was considered to be irrelevant in these studies.
Other goals of the factor analysis were confirmation of a previously found factor structure in five
studies [7, 28, 30, 34, 37], psychometric testing of a
new version [38] or in a new population or language
in 15 studies [25–27, 29, 31, 32, 38–46], comparing
the properties of different questionnaires [47, 48],
and testing the stability of the factor structure over
time [49]. Two studies [9, 50] performed equivalence testing of multiple language versions. Only
three studies used a non-orthogonal factor structure [25, 33, 34], and two of those [25, 34] discussed
the associations between factors.
3. Confirmatory factor analysis
Only six studies performed confirmatory factor
analysis [28–30, 32, 33, 49]. They all adequately
described the model to be confirmed and the
strategy used to arrive at the best model. Five of
these six studies interpreted the results correctly.
However, only half of the studies (3/6) described
the associations between the factors [28, 30, 49].
4. Cross-validation
Only four studies used a cross-validation design
[28, 30, 41, 49], while in all 28 studies there were
sufficient subjects to make cross-validation
possible. None of these four studies randomly
sub-divided their study population to perform
cross-validation on the same data set. In three
studies [28, 30, 41] the interpretation of the crossvalidation results was convincing.
B. Sample size and data quality
1. Sample size
In most studies (25/28) the number of subjects was
sufficient to perform factor analysis.
3. Distributional properties
The majority of studies reported the distribution
of the items (21/28). Four studies reported distributional problems [26, 29, 30, 46]; one study
handled these adequately [30].
C. Full report of statistical entities [1]
With respect to exploratory factor analysis, all
studies stated whether they had performed principal component analysis or common factor
analysis; criteria for retaining factors were presented in 16 of the 23 studies; eigenvalues and
percentage of variance accounted for by the factors were presented in 14 of the 23 studies; only
one study [26] failed to mention the rotation
method. Three studies used a non-orthogonal
method; two of these studies justified this method
[25, 33] and also two studies [25, 34] presented the
factor inter-correlations; almost all studies (20/23)
presented the rotated factor loadings.
With respect to confirmatory factor analysis, all
six studies mentioned the number of factors and
the composition of the factors. Four studies stated
whether they used orthogonal or correlated factors
[28–30, 49] and three studies [28, 30, 32] described
other model constraints. With respect to the fit,
only one of the six studies [49] failed to mention
the method of estimation. The overall fit was
presented by all studies. Four studies also mentioned the relative fit [28, 30, 32, 49]; parsimony
was discussed in only two studies [30, 49].
Five studies applied model modifications to improve the fit [28, 30, 32, 49]. Factor loadings were
reported in four of these studies [28–30, 32, 49], and
the communalities of the items or the squared
correlations of observed variables with the factors
in only one study [49]. Three studies reported correlations between the factors [28, 30, 49].
Discussion
2. Data quality: missing data
Less than half of the studies (15/28) reported on
missing data on the individual items. In one of these
studies [49] the percentage of missing values raised
concerns (25%) and no information was presented
whether missings were missing at random. In eight
studies [25, 29, 30, 33, 43, 46, 48, 50] missing data
had been imputed by an acceptable method.
As far as we know, this is the first critical assessment of factor analysis that has been performed in
the field of health sciences. For this purpose, we
developed a checklist, based on landmark papers
on the use of factor analysis [1, 3, 4, 12, 14, 51].
Despite the pilot run to test the criteria list on five
studies, during the assessment of the studies, more
1212
adaptations to the rules had to be made before we
were able to compile detailed instructions on how
to decide on which issues were correctly reported
and/or justified, and which methods were considered to be appropriate.
Floyd and Widaman [1] remarked that ‘the
analytic technique used in a factor analysis appears to be the aspects of a factor analysis that
are least understood by practicing scientists, and
thus are often reported in insufficient detail in
methods sections of journal articles’. Therefore
their checklist focused on these technical aspects.
In our opinion, the choices between the various
possible methods of factor analyses are crucial,
and therefore we added the section on choice and
justification and the section on sample size and
data quality. We felt the need to make a distinction between whether an issue was correctly
reported and/or justified and whether an analysis
was performed appropriately. Specific results
needed to be reported. The reporting of methods
is a prerequisite for assessing whether the right
methods have been applied and to make replication of the study possible. In some cases we required a justification, especially when a number
of arbitrary choices can be made, e.g. with regard
to the number of factors to be rotated. If the
authors described the methods they used, and
justified their reasons for the choice of these
methods, we were satisfied. In other situations we
wanted to assess whether the method was
appropriate, i.e. the method of factor analysis
used. If the authors justified an exploratory
analysis when a confirmatory method would have
been appropriate for that specific research question, we assessed the adequateness of the method
and not whether the authors did give a justification.
We acknowledge that the checklist we used included a number of criteria which required rather
subjective judgement. Therefore, we describe the
interpretation of the criteria in detail. Furthermore, in appendix A, we have presented the scores
of all the papers that were included, to make these
scores available and verifiable for readers.
With respect to the results, we were surprised by
the fact that most authors performed exploratory
factor analysis in situations in which a confirmatory factor analysis would have been more
appropriate. We have been rather accommodating
in scoring this item. One can argue that in case of
the SF-36 there is always a hypothesized factor
structure. If the authors used an open formulation
about exploring the factor structure in another
situation we tolerated explorative factor analysis.
The conclusion of an explorative factor analysis
has far less power than confirmatory factor analysis to support a hypothesized factor structure [1].
This is nicely illustrated by the papers of Dexter
et al. [25] and Wolinsky and Stump [32] who performed exploratory factor analysis and confirmatory factor analysis, respectively, on the same
data-set. Dexter et al. [25] conclude that ‘In general, the principal component analysis confirmed
the hypothesized underlying factor structure for
each subscale and the overall SF-36’, while the
factor loadings on the subscales General Health
and Mental Health were, in our opinion, far from
convincing. Wolinsky and Stump [32] conclude
that ‘There is a problem, associated with the ‘getting sick’ and ‘getting worse’ items on the General
Health subscale’ and ‘An alternative is to specify a
nine-factor model that assigns ‘getting sick’ and
‘getting worse’ items to form their own construct,
which is labeled ‘health optimism’’. They conclude
that ‘A nine factor model including the separate
construct of health optimism provides the best fit
to the data’. It is recommendable to re-analyse
studies by performing confirmatory analysis to
determine whether the conclusions on the factor
structure of the SF-36 would remain the same.
Other remarkable findings were that the interpretation of the final factor solution left much to
be desired, and cross-validation was seldomly applied. With respect to the more technical aspects,
the justification of which factors should be retained was often poor and the eigenvalues or
percentages of variances accounted for by the
factors was often not reported. Half of the studies
in which confirmatory factor analysis was applied
failed to report model constraints and factor
loadings. Communalities of items were rarely
mentioned.
As there are thousands of publications on factor
analysis of instruments to measure health status,
we had to restrict our selection in some way, either
on the basis of specific journals, years of publication or specific instruments. This study focused on
the SF-36. By choosing for the SF-36, we covered
13 different journals and the year of publication
1213
varied from 1996 to 2002. The disadvantage of the
choice for one instrument is that researchers may
tend to imitate each other with regard to the
methods used. We included five studies that performed factor analysis on the same data set [9, 28,
36] and [25, 32], respectively. As each research
group can use a different strategy for their factor
analyses (in both situations exploratory and confirmative factor analyses were used), we saw no
reason to exclude these studies. We also included
studies in which the same research groups analyzed different data sets: Jenkinson et al. [37, 38],
Thumboo et al. [31, 50] and Ware and co-workers
[7, 9, 28, 36, 39, 40]. In theory they may use different methods, depending on the specific purpose
of the factor analysis. However, 9 of the 28 studies
(32%) in this review were performed by members
of the IQOLA group [5, 7, 9, 10, 28, 36, 39, 40, 43],
who followed the methods as described and justified in the User’s manual [8].
By choosing for the SF-36 as the subject of our
methodological appraisal of factor analysis, we
cannot ignore the validating work that has been
done on the SF-36 and the methods that have been
used by the developers. In the User’s manual [8]
they justify their choice for principal component
analysis instead of common factor analysis and for
orthogonal instead of oblique rotation. Advantages of principal component analysis over common
factor analysis include (1) a simple additive model
of factor content facilitating the interpretation of
each scale; (2) summary measures that explain as
much of the variance in the eight subscales as
possible; (3) summary scales that are easy to estimate statistically; and (4) summary scores that are
interpretable as physical and mental dimensions of
health.
Advantages of orthogonal factor rotation over
oblique factor rotation are that ‘factor loadings’,
which are product moment correlations between
scales and factors, can be squared and summed
across factors to estimate the amount of variance
in each scale accounted for by each factor, and the
amount of variance in each scale is explained by all
factors. As a result, factor content and implications for the interpretation of each scale are more
straightforward.
We agree with the developers of the SF-36 and
members of the IQOLA Project on these preferences for principal component analysis over com-
mon factor analysis and for orthogonoal rotation
instead of oblique rotation. However, we differ in
opinion with regard to their choice for exploratory
factor analysis instead of confirmatory factor
analysis in some situations.
The choice of method of factor analysis is very
much dependent on the purpose of the factor
analysis. The aim of factor analysis may be (1)
either pure data reduction; (2) assessment of the
factor structure (dimensions) being measured by
the questionnaire; (3) investigating whether the
questionnaire shows the same dimensions across
different groups (structural reliability); or (4) to
determine country-specific factor loadings to be
used as weighing factors for the calculation of the
physical and mental health subscale scores.
For example, principal component analysis is
adequate in the situation of development of an
instrument (second aim), and also to determine
factor loadings for the instrument to be used in
another situation (fourth aim). However, in order
to investigate whether the questionnaire shows the
same (known) dimensions across different cultural
populations or disease groups (third aim) confirmatory factor analysis is more appropriate than
principal component analysis. Therefore we do not
agree with the frequently cited sentence of Gandek
and Ware [18] ‘Principal component analysis, a
type of factor analysis, was used in the IQOLA
Project to estimate the congruence between
hypothesized physical and mental health constructs and the SF-36 scales used to measure these
constructs’. It is this situation which, in our
opinion, requires confirmative factor analysis.
Nowhere do they justify why they prefer to use
exploratory factor analysis instead of confirmatory
factor analysis for this purpose.
As many studies were done by members of the
IQOLA project [10], the dependency of our results
forms a limitation of our study. It has consequences for the generalisation of the results of our
methodological assessment to other instruments:
the type of shortcomings may be similar in other
instruments, but the frequency of occurrence
probably differs. Moreover, the generalisability of
our findings probably varies per methodological
criterion. Failure to use confirmatory factor analysis may be typical for the SF-36, because exploratory analysis is used by the IQOLA group. If we
had extended our methodological assessment to all
1214
health status questionnaires published in ‘Quality
of Life Research’ in the past 5 years, we would
have included more new measurement instruments,
which required exploratory factor analysis because
the factor structure is unknown yet.
The sub-optimal description of the method of
factor analysis will probably also occur in the
evaluation of other instruments.
This manuscript focussed on the quality of factor analysis performed on the SF-36. For that
reason we ignored alternative methods that are
available to examine the construct validity of
health status questionnaires. Many validation
studies on the SF-36 have been performed using
other methods, such as the multitrait-multimethod
approach [52, 53], and Rasch analysis [54]. Especially methods based on Item Response Theory,
including estimation of differential item functioning, are powerful methods to check cross cultural
equivalence of SF-36 translations [55] and the
dimensionality of scales.
By identifying shortcomings in factor analysis
and describing the specific points on which the
studies fail to meet the criteria, we aim to improve
the quality and reporting of future studies which
perform factor analysis on the SF-36 or other
health status questionnaires. To enable interested
readers to make a full interpretation and replication of the results, the specific choice of analytic
techniques requires justification, and the techniques used in any study should be described in
sufficient detail. Moreover, as the interpretation of
the results of factor analysis is rather subjective, it
is important that a publication provides sufficient
information to support the conclusions.
In analogy of the CONSORT statement [56] and
the STARD statement [57, 58] to improve the
quality of reporting of randomized clinical trials
and diagnostic studies, respectively, journal editors, reviewers and authors could make use of our
list of criteria to check on the completeness of
reporting and the quality of performance of factor
analysis, especially in studies in which factor
analysis is the key element.
Conclusion
Ware [6] state that ‘A major objective in constructing the SF-36 was achievement of high
psychometric standards. Guidelines for testing
were derived from those recommended for use in
validating psychological and educational measures by the American Psychological Association, the American Education Research
Association and the National Council on Measurement in Education’. Despite the high standards used in the development of the SF-36, the
quality of factor analysis in exploring or confirming the factor structure of the SF-36 leaves
much to be desired. This critical assessment of
methods of factor analysis applied in relation to
the SF-36 showed that exploratory factor analysis is often used when confirmatory analysis
would have been more appropriate and might
have led to different conclusions. Furthermore,
crucial information on the methods and the results of factor analysis is lacking in many publications. With this assessment we hope to
contribute to the improvement of the application
and reporting of factor analysis in health sciences research.
References
1. Floyd FJ, Widaman KF. Factor analysis in the development and refinement of clinical assessment instruments.
Psychol Assess 1995; 7: 286–299.
2. Johnson DE. Applied Multivariate Methods for Data
Analysts. 1998.
3. Streiner DL. Figuring out factors: The use and misuse of
factor analysis. Can J Psychiatry 1994; 39: 135–140.
4. Gorsuch RL. Factor Analysis. Hillsdale NJ: Lawrence
Erlbaum Associates, Inc, 1983.
5. Ware JE Jr, Sherbourne CD. The MOS 36-item short-form
health survey (SF-36). I. Conceptual framework and item
selection. Med Care 1992; 30: 473–483.
6. Ware JE Jr, SF-36 health survey update. Spine 2000; 25:
3130–3139.
7. McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item
Short-Form Health Survey (SF-36): II. Psychometric and
clinical tests of validity in measuring physical and mental
health constructs. Med Care 1993; 31: 247–263.
8. Ware JE, Kosinski M, Keller SK, Psychological and mental
health summary scales: A user’s manual. 1994. Boston MA,
The Health Institute (report).
9. Ware JE Jr, Kosinski M, Gandek B, et al. The factor
structure of the SF-36 Health Survey in 10 countries: Results from the IQOLA Project. International Quality of Life
Assessment. J Clin Epidemiol 1998; 51: 1159–1165.
10. Ware JE Jr, Gandek B. Overview of the SF-36 Health
Survey and the International Quality of Life Assessment
(IQOLA) Project. J Clin Epidemiol 1998; 51: 903–912.
1215
11. Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-Form
Health Survey: Construction of scales and preliminary tests
of reliability and validity. Med Care 1996; 34: 220–233.
12. Zwick WR, Velicer WF. A comparison of five rules for
determining the number of components to retain. Psychol
Bull 1986; 99: 432–442.
13. Allison DB, Gorman BS, Primavera LH. Some of the most
common questions asked of statistical consultants: Our
favorite responses and recommended readings. Gene Soc
Gen Psychol Monogr 1993; 119: 153–185.
14. Bollen KA. Structural Equations with Latent Variables.
New York: Wiley, 1989.
15. Pouwer F, Snoek FJ, Van der Ploeg HM, Ader HJ, Heine
RJ. The well-being questionnaire: Evidence for a threefactor structure with 12 items (W-BQ12). Psychol Med
2000; 30: 455–462.
16. Kline P. The Handbook of Psychological Testing. London:
Routledge, 1993.
17. Guadagnoli E, Velicer WF. Relation of sample size to the
stability of component patterns. Psychol Bull 1988; 103:
265–275.
18. Gandek B, Ware JE Jr. Methods for validating and norming translations of health status questionnaires: The IQOLA Project approach. International Quality of Life
Assessment. J Clin Epidemiol 1998; 51: 953–959.
19. Hays RD, Sherbourne CD, Mazel RM. The RAND 36Item Health Survey 1.0. Health Econ 1993; 2: 217–227.
20. Kosinski M, Keller SD, Hatoum HT, Kong SX, Ware JE
Jr. The SF-36 Health Survey as a generic outcome measure
in clinical trials of patients with osteoarthritis and rheumatoid arthritis: Tests of data quality, scaling assumptions
and score reliability. Med Care 1999; 37: MS10–MS22.
21. Reed PJ. Methods of evaluation in outcomes research. Am
J Manag Care 1998; 4: 1616–1625.
22. Ware JE, Kosinski M. Interpreting SF-36 summary health
measures: A response. Qual Life Res 2001; 10: 405–413.
23. Pfennings LE, Van der Ploeg HM, Cohen L, et al.
A health-related quality of life questionnaire for multiple
sclerosis patients. Acta Neurol Scand 1999; 100: 148–
155.
24. Thumboo J, Feng PH, Soh CH, Boey ML, Thio S, Fong
KY. Validation of a Chinese version of the Medical Outcomes Study Family and Marital Functioning Measures in
patients with SLE. Lupus 2000; 9: 702–707.
25. Dexter PR, Stump TE, Tierney WM, Wolinsky FD. The
psychometric properties of the SF-36 Health Survey among
older adults in a clinical setting. J Clin Geropsychol 1996; 2:
225–237.
26. Scott KM, Haslett SJ. SF-36 health survey reliability,
validity and norms for New Zealand. Aust New Zeal J
Public Health 1999; 23: 401–406.
27. Failde I, Ramos I. Validity and reliability of the SF-36
Health Survey Questionnaire in patients with coronary artery disease. J Clin Epidemiol 2000; 53: 359–365.
28. Keller SD, Ware JE Jr, Bentler PM, et al. Use of structural
equation modeling to test the construct validity of the SF36 Health Survey in 10 countries: Results from the IQOLA
Project. International Quality of Life Assessment. J Clin
Epidemiol 1998; 51: 1179–1188.
29. Lewin-Epstein N, Sagiv-Schifter T, Shabtai EL, Shmueli A.
Validation of the 36-item short-form Health Survey (Hebrew version) in the adult population of Israel. Med Care
1998; 36: 1361–1370.
30. Reed PJ. Medical outcomes study short form 36: Testing
and cross-validating a second-order factorial structure for
health system employees. Health Serv Res 1998; 33: 1361–
1380.
31. Thumboo J, Fong KY, Machin D, et al. A communitybased study of scaling assumptions and construct validity
of the English (UK) and Chinese (HK) SF-36 in Singapore.
Qual Life Res 2001; 10: 175–188.
32. Wolinsky FD, Stump TE. A measurement model of the
Medical Outcomes Study 36-Item Short-Form Health
Survey in a clinical sample of disadvantaged, older, black,
and white men and women. Med Care 1996; 34: 537–548.
33. Wilson D, Parsons J, Tucker G. The SF-36 summary scales:
Problems and solutions. Soz Praventiv Med 2000; 45: 239–
246.
34. Hardt J, Buchwald D, Wilks D, Sharpe M, Nix WA, Egle
UT. Health-related quality of life in patients with chronic
fatigue syndrome: An international study. J Psychosom Res
2001; 51: 431–434.
35. Mishra G, Schofield MJ. Norms for the physical and
mental health component summary scores of the SF-36 for
young, middle-aged and older Australian women. Qual Life
Res 1998; 7: 215–220.
36. Ware JE Jr, Gandek B, Kosinski M, et al. The equivalence
of SF-36 summary health scores estimated using standard
and country-specific algorithms in 10 countries: Results
from the IQOLA Project. International Quality of Life
Assessment. J Clin Epidemiol 1998; 51: 1167–1170.
37. Jenkinson C, Layte R. Development and testing of the UK
SF-12 (short form health survey). J Health Serv Res Policy
1997; 2: 14–18.
38. Jenkinson C, Stewart-Brown S, Petersen S, Paice C.
Assessment of the SF-36 version 2 in the United Kingdom.
J Epidemiol Commun Health 1999; 53: 46–50.
39. Apolone G, Mosconi P. The Italian SF-36 Health Survey:
Translation, validation and norming. J Clin Epidemiol
1998; 51: 1025–1036.
40. Fukuhara S, Ware JE Jr, Kosinski M, Wada S, Gandek B.
Psychometric and clinical tests of validity of the Japanese SF36 Health Survey. J Clin Epidemiol 1998b; 51: 1045–1053.
41. Fukuhara S, Bito S, Green J, Hsiao A, Kurokawa K.
Translation, adaptation, and validation of the SF-36
Health Survey for use in Japan. J Clin Epidemiol 1998a; 51:
1037–1044.
42. Hobart J, Freeman J, Lamping D, Fitzpatrick R, Thompson A. The SF-36 in multiple sclerosis: Why basic
assumptions must be tested. J Neurol Neurosurg Psychiatry
2001; 71: 363–370.
43. Perneger TV, Leplege A, Etter JF, Rougemont A. Validation of a French-language version of the MOS 36-Item
Short Form Health Survey (SF-36) in young healthy adults.
J Clin Epidemiol 1995; 48: 1051–1060.
44. Sanson-Fisher RW, Perkins JJ. Adaptation and validation
of the SF-36 Health Survey for use in Australia. J Clin
Epidemiol 1998; 51: 961–967.
1216
45. Scott KM, Sarfati D, Tobias MI, Haslett SJ. A challenge to
the cross-cultural validity of the SF-36 health survey: Factor structure in Maori, Pacific and New Zealand European
ethnic groups. Soc Sci Med 2000; 51: 1655–1664.
46. Stadnyk K, Calder J, Rockwood K. Testing the measurement properties of the Short Form-36 Health Survey in a
frail elderly population. J Clin Epidemiol 1998; 51: 827–
835.
47. Essink-Bot ML, Krabbe PF, Bonsel GJ, Aaronson NK. An
empirical comparison of four generic health status measures. The Nottingham Health Profile, the Medical Outcomes Study 36-item Short-Form Health Survey, the
COOP/WONCA charts, and the EuroQol instrument. Med
Care 1997; 35: 522–537.
48. Goldbeck L, Schmitz TG. Comparison of three generic
questionnaires measuring quality of life in adolescents and
adults with cystic fibrosis: The 36-item short form health
survey, the quality of life profile for chronic diseases, and
the questions on life satisfaction. Qual Life Res 2001; 10:
23–36.
49. Chern JY, Wan TT, Pyles M. The stability of health status
measurement (SF-36) in a working population. J Outcome
Meas 2000; 4: 461–481.
50. Thumboo J, Fong KY, Chan SP, et al. The equivalence of
English and Chinese SF-36 versions in bilingual Singapore
Chinese. Qual Life Res 2002; 11: 495–503.
51. Gorsuch RL. Exploratory factor analysis: Its role in item
analysis. J Pers Assess 1995; 68: 532–560.
52. Aaronson NK, Muller M, Cohen PD, et al. Translation,
validation, and norming of the Dutch language version of
the SF-36 Health Survey in community and chronic disease
populations. J Clin Epidemiol 1998; 51: 1202.
53. Bjorner JB, Damsgaard MT, Watt T, Groenvold M. Tests
of data quality, scaling assumptions, and reliability of the
Danish SF-36. J Clin Epidemiol 1998; 51: 1001–1011.
54. Raczek AE, Ware JE, Bjorner JB, et al. Comparison of
Rasch and summated rating scales constructed from SF-36
physical functioning items in seven countries: Results from
the IQOLA Project. International Quality of Life Assessment. J Clin Epidemiol 1998; 51: 1203–1214.
55. Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P.
Differential item functioning in the Danish translation of
the SF-36. J Clin Epidemiol 1998; 51: 1189–1202.
56. Moher D, Schulz KF, Altman DG. The CONSORT
statement: Revised recommendations for improving the
quality of reports of parallel-group randomised trials.
Lancet 2001; 357: 1191–1194.
57. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD
statement for reporting studies of diagnostic accuracy:
Explanation and elaboration. Ann Intern Med 2003; 138:
W1–12.
58. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann Intern Med 2003; 138:
40–44.
Address for correspondence: Henrica C. W. de Vet, Institute for
Research in Extramural Medicine, VU University Medical
Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The
Netherlands
Phone: +31-20-4448176; Fax: +31-20-4446775
E-mail: HCW.deVet@vumc.nl
Appendix A. Individual scores of each study.
McH Pern Wol Dex Ess Jen97 Lew Mis Reed Stad San Apo Fuka Fukb Wa/ Wa/ Kel Je99 Che Fai Sco Sco Wil Gold Har Hob Th01 Th02
[7] [43] [32] [25] [47] [37] [29] [35] [30] [46] [44] [39] [41] [40] Ko Ga [28] [38] [49] [27] [26] [45] [33] [48] [34] [42] [31] [50]
[9] [36]
A. Choice and justification of method of factor analysis
A1.1 )
)
+
)
+
+
+
+
+
A1.2
A2.1 +
A2.2
A2.3 +
A2.4
+
+
+
)
+
+
A3.1
A3.2
A3.3
A3.4
)
+
)
+
)
)
+
+
+
)
)
)
B. Sample size and data quality
B1.11 +
+
+
+
+
B1.22 +
+
+
+
+
B2.1 )
B2.2
B2.3
+
B3.1 +
B3.2
+
C. Full
C1.1
C1.2
C1.3
C1.4
C1.5
C1.6
C1.7
C2.1
C2.2
+
+
+
+
+
)
)
+
)
+
+
+
+
+
+
+
+
+
+
)
+
)
+
+
report of statistical entities
+
+
+
+
+
)
+
)
+
+
)
+
+
+
+
+
+
+
+
+
+
+
+
+
)
)
)
)
+
+
+
+
+
+
+
+
+
)
+
+
)
+
+
+
+
+
)
+
)
)
+
+
+
+
)
)
+
+
+
)
)
)
)
+
+
+
+
+
+
+
+
)
)
+
)
)
)
+
)
+
+
+
)
+
)
)
)
)
+
)
+
+
+
)
A4.1 )
A4.2
A4.3
A4.4
+
)
+
+
)
+
+
+
+
+
+
+
)
+
)
)
+
)
+
+
)
)
)
+
)
+
+
)
+
)
+
)
+
)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
)
+
)
)
)
)
)
)
+
)
+
+
+
+
+
)
+
+
)
+
)
)
)
+
)
+
+
+
)
)
)
)
)
)
+
+
+
+
+
+
)
)
+
+
)
+
+
+
+
+
)
+
+
)
+
)
+
+
+
+
+
+
+
+
)
)
+
+
+
+
+
+
+
)
+
+
)
)
+
+
+
+
+
)
+
)
)
+
+
+
+
+
+
+
)
)
+
+
)
)
+
+
+
+
+
)
)
+
+
+
)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
)
+
+
+
)
+
+
+
+
+
+
+
+
+
+
)
)
)
+
+
+
+
+
+
+
+
)
+
+
+
)
+
+
+
+
+
+
+
+
+
+
+
+
1217
McH Pern Wol Dex Ess Jen97 Lew Mis Reed Stad San Apo Fuka Fukb Wa/ Wa/ Kel Je99 Che Fai Sco Sco Wil Gold Har Hob Th01 Th02
[7] [43] [32] [25] [47] [37] [29] [35] [30] [46] [44] [39] [41] [40] Ko Ga [28] [38] [49] [27] [26] [45] [33] [48] [34] [42] [31] [50]
[9] [36]
C2.3
C2.4
C2.5
C2.6
C2.7
C2.8
C2.9
C2.10
C2.11
C2.12
)
+
+
+
)
+
)
+
+
+
+
+
)
)
+
)
)
)
+
+
+
+
+
+
+
+
)
+
+ – The description is both informative and methodologically sound.
) – The decription is informative, but methodologically doubtful.
? – The description is too unclear or too incomplete to answer the question.
0 – No relevant information about this item is given in the paper.
blank – Not applicable.
1
Smallest subsample was scored.
2
Largest subsample was scored.
+
+
+
+
+
+
+
)
+
+
)
)
+
+
+
+
+
+
+
)
)
+
+
)
)
)
)
1218
Appendix A. (Continued.)