J Clin Epidemiol Vol. 51, No. 11, pp. 1025–1036, 1998
Copyright © 1998 Elsevier Science Inc. All rights reserved.
0895-4356/98 $–see front matter
PII S0895-4356(98)00094-8
The Italian SF-36 Health Survey:
Translation, Validation and Norming
Giovanni Apolone* and Paola Mosconi
Dipartimento di Oncologia, Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy
ABSTRACT. This article reports on the development and validation of the Italian SF-36 Health Survey using
data from seven studies in which an Italian version of the SF-36 was administered to more than 7000 subjects
between 1991 and 1995. Empirical findings from a wide array of studies and diseases indicate that the
performance of the questionnaire improved as the Italian translation was revised and that it met the standards
suggested by the literature in terms of feasibility, psychometric tests, and interpretability. This generally
satisfactory picture strengthens the idea that the Italian SF-36 is as valid and reliable as the original instrument
and applicable and valid across age, gender, and disease. Empirical evidence from a cross-sectional survey carried
out to norm the final version in a representative sample of 2031 individuals confirms the questionnaire’s
characteristics in terms of hypothesized constructs and psychometric behavior and gives a better picture of its
external validity (i.e., robustness and generalizability) when administered in settings that are very close to real
world. J CLIN EPIDEMIOL 51;11:1025–1036, 1998. © 1998 Elsevier Science Inc.
KEY WORDS. Health-related quality of life, SF-36 Health Survey, reliability, validity, cross-cultural validation
INTRODUCTION
Interest in measuring the aspects of health most closely related to quality of life, usually referred to as health-related
quality of life (HR-QOL), has increased in recent years in
Italy as in other countries. Advances have been made in
methods for describing patients’ subjective health status using standardized measures, and several valid and reliable patient-based measures are available either as generic or disease and treatment targeted questionnaires. However, most
of these are in English and are intended for use in Englishspeaking settings [1–3].
Very few brand new country-specific instruments have
been developed in Italy to measure HR-QOL, and few English instruments that met the investigators’ need have
been translated [4,5]. With few but relevant exceptions
that were actually translated or developed as part of multilanguage and multinational projects [6–8], most of such efforts were produced by isolated groups of researchers who
seldom published their findings in peer-reviewed journals.
Among the so-called generic measures, the MOS 36Item Short Form Health Survey (SF-36) is known for its
comprehensiveness, brevity, and high standards of reliability and validity [9–12]. It was first translated by independent Italian teams in 1990. With the launch of the Interna*Address for correspondence: Giovanni Apolone, MD, Istituto di Ricerche
Farmacologiche Mario Negri, Via Eritrea 62, 20157 Milan, Italy.
Accepted for publication on 7 July 1998.
tional Quality of Life Assessment (IQOLA) project [13–16]
in 1991, the description of the use of the SF-36 in Italy can
be spilt up into three parts. The first part regards the preIQOLA phase (in which independent translations were
used in selected series of studies), the second an intermediate-IQOLA phase (in which an effort was made to trace all
potential users and to centralize and harmonize the use of
the SF-36 in Italy), and the third and final phase (in which
a standardized and accredited Italian SF-36 version was
tested in a large representative sample of the Italian general
population) (see Figure 1 for a synopsis).
This is the first comprehensive English language report
on the development of the Italian SF-36, and it traces the
improvement in the psychometric properties of the form as
the translation evolved. It also reports on findings from seven
studies in which an Italian version of the SF-36 was administered to more than 7000 subjects during 1991 to 1995.
MATERIAL AND METHODS
Pre-IQOLA Studies
As summarized in Figure 1 and Table 1, before the launch
of the IQOLA project, at least two independent translations of the SF were being used in Italy, one produced by a
team of researchers at the Mario Negri Institute (IRFMN),
the other by a pharmaceutical company, Glaxo Italy in Verona. The two translations were quite similar not only because both were produced using a similar forward-backward
methodology but also because both were eventually revised
G. Apolone and P. Mosconi
1026
FIGURE 1. Synopsis of the International Quality of Life Assessment (IQOLA) project in Italy.
by one of the authors of this paper (GA). Two studies in
which these two pre-IQOLA translations were administered are outlined here:
STUDY NO. 1. In 1991, 28 general practitioners participated on voluntary basis in a study aimed at exploring the
feasibility of HR-QOL assessment in general practice by assessing the applicability and performance of SF-36 in a sample of 243 patients with chronic obstructive pulmonary disease (COPD) [17]. Besides formal evaluation of the
psychometric and clinical validity characteristics of the
SF-36 when self-administered to patients, a formal comparison of physicians and patients agreement about patients’
HR-QOL was also carried out by administering a slightly
modified version of the questionnaire to physicians (i.e., a
rewording of the questions pertaining to some physical and
mental scales was introduced in order to develop a proxy
version of the questionnaire). Patients and physicians were
instructed to complete the questionnaires simultaneously
but independently during a scheduled office visit.
TABLE 1. Selected characteristics of seven studies adopting the SF-36
Pre-IQOLA
Intermediate IQOLA
Final IQOLA
Characteristic
Breast Healthy
COPD cancer people
GISSI-3 Migraine
study study
study
AMI trial
study
Dialysis
study
Normative
survey
Year
No. cases
Age (mean) years
Age (% . 65 years)
Sex (% male)
School (mean) years
School (% . 8 years)
1991
243
70.6
75.7
61.7
6.5
16.0
1994–95
246
59.7
40.3
48.6
6.5
19.3
1995
2031
47.7
18.7
49.2
9.1
57.7
1991
178
52.8
14.2
0.0
10.5
67.8
1993
50
45.5
12.0
42.0
6.1
68.0
1992–93
3278
61.0
37.1
81.0
7.4
26.0
1993–94
1524
38.8
1.9
26.6
No
No
Abbreviations: IQOLA 5 International Quality of Life Assessment; COPD 5 chronic obstructive pulmonary
disease; AMI 5 acute mycardial infarction.
The Italian Version of the SF-36
STUDY NO. 2. In the framework of the activity of GIVIO
(Interdisciplinary Group for Cancer Care Evaluation), a cooperative group involved in a research program monitoring
quality of care since 1986 [18], a randomized multicenter
trial was launched to test the impact on overall survival and
HR-QOL of two different follow-up protocols. In order to
assemble and validate the questionnaire to be adopted in
the trial, a multistep approach has been used [5,19]. In
1991, a developmental version of the GIVIO questionnaire
was tested, administering it together with the Functional
Living Index-Cancer Scale [20] and the SF-36 to a sample
of patients with breast cancer. The three questionnaires
were mailed with a cover letter explaining the aim of the
study to 273 members of an Italian association of patients
with breast cancer (Attive Come Prima). One hundred seventy-eight women (65%) mailed back a complete set of questionnaires, enabling a formal concurrent validation [21].
IQOLA in Italy
As described in detail in other articles in this issue [22–25],
the IQOLA project was launched in 1991 with the goals of
translating, adapting, and testing the cross-cultural applicability of the SF-36 with the assistance of sponsored investigators
from 14 different countries. In Italy, the method presented in
the general articles in this issue has been applied with a few
operative modifications that are described briefly here.
In the first stage of the IQOLA project (i.e., translation
and qualitative and quantitative evaluation to ensure semantic equivalence and acceptance in each country), two
professional translators with experience of “health and
quality of life terminology” but not with the SF-36, independently produced two forward-translations and, after a
first meeting, agreed on a common version with a list of alternatives for the controversial item stems and response
choices. Next, after a meeting with the leader of the Italian
team, some revisions were made and a second common version was produced. During this step, each forward-translator rated the difficulty in translation, and two external experts (bilingual and with specific background on the issue
of health outcome assessment) and the leader of the Italian
team rated the quality of translation in terms of common
language, clarity, and conceptual equivalence. Problematic
items and response choices were identified and reentered
into the forward-translation process.
The third forward-translation entered the backwardtranslation process: two professional American-Italian
translators, not familiar with the SF-36, worked independently to produce two English versions that were sent to
the IQOLA project headquarters in Boston (NEMC) for review. In the meantime, two other American-Italian translators rated the forward versions in terms of difficulty and
quality. Finally, a meeting was held with the leader of the
Italian team, all translators, a general practitioner, and a
nurse (both with experience in health outcome assess-
1027
ment). Using all the documentation available, a new version was produced to be field tested.
Empirical testing was done first using a Thurstone exercise [26] to test the equivalence, ordinality, and interval
properties for translations of the original response choices.
The test was carried out in two independent convenience
samples of the lay public (in the first sample, 30 subjects positioned their values on an anchored 10-cm line, and in the
second sample, 60 subjects positioned their values on a
floating 10-cm line). A further version of the questionnaire
was developed using the empirical findings from this Thurstone exercise to select the response choices closest to those
in the source instrument. This version was self-administered to a convenience sample of healthy individuals to assess the acceptance, clarity, and the use of common language.
STUDY NO. 3. Fifty people, identified on a convenience basis in two different geographic settings (North and South
Italy), first completed the questionnaire, then answered a
standardized debriefing questionnaire, and, finally, were
interviewed by one of the authors or by a trained psychologist.
In the second stage (i.e., formal psychometric tests of assumptions underlying item and scale scoring), an effort was
made to centralize and harmonize the use of the SF-36 in
Italy by recommending the use of the most suitable IQOLA
version in as many studies as possible. Overall, the SF-36
IQOLA version has been adopted in several studies, three
of which have been completed, allowing a preliminary
analysis of the questionnaire’s performance.
STUDY NO. 4. In 1992, in the framework of the GISSI-3
trial (Gruppo Italiano per lo Studio della Sopravvivenza
dell’Infarto Miocardico)—a randomized controlled trial on
the effect of very early administration of ACE inhibitors
and nitrates, which included 20,000 patients with acute
myocardial infarction—a sample of patients from 89 coronary
care units was included in an ad hoc protocol aimed at evaluating the patient’s perception of health-related quality of
life issues. Details about the GISSI trial design and 6-week
follow-up results regarding clinical end points and questionnaire psychometric properties have been presented elsewhere [27–30]. Briefly, an ad hoc multidimensional instrument was assembled starting from well-known previously
validated questionnaires. The GISSI-Nursing questionnaire
includes 38-items that generate a symptom checklist and 9
scales. Eighteen items and 5 scales were excerpted from the
SF-36. The validation process included a formal test–retest
reliability assessment in an independent sample of 100 patients using an interval period of 3 days. Overall, more than
4400 patients have been enrolled into the study and followed-up over time, yielding samples ranging from 3278
baseline cross-sectional to 1919 longitudinal cases.
G. Apolone and P. Mosconi
1028
STUDY NO. 5. In 1993, a study aimed at evaluating the
characteristics of two different questionnaires, the SF-36
and the Migraine Specific Questionnaire (MSQ), was carried out in a large cohort of people with migraine, one of
the most common complaints of patients presenting to general practitioners in Italy [31]. The study, a pilot phase of a
more extensive project directly conducted by a pharmaceutical company, Glaxo Italy in Verona, with the external assistance of the Italian IQOLA team, was actually carried
out with the collaboration of 150 neurological centers in a
12-month recruitment period. Cross-sectional data on 1524
patients are presently available for the psychometric analyses of both questionnaires.
STUDY NO. 6. In 1992, in the framework of a project sponsored by the CNR-Italian National Council of research
[21,32,33], a study aimed at developing and validating a
HR-QOL questionnaire for patients undergoing dialysis was
launched. After the item selection and reduction steps and
following test on small focus groups, a prospective multicenter study was planned to evaluate longitudinally the
characteristics of the new questionnaire by comparing it
with the most suitable SF-36 IQOLA version. During a
1-year period (1994–1995), more than 500 cases were recruited, and, according to the study design, the SF-36 was
delivered to half of them, yielding a validation sample of
246 patients.
According to the IQOLA project protocol, at the end of
the translation and validation phases, a cross-sectional survey was carried out in 1995 to norm the final Italian SF-36
in a representative sample of noninstitutionalized Italians
aged 18 years and over. The survey was conducted by DOXA
(Istituto di Ricerche Statistiche e Analisi dell’Opinione Pubblica, Milano, Italy) with the supervision of the IQOLA
Italian team. Following the guidelines recommended by the
project protocol, a multistep random sampling strategy was
adopted to draw a large, representative sample. The method
applied is briefly described. Details are given elsewhere [34].
STUDY NO. 7. The universe to which the survey refers to is
all Italians aged 18 years and older, or about 45.2 million
persons, in all regions of Italy. This universe was split into
sections (strata) according to regions and size of the commune of residence. In each stratum (e.g., communities of
Lombardia Region with fewer than 5000 inhabitants), the
sampling units were chosen in the following way: in the
first stage, the choice regarded the communes where the interviews were to be conduced (178 different municipalities
selected within the DOXA sampling points); in the second
stage, in each commune an adequate number of electoral
wards were extracted at random so that various type of inhabited areas were represented (e.g., central, suburban, outskirts, and isolated houses); finally, in the third stage,
names and addresses of the persons to be contacted were
extracted at random from the electoral lists of the wards se-
lected in the second stage. Each subject sampled had a comparable alternative (i.e., for each subject there was another
one randomly sampled) who would be interviewed in case
the first one was not reached after several tries.
The Italian version of the SF-36 was self-administered in
the context of a person-to-person interview. The individuals, once sampled, were personally contacted at home by
195 trained interviewers who first introduced the aim of the
survey, then delivered the questionnaire and collected some
basic sociodemographic characteristics, and, finally, took it
back. Interviewers asked individuals to answer questions.
With the approach adopted, a national representative
sample of 2031 adult Italians was obtained. Using as reference the official statistics from the 1991 Italian National
Census published by the ISTAT (Istituto Nazionale di Statistica), it is possible to describe and support the representativeness of this normative sample. In addition, it is also possible to compute a weighting variable in order to reproduce
a distribution that is completely consistent with that of the
Italian population to which it refers.
Methods of Analysis
To evaluate the SF-36 in Italy, we used simple descriptive
statistics, multitrait scaling techniques, and factor analysis.
These are described in other articles in this issue [24,25].
We hypothesized that Italian data would replicate the findings from previous research for completeness, scaling assumptions, reliability, and construct and known groups validity [9–12,35].
This article reports on the findings within and across
studies, SF-36 scales, and patient groups using only the
main indicators of data quality, scaling assumptions, and internal reliability, summarized in Table 2. In addition to the
analyses planned a priori, for data from the general population survey, further post hoc statistical tests and sensitivity
subgroup analyses were done using methods that are described in the text and in the tables reporting the results.
For example, because the amount of help required by subjects to complete the questionnaire appeared to be associated either with certain sociodemographic variables or with
indicators of the scales’ reliability and validity, the whole
sample was split into two mutually exclusive subgroups
(subjects receiving and not receiving help) and the analyses
were redone, after excluding the group who had help from
interviewers, to test the stability of findings across relevant
subgroups.
RESULTS
Pre-IQOLA and IQOLA Validation Studies
Significant differences were observed across studies, not
only in terms of target disease, design, and objectives, but
also in terms of availability of complementary data (Table
The Italian Version of the SF-36
1029
TABLE 2. List of operative definitions and indicators of performance adopted in the article
Data quality
Completeness
Response Consistency Index (RCI)
Summated rating scaling assumptions
Item internal consistency
Discriminant validity
Internal consistency reliability
Test–retest reliability
Construct validity
a For
Expected
values
Conceptual definition
Operative definition
Count of completed items
Count of computable scales
Check of the logical inconsistencies
in response based on a preidentified
set of 15 pairs of answersa
% of items answered
% of cases with all scales computable
% of cases with no logical
inconsistency
90–95%
90–98%
90–95%
Assessment of the strength of
correlation of the items with their
hypothetical scale
Assessment of the strength of
correlation of the items with the
other scales
Internal consistency and reliability
based on the average correlation
between items
Stability of scores from one
administration time
Scale-components correlation
(principal component analysis) to
test the dimensionality of the scales
Number of items with Pearson itemhypothesized scale correlation
coefficents lower than 0.40
Number of items with higher Pearson
correlation coefficents with other
scales than with hypothesized scale
Cronbach’s alpha coefficient
0
0
.0.70
Pearson correlation coefficients after
a 3-day interval
Pattern of the correlations between
each scale and two rotated
principal components identified in
the U.S. data
.0.70
See Table 8
example: a report of being able “to walk one kilometer” but not “one hundred meters” is considered an inconsistency in scoring the RCI
1). Although data on age and gender were collected in all
studies, and educational status in all but one (in the migraine sample), self-reported disease status was available
only from the two most recent studies in 1994 to 1995 (in
the dialysis and general survey samples). The mean age
ranged from 38.8 (migraine) to 70.6 years (COPD). The
gender distribution also significantly differered, ranging
from 0% male for breast cancer to 81% for myocardial infarction. In all seven studies included in this review, the
questionnaire was self-administered.
When the performance of the SF-36 was analyzed according to the kind of study, standards of data quality and of
scaling assumptions were generally satisfied (Table 3), with
the worst findings coming from studies that used a pre-
TABLE 3. Synthesis of data quality and psychometric findings in six studiesa
Pre-IQOLA
Missing item, %b
Scale completeness, %c
Response consistency index 0, %d
Item-scale r . 0.40, Number/total
(percentage)e
Significant success, Number/total
(percentage)f
Scale reliabilityg
a , 0.70
a 5 0.70–0.90
a . 0.90
a The
Intermediate IQOLA
Final IQOLA
COPD
(N 5 243)
Breast cancer
(N 5 178)
Healthy
(N 5 50)
Migraine
(N 5 1524)
Dialysis
(N 5 246)
Normative survey
(N 5 2031)
3.3
96.4
78.8
31/35
88.6
277/280
98.9
3.9
95.3
85.0
31/35
88.6
280/280
100.0
2.7
96.0
86.0
32/35
91.4
279/280
99.6
1.1
99.3
84.8
35/35
100.0
276/280
98.6
11.9
86.6
83.3
34/35
97.1
280/280
100.0
0.1
99.8
92.6
34/35
97.1
280/280
100.0
1/8
7/8
0/8
1/8
7/8
0/8
2/8
6/8
0/8
0/8
7/8
1/8
0/8
7/8
1/8
0/8
7/8
1/8
Gissi-3 trial is not included in this table because only part of the F-36 was used in the study.
missing item proportion across all items.
c Average scale completeness (percent of scales with 50% or more items completed.
d Proportion of questionnaires with no logical inconsistency.
e Proportion of items having item-hypothesized scale correlation coefficent higher than 0.40.
f Number of correlations of items with own scales significantly higher than correlations with other scales/total number of correlations.
g Internal-consistency reliability (Cronbach’s alpha).
Abbreviations: COPD 5 chronic obstructive pulmonary disease; IQOLA 5 International Quality of Life Assessment.
b Average
G. Apolone and P. Mosconi
1030
IQOLA or an earlier IQOLA developmental version of the
questionnaire. Exceptions include the proportion of missing
items and scale completeness (11.9% and 86.6%, respectively) in the dialysis study and the significant scaling failures in item discriminant validity assessment (4 failures in
280 tests) in the migraine study, Cronbach’s alpha coefficient was generally greater than 0.70, the minimum recommended for group level comparisons, with lower estimates
in the pre-IQOLA studies.
When data were analyzed to evaluate the performance of
each of the eight SF-36 scales (data not fully shown here), a
few problems were noted in the overall satisfactory picture,
in which most of the scales showed good psychometric
characteristics. These regarded the PF scale, in which 7% of
the item-hypothesized scale correlation coefficients were
lower than 0.40; the GH and SF scales, which had the
poorest performance in all the tests; and an occasional failure in the VT scale. The relatively poor performance of the
PF scale was due to the behavior of two specific items (item
number 1 pertaining to vigorous activities and item number
10 pertaining to bathing and dressing) that was noted in
only two samples in which these findings were expected
(healthy people with a low prevalence of physical limitations and COPD patients with severe disease). The SF and
VT scale problems were due to difficulties arising in early
translations that were promptly identified and corrected.
This finding, together with those pertaining to the GH
scale, is fully commented in the Discussion section.
Test–retest reliability was estimated using data from the
GISSI study for five SF-36 scales included in the study
questionnaire. High correlations were observed between
pairs of scores at time 1 and 2 (RP 5 0.88, BP 5 0.72, VT 5
0.88, RE 5 0.74 and MH 5 0.89).
Results from the IQOLA Normative Survey
CHARACTERISTICS OF THE SAMPLE. Overall, our general population sample reproduces a distribution consistent with
the universe to which the survey refers. Women were
slightly more represented than men (50.8%), and the mean
age was 47.7 years, with 18.7% of cases aged more than 65
years. Most cases were married or cohabitating (67.6%).
Forty-seven percent of respondents lived in Northern Italy
(47.3), 17% in the Center, and 37% in the South.
PARTICIPATION. The strategy adopted to assemble this national sample was random sampling from electoral lists with
the possibility of identifying in the surroundings a comparable alternative for each random subject who was not reachable by the interviewer, and the administration strategy involved self-administration in the context of person-to-person
contact, so the response rate in this survey is not an appropriate indicator of participation.
To give information about the acceptance and applicability of the SF-36 in Italy, Table 4 shows the time required
to complete the entire questionnaire, the proportion of individuals who received help, the distribution of failures
from the questionnaire response consistency index, and the
proportion of cases identified by interviewers on a convenience basis rather than using the probabilistic list. These
four indicators are associated with sociodemographic variables such as age, gender, and education that are wellknown predictors of participation and compliance.
On average, subjects ages 65 years and older, female, or
with lower levels of education required significantly more
help and more time to complete the questionnaire and also
had a higher level of inconsistent responses (Table 5). The
TABLE 4. Details about the normative sample
Time required for completion of the questionnairea
Mean minutes
Less than 10 minutes
11–20 minutes
More than 20 minutes
Level of help for completion of the questionnaire
None
Questions read by interviewer
Questions explained by interviewer
Response Consistency Index (RCI)
No failures
Only one failure
More than one failure
Kind of respondentb
Random sampled
Convenience sampled
aSF-36
No.
%
—
652
1191
188
—
32.1
58.6
9.3
1.000
483
548
49.3
23.8
26.9
1881
82
68
92.6
4.0
3.4
1390
641
68.4
31.6
15.4
items and 16 additional questions pertaining to self-reported medical information.
contacted when the first and second choices (random sampled) were not reached. Interviewers were
then allowed to identify in the surroundings alternative respondents with matching age and gender characteristics.
bCase
The Italian Version of the SF-36
1031
TABLE 5. A selected list of indicators of data quality (normative sample)
Age (y)
Cases assisted by interviewers (%)
Mean time (in mintues)required for completionc
Cases requiring more than 20 minutes for completion (%)
Cases not randomly sampled (%)
Response Consistency Index (RCI) $1
Gender
Education (y)
,45
45–65
.65
Male
Female
#8
.8
Total
11.9
14.5
6.2
34.1
4.1
30.4
15.6
9.8
34.4
5.3
57.6a
17.6d
15.8a
20.0b
19.5a
23.8
15.3
8.2
30.6
6.6
30.0b
16.6
10.3
32.4
8.2
40.7
16.6
13.3
27.1
10.1
8.3a
13.9d
3.7
37.6b
3.7a
26.9
15.4
9.3
31.6
7.4
aChi-square
test, P , 0.001.
test, P , 0.005.
cSF-36 items and 16 additional questions pertaining to self-reported medical information.
dAnalysis of variance (ANOVA), P , 0.005.
bChi-square
proportion of individuals who were identified not on a random basis varied according to age and educational level;
elderly and less educated people were significantly less
likely to be sampled on a convenience basis than through
random sampling.
PSYCHOMETRIC RESULTS. Overall, the pattern of results from
the multitrait analysis confirmed the multidimensional
conceptualization of the SF-36 scales (Table 6) and compared well with that of the United States. In all cases the
within-scale item correlations were homogeneous and for
all but one item (item GH3: I am as healthy as anybody I
know) the 0.40 standard for item-internal consistency was
exceeded by far. Tests of scaling success were also satisfactory, with no failures in 280 tests. Cronbach’s alpha coefficient always exceeded the recommended level of 0.70
(range from 0.77 to 0.93) [36,37], with the lowest values in
the GH and SF scales. Table 7 shows internal consistency
reliability coefficients according to a selected list of relevant characteristics, such as gender, age, and education. Although there is more variation across groups (range, 0.55–
0.94), in general, internal consistency reliability is high,
with most of the lowest values in the GH scale, the young
age groups and the more educated samples. These findings
still held after exclusion of the cases who received help in
completing the questionnaire (“Receiving help” Group,
N 5 548) or who were sampled on a convenience basis
(“Non random sample” Group, N 5 641).
Principal components analysis confirmed the hypothesized physical and mental dimensions of health seen in U.S.
data [16,35]. The proportion of total variance in each scale
explained by the two extracted components in the Italian
data was 60 to 78% across scales, indicating that the two
factors explained the majority of the variance in each scale.
The ordering of correlations between the eight scales and
both factors was generally equivalent in the United States
and Italy. The PF scale correlated highest and the MH lowest with the physical component, whereas the MH scale
was associated more with the mental (3rd rank) than with
the physical (6th rank) factor. After the exclusion of respondents who received help in completing the survey, results came closer to the U.S. data (Table 8).
TABLE 6. Item scaling test and reliability estimates in the normative sample
Range of item-scale correlations
PF, Physical functioning
RP, Role-physical
BP, Bodily pain
GH, General health
VT, Vitality
SF, Social functioning
RE, Role-emotional
MH, Mental health
aNumber
Item scaling tests
Scale
Ka
Item internal
consistencyb
Item discriminant
validityc
Item internal
consistency testd
Item discriminant
validity teste
Reliabilityf
10
4
2
5
4
2
3
5
0.63–0.81
0.73–0.80
0.77
0.29–0.72
0.51–0.63
0.63
0.70–0.76
0.57–0.73
0.24–0.60
0.31–0.58
0.37–0.63
0.14–0.66
0.28–0.58
0.28–0.60
0.22–0.58
0.20–0.64
100
100
100
80
100
100
100
100
100
100
100
100
100
100
100
100
0.93
0.89
0.85
0.77
0.78
0.77
0.85
0.85
of items and number of item-internal consistency tests per scale.
between items and hypothesized scale corrected for overlap.
cCorrelations between items and other scales.
dItem internal consistency scaling success, i.e., number of item-scale correlations greater than 00.40/total number of correlations (corrected for overlap).
eItem discriminant validity scaling success, i.e., number of correlations of items with own scales significantly higher than correlations with other scales/total number of correlations.
fInternal-consistency reliability (Cronbach’s alpha).
bCorrelations
G. Apolone and P. Mosconi
1032
TABLE 7. Reliability (Cronbach’s a) estimates for SF-36 scales in different subgroups (normative
sample)
Total
Gender
Male
Female
Age (y)
18–24
25–34
35–44
45–54
55–64
65–74
$75
Education (y)
#5
6–8
.8
Level of help for completion
of the questionnaire
Nonhelp
Receiving help
No. Cases
PF
RP
BP
GH
VT
SF
RE
MH
2031
0.93
0.89
0.85
0.77
0.78
0.77
0.85
0.85
999
1032
0.93
0.93
0.90
0.89
0.84
0.85
0.76
0.77
0.76
0.77
0.77
0.77
0.84
0.86
0.81
0.86
193
367
373
351
367
258
122
0.74
0.89
0.83
0.86
0.93
0.92
0.93
0.84
0.83
0.87
0.85
0.89
0.91
0.93
0.66
0.80
0.78
0.83
0.84
0.90
0.90
0.55
0.62
0.68
0.65
0.77
0.78
0.82
0.71
0.63
0.71
0.75
0.81
0.83
0.82
0.75
0.65
0.73
0.76
0.81
0.85
0.73
0.80
0.75
0.85
0.84
0.88
0.85
0.90
0.75
0.77
0.84
0.82
0.88
0.88
0.90
673
499
859
0.94
0.91
0.87
0.92
0.86
0.85
0.88
0.83
0.77
0.81
0.66
0.65
0.81
0.73
0.71
0.83
0.73
0.71
0.89
0.85
0.80
0.87
0.84
0.80
1483
548
0.90
0.94
0.87
0.92
0.83
0.87
0.71
0.82
0.72
0.84
0.74
0.82
0.83
0.89
0.82
0.89
Abbreviations: PF 5 physical functioning; RP 5 role physical; BP 5 bodily pain; GH 5 general health; VT 5
vitality; SF 5 social functioning; RE 5 role-emotional; MH 5 mental health.
Table 9 presents the multivariate relationship between
the GH scale, which measures the overall general perception of health, and the other SF-36 scales. We conducted
this analysis to see how well the SF-36 represents the universe of concepts people consider in evaluating their
health, in the whole sample and after excluding the group
that received assistance, in order to assess whether the conceptualization of health is invariant to the amount of help
received. Both analyses gave results comparable with the
U.S. reference data. Each scale was significantly related to
the overall perception of health, with product-moment correlations always higher than 0.40 and more than 60% of the
TABLE 8. Relationship between the 8 SF-36 scales and physical and mental components in the
U.S. and Italy normative samplea
U.S. data
U.S. population
(2474)
Rotated principal
componentsb
PF, Physical functioning
RP, Role-physical
BP, Bodily pain
GH, General health
VT, Vitality
SF, Social functioning
RE, Role-emotional
MH, Mental health
aGeneral
Italian data
Whole sample
(2031)
Rotated principal
componentsc
Without help group
(1483)
Rotated principal
componentsd
Physical
Mental
Physical
Mental
Physical
Mental
0.85
0.81
0.76
0.69
0.47
0.42
0.17
0.17
0.12
0.27
0.28
0.37
0.64
0.67
0.78
0.87
0.84
0.53
0.75
0.83
0.62
0.34
0.17
0.44
0.22
0.55
0.34
0.29
0.56
0.79
0.86
0.66
0.86
0.64
0.73
0.75
0.44
0.27
0.23
0.25
0.13
0.42
0.37
0.34
0.70
0.81
0.76
0.81
population data [35].
associations from U.S. empirical data.
cActual correlation coefficents between each SF-36 scale and rotated orthogonal principal component in the
whole Italian sample.
dActual correlation coefficents between each SF-36 scale rotated orthogonal principal component in the
group who did not need help to complete the questionnaire.
bHypothesized
The Italian Version of the SF-36
1033
TABLE 9. Relationships between responses to SF-36 scales and the general health perception
(GH) scalea in the Italian normative sample and U.S. datab
Italian data
Whole sample
(2031)
U.S. data
(2149)
bc
PF, Physical functioning
RP, Role-physical
BP, Bodily pain
VT, Vitality
SF, Social functioning
RE, Role-emotional
MH, Mental health
Reliable variance explained, %
Adjusted rb
rd
0.22*
0.09*
0.20*
0.23*
0.03
20.00
0.24*
62
0.48
0.56*
0.55*
0.56*
0.58*
0.47*
0.36*
0.46*
bc
0.32*
0.05*
0.15*
0.21*
0.02
20.03**
0.17*
62
0.48
Italian data
without help
group (1483)
rd
bc
rd
0.68*
0.55*
0.63*
0.66*
0.52*
0.43
0.58*
0.30*
0.05*
0.17*
0.20*
0.02
20.02
0.15*
0.65*
0.50*
0.59*
0.58*
0.47*
0.39*
0.52*
52
0.52
aResults from an ordinal multivariable least squares regression model having the GH scale as dependent variable
and the other seven scales as independent variables.
bSee references (16, 38, and 39).
cUstandardized regression coefficients.
dProduct moment correlations.
*P , 0.001.
**P , 0.05.
reliable variance explained by the model. The PF, VT, and
MH scales always showed the highest correlations and SF
and RE the lowest.
Score distributions for all scales
in the whole sample are reported in Table 10. As expected
in relatively healthy samples, the full range of values was
present, medians were always higher than means, with the
negative skewness pattern indicating a tendency toward the
positive end of the scales. The bipolar scales that measure
well-being as well as limitations (VT, GH, MH), showed
lower average values and, in general, wider score distributions. The proportion of the population for whom an improvement (ceiling) or decline (floor) in health could not
be measured is larger with the unipolar limitation scales
(PF, RP, BP, and RE). Estimates according to age and genDESCRIPTIVE STATISTICS.
der groups are reported elsewhere [34] or can be obtained
on request from the Authors.
DISCUSSION AND FUTURE PLANS
Despite demand for questionnaires for assessing population
health, and the effectiveness of health technologies across
countries and languages, systematic efforts to evaluate empirically the equivalence and validity of translations of original English questionnaires have been rare.
The IQOLA project tested the assumption that the SF-36,
a short questionnaire originally developed in the United
States and probably the most widely used instrument in English-speaking countries, can be translated, validated, and
normed in other languages while maintaining its excellent
content and psychometric and clinical validity. Other arti-
TABLE 10. Descriptive statistics and score distribution in the Italian normative sample (2031)
No. items
Mean
Median
Range
Standard deviation
% Floor
% Ceiling
CV
Skewness
Kurtosis
PF
RP
BP
GH
VT
SF
RE
MH
10
84.46
95
0–100
23.18
1.0
40.1
27.44
21.87
2.85
4
78.21
100
0–100
35.93
12.6
67.4
45.94
21.35
0.20
2
73.67
84
0–100
27.65
1.1
41.1
37.53
20.68
20.68
5
65.22
70
0–100
22.18
0.6
1.5
34.01
20.85
0.16
4
61.89
65
0–100
20.69
0.5
1.8
33.44
20.55
20.03
2
77.43
87.5
0–100
23.34
0.8
32.3
30.14
21.00
0.34
3
76.16
100
0–100
37.25
14.3
66.4
48.91
21.19
5
66.59
68
0–100
20.89
0.3
2.9
31.37
20.76
0.2
20.21
Abbreviations: PF 5 physical functioning; RP 5 role physical; BP 5 bodily pain; GH 5 general health; VT 5
vitality; SF 5 social functioning; RE 5 role-emotional; MH 5 mental health.
1034
cles in this issue, reporting on details about the IQOLA organization and project objectives, the methods, and the
empirical findings from cross-cultural comparisons, document the methods and confirm the validity and comparability of the national versions produced.
This article summarizes the work done in Italy and discusses the empirical evidence by comparing the Italian results with the standards suggested in the literature and with
the results from the original evaluations. Although the
overall picture is satisfactory in terms of feasibility, psychometric tests, and interpretability of the Italian SF-36 questionnaire, a few findings are worth commenting on, in order to clarify the validity and generalizability of the methods
and results.
First, satisfactory results were obtained in a wide array of
studies, differing in terms of year of implementation, design,
size, and sociodemographic characteristics. This strengthens the idea that the SF-36 is a multidimensional questionnaires applicable and valid across age, gender, and kind and
severity of disease.
The second point concerns the association between better psychometric indicators and the year of implementation, which suggests that the use of psychometric methods
to evaluate the translations led to a better questionnaire.
This was the case for the SF and VT scales. For instance, in
the first version of the IQOLA translation that was applied
in the GISSI-3 study, the original VT3 item Did you feel
worn out was translated as Si è sentito esaurito, a question
that actually suggests a psychological state of a person suffering from a “nervous breakdown” more than the perception of tiredness or vitality. This item acted as an outlier. It
had a high mean compared with the other vitality items
(4.75 versus 3.75, 3.83, and 3.84), its correlation with the
sum of the other items hypothesized belonging to the vitality scale was the lowest among the four vitality items; and it
correlated more with the MH scale than with the VT scale.
Thus, based on empirical data from the multitrait analysis,
problems were identified, translators and experts discussed
the findings at formal meetings, and the problems were
solved by changing the wording. Such findings support the
validity of the methods adopted in the IQOLA project, as
the translations yielded more reliable and valid scales over
time, with the last versions giving the best results.
Third, despite the improvement during the translation
process, the GH scale, which measures how people evaluate
their own overall health status, stood out for its relatively
poor performance in terms of item-internal consistency,
item-discriminant validity and reliability in both the preIQOLA studies and in some age and gender subgroups of
the IQOLA normative sample. This relatively poor performance seems to be associated with age, with lower reliability
estimates for younger individuals. Although these findings
do not compromise the overall quality of this scale because
it met all the recommended standards in terms of scaling assumptions (it passed 99% of the item-discriminant tests,
G. Apolone and P. Mosconi
and its reliability estimates were above 0.70 in five out of
the six studies examined), our results indicate that not all
the original SF-36 scales have the same degree of validity in
Italy. The behavior of the GH scale, more evident in young
and more educated people, calls for further analyses.
On the other hand, analysis of the relationship between
responses to the SF-36 scales and the general health perception scale showed a substantial amount of variance that
is not accounted for by the other seven scales, suggesting
that it is important to include the GH scale in the SF-36 to
capture the impact of health constructs not measured by
other scales and to provide a direct measure of each respondent’s evaluation of his or her health state. Comparable
findings indicating that a substantial amount of variance
remains to be explained were also seen in the U.S. databases [38] and in other European studies [16,39].
Finally, normative data, essential to enable within and
between country comparisons and score interpretation, also
allowed a complete assessment of the questionnaire in
terms of participation, data quality, hypothesized constructs, and psychometric behavior of the scales. Against a
very satisfactory overall picture, in which multitrait and
factor analyses confirm that the recommended psychometric standard met by the Italian translation was comparable
with the source questionnaire, one particular finding needs
deeper discussion and more detailed analyses in the future.
This regards the amount of help required for questionnaire
completion and the sampling strategy adopted to identify
individuals to be contacted. In this survey, 2031 persons
were contacted and actually reached with the methods described. In most cases, individuals were identified on a random basis (68%) and needed no help for questionnaire
completion (73%). Overall, 969 (47.7%) of 2031 cases
were random sampled and did not receive any help to complete the questionnaire. The need for help was associated
with sociodemographic indicators such as age, gender, and
education. If we had used alternative approaches (selfadministration with a mail survey or phone survey using a
random or convenience sample), we would probably had a
higher rate of nonresponse and much more missing data at
an item level, with much less information about the representativeness of the sample and the presence, direction,
and amount of potential bias in the completion of the questionnaire.
Although preliminary stratified and sensitivity analyses
have shown that most of the results are consistent across
the main strata, and reliability estimates did not change after excluding specific subgroups, because we cannot exclude
that systematic distortion may have been introduced by the
request for help (and the type of help given by the interviewer), in future we will check whether it is worth estimating norms for different modes of administration, as others
have done in different settings [40].
In conclusion, we have provided empirical data to describe and illustrate with the results of a prospective, multi-
The Italian Version of the SF-36
national project to determine the feasibility of translating,
validating, and norming the SF-36 in a non-English speaking country. The Italian version of the SF-36 appears to be
a valid and reliable multidimensional questionnaire, either
comparing the Italian data with the original U.S. data, or
when the different IQOLA translations are evaluated and
compared using the whole IQOLA database.
Further studies supporting the clinical validity of the
SF-36 are under way in Italy, using data from multiple
sources. Additional information on the application of the
SF-36 in Italy is provided elsewhere [41–45], or is forthcoming (Bamfi F, Fasolo A, De Carli GF, Recchia G, Cifani S,
Mosconi P, et al. Submitted for publication; Mingardi G,
Apolone G, Ruggiata R, Mosconi P on behalf DIA-QOL
Group. Submitted for publication; Mosconi P, Cifani S,
Crispino S, Fossati R, Apolone G and the Head and Neck
Cancer Italian Working Group. Submitted for publication).
In addition to the sponsors of the International Quality of Life Assessment (IQOLA) project, this work was partially supported by grants
from the CNR-National Research Council (ACRO grant no.
96.00763.PF39), AIRC (Associazione Italiana per la Ricerca sul
Cancro), and Glaxo Wellcome Italia. We are particularly indebted to
Barbara Gandek and John Ware, whose suggestions were of great
value. We also thank Gianfranco Decarli and Giuseppe Recchia
(Glaxo Wellcome Italia) for the constructive comments on a previous
version of the manuscript.
1035
10.
11.
12.
13.
14.
15.
16.
17.
18.
References
1. McDowel I, Newell C. Measuring health: A Guide to Rating
Scales and Questionnaires. New York: Oxford University
Press; 1987.
2. Walker S, Rosser R. Quality of Life: Key Issue in the 1990.
Dordrecht: Kluwer Academic Press; 1992.
3. Anderson RT, Aaronson NK, Wilkin D. Critical review of
the international assessments of health related quality of life.
Qual Life Res 1993; 2: 369–395.
4. Tamburini M, Rosso S, Gamba A, Mencaglia E, De Conno F,
Ventaffrida V. A therapy questionnaire for quality of life
assessment in advanced cancer research. Ann Oncol 1992; 3:
565–570.
5. GIVIO Investigators. Impact of follow-up testing on survival
and health-related quality of life in breast cancer patients. A
multicenter randomized controlled trial. JAMA 1994; 271:
1587–1592.
6. The European Group for Quality of Life and Health Measurement. European Guide to the Nottingham Health Profile.
Bucquet D, Ed. France: Escubase 8 Press; 1992.
7. Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A,
Duez NJ, et al. The European Organization for Research and
treatment of cancer QLQ-C30: A quality of life instrument for
use in international clinical trials in oncology. J Nat Cancer
Inst 1993; 85: 365–376.
8. WHOQOL Group, Division of Mental Health, World Health
Organization. Study protocol for the World Health Organization: organization to develop a Quality of Life assessment
instrument (WHOQOL). Qual Life Res 1993; 2: 153–159.
9. Ware JE, Sherbourne CD. The MOS 36-Item Short-Form
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Health Survey (SF-36) I. Conceptual framework and item
selection. Med Care 1992; 30: 473–483.
McHorney CA, Ware JE, Raczek AE. The MOS 36-Item
Short-Form Health Survey (SF-36): II. Psychometric and
clinical tests of validity in measuring physical and mental
health constructs. Med Care 1993; 31: 247–263.
McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS
36-item short-form Health Survey (SF-36): III. Tests of data
quality, scaling assumptions, and reliability across diverse
patients groups. Med Care 1994; 32: 40–66.
Ware JJ. SF-36 Health Survey. Manual and Interpretation
Guide. Boston, MA: The Health Institute, New England
Medical Center; 1993.
Aaronson NK, Acquadro C, Alonso J, Apolone G, Bucquet
D, Bullinger M, et al. International quality of life assessment
(IQOLA) project. Qual Life Res 1992; 1: 349–351.
Ware JE, Gandek B, and the IQOLA Project Group. The
SF-36 Health Survey: Development and use in mental health
research and the IQOLA Project. Int J Ment Health 1994;
23: 49–73.
Ware JE, Gandek B, Keller SD, and the IQOLA Group. Evaluating instruments used cross-nationally: Methods from the
IQOLA Project. In: Spilker B, Ed. Quality of Life and Pharmacoeconomics in Clinical Trials, Second Edition. New
York: Raven Press; 1995.
Ware JE, Keller SD, Gandek B, Brazier JE, Sullivan M, and
the IQOLA Project Group. Evaluating translations of Health
Status questionnaires: Methods from the IQOLA Project. Int
J Tech Assess Health Care 1995; 11: 525–551.
Bosisio M, Parma E, Apolone G, Bertolini G. La valutazione
dello stato di salute in pazienti ambulatoriali con bronchite
cronica. Uno studio di esito nella medicina generale. Ric
Practica 1996; 12: 55–62.
GIVIO. Survey of treatment of primary breast cancer in Italy.
Br J Cancer 1988; 57: 630–634.
Mosconi P, Meyerowitz BE, Liberati MC, Liberati A, on
behalf of GIVIO. Disclosure of breast cancer diagnosis. Ann
Oncol 1991; 2: 273–280.
Schipper H, Clinch J, McMurray A, Levitt M. Measuring the
quality of life of cancer patients: the Functional Living IndexCancer. J Clin Oncol 1984; 2: 472–483.
Apolone G, Mosconi P, Liberati A on behalf of GIVIO. Validation process of health-related quality of life questionnaire:
the GIVIO approach. Qual Life Res 1994; 3: 69.
Ware J, Gandek B. Overview of the SF-36 Health Survey and
the International Quality of Life Assessment (IQOLA)
Project. J Clin Epidemiol 1998; 51(11): 903–912.
Bullinger M, Alonso J, Apolone G, Leplège A, Sullivan M,
Wood-Dauphinee S, et al. Special issue: Translating health
status questionnaires and evaluating their quality: the IQOLA
Project approach. J Clin Epidemiol 1998; 51(11): 913–923.
Ware J, Gandek B. Methods for testing data quality, scaling
assumptions, and reliability: The IQOLA Project approach. J
Clin Epidemiol 1998; 51(11): 945–952.
Ware J, Gandek B. Methods for validating and norming translations of health status questionnaires: The IQOLA Project
approach. J Clin Epidemiol 1998; 51(11): 953–959.
Thurstone LL, Chave EJ. The measurement of attitude. Chicago: University of Chicago Press; 1929.
GISSI-Gruppo Italiano per lo Studio della Sopravvivenza
nell’Infarto Miocardico. GISSI Study protocol on the effect of
lisinopril, of nitrates, and of their association in patients with
acute myocardial infarction. Am J Cardiol 1992; 70: 62C–69C.
GISSI-3-Gruppo Italiano per lo Studio della Sopravvivenza
nell’Infarto Miocardico. GISSI-3: Effect of lisinopril and
transdermal glyceryl trinitrate singly and together on 6-week
1036
29.
30.
31.
32.
33.
34.
35.
36.
37.
mortality and ventricular function after acute myocardial
infarction. Lancet 1994; 343: 1115–1122.
GISSI-Nursing. Valutazione della percezione della qualità
della salute da parte del paziente con infarto miocardico. Rapporto finale dello studio. Riv Infermiere 1995; 16–29.
GISSI-NURSING: Valutazione della percezione della qualità
della vita e salute da parte del paziente con infarto del miocardio. G Ital Cardiol 1997; 27: 865–876.
Apolone G, Mosconi P. Validazione psicometrica e clinica del
questionario per la valutazione della qualità della vita
nell’emicranico. Internal Report. Milano: Istituto Mario
Negri; January 1995.
Mosconi P, Apolone G, Liberati A. Psychological impact of
neoplastic disease: an epidemiological perspective. Validation
process of a health-related quality of life questionnaire. Roma:
CNR-Italian National Council of Research, Giugno; 1994.
Mingardi G, on behalf of DIA-QOL Group. From the development to the clinical apllication of a questionnaire on the
Quality of Life in dialysis. Nephrol Dial Transplant 1998; 13
(Suppl. 1): 70–75.
Apolone G, Mosconi P, Ware J. Il questionario sullo stato di
salute SF-36. Manuale d’uso e Guida all’Interpretazione dei
Risultati. Milan, Italy: Guerini Ed Associati; 1997.
Ware JE, Kosinski M, Keller SD. SF-36 Physical and Mental
Summary Scales: A User’s Manual. Boston, MA: The
Health Institute; 1994.
Cronbach LJ. Coefficent alpha and the internal structure of
tests. Psychometrika 1951; 16: 297–334.
Nunnally JC. Psychometric Theory. New York: McGrawHill; 1987.
G. Apolone and P. Mosconi
38. Davies AR, Ware JE. Measuring Health Perceptions in the
Health Insurance Experiment. Santa Monica, CA: Rand
Corporation, 1981. (Health Insurance Experiment Serie,
RAND #R-2711-HHS).
39. Jenkinson C, Wright L, Coulter A. Criterion validity and reliability of the SF-36 in a population sample. Qual Life Res
1994; 3: 7–12.
40. McHorney CA, Kosinski M, Ware JE. Comparisons of the
costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview; Results from a
national survey. Med Care 1994; 32: 551–567.
41. Crosignani PG, Vercellini P, Apolone G, De Giorgi D, Cortesi I, Mestia M, et al. Endometrial resection versus vaginal
hysterectomy for menorrhagia: long term clinical and quality
of life outcomes. Am J Obstet Gynecol 1997; 177: 95–101.
42. Crosignani PG, Vercellini P, Mosconi P, Oldani S, Cortesi I,
De Giorgi O. A levonorgestrel-releasing intrauterine device
versus hysteroscopic endometrial resection in the treatment of
dysfunctional uterine bleeding. Obstet Gynec 1997; 90: 257–
263.
43. Apolone G, Cifani S, Mosconi P. Questionario sullo stato di
salute SF-36. Traduzione e validazione della versione italiana:
Risulttai del progetto IQOLA. Medic 1997; 2: 86–94.
44. Mosocni P, Apolone G. SF-36: La qualità della vita. Esperienze
italiane. SIMG. J Ital Coll Gen Practitioner 1998; 3: 4–8.
45. Apolone G, Filiberti A, Cifani S, Ruggiata R, Mosconi P. The
evaluation of the EORTC QLQ-C30 questionnaire: A comparison with SF-36 health survey in a cohort of Italian long
survival cancer patients. Ann Oncol 1998; 9: 549–557.