Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Italian SF36 Health Survey

1998, Journal of Clinical Epidemiology

J Clin Epidemiol Vol. 51, No. 11, pp. 1025–1036, 1998 Copyright © 1998 Elsevier Science Inc. All rights reserved. 0895-4356/98 $–see front matter PII S0895-4356(98)00094-8 The Italian SF-36 Health Survey: Translation, Validation and Norming Giovanni Apolone* and Paola Mosconi Dipartimento di Oncologia, Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy ABSTRACT. This article reports on the development and validation of the Italian SF-36 Health Survey using data from seven studies in which an Italian version of the SF-36 was administered to more than 7000 subjects between 1991 and 1995. Empirical findings from a wide array of studies and diseases indicate that the performance of the questionnaire improved as the Italian translation was revised and that it met the standards suggested by the literature in terms of feasibility, psychometric tests, and interpretability. This generally satisfactory picture strengthens the idea that the Italian SF-36 is as valid and reliable as the original instrument and applicable and valid across age, gender, and disease. Empirical evidence from a cross-sectional survey carried out to norm the final version in a representative sample of 2031 individuals confirms the questionnaire’s characteristics in terms of hypothesized constructs and psychometric behavior and gives a better picture of its external validity (i.e., robustness and generalizability) when administered in settings that are very close to real world. J CLIN EPIDEMIOL 51;11:1025–1036, 1998. © 1998 Elsevier Science Inc. KEY WORDS. Health-related quality of life, SF-36 Health Survey, reliability, validity, cross-cultural validation INTRODUCTION Interest in measuring the aspects of health most closely related to quality of life, usually referred to as health-related quality of life (HR-QOL), has increased in recent years in Italy as in other countries. Advances have been made in methods for describing patients’ subjective health status using standardized measures, and several valid and reliable patient-based measures are available either as generic or disease and treatment targeted questionnaires. However, most of these are in English and are intended for use in Englishspeaking settings [1–3]. Very few brand new country-specific instruments have been developed in Italy to measure HR-QOL, and few English instruments that met the investigators’ need have been translated [4,5]. With few but relevant exceptions that were actually translated or developed as part of multilanguage and multinational projects [6–8], most of such efforts were produced by isolated groups of researchers who seldom published their findings in peer-reviewed journals. Among the so-called generic measures, the MOS 36Item Short Form Health Survey (SF-36) is known for its comprehensiveness, brevity, and high standards of reliability and validity [9–12]. It was first translated by independent Italian teams in 1990. With the launch of the Interna*Address for correspondence: Giovanni Apolone, MD, Istituto di Ricerche Farmacologiche Mario Negri, Via Eritrea 62, 20157 Milan, Italy. Accepted for publication on 7 July 1998. tional Quality of Life Assessment (IQOLA) project [13–16] in 1991, the description of the use of the SF-36 in Italy can be spilt up into three parts. The first part regards the preIQOLA phase (in which independent translations were used in selected series of studies), the second an intermediate-IQOLA phase (in which an effort was made to trace all potential users and to centralize and harmonize the use of the SF-36 in Italy), and the third and final phase (in which a standardized and accredited Italian SF-36 version was tested in a large representative sample of the Italian general population) (see Figure 1 for a synopsis). This is the first comprehensive English language report on the development of the Italian SF-36, and it traces the improvement in the psychometric properties of the form as the translation evolved. It also reports on findings from seven studies in which an Italian version of the SF-36 was administered to more than 7000 subjects during 1991 to 1995. MATERIAL AND METHODS Pre-IQOLA Studies As summarized in Figure 1 and Table 1, before the launch of the IQOLA project, at least two independent translations of the SF were being used in Italy, one produced by a team of researchers at the Mario Negri Institute (IRFMN), the other by a pharmaceutical company, Glaxo Italy in Verona. The two translations were quite similar not only because both were produced using a similar forward-backward methodology but also because both were eventually revised G. Apolone and P. Mosconi 1026 FIGURE 1. Synopsis of the International Quality of Life Assessment (IQOLA) project in Italy. by one of the authors of this paper (GA). Two studies in which these two pre-IQOLA translations were administered are outlined here: STUDY NO. 1. In 1991, 28 general practitioners participated on voluntary basis in a study aimed at exploring the feasibility of HR-QOL assessment in general practice by assessing the applicability and performance of SF-36 in a sample of 243 patients with chronic obstructive pulmonary disease (COPD) [17]. Besides formal evaluation of the psychometric and clinical validity characteristics of the SF-36 when self-administered to patients, a formal comparison of physicians and patients agreement about patients’ HR-QOL was also carried out by administering a slightly modified version of the questionnaire to physicians (i.e., a rewording of the questions pertaining to some physical and mental scales was introduced in order to develop a proxy version of the questionnaire). Patients and physicians were instructed to complete the questionnaires simultaneously but independently during a scheduled office visit. TABLE 1. Selected characteristics of seven studies adopting the SF-36 Pre-IQOLA Intermediate IQOLA Final IQOLA Characteristic Breast Healthy COPD cancer people GISSI-3 Migraine study study study AMI trial study Dialysis study Normative survey Year No. cases Age (mean) years Age (% . 65 years) Sex (% male) School (mean) years School (% . 8 years) 1991 243 70.6 75.7 61.7 6.5 16.0 1994–95 246 59.7 40.3 48.6 6.5 19.3 1995 2031 47.7 18.7 49.2 9.1 57.7 1991 178 52.8 14.2 0.0 10.5 67.8 1993 50 45.5 12.0 42.0 6.1 68.0 1992–93 3278 61.0 37.1 81.0 7.4 26.0 1993–94 1524 38.8 1.9 26.6 No No Abbreviations: IQOLA 5 International Quality of Life Assessment; COPD 5 chronic obstructive pulmonary disease; AMI 5 acute mycardial infarction. The Italian Version of the SF-36 STUDY NO. 2. In the framework of the activity of GIVIO (Interdisciplinary Group for Cancer Care Evaluation), a cooperative group involved in a research program monitoring quality of care since 1986 [18], a randomized multicenter trial was launched to test the impact on overall survival and HR-QOL of two different follow-up protocols. In order to assemble and validate the questionnaire to be adopted in the trial, a multistep approach has been used [5,19]. In 1991, a developmental version of the GIVIO questionnaire was tested, administering it together with the Functional Living Index-Cancer Scale [20] and the SF-36 to a sample of patients with breast cancer. The three questionnaires were mailed with a cover letter explaining the aim of the study to 273 members of an Italian association of patients with breast cancer (Attive Come Prima). One hundred seventy-eight women (65%) mailed back a complete set of questionnaires, enabling a formal concurrent validation [21]. IQOLA in Italy As described in detail in other articles in this issue [22–25], the IQOLA project was launched in 1991 with the goals of translating, adapting, and testing the cross-cultural applicability of the SF-36 with the assistance of sponsored investigators from 14 different countries. In Italy, the method presented in the general articles in this issue has been applied with a few operative modifications that are described briefly here. In the first stage of the IQOLA project (i.e., translation and qualitative and quantitative evaluation to ensure semantic equivalence and acceptance in each country), two professional translators with experience of “health and quality of life terminology” but not with the SF-36, independently produced two forward-translations and, after a first meeting, agreed on a common version with a list of alternatives for the controversial item stems and response choices. Next, after a meeting with the leader of the Italian team, some revisions were made and a second common version was produced. During this step, each forward-translator rated the difficulty in translation, and two external experts (bilingual and with specific background on the issue of health outcome assessment) and the leader of the Italian team rated the quality of translation in terms of common language, clarity, and conceptual equivalence. Problematic items and response choices were identified and reentered into the forward-translation process. The third forward-translation entered the backwardtranslation process: two professional American-Italian translators, not familiar with the SF-36, worked independently to produce two English versions that were sent to the IQOLA project headquarters in Boston (NEMC) for review. In the meantime, two other American-Italian translators rated the forward versions in terms of difficulty and quality. Finally, a meeting was held with the leader of the Italian team, all translators, a general practitioner, and a nurse (both with experience in health outcome assess- 1027 ment). Using all the documentation available, a new version was produced to be field tested. Empirical testing was done first using a Thurstone exercise [26] to test the equivalence, ordinality, and interval properties for translations of the original response choices. The test was carried out in two independent convenience samples of the lay public (in the first sample, 30 subjects positioned their values on an anchored 10-cm line, and in the second sample, 60 subjects positioned their values on a floating 10-cm line). A further version of the questionnaire was developed using the empirical findings from this Thurstone exercise to select the response choices closest to those in the source instrument. This version was self-administered to a convenience sample of healthy individuals to assess the acceptance, clarity, and the use of common language. STUDY NO. 3. Fifty people, identified on a convenience basis in two different geographic settings (North and South Italy), first completed the questionnaire, then answered a standardized debriefing questionnaire, and, finally, were interviewed by one of the authors or by a trained psychologist. In the second stage (i.e., formal psychometric tests of assumptions underlying item and scale scoring), an effort was made to centralize and harmonize the use of the SF-36 in Italy by recommending the use of the most suitable IQOLA version in as many studies as possible. Overall, the SF-36 IQOLA version has been adopted in several studies, three of which have been completed, allowing a preliminary analysis of the questionnaire’s performance. STUDY NO. 4. In 1992, in the framework of the GISSI-3 trial (Gruppo Italiano per lo Studio della Sopravvivenza dell’Infarto Miocardico)—a randomized controlled trial on the effect of very early administration of ACE inhibitors and nitrates, which included 20,000 patients with acute myocardial infarction—a sample of patients from 89 coronary care units was included in an ad hoc protocol aimed at evaluating the patient’s perception of health-related quality of life issues. Details about the GISSI trial design and 6-week follow-up results regarding clinical end points and questionnaire psychometric properties have been presented elsewhere [27–30]. Briefly, an ad hoc multidimensional instrument was assembled starting from well-known previously validated questionnaires. The GISSI-Nursing questionnaire includes 38-items that generate a symptom checklist and 9 scales. Eighteen items and 5 scales were excerpted from the SF-36. The validation process included a formal test–retest reliability assessment in an independent sample of 100 patients using an interval period of 3 days. Overall, more than 4400 patients have been enrolled into the study and followed-up over time, yielding samples ranging from 3278 baseline cross-sectional to 1919 longitudinal cases. G. Apolone and P. Mosconi 1028 STUDY NO. 5. In 1993, a study aimed at evaluating the characteristics of two different questionnaires, the SF-36 and the Migraine Specific Questionnaire (MSQ), was carried out in a large cohort of people with migraine, one of the most common complaints of patients presenting to general practitioners in Italy [31]. The study, a pilot phase of a more extensive project directly conducted by a pharmaceutical company, Glaxo Italy in Verona, with the external assistance of the Italian IQOLA team, was actually carried out with the collaboration of 150 neurological centers in a 12-month recruitment period. Cross-sectional data on 1524 patients are presently available for the psychometric analyses of both questionnaires. STUDY NO. 6. In 1992, in the framework of a project sponsored by the CNR-Italian National Council of research [21,32,33], a study aimed at developing and validating a HR-QOL questionnaire for patients undergoing dialysis was launched. After the item selection and reduction steps and following test on small focus groups, a prospective multicenter study was planned to evaluate longitudinally the characteristics of the new questionnaire by comparing it with the most suitable SF-36 IQOLA version. During a 1-year period (1994–1995), more than 500 cases were recruited, and, according to the study design, the SF-36 was delivered to half of them, yielding a validation sample of 246 patients. According to the IQOLA project protocol, at the end of the translation and validation phases, a cross-sectional survey was carried out in 1995 to norm the final Italian SF-36 in a representative sample of noninstitutionalized Italians aged 18 years and over. The survey was conducted by DOXA (Istituto di Ricerche Statistiche e Analisi dell’Opinione Pubblica, Milano, Italy) with the supervision of the IQOLA Italian team. Following the guidelines recommended by the project protocol, a multistep random sampling strategy was adopted to draw a large, representative sample. The method applied is briefly described. Details are given elsewhere [34]. STUDY NO. 7. The universe to which the survey refers to is all Italians aged 18 years and older, or about 45.2 million persons, in all regions of Italy. This universe was split into sections (strata) according to regions and size of the commune of residence. In each stratum (e.g., communities of Lombardia Region with fewer than 5000 inhabitants), the sampling units were chosen in the following way: in the first stage, the choice regarded the communes where the interviews were to be conduced (178 different municipalities selected within the DOXA sampling points); in the second stage, in each commune an adequate number of electoral wards were extracted at random so that various type of inhabited areas were represented (e.g., central, suburban, outskirts, and isolated houses); finally, in the third stage, names and addresses of the persons to be contacted were extracted at random from the electoral lists of the wards se- lected in the second stage. Each subject sampled had a comparable alternative (i.e., for each subject there was another one randomly sampled) who would be interviewed in case the first one was not reached after several tries. The Italian version of the SF-36 was self-administered in the context of a person-to-person interview. The individuals, once sampled, were personally contacted at home by 195 trained interviewers who first introduced the aim of the survey, then delivered the questionnaire and collected some basic sociodemographic characteristics, and, finally, took it back. Interviewers asked individuals to answer questions. With the approach adopted, a national representative sample of 2031 adult Italians was obtained. Using as reference the official statistics from the 1991 Italian National Census published by the ISTAT (Istituto Nazionale di Statistica), it is possible to describe and support the representativeness of this normative sample. In addition, it is also possible to compute a weighting variable in order to reproduce a distribution that is completely consistent with that of the Italian population to which it refers. Methods of Analysis To evaluate the SF-36 in Italy, we used simple descriptive statistics, multitrait scaling techniques, and factor analysis. These are described in other articles in this issue [24,25]. We hypothesized that Italian data would replicate the findings from previous research for completeness, scaling assumptions, reliability, and construct and known groups validity [9–12,35]. This article reports on the findings within and across studies, SF-36 scales, and patient groups using only the main indicators of data quality, scaling assumptions, and internal reliability, summarized in Table 2. In addition to the analyses planned a priori, for data from the general population survey, further post hoc statistical tests and sensitivity subgroup analyses were done using methods that are described in the text and in the tables reporting the results. For example, because the amount of help required by subjects to complete the questionnaire appeared to be associated either with certain sociodemographic variables or with indicators of the scales’ reliability and validity, the whole sample was split into two mutually exclusive subgroups (subjects receiving and not receiving help) and the analyses were redone, after excluding the group who had help from interviewers, to test the stability of findings across relevant subgroups. RESULTS Pre-IQOLA and IQOLA Validation Studies Significant differences were observed across studies, not only in terms of target disease, design, and objectives, but also in terms of availability of complementary data (Table The Italian Version of the SF-36 1029 TABLE 2. List of operative definitions and indicators of performance adopted in the article Data quality Completeness Response Consistency Index (RCI) Summated rating scaling assumptions Item internal consistency Discriminant validity Internal consistency reliability Test–retest reliability Construct validity a For Expected values Conceptual definition Operative definition Count of completed items Count of computable scales Check of the logical inconsistencies in response based on a preidentified set of 15 pairs of answersa % of items answered % of cases with all scales computable % of cases with no logical inconsistency 90–95% 90–98% 90–95% Assessment of the strength of correlation of the items with their hypothetical scale Assessment of the strength of correlation of the items with the other scales Internal consistency and reliability based on the average correlation between items Stability of scores from one administration time Scale-components correlation (principal component analysis) to test the dimensionality of the scales Number of items with Pearson itemhypothesized scale correlation coefficents lower than 0.40 Number of items with higher Pearson correlation coefficents with other scales than with hypothesized scale Cronbach’s alpha coefficient 0 0 .0.70 Pearson correlation coefficients after a 3-day interval Pattern of the correlations between each scale and two rotated principal components identified in the U.S. data .0.70 See Table 8 example: a report of being able “to walk one kilometer” but not “one hundred meters” is considered an inconsistency in scoring the RCI 1). Although data on age and gender were collected in all studies, and educational status in all but one (in the migraine sample), self-reported disease status was available only from the two most recent studies in 1994 to 1995 (in the dialysis and general survey samples). The mean age ranged from 38.8 (migraine) to 70.6 years (COPD). The gender distribution also significantly differered, ranging from 0% male for breast cancer to 81% for myocardial infarction. In all seven studies included in this review, the questionnaire was self-administered. When the performance of the SF-36 was analyzed according to the kind of study, standards of data quality and of scaling assumptions were generally satisfied (Table 3), with the worst findings coming from studies that used a pre- TABLE 3. Synthesis of data quality and psychometric findings in six studiesa Pre-IQOLA Missing item, %b Scale completeness, %c Response consistency index 0, %d Item-scale r . 0.40, Number/total (percentage)e Significant success, Number/total (percentage)f Scale reliabilityg a , 0.70 a 5 0.70–0.90 a . 0.90 a The Intermediate IQOLA Final IQOLA COPD (N 5 243) Breast cancer (N 5 178) Healthy (N 5 50) Migraine (N 5 1524) Dialysis (N 5 246) Normative survey (N 5 2031) 3.3 96.4 78.8 31/35 88.6 277/280 98.9 3.9 95.3 85.0 31/35 88.6 280/280 100.0 2.7 96.0 86.0 32/35 91.4 279/280 99.6 1.1 99.3 84.8 35/35 100.0 276/280 98.6 11.9 86.6 83.3 34/35 97.1 280/280 100.0 0.1 99.8 92.6 34/35 97.1 280/280 100.0 1/8 7/8 0/8 1/8 7/8 0/8 2/8 6/8 0/8 0/8 7/8 1/8 0/8 7/8 1/8 0/8 7/8 1/8 Gissi-3 trial is not included in this table because only part of the F-36 was used in the study. missing item proportion across all items. c Average scale completeness (percent of scales with 50% or more items completed. d Proportion of questionnaires with no logical inconsistency. e Proportion of items having item-hypothesized scale correlation coefficent higher than 0.40. f Number of correlations of items with own scales significantly higher than correlations with other scales/total number of correlations. g Internal-consistency reliability (Cronbach’s alpha). Abbreviations: COPD 5 chronic obstructive pulmonary disease; IQOLA 5 International Quality of Life Assessment. b Average G. Apolone and P. Mosconi 1030 IQOLA or an earlier IQOLA developmental version of the questionnaire. Exceptions include the proportion of missing items and scale completeness (11.9% and 86.6%, respectively) in the dialysis study and the significant scaling failures in item discriminant validity assessment (4 failures in 280 tests) in the migraine study, Cronbach’s alpha coefficient was generally greater than 0.70, the minimum recommended for group level comparisons, with lower estimates in the pre-IQOLA studies. When data were analyzed to evaluate the performance of each of the eight SF-36 scales (data not fully shown here), a few problems were noted in the overall satisfactory picture, in which most of the scales showed good psychometric characteristics. These regarded the PF scale, in which 7% of the item-hypothesized scale correlation coefficients were lower than 0.40; the GH and SF scales, which had the poorest performance in all the tests; and an occasional failure in the VT scale. The relatively poor performance of the PF scale was due to the behavior of two specific items (item number 1 pertaining to vigorous activities and item number 10 pertaining to bathing and dressing) that was noted in only two samples in which these findings were expected (healthy people with a low prevalence of physical limitations and COPD patients with severe disease). The SF and VT scale problems were due to difficulties arising in early translations that were promptly identified and corrected. This finding, together with those pertaining to the GH scale, is fully commented in the Discussion section. Test–retest reliability was estimated using data from the GISSI study for five SF-36 scales included in the study questionnaire. High correlations were observed between pairs of scores at time 1 and 2 (RP 5 0.88, BP 5 0.72, VT 5 0.88, RE 5 0.74 and MH 5 0.89). Results from the IQOLA Normative Survey CHARACTERISTICS OF THE SAMPLE. Overall, our general population sample reproduces a distribution consistent with the universe to which the survey refers. Women were slightly more represented than men (50.8%), and the mean age was 47.7 years, with 18.7% of cases aged more than 65 years. Most cases were married or cohabitating (67.6%). Forty-seven percent of respondents lived in Northern Italy (47.3), 17% in the Center, and 37% in the South. PARTICIPATION. The strategy adopted to assemble this national sample was random sampling from electoral lists with the possibility of identifying in the surroundings a comparable alternative for each random subject who was not reachable by the interviewer, and the administration strategy involved self-administration in the context of person-to-person contact, so the response rate in this survey is not an appropriate indicator of participation. To give information about the acceptance and applicability of the SF-36 in Italy, Table 4 shows the time required to complete the entire questionnaire, the proportion of individuals who received help, the distribution of failures from the questionnaire response consistency index, and the proportion of cases identified by interviewers on a convenience basis rather than using the probabilistic list. These four indicators are associated with sociodemographic variables such as age, gender, and education that are wellknown predictors of participation and compliance. On average, subjects ages 65 years and older, female, or with lower levels of education required significantly more help and more time to complete the questionnaire and also had a higher level of inconsistent responses (Table 5). The TABLE 4. Details about the normative sample Time required for completion of the questionnairea Mean minutes Less than 10 minutes 11–20 minutes More than 20 minutes Level of help for completion of the questionnaire None Questions read by interviewer Questions explained by interviewer Response Consistency Index (RCI) No failures Only one failure More than one failure Kind of respondentb Random sampled Convenience sampled aSF-36 No. % — 652 1191 188 — 32.1 58.6 9.3 1.000 483 548 49.3 23.8 26.9 1881 82 68 92.6 4.0 3.4 1390 641 68.4 31.6 15.4 items and 16 additional questions pertaining to self-reported medical information. contacted when the first and second choices (random sampled) were not reached. Interviewers were then allowed to identify in the surroundings alternative respondents with matching age and gender characteristics. bCase The Italian Version of the SF-36 1031 TABLE 5. A selected list of indicators of data quality (normative sample) Age (y) Cases assisted by interviewers (%) Mean time (in mintues)required for completionc Cases requiring more than 20 minutes for completion (%) Cases not randomly sampled (%) Response Consistency Index (RCI) $1 Gender Education (y) ,45 45–65 .65 Male Female #8 .8 Total 11.9 14.5 6.2 34.1 4.1 30.4 15.6 9.8 34.4 5.3 57.6a 17.6d 15.8a 20.0b 19.5a 23.8 15.3 8.2 30.6 6.6 30.0b 16.6 10.3 32.4 8.2 40.7 16.6 13.3 27.1 10.1 8.3a 13.9d 3.7 37.6b 3.7a 26.9 15.4 9.3 31.6 7.4 aChi-square test, P , 0.001. test, P , 0.005. cSF-36 items and 16 additional questions pertaining to self-reported medical information. dAnalysis of variance (ANOVA), P , 0.005. bChi-square proportion of individuals who were identified not on a random basis varied according to age and educational level; elderly and less educated people were significantly less likely to be sampled on a convenience basis than through random sampling. PSYCHOMETRIC RESULTS. Overall, the pattern of results from the multitrait analysis confirmed the multidimensional conceptualization of the SF-36 scales (Table 6) and compared well with that of the United States. In all cases the within-scale item correlations were homogeneous and for all but one item (item GH3: I am as healthy as anybody I know) the 0.40 standard for item-internal consistency was exceeded by far. Tests of scaling success were also satisfactory, with no failures in 280 tests. Cronbach’s alpha coefficient always exceeded the recommended level of 0.70 (range from 0.77 to 0.93) [36,37], with the lowest values in the GH and SF scales. Table 7 shows internal consistency reliability coefficients according to a selected list of relevant characteristics, such as gender, age, and education. Although there is more variation across groups (range, 0.55– 0.94), in general, internal consistency reliability is high, with most of the lowest values in the GH scale, the young age groups and the more educated samples. These findings still held after exclusion of the cases who received help in completing the questionnaire (“Receiving help” Group, N 5 548) or who were sampled on a convenience basis (“Non random sample” Group, N 5 641). Principal components analysis confirmed the hypothesized physical and mental dimensions of health seen in U.S. data [16,35]. The proportion of total variance in each scale explained by the two extracted components in the Italian data was 60 to 78% across scales, indicating that the two factors explained the majority of the variance in each scale. The ordering of correlations between the eight scales and both factors was generally equivalent in the United States and Italy. The PF scale correlated highest and the MH lowest with the physical component, whereas the MH scale was associated more with the mental (3rd rank) than with the physical (6th rank) factor. After the exclusion of respondents who received help in completing the survey, results came closer to the U.S. data (Table 8). TABLE 6. Item scaling test and reliability estimates in the normative sample Range of item-scale correlations PF, Physical functioning RP, Role-physical BP, Bodily pain GH, General health VT, Vitality SF, Social functioning RE, Role-emotional MH, Mental health aNumber Item scaling tests Scale Ka Item internal consistencyb Item discriminant validityc Item internal consistency testd Item discriminant validity teste Reliabilityf 10 4 2 5 4 2 3 5 0.63–0.81 0.73–0.80 0.77 0.29–0.72 0.51–0.63 0.63 0.70–0.76 0.57–0.73 0.24–0.60 0.31–0.58 0.37–0.63 0.14–0.66 0.28–0.58 0.28–0.60 0.22–0.58 0.20–0.64 100 100 100 80 100 100 100 100 100 100 100 100 100 100 100 100 0.93 0.89 0.85 0.77 0.78 0.77 0.85 0.85 of items and number of item-internal consistency tests per scale. between items and hypothesized scale corrected for overlap. cCorrelations between items and other scales. dItem internal consistency scaling success, i.e., number of item-scale correlations greater than 00.40/total number of correlations (corrected for overlap). eItem discriminant validity scaling success, i.e., number of correlations of items with own scales significantly higher than correlations with other scales/total number of correlations. fInternal-consistency reliability (Cronbach’s alpha). bCorrelations G. Apolone and P. Mosconi 1032 TABLE 7. Reliability (Cronbach’s a) estimates for SF-36 scales in different subgroups (normative sample) Total Gender Male Female Age (y) 18–24 25–34 35–44 45–54 55–64 65–74 $75 Education (y) #5 6–8 .8 Level of help for completion of the questionnaire Nonhelp Receiving help No. Cases PF RP BP GH VT SF RE MH 2031 0.93 0.89 0.85 0.77 0.78 0.77 0.85 0.85 999 1032 0.93 0.93 0.90 0.89 0.84 0.85 0.76 0.77 0.76 0.77 0.77 0.77 0.84 0.86 0.81 0.86 193 367 373 351 367 258 122 0.74 0.89 0.83 0.86 0.93 0.92 0.93 0.84 0.83 0.87 0.85 0.89 0.91 0.93 0.66 0.80 0.78 0.83 0.84 0.90 0.90 0.55 0.62 0.68 0.65 0.77 0.78 0.82 0.71 0.63 0.71 0.75 0.81 0.83 0.82 0.75 0.65 0.73 0.76 0.81 0.85 0.73 0.80 0.75 0.85 0.84 0.88 0.85 0.90 0.75 0.77 0.84 0.82 0.88 0.88 0.90 673 499 859 0.94 0.91 0.87 0.92 0.86 0.85 0.88 0.83 0.77 0.81 0.66 0.65 0.81 0.73 0.71 0.83 0.73 0.71 0.89 0.85 0.80 0.87 0.84 0.80 1483 548 0.90 0.94 0.87 0.92 0.83 0.87 0.71 0.82 0.72 0.84 0.74 0.82 0.83 0.89 0.82 0.89 Abbreviations: PF 5 physical functioning; RP 5 role physical; BP 5 bodily pain; GH 5 general health; VT 5 vitality; SF 5 social functioning; RE 5 role-emotional; MH 5 mental health. Table 9 presents the multivariate relationship between the GH scale, which measures the overall general perception of health, and the other SF-36 scales. We conducted this analysis to see how well the SF-36 represents the universe of concepts people consider in evaluating their health, in the whole sample and after excluding the group that received assistance, in order to assess whether the conceptualization of health is invariant to the amount of help received. Both analyses gave results comparable with the U.S. reference data. Each scale was significantly related to the overall perception of health, with product-moment correlations always higher than 0.40 and more than 60% of the TABLE 8. Relationship between the 8 SF-36 scales and physical and mental components in the U.S. and Italy normative samplea U.S. data U.S. population (2474) Rotated principal componentsb PF, Physical functioning RP, Role-physical BP, Bodily pain GH, General health VT, Vitality SF, Social functioning RE, Role-emotional MH, Mental health aGeneral Italian data Whole sample (2031) Rotated principal componentsc Without help group (1483) Rotated principal componentsd Physical Mental Physical Mental Physical Mental 0.85 0.81 0.76 0.69 0.47 0.42 0.17 0.17 0.12 0.27 0.28 0.37 0.64 0.67 0.78 0.87 0.84 0.53 0.75 0.83 0.62 0.34 0.17 0.44 0.22 0.55 0.34 0.29 0.56 0.79 0.86 0.66 0.86 0.64 0.73 0.75 0.44 0.27 0.23 0.25 0.13 0.42 0.37 0.34 0.70 0.81 0.76 0.81 population data [35]. associations from U.S. empirical data. cActual correlation coefficents between each SF-36 scale and rotated orthogonal principal component in the whole Italian sample. dActual correlation coefficents between each SF-36 scale rotated orthogonal principal component in the group who did not need help to complete the questionnaire. bHypothesized The Italian Version of the SF-36 1033 TABLE 9. Relationships between responses to SF-36 scales and the general health perception (GH) scalea in the Italian normative sample and U.S. datab Italian data Whole sample (2031) U.S. data (2149) bc PF, Physical functioning RP, Role-physical BP, Bodily pain VT, Vitality SF, Social functioning RE, Role-emotional MH, Mental health Reliable variance explained, % Adjusted rb rd 0.22* 0.09* 0.20* 0.23* 0.03 20.00 0.24* 62 0.48 0.56* 0.55* 0.56* 0.58* 0.47* 0.36* 0.46* bc 0.32* 0.05* 0.15* 0.21* 0.02 20.03** 0.17* 62 0.48 Italian data without help group (1483) rd bc rd 0.68* 0.55* 0.63* 0.66* 0.52* 0.43 0.58* 0.30* 0.05* 0.17* 0.20* 0.02 20.02 0.15* 0.65* 0.50* 0.59* 0.58* 0.47* 0.39* 0.52* 52 0.52 aResults from an ordinal multivariable least squares regression model having the GH scale as dependent variable and the other seven scales as independent variables. bSee references (16, 38, and 39). cUstandardized regression coefficients. dProduct moment correlations. *P , 0.001. **P , 0.05. reliable variance explained by the model. The PF, VT, and MH scales always showed the highest correlations and SF and RE the lowest. Score distributions for all scales in the whole sample are reported in Table 10. As expected in relatively healthy samples, the full range of values was present, medians were always higher than means, with the negative skewness pattern indicating a tendency toward the positive end of the scales. The bipolar scales that measure well-being as well as limitations (VT, GH, MH), showed lower average values and, in general, wider score distributions. The proportion of the population for whom an improvement (ceiling) or decline (floor) in health could not be measured is larger with the unipolar limitation scales (PF, RP, BP, and RE). Estimates according to age and genDESCRIPTIVE STATISTICS. der groups are reported elsewhere [34] or can be obtained on request from the Authors. DISCUSSION AND FUTURE PLANS Despite demand for questionnaires for assessing population health, and the effectiveness of health technologies across countries and languages, systematic efforts to evaluate empirically the equivalence and validity of translations of original English questionnaires have been rare. The IQOLA project tested the assumption that the SF-36, a short questionnaire originally developed in the United States and probably the most widely used instrument in English-speaking countries, can be translated, validated, and normed in other languages while maintaining its excellent content and psychometric and clinical validity. Other arti- TABLE 10. Descriptive statistics and score distribution in the Italian normative sample (2031) No. items Mean Median Range Standard deviation % Floor % Ceiling CV Skewness Kurtosis PF RP BP GH VT SF RE MH 10 84.46 95 0–100 23.18 1.0 40.1 27.44 21.87 2.85 4 78.21 100 0–100 35.93 12.6 67.4 45.94 21.35 0.20 2 73.67 84 0–100 27.65 1.1 41.1 37.53 20.68 20.68 5 65.22 70 0–100 22.18 0.6 1.5 34.01 20.85 0.16 4 61.89 65 0–100 20.69 0.5 1.8 33.44 20.55 20.03 2 77.43 87.5 0–100 23.34 0.8 32.3 30.14 21.00 0.34 3 76.16 100 0–100 37.25 14.3 66.4 48.91 21.19 5 66.59 68 0–100 20.89 0.3 2.9 31.37 20.76 0.2 20.21 Abbreviations: PF 5 physical functioning; RP 5 role physical; BP 5 bodily pain; GH 5 general health; VT 5 vitality; SF 5 social functioning; RE 5 role-emotional; MH 5 mental health. 1034 cles in this issue, reporting on details about the IQOLA organization and project objectives, the methods, and the empirical findings from cross-cultural comparisons, document the methods and confirm the validity and comparability of the national versions produced. This article summarizes the work done in Italy and discusses the empirical evidence by comparing the Italian results with the standards suggested in the literature and with the results from the original evaluations. Although the overall picture is satisfactory in terms of feasibility, psychometric tests, and interpretability of the Italian SF-36 questionnaire, a few findings are worth commenting on, in order to clarify the validity and generalizability of the methods and results. First, satisfactory results were obtained in a wide array of studies, differing in terms of year of implementation, design, size, and sociodemographic characteristics. This strengthens the idea that the SF-36 is a multidimensional questionnaires applicable and valid across age, gender, and kind and severity of disease. The second point concerns the association between better psychometric indicators and the year of implementation, which suggests that the use of psychometric methods to evaluate the translations led to a better questionnaire. This was the case for the SF and VT scales. For instance, in the first version of the IQOLA translation that was applied in the GISSI-3 study, the original VT3 item Did you feel worn out was translated as Si è sentito esaurito, a question that actually suggests a psychological state of a person suffering from a “nervous breakdown” more than the perception of tiredness or vitality. This item acted as an outlier. It had a high mean compared with the other vitality items (4.75 versus 3.75, 3.83, and 3.84), its correlation with the sum of the other items hypothesized belonging to the vitality scale was the lowest among the four vitality items; and it correlated more with the MH scale than with the VT scale. Thus, based on empirical data from the multitrait analysis, problems were identified, translators and experts discussed the findings at formal meetings, and the problems were solved by changing the wording. Such findings support the validity of the methods adopted in the IQOLA project, as the translations yielded more reliable and valid scales over time, with the last versions giving the best results. Third, despite the improvement during the translation process, the GH scale, which measures how people evaluate their own overall health status, stood out for its relatively poor performance in terms of item-internal consistency, item-discriminant validity and reliability in both the preIQOLA studies and in some age and gender subgroups of the IQOLA normative sample. This relatively poor performance seems to be associated with age, with lower reliability estimates for younger individuals. Although these findings do not compromise the overall quality of this scale because it met all the recommended standards in terms of scaling assumptions (it passed 99% of the item-discriminant tests, G. Apolone and P. Mosconi and its reliability estimates were above 0.70 in five out of the six studies examined), our results indicate that not all the original SF-36 scales have the same degree of validity in Italy. The behavior of the GH scale, more evident in young and more educated people, calls for further analyses. On the other hand, analysis of the relationship between responses to the SF-36 scales and the general health perception scale showed a substantial amount of variance that is not accounted for by the other seven scales, suggesting that it is important to include the GH scale in the SF-36 to capture the impact of health constructs not measured by other scales and to provide a direct measure of each respondent’s evaluation of his or her health state. Comparable findings indicating that a substantial amount of variance remains to be explained were also seen in the U.S. databases [38] and in other European studies [16,39]. Finally, normative data, essential to enable within and between country comparisons and score interpretation, also allowed a complete assessment of the questionnaire in terms of participation, data quality, hypothesized constructs, and psychometric behavior of the scales. Against a very satisfactory overall picture, in which multitrait and factor analyses confirm that the recommended psychometric standard met by the Italian translation was comparable with the source questionnaire, one particular finding needs deeper discussion and more detailed analyses in the future. This regards the amount of help required for questionnaire completion and the sampling strategy adopted to identify individuals to be contacted. In this survey, 2031 persons were contacted and actually reached with the methods described. In most cases, individuals were identified on a random basis (68%) and needed no help for questionnaire completion (73%). Overall, 969 (47.7%) of 2031 cases were random sampled and did not receive any help to complete the questionnaire. The need for help was associated with sociodemographic indicators such as age, gender, and education. If we had used alternative approaches (selfadministration with a mail survey or phone survey using a random or convenience sample), we would probably had a higher rate of nonresponse and much more missing data at an item level, with much less information about the representativeness of the sample and the presence, direction, and amount of potential bias in the completion of the questionnaire. Although preliminary stratified and sensitivity analyses have shown that most of the results are consistent across the main strata, and reliability estimates did not change after excluding specific subgroups, because we cannot exclude that systematic distortion may have been introduced by the request for help (and the type of help given by the interviewer), in future we will check whether it is worth estimating norms for different modes of administration, as others have done in different settings [40]. In conclusion, we have provided empirical data to describe and illustrate with the results of a prospective, multi- The Italian Version of the SF-36 national project to determine the feasibility of translating, validating, and norming the SF-36 in a non-English speaking country. The Italian version of the SF-36 appears to be a valid and reliable multidimensional questionnaire, either comparing the Italian data with the original U.S. data, or when the different IQOLA translations are evaluated and compared using the whole IQOLA database. Further studies supporting the clinical validity of the SF-36 are under way in Italy, using data from multiple sources. Additional information on the application of the SF-36 in Italy is provided elsewhere [41–45], or is forthcoming (Bamfi F, Fasolo A, De Carli GF, Recchia G, Cifani S, Mosconi P, et al. Submitted for publication; Mingardi G, Apolone G, Ruggiata R, Mosconi P on behalf DIA-QOL Group. Submitted for publication; Mosconi P, Cifani S, Crispino S, Fossati R, Apolone G and the Head and Neck Cancer Italian Working Group. Submitted for publication). In addition to the sponsors of the International Quality of Life Assessment (IQOLA) project, this work was partially supported by grants from the CNR-National Research Council (ACRO grant no. 96.00763.PF39), AIRC (Associazione Italiana per la Ricerca sul Cancro), and Glaxo Wellcome Italia. We are particularly indebted to Barbara Gandek and John Ware, whose suggestions were of great value. We also thank Gianfranco Decarli and Giuseppe Recchia (Glaxo Wellcome Italia) for the constructive comments on a previous version of the manuscript. 1035 10. 11. 12. 13. 14. 15. 16. 17. 18. References 1. McDowel I, Newell C. Measuring health: A Guide to Rating Scales and Questionnaires. New York: Oxford University Press; 1987. 2. Walker S, Rosser R. Quality of Life: Key Issue in the 1990. Dordrecht: Kluwer Academic Press; 1992. 3. Anderson RT, Aaronson NK, Wilkin D. Critical review of the international assessments of health related quality of life. Qual Life Res 1993; 2: 369–395. 4. Tamburini M, Rosso S, Gamba A, Mencaglia E, De Conno F, Ventaffrida V. A therapy questionnaire for quality of life assessment in advanced cancer research. Ann Oncol 1992; 3: 565–570. 5. GIVIO Investigators. Impact of follow-up testing on survival and health-related quality of life in breast cancer patients. A multicenter randomized controlled trial. JAMA 1994; 271: 1587–1592. 6. The European Group for Quality of Life and Health Measurement. European Guide to the Nottingham Health Profile. Bucquet D, Ed. France: Escubase 8 Press; 1992. 7. Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, Duez NJ, et al. The European Organization for Research and treatment of cancer QLQ-C30: A quality of life instrument for use in international clinical trials in oncology. J Nat Cancer Inst 1993; 85: 365–376. 8. WHOQOL Group, Division of Mental Health, World Health Organization. Study protocol for the World Health Organization: organization to develop a Quality of Life assessment instrument (WHOQOL). Qual Life Res 1993; 2: 153–159. 9. Ware JE, Sherbourne CD. The MOS 36-Item Short-Form 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Health Survey (SF-36) I. Conceptual framework and item selection. Med Care 1992; 30: 473–483. McHorney CA, Ware JE, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993; 31: 247–263. McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS 36-item short-form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patients groups. Med Care 1994; 32: 40–66. Ware JJ. SF-36 Health Survey. Manual and Interpretation Guide. Boston, MA: The Health Institute, New England Medical Center; 1993. Aaronson NK, Acquadro C, Alonso J, Apolone G, Bucquet D, Bullinger M, et al. International quality of life assessment (IQOLA) project. Qual Life Res 1992; 1: 349–351. Ware JE, Gandek B, and the IQOLA Project Group. The SF-36 Health Survey: Development and use in mental health research and the IQOLA Project. Int J Ment Health 1994; 23: 49–73. Ware JE, Gandek B, Keller SD, and the IQOLA Group. Evaluating instruments used cross-nationally: Methods from the IQOLA Project. In: Spilker B, Ed. Quality of Life and Pharmacoeconomics in Clinical Trials, Second Edition. New York: Raven Press; 1995. Ware JE, Keller SD, Gandek B, Brazier JE, Sullivan M, and the IQOLA Project Group. Evaluating translations of Health Status questionnaires: Methods from the IQOLA Project. Int J Tech Assess Health Care 1995; 11: 525–551. Bosisio M, Parma E, Apolone G, Bertolini G. La valutazione dello stato di salute in pazienti ambulatoriali con bronchite cronica. Uno studio di esito nella medicina generale. Ric Practica 1996; 12: 55–62. GIVIO. Survey of treatment of primary breast cancer in Italy. Br J Cancer 1988; 57: 630–634. Mosconi P, Meyerowitz BE, Liberati MC, Liberati A, on behalf of GIVIO. Disclosure of breast cancer diagnosis. Ann Oncol 1991; 2: 273–280. Schipper H, Clinch J, McMurray A, Levitt M. Measuring the quality of life of cancer patients: the Functional Living IndexCancer. J Clin Oncol 1984; 2: 472–483. Apolone G, Mosconi P, Liberati A on behalf of GIVIO. Validation process of health-related quality of life questionnaire: the GIVIO approach. Qual Life Res 1994; 3: 69. Ware J, Gandek B. Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. J Clin Epidemiol 1998; 51(11): 903–912. Bullinger M, Alonso J, Apolone G, Leplège A, Sullivan M, Wood-Dauphinee S, et al. Special issue: Translating health status questionnaires and evaluating their quality: the IQOLA Project approach. J Clin Epidemiol 1998; 51(11): 913–923. Ware J, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project approach. J Clin Epidemiol 1998; 51(11): 945–952. Ware J, Gandek B. Methods for validating and norming translations of health status questionnaires: The IQOLA Project approach. J Clin Epidemiol 1998; 51(11): 953–959. Thurstone LL, Chave EJ. The measurement of attitude. Chicago: University of Chicago Press; 1929. GISSI-Gruppo Italiano per lo Studio della Sopravvivenza nell’Infarto Miocardico. GISSI Study protocol on the effect of lisinopril, of nitrates, and of their association in patients with acute myocardial infarction. Am J Cardiol 1992; 70: 62C–69C. GISSI-3-Gruppo Italiano per lo Studio della Sopravvivenza nell’Infarto Miocardico. GISSI-3: Effect of lisinopril and transdermal glyceryl trinitrate singly and together on 6-week 1036 29. 30. 31. 32. 33. 34. 35. 36. 37. mortality and ventricular function after acute myocardial infarction. Lancet 1994; 343: 1115–1122. GISSI-Nursing. Valutazione della percezione della qualità della salute da parte del paziente con infarto miocardico. Rapporto finale dello studio. Riv Infermiere 1995; 16–29. GISSI-NURSING: Valutazione della percezione della qualità della vita e salute da parte del paziente con infarto del miocardio. G Ital Cardiol 1997; 27: 865–876. Apolone G, Mosconi P. Validazione psicometrica e clinica del questionario per la valutazione della qualità della vita nell’emicranico. Internal Report. Milano: Istituto Mario Negri; January 1995. Mosconi P, Apolone G, Liberati A. Psychological impact of neoplastic disease: an epidemiological perspective. Validation process of a health-related quality of life questionnaire. Roma: CNR-Italian National Council of Research, Giugno; 1994. Mingardi G, on behalf of DIA-QOL Group. From the development to the clinical apllication of a questionnaire on the Quality of Life in dialysis. Nephrol Dial Transplant 1998; 13 (Suppl. 1): 70–75. Apolone G, Mosconi P, Ware J. Il questionario sullo stato di salute SF-36. Manuale d’uso e Guida all’Interpretazione dei Risultati. Milan, Italy: Guerini Ed Associati; 1997. Ware JE, Kosinski M, Keller SD. SF-36 Physical and Mental Summary Scales: A User’s Manual. Boston, MA: The Health Institute; 1994. Cronbach LJ. Coefficent alpha and the internal structure of tests. Psychometrika 1951; 16: 297–334. Nunnally JC. Psychometric Theory. New York: McGrawHill; 1987. G. Apolone and P. Mosconi 38. Davies AR, Ware JE. Measuring Health Perceptions in the Health Insurance Experiment. Santa Monica, CA: Rand Corporation, 1981. (Health Insurance Experiment Serie, RAND #R-2711-HHS). 39. Jenkinson C, Wright L, Coulter A. Criterion validity and reliability of the SF-36 in a population sample. Qual Life Res 1994; 3: 7–12. 40. McHorney CA, Kosinski M, Ware JE. Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview; Results from a national survey. Med Care 1994; 32: 551–567. 41. Crosignani PG, Vercellini P, Apolone G, De Giorgi D, Cortesi I, Mestia M, et al. Endometrial resection versus vaginal hysterectomy for menorrhagia: long term clinical and quality of life outcomes. Am J Obstet Gynecol 1997; 177: 95–101. 42. Crosignani PG, Vercellini P, Mosconi P, Oldani S, Cortesi I, De Giorgi O. A levonorgestrel-releasing intrauterine device versus hysteroscopic endometrial resection in the treatment of dysfunctional uterine bleeding. Obstet Gynec 1997; 90: 257– 263. 43. Apolone G, Cifani S, Mosconi P. Questionario sullo stato di salute SF-36. Traduzione e validazione della versione italiana: Risulttai del progetto IQOLA. Medic 1997; 2: 86–94. 44. Mosocni P, Apolone G. SF-36: La qualità della vita. Esperienze italiane. SIMG. J Ital Coll Gen Practitioner 1998; 3: 4–8. 45. Apolone G, Filiberti A, Cifani S, Ruggiata R, Mosconi P. The evaluation of the EORTC QLQ-C30 questionnaire: A comparison with SF-36 health survey in a cohort of Italian long survival cancer patients. Ann Oncol 1998; 9: 549–557.