Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hip and Knee OA - Measures - Systematic - Review

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Osteoarthritis and Cartilage 20 (2012) 1548e1562

Measurement properties of performance-based measures to assess physical


function in hip and knee osteoarthritis: a systematic review
F. Dobson y *, R.S. Hinman y, M. Hall y, C.B. Terwee z, E.M. Roos x, K.L. Bennell y
y Centre for Health, Exercise and Sports Medicine, Department of Physiotherapy, School of Health Sciences, The University of Melbourne, Australia
z VU University Medical Center, Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, The Netherlands
x Institute of Sports Science and Clinical Biomechanics, University of Southern Denmark, Denmark

a r t i c l e i n f o s u m m a r y

Article history: Objectives: To systematically review the measurement properties of performance-based measures to
Received 19 April 2012 assess physical function in people with hip and/or knee osteoarthritis (OA).
Accepted 22 August 2012 Methods: Electronic searches were performed in MEDLINE, CINAHL, Embase, and PsycINFO up to the end
of June 2012. Two reviewers independently rated measurement properties using the consensus-based
Keywords: standards for the selection of health status measurement instrument (COSMIN). “Best evidence
Performance-based measures
synthesis” was made using COSMIN outcomes and the quality of findings.
Physical function
Results: Twenty-four out of 1792 publications were eligible for inclusion. Twenty-one performance-based
Measurement properties
Clinimetrics
measures were evaluated including 15 single-activity measures and six multi-activity measures.
Systematic review Measurement properties evaluated included internal consistency (three measures), reliability (16
Osteoarthritis measures), measurement error (14 measures), validity (nine measures), responsiveness (12 measures)
and interpretability (three measures). A positive rating was given to only 16% of possible measurement
ratings. Evidence for the majority of measurement properties of tests reported in the review has yet to be
determined. On balance of the limited evidence, the 40 m self-paced test was the best rated walk test, the
30 s-chair stand test and timed up and go test were the best rated sit to stand tests, and the Stratford
battery, Physical Activity Restrictions and Functional Assessment System were the best rated multi-
activity measures.
Conclusion: Further good quality research investigating measurement properties of performance
measures, including responsiveness and interpretability in people with hip and/or knee OA, is needed.
Consensus on which combination of measures will best assess physical function in people with hip/and
or knee OA is urgently required.
Crown Copyright Ó 2012 Published by Elsevier Ltd on behalf of Osteoarthritis Research Society
International. All rights reserved.

Introduction gold standard for the assessment of physical function. Physical


function is related to “the ability to move around”2 and “the ability
Measurement of treatment outcomes and change in health to perform daily activities”3 and can be classified as Activities using
status over time is a critical component of research and clinical the World Health Organization International Classification of
practice for people with osteoarthritis (OA). The Osteoarthritis Functioning, Disability and Health (ICF) model4.
Research Society International (OARSI) and Outcome Measures in Measurement of physical function is complex as it contains
Rheumatology and Clinical Trials (OMERACT) jointly advocate the multi-dimensional constructs3,5. A range of both self-report and
use of core outcome measures for clinical trials of OA that address performance-based measures have been used to assess physical
the domains of pain and function1. Currently there is no singular function. Performance-based measures are defined as assessor-
observed measures of tasks classified as “activities” using the ICF
* Address correspondence and reprint requests to: F. Dobson, Centre for Health, model4 and are usually assessed by timing, counting or distance
Exercise and Sports Medicine, Department of Physiotherapy, School of Health methods. They are not specific to body structure, body function or
Sciences, The University of Melbourne, 200 Berkeley Street, Victoria 3010, Australia. impairments such as measures of muscle strength or range of
Tel: 61-3-8344-3642; Fax: 61-3-8344-3771.
motion. Performance-based measures assess what an individual
E-mail addresses: fdobson@unimelb.edu.au (F. Dobson), ranash@unimelb.edu.au
(R.S. Hinman), halm@unimelb.edu.au (M. Hall), cb.terwee@vumc.nl (C.B. Terwee), can do rather than what the individual perceives they can do, which
eroos@health.sdu.dk (E.M. Roos), k.bennell@unimelb.edu.au (K.L. Bennell). is determined by self-report measures3. Increasing evidence

1063-4584/$ e see front matter Crown Copyright Ó 2012 Published by Elsevier Ltd on behalf of Osteoarthritis Research Society International. All rights reserved.
http://dx.doi.org/10.1016/j.joca.2012.08.015
F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562 1549

suggests that performance-based measures capture a different were discussed and resolved with a third reviewer (CT). Studies
construct of function and are more likely to fully characterize were included if they met the following criteria:
a change in body function than self-reported measures alone6e8.
Both types of measures are now seen as complementary rather 1. Construct: The test was a measure of physical function, defined
than competing when evaluating functional outcomes in people according to the ICF model as Activities, which relate to the
with OA5,9,10. ability to move around and perform daily activities4. If the test
A previous systematic review of performance-based measures was a battery of multi-task items, then at least 80% of the items
in OA concluded that better designed studies assessing the were required to assess activities.
measurement properties of these measures in OA populations were 2. Target population: The study population comprised at least
required3. Also, only a small percentage (7%) of measurement 80% of people diagnosed with symptomatic hip or knee OA
properties were rated as ‘positive’ for the quality of the findings and using clinical or radiographic criteria. This could include all
the levels of evidence were generally unknown or very limited. This stages of disease as well as individuals who had recently
previous review evaluated studies published up until early 2004 undergone a specific intervention such as joint arthroplasty or
and since then further studies have been published. In addition, an exercise program, where measures pre-intervention were
a new quality evaluation tool, the consensus-based standards for provided.
the selection of health status measurement instruments (COS- 3. Measurement instrument: The measure under study should
MIN)11,12 and scoring system13, has been developed to standardize be a performance-based measure which is evaluated by an
the assessment of methodological quality of measurement studies. observer as the activity is being performed by the individual,
The aim of this study was to systematically review the usually by timing, counting or distance methods.
measurement properties of performance-based tests to measure 4. Setting: The measure was conducted within the clinic or field
physical function in people with hip and/or knee OA using a robust and required non-technical, readily available, inexpensive and
quality evaluation tool and scoring system (COSMIN). Such a review portable equipment.
would be a useful and timely update for researchers and clinicians 5. Measurement properties: The study aim was to evaluate one
to assist them in selecting appropriate clinical performance-based or more measurement properties (e.g., internal consistency,
measures for people with hip and knee OA. reliability, validity, responsiveness and/or interpretability).
6. Full-text studies published as original articles.
Methodology
Studies were excluded if: (1) the focus was on validating self-
Literature search reported measures of function; (2) the measure predominately
targeted the ICF level of impairment or health related quality of life;
The search strategy was developed, reviewed and refined by (3) treatment effectiveness was evaluated without a specific aim to
multiple authors, in accordance with the Preferred Reporting Items study the measurement properties of performance measures; (4)
for Systematic Reviews and Meta-Analyses (PRISMA) guidelines14. the measure required expensive sophisticated equipment such as
Electronic searches of entire databases up until June 2012 were three-dimensional gait analysis or accelerometers; (5) they were
performed using MEDLINE via PubMed, CINAHL via EBSCO, Embase published only as ‘grey literature’ such as scientific meeting
via Elsevier, and PsycINFO via CSA. Key search terms and synonyms abstracts, dissertations or unpublished literature; and (6) they were
were searched separately in four main filters which were then published in languages other than English due to limited language
combined. These filters are summarized as: translational ability.

1. Construct: physical function OR physical performance OR Methodological quality evaluation of the studies
physical activity
2. Target population: Hip OR knee OR lower-limb AND osteoar- The COSMIN tool was used to evaluate the methodological
thritis OR arthritis OR OA OR replacement OR arthroplasty quality of included studies11,17. Two raters (FD and MH) with prior
3. Measurement instrument: performance test/measure/instru- COSMIN tool experience assessed the quality of all included studies
ment/assessment/index OR objective test/measure/assess- independently using the four-point scored COSMIN checklist13. This
ment/OR observational test/measure/assessment/index OR standardized and validated tool consists of 10 sections, each
task performance and analysis assessing a different measurement property: internal consistency,
4. Measurement properties: instrument development OR reliability, measurement error, content validity, construct validity
psychometrics OR clinimetrics OR validity OR reliability OR (structural validity and hypothesis testing), cross-cultural validity,
responsiveness OR interpretability OR meaningful change. criterion validity, responsiveness and interpretability. Each section
contains between 5 and 18 items.
The search strategy was based on recommendations for per- Each item within a section is scored using a four-point scoring
forming systematic reviews of measurement properties15 and is system with defined response options representing excellent, good,
more fully described in Appendix 1. For MEDLINE (PubMed), we fair or poor quality13. An overall quality score for each measure-
adopted a measurement properties search filter shown to retrieve ment property reported in a study is defined as the lowest rating of
more than 97% of publications related to measurement proper- any item within that section, i.e., “worst score counts” method.
ties16. Targeted hand-searching of reference lists was also Depending on the number of measurement properties assessed in
performed. a study, some studies receive one quality evaluation whereas other
studies receive several.
Eligibility criteria
Evaluation of the measurement property result
Studies were screened by two independent reviewers (FD and
MH). This included independent screening of the titles and In addition to a methodological quality evaluation with COSMIN,
abstracts from all retrieved studies followed by independent full- an overall rating of the study findings for each measurement
text review of potentially eligible studies. Any disagreements property was assessed using a commonly used checklist of criteria
1550 F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562

for good measurement properties18. These criteria consist of posi- than those recommended for self-reported questionnaires. Based
tive, indeterminate and negative ratings for the study findings and on discussions with the developers of the COSMIN, it was decided
are defined in Table I. that to avoid the exclusion of many small samples (which might
otherwise be of excellent/good quality) from best evidence
Best evidence synthesis: levels of evidence synthesis, the sample size item was removed from the COSMIN
quality assessment and the “second worst score counts” method
To synthesize the results from multiple studies on the same was used. Sample size was then accounted for at the evidence
performance test, “a best evidence synthesis”15 was performed by synthesis stage. Evidence was assigned as: “strong” when the total
the first author using the criteria outlined in Appendix 2. This best sample size of eligible combined studies was 100; “moderate”
synthesis of evidence is similar to that used for synthesizing with total samples between 50 and 99; “limited” with total samples
evidence from clinical trials19. The possible levels of evidence for between 25 and 49, and “unknown” with samples less than 25.
a measurement property are “strong”, “moderate”, “limited”
“conflicting” or “unknown” (Appendix 2). Best evidence synthesis Results
was derived using the methodological quality of the studies (COS-
MIN score), the rating and consistency of the measurement prop- Description of included studies and performance-based measures
erty result (positive, indeterminate, negative e Table I), as well as
the number of related studies evaluating each measurement Selection procedures are summarized in Fig. 1. Twenty-four
property. For this review, studies could only be considered related eligible studies were identified and are described in Table II.
when the same variation of the performance-based measure was Measurement properties from 15 single-activity measures were
evaluated, that is they were comparable in regards to activity and investigated in 12 studies6,20e30 and from six multi-activity
procedure. Measurement properties from studies that were rated measures investigated in 12 studies7,8,10,31e39. Single-activity
as “poor” on the COSMIN were not eligible to contribute to best measures could be grouped into three main activity domains: (1)
evidence synthesis15. walking tests, (2) sit to stand tests, and (3) stair negotiation tests.
The COSMIN scoring system used in this review was initially There were two main types of walk tests, those over short
developed for assessing psychometric properties in self-reported distances (<100 m) and those over long distances (>100 m). There
questionnaires and defines a minimum adequate sample size as were nine different short-distance walk tests with variations in (1)
30 (fair), and adequate sample size as 100 (excellent). It was antic- set pace (self-paced, fast-paced); (2) distance walked (range 2.4e
ipated that many studies, particularly those evaluating reliability 80 m); (3) functional measure (time, speed, distance, quality
and measurement error, were likely to contain smaller sample sizes grading); and (4) incorporated turns (range 0e7). Short-distance

Table I
Quality criteria for rating the results of measurement properties

Property Rating Quality criteria


Reliability
Internal consistency þ Cronbach’s alpha(s) 0.70
? Cronbach’s alpha not determined
 Cronbach’s alpha(s) <0.70
Reliability þ ICC/weighted kappa 0.70 OR Pearson’s r  0.80
? Neither ICC/weighted kappa, nor Pearson’s r determined
 ICC/weighted kappa <0.70 OR Pearson’s r < 0.80
Measurement error þ MIC >SDC OR MIC outside the LOA
? MIC not defined
 MIC SDC OR MIC equals or inside LOA

Validity
Content validity þ The target population considers all items in the questionnaire to be relevant AND considers the questionnaire to be complete
? No target population involvement
 The target population considers items in the questionnaire to be irrelevant OR considers the questionnaire to be incomplete
Structural validity þ Factors should explain at least 50% of the variance
? Explained variance not mentioned
 Factors explain <50% of the variance
Construct validity þ Correlation with an instrument measuring the same construct 0.50 OR at least 75% of the results are in accordance with the
hypothesis testing hypotheses AND correlation with related constructs is higher than with unrelated constructs
? Solely correlations determined with unrelated constructs
 Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the
hypotheses OR correlation with related constructs is lower than with unrelated constructs
Cross-cultural validity þ Original factor structure confirmed OR no important DIF between language versions
? Confirmatory factor analysis not applied and DIF not assessed
 Original factor structure not confirmed OR important DIF found between language versions
Criterion validity þ Convincing arguments that gold standard is “gold” AND correlation with gold standard 0.70
? No convincing arguments that gold standard is “gold” OR doubtful design or method
 Correlation with gold standard <0.70, despite adequate design and method

Responsiveness
Responsiveness þ Correlation with an instrument measuring the same construct 0.50 OR at least 75% of the results are in accordance with the
hypotheses OR AUC 0.70 AND correlation with related constructs is higher than with unrelated constructs
? Solely correlations determined with unrelated constructs
 Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the
hypotheses OR AUC <0.70 OR correlation with related constructs is lower than with unrelated constructs

SDC, smallest detectable change; LoA, limits of agreement; DIF, differential item functioning; þ, positive rating; ?, indeterminate rating; , negative rating.
Adapted from Terwee et al. J Clin Epidemiol 2007;60(1):34e42.
F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562 1551

Fig. 1. Flowchart of the selection and inclusion of studies.

walk tests were included in five/six multi-activity measures7,8,10,31e Measurement properties


34,36e39
. The 6-min walk test was the only long-distance walk test
and was investigated in four studies6,22,26,28 and included in two The inter-rater agreement of the independent methodological
multi-activity measures8,10,35. quality of included studies was good [absolute agreement ¼ 90%,
There were six different sit to stand tests with variations in (1) kappa ¼ 0.85, 95% confidence interval (CI) 0.72, 0.98]. Disagree-
method of measurement (count over 30 s, time for five repetitions, ment was mainly due to reading errors and was easily resolved
total time and quality grading) and (2) height of chair (standard and using a consensus method between the two raters.
high) and (3) incorporated walking and/or turning components
(timed up and go test, which incorporates walking 3 m, turning and Internal consistency
returning to sit down and the get up and go test, which incorpo-
rates walking 20 m with no return). Sit to stand tests were included Internal consistency was only applicable to multi-activity
in three multi-activity measures7,8,10,31e34. measures and was assessed in three measures31,35,37 (Table III).
There were seven different stair negotiation tests with varia- Two studies were rated as “excellent” quality35,37. A positive
tions in (1) number of stairs (range 4e12); (2) ascend only, descend internal consistency rating (a ¼ 0.82 and 0.84) was found in both
only or both; (3) hand-rail support and (4) leading limb step studies.
pattern. Stair negotiation tests were included in five/six multi-
activity measures7,8,10,31e36. Reliability and measurement error
Three studies included participants with hip OA24,30,32, five with knee
6,20,22,26,27
OA and 16 with both hip and knee OA7,8,10,21,23,25,28,29,31,33e39. The Reliability was assessed in 16/21 of the performance measures.
majority of studies included participants in the end stage of OA or the stage Measurement error was assessed in 14/21 of the performance
of disease was not specified. measures (Table III).
Table II

1552
Characteristics of included studies

Author (Year) Mean age years  SD (range) OA site OA stage Performance Activity No. of PPMs No. of scores Equipment Measurement property
measure required assessed
Single-activity measures
French (2011)22 65.3  6.9 Knee NS TUG Stand, 3 m walk, turn, return, sit 3 3 Chair, stopwatch Responsiveness
CST Chair-rise  five reps walking space
6MWT 6 min walking
Gill (2008)23 70.3  9.8 Hip/knee ES/PA WT Walk 50-feet (15.2 m) fast-paced 2 5 20 m walkway Testeretest reliability
CST Chair-rise over 30 s Chair, stopwatch Inter-reliability
Measurement error
Mizner (2011)6 65.0  9.0 Knee ES/PA TUG Stand, 3 m walk, turn, return, sit 3 3 Chair, stopwatch Responsiveness
SCT Up and down 12 stairs Stairs, Construct validity
6MWT 6 min walking Walking space
Wright (2011)30 66.5  9.4 Hip NS TUG Stand, 3 m walk, turn, return, sit 4 4 Chair, stopwatch Interpretability
WT Walk 4  10 m self-paced 20 cm step Inter-reliability
CST Chair-rise over 30 s 10 m walkway Measurement error
Hoeksma (2003)24 72.0  6.0 Hip Early-late WT Walk 80 m fast-paced 15 m walkway Responsiveness
K&L 0-IV Stopwatch

F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562


Borjesson (2007)20 63.0  5.0 Knee ES/PA WT Walk 5 m slow-paced 3 3 <10 m walkway Responsiveness
Walk 5 m medium-paced Stopwatch
Walk 5 m fast-paced
Kennedy (2005)28 63.7  10.7 Hip/knee ES/PA WT Walk 2  20 m fast-paced 4 4 Chair, stopwatch Testeretest reliability
SCT Up and down nine stairs >20 m walkway Measurement error
TUG Stand, 3 m walk, turn, return, sit Nine-step stairs Responsiveness
6MWT 6 min walking Walking space
Parent (2002)26 68.6  8.7 Knee ES/PA 6MWT 6 min walking 1 1 Walking space Responsiveness
Stopwatch
21
Davey (2003) 69.5  7.2 Hip/knee NS WT Walk eight feet self-paced 2 2 <5 m walkway Testeretest reliability
SCT Up and down four stairs Four-step stairs Measurement error
Piva (2004)27 62.0  9.0 Knee Mid-late GUG Stand, walk 20 m, no return 1 1 Chair with arms Intra-/inter-reliability
K&L > 2 20 m walkway Measurement error
15.2 mark Construct validity
Stopwatch
Marks (1994a)25 65.9  8.3 Knee NS WT Walk 13 m self-paced 1 1 13 m walkway Testeretest reliability
Stopwatch Measurement error
29
Marks (1994b) 59.2  11.1 Knee NS WT Walk 13 m self-paced 1 1 13 m walkway Testeretest reliability
Stopwatch Measurement error
Responsiveness

Multi-activity measures
Oberg (1994)33 69.0  9.0 Hip/knee Early-Mid FAS Rise from half stand max no. 7 1 Adj height chair Inter-reliability
Sit to stand lowest height Adj height step Structural validity
Step (max height) Stopwatch
Stand one leg 65 m walkway
Stair climbing (NS) Stairs
Gait speed over 65 m
Walking aid
Oberg (1997)34 68.9  9.7 Hip/knee Early-Mid FAS Rise from half stand max no. 7 1 Adj height chair Criterion validity
Sit to stand lowest height Adj height step
Step (max height) Stopwatch
Stand one leg 65 m walkway
Stair climbing (NS) Stairs
Gait speed over 65 m
Walking aid
Nilsdotter (2001)32 72.6 (52e86) Hip ES/PA FAS Rise from half stand max no. 7 1 Adj height chair Responsiveness
K&L > 2 Sit to stand lowest height Adj height step
Step (max height) Stopwatch
Stand one leg 65 m walkway
Stair climbing (NS) Stairs
Gait speed over 65 m
Walking aid
McCarthy (2004)36 64.7  9.8 Knee NS ALF 8 m walk test 3 1 10 m space Testeretest reliability
Seven step SCT up and down Seven-step stair Measurement error
Sit transfer test Chair (no arms) Construct validity
Stopwatch Responsiveness
Rejeski (1995)35 68.8  5.6 Knee NS PAR 6MWT 4 1 Walking space Internal consistency
Five or nine-step SCT up and down Five or nine-step stair Testeretest reliability
Lift þ carry timed Movable shelves Convergent validity
In/out car timed 2.2 kg weight Concurrent validity
Mock up car
Lin (2001)31 69.4  5.9 Hip/knee NS Lin Battery Eight feet walk test 4 1 3 m space Testeretest reliability
Four-step SCT ascend Four-step stair Measurement error
Four-step SCT descend Chair Floor/ceiling
CST x5 Stopwatch Internal consistency
Construct validity
Steultjens (1999)37 68.0  8.9 Hip/knee NS Steultjens Walk 1 min self-paced 4 1 8 m space Internal consistency
Sitting down timed Chair Construct validity
Lying down timed Bench

F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562


Bend þ lift timed 2 kg weight
Stopwatch video
Trained observer
Steultjens (2000)38 68.0  8.9 Hip/knee NS Steultjens Walk 1 min self-paced 4 1 8 m space Construct validity
Sitting down timed Chair
Lying down timed Bench
Bend þ lift timed 2 kg weight
Stopwatch video
Trained observer
Steultjens (2001)39 67.9  8.7 Hip/knee NS Steultjens Walk 1 min self-paced 4 1 8 m space Responsiveness
Sitting down timed Chair
Lying down timed Bench
Bend þ lift timed 2 kg weight
Stopwatch video
Trained observer
Stratford (2006a)8 65 (58e72) Hip/knee ES/PA WT Walk 2  20m fast-paced 4 1 >20 m space Construct validity
(1e3 QR) TUG Stand, 3 m walk, turn, return, sit Chair
SCT Up and down nine stairs Nine-step stair
6MWT 6 min walking Walkway
Stratford (2006b)10 65.0 (55e77) Hip/knee ES/PA WT Walk 2  20 m fast-paced 4 1 >20 m space Construct validity
TUG Stand, 3 m walk, turn, return, sit Chair
SCT Up and down nine stairs Nine-step stairs
6MWT 6 min walking Stopwatch
Stratford (2009)7 61.7  10.7 Hip/knee K&L > 2 WT Walk 2  20 m fast-paced 3 1 >20 m space, Construct validity
ES/PA SCT Up and down nine stairs Nine-step stair
TUG Stand, 3 m walk, turn, return, sit Chair
Stopwatch

6MWT, 6-min walk test; CST, chair stand test; ES/PA, end stage/post arthroplasty, FAS, functional assessment system; GUG, get up & go test; K&L, Kellgren and Lawrence classification; SCT, stair-climb test; TUG, timed up & go
test; WT, walk test.

1553
1554
Table III
Measurement properties of performance-based measures (reliability and measurement error)

Performance-based Internal consistency Reliability Measurement error


measure
Result Study n COSMIN score Result Design Time interval Study n COSMIN score Result Study n COSMIN score
Walk tests
50ft fast-paced23 N/A ICC1,1 0.91e0.97 (0.86e0.98) Intra-rater Intra-session 35e47 Fair SEM 1.32 s 81 Fair
ICC1,1 0.94e0.97 (0.90, 0.98) Inter-rater Intra-session 28e31 Fair* MDC90 3.08 s
40 m self-paced30 N/A ICC2,1 0.95 (0.90, 0.98) Inter-rater <1 week 29 Good* SEM 1.0 m/s 29 Good*
80 m fast-paced24 N/A e e
40 m fast-paced28 N/A ICC2,1 0.91 (0.81, 0.97) Testeretest Mean 25.4 weeks 21 Fair* SEM 1.73 s 17 Fair*
(CI 1.39, 2.29)MDC90 4.04 s
8 ft self-paced21 N/A Pearson r 0.92 Testeretest <1 week 21 Fair* SEM 0.12 s 21 Fair*

F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562


13 m self-paced25,29 N/A ICC1,1 0.83 Testeretest 6 weeks 10 Good* SEM 1.5 s 10 Poor
5 m multi-paced20 N/A e e
6MWT22 N/A e e
6MWT28 N/A ICC2,1 0.94 (0.88, 0.98) Testeretest Mean 25.4 weeks 21 Fair* SEM: 26.29 m 17 Fair*
(CI 21.14, 34.77)
6MWT6 N/A e e
6MWT26 N/A e e

CST
x5 chair stand22 N/A e e
30 s-chair stand23 N/A ICC1,1 0.97e0.98 (0.94, 0.99) Intra-rater Intra-session 37e47 Fair SEM 0.7 stands 40 Fair
ICC1,1 0.93e0.98 (0.87, 0.99) Inter-rater Intra-session 28e42 Fair* MDC90 1.64 stands
30 s-chair stand30
N/A ICC2,1 0.81 (0.63, 0.91) Inter-rater <1 week 29 Good* SEM 1.27 stands 29 Good*
TUG22 N/A e e
TUG6 N/A e e
TUG30 N/A ICC2,1 0.87 (0.74, 0.94) Inter-rater <1 week 29 Good* SEM 0.84 s 29 Good*
TUG28 N/A ICC2,1 0.75 (0.51, 0.89) Testeretest Mean 25.4 weeks 21 Fair* SEM 1.07 s (0.86, 1.41) 17 Fair*
GUG27 N/A ICC 0.95 (0.72e0.98) Intra-rater 2 min 25 Poor SEM 0.55 s, MDC 1.5 s 25 Poor
ICC 0.98 (0.94e0.99) Inter-rater 2 min 25 Good* SEM 0.42 s, MDC 1.2 s 25 Good*

SCTs
12-stair up/down6 N/A e e
Nine-stair up/down28 N/A ICC2,1 0.90 (0.79, 0.96) Testeretest Mean 25.4 weeks 21 Fair* SEM 2.35 s (1.89, 3.10) 17 Fair*
Four-stair up/down21 N/A Pearson r 0.92 Testeretest <1 week 21 Fair* SEM 0.23 s

Multi-activity tests
Lin battery31 a ¼ 0.84 106 Poor ICC 0.94e0.96 (0.75e0.99) Testeretest N/S 10 Fair* SEM 0.10e1.44 s 10 Good*
PAR35 a ¼ 0.82 203 Excellent r ¼ 0.88e0.93 (range of all tests) Testeretest 2 weeks 25 Fair* e
r ¼ 0.72e0.86 (range of all tests) Testeretest 3 months 148 Fair*
ALF36 e ICC 0.99 (0.98e0.99) total ALF Testeretest 1 week 15 Good* SEM 0.86 s 15 Good*
Steultjens battery37e39 a ¼ 0.84 198 Excellent e e
Stratford battery7,8,10 N/A e e
FAS33 e G ¼ 0.99e1.0 (range of all tests) Inter-tester ? 42 Fair e

N/A, not applicable for single-activity tests or multi-activity tests using reflective models; FAS, functional assessment system; G, GoodmaneKruskal gamma; MDC, minimal detectable change.
* Denotes a change of COSMIN score after to removal of sample size item from the rating.
F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562 1555

Single-activity measures one “fair” study7,8,10. The FAS demonstrated positive structural val-
For walking tests, a positive rating [i.e., intraclass correlation idity in one “fair” quality study33 and positive criterion validity with
coefficient (ICC) > 0.70] for intra-rater reliability [ICC 0.91e0.97 (CI: good sensitivity (0.70e0.89) and specificity (0.57e1.0)34.
0.86e0.98)] and inter-rater reliability [ICC 0.94e0.97 (CI: 0.90,
0.98)] was reported for the 50ft (15.2 m)-walk test in one “fair” Responsiveness
quality study of hip and knee OA23. A positive rating for inter-rater
reliability [ICC 0.95 (CI: 0.90, 0.98)] was also reported for the 40 m- Single-activity measures
walk test in one “good” quality study of hip OA30. For sit to stand Responsiveness was reported in 12/15 single-activity measures
tests, a positive rating for inter-tester reliability [ICC 0.87 (CI: 0.74, (Table IV). Responsiveness of walking tests was reported in four
0.94)] was reported for the timed up and go test in one “good” “fair” quality studies following either physiotherapy/exercise24,30
study of hip OA30. The 30 s-chair stand test was also found to have or joint arthroplasty20,28. A positive rating [i.e., area under the
a positive rating for intra-tester [ICC 0.97e0.98 (CI: 0.94, 0.99)] and curve (AUC) > 0.70] was reported for the 40 m-walk test
inter-tester [ICC 0.93e0.98 (CI: 0.87, 0.99)] reliability in a “fair” (AUC ¼ 0.89)30 and the 80 m-walk test (AUC ¼ 0.71)24. Respon-
study of hip and knee OA23 and inter-tester [ICC 0.81 (CI: 0.63, siveness of other walk tests was reported using standard response
0.91)] reliability in a “good” study of hip OA30. Evidence for stair means (SRM) or effect sizes (ES) (see Table IV) and results were
negotiation tests and other single-activity measures was limited by therefore indeterminate. Responsiveness of sit to stand tests was
small total sample sizes or inappropriate time intervals between reported in three “fair” quality studies following either physio-
repeat testing. therapy30 or joint arthroplasty6,28. A positive rating was reported
The standard error of measurement (SEM), along with minimum for the 30 s-chair stand test (AUC ¼ 0.73) and a negative rating
important change (MIC) was reported in only three of the 12 single- (AUC < 0.70) was reported for the timed up and go test
activity measures (40 m-walk test, timed and 30 s-chair stand (AUC ¼ 0.69) following physiotherapy/exercise30. Responsiveness
test)30. Measurement error and MIC was defined in one “good” of other sit to stand tests following joint arthroplasty6,28 and all
quality study for the 40 m-walk test (SEM 1.0 m/s; MIC 2.0 m/s), stair negotiation tests6,28 was reported using ES and/or SRM and
timed up and go test (SEM 0.84 s; MIC 0.8e1.4) and the 30 s-chair therefore results were indeterminate.
stand test (SEM 1.27 stands; MIC 2.0e2.6 stands)30. As MIC was not
calculated for the remaining single-activities, quality ratings were Multi-activity measures
indeterminate for these measures. Responsiveness was reported in three/six multi-activity
measures following either exercise36,39 or hip arthroplasty32. One
Multi-activity measures study was “good” quality39 and the others were “fair”32,36. A
Reliability of multi-activity measures was reported in three “fair” negative rating of responsiveness of the Steultjens battery39 was
quality studies31,33,35 and one “good” quality study36. A positive found as <75% of the results were in accordance with the hypoth-
rating for testerest reliability was reported for the Physical Activity eses. Other batteries provided SRM and results were indeterminate.
Restrictions (PAR) (ICC 0.72e0.86)35. A positive rating for inter-
tester rating (GoodmaneKruskal Gamma 0.99e1.0) was found for Interpretability
the Functional Assessment System (FAS)33. Evidence of reliability for
other test batteries was limited due to inadequate total sample size. Evidence of interpretability was reported in one “good” quality
Measurement error was reported in two test batteries31,36 study that evaluated three single-activity measures30. Major clini-
however as MIC has not been calculated for either battery, quality cally important improvement (MCII) of the 40 m self-paced walk
ratings were indeterminate. test (0.2e0.3 m/s), 30 s-chair stand test (2.0e2.6 stands) and the
timed up and go test (0.8e1.4 s), were reported30.
Validity studies
Best evidence synthesis: levels of evidence
Validity was assessed in 9/21 (43%) of performance tests
(Table IV). A summary of best evidence synthesis for each of the 21
performance tests is provided in Table V. This synthesis was derived
Single-activity measures from information found in Tables III and IV including (1) the
Construct validity was investigated for three single-activity methodological quality (COSMIN), (2) the findings (result), and (3)
performance measures6,27. In one “good” quality study, a positive the sample size. Given the large variety of performance-based
rating of construct validity was found for the timed up and go test measures, results were rarely combined. The exceptions were for
and the 12-step stair-climb test as more than 75% of the results the Steultjens battery and the Stratford battery. A positive rating
were in accordance with the hypotheses6. In another “good” quality (limited, moderate or strong evidence) was given to only 25/153
study a negative rating of construct validity was found for the get (16%) of all possible ratings.
up and go test as less than 75% of the results were in accordance
with the hypotheses27. Discussion

Multi-activity measures In this systematic review we identified 24 eligible studies that


Validity was investigated in all six multi-activity batteries and reported the measurement properties of 21 different performance-
four were rated as “good” quality for construct validity7,8,10,35,37,38 based measures of physical function in individuals with hip and/or
and one was rated as “fair” quality for criterion and structural val- knee OA. The majority of studies were rated as “fair” quality using
idity34. The PAR35 demonstrated mostly positive convergent validity the modified COSMIN tool. Evidence for most measurement prop-
with treadmill time, VO2 peak and strength and divergent validity erties is yet to be determined either because there was no infor-
with self-reported dysfunction as predicted. The Steultjens battery38 mation available, information was indeterminate or because
demonstrated a negative convergent validity with self-reported evidence was only available from poor quality studies. Studies were
mobility and joint range of motion. The Stratford battery demon- mostly rated as poor quality due to unclear hypotheses and/or non-
strated positive construct validity in two “good” quality studies and optimal analyses. Although none of the measures included in the
Table IV

1556
Measurement properties of performance-based measures (validity, responsiveness and interpretability)

Performance-based Validity (hypothesis testing) Responsiveness Interpretability


measure
Design Result Study n COSMIN Treatment Result COSMIN Result COSMIN
score score score
Walk tests e
50ft fast-paced23 e e
40 m self-paced30 e PT x9 sessions AUC 0.89 (0.76, 1.00) Fair MCII 0.2e0.3 m/s Good
80 m fast-paced24 e PT x9 sessions AUC 0.71 (0.58, 0.83) Fair
GRI 0.45
40 m fast-paced28 e Hip/knee arthroplasty SRM 0.89 (1.42, 0.68) Fair
pre-first post; SRM 0.79
(0.66, 1.45) first-second post
8ft self-paced21 e e
13 m self-paced (29) e Quads exercise (6 weeks) r ¼ 0.9 with quads strength Poor
5 m multi-paced20 e Knee arthroplasty ES/SRM/RE at slow speed: Fair
0.58/0.71/1.62
6MWT22 e PT mean 5.8 sessions ES/ES med/SRM 0.39/0.43/0.54 Poor

F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562


6MWT28 e Hip/knee arthroplasty SRM pre-post1: 1.74 (1.60, 1.97) Fair
SRM post1-post2: 1.90 (1.46, 2.39)
6MWT6 e
6MWT26 e Knee arthroplasty  PT SRM/ES: pre-2 mth post 0.63/0.41 Fair
2e4 mth post 1.51/0.82 pre-4 mth
post 0.58/0.35

CST
x5 chair stand22 e PT mean 5.8 sessions ES/Es med/SRM Poor
0.36, 0.33, 0.39
30 s-chair stand23 e e
30 s-chair stand30 e PT x9 sessions AUC 0.73 (0.55, 0.91) Fair MCII 2.0e2.6 Good
stands
TUG22 e PT mean 5.8 sessions ES/ES med/SRM Poor
0.33/0.17/0.35
6
TUG Construct Low correlations with PROs as 100 Good Knee arthroplasty ES pre-1 mth/pre-12 mth Fair
predicted; r ¼ 0.40 to 0.48 with /1-12 mth: 0.43, 0.79, 1.17
quads strength as predicted
TUG30 e PT x9 sessions AUC 0.69 (0.48, 0.90) Fair MCII 0.8e1.4 s Good
TUG28 e Hip/knee arthroplasty SRM pre-post1: 1.08 Fair
(1.38, 0.92)
SRM post1epost2:
1.04 (0.84, 1.61)
GUG27 Construct Sig diff b/w patients and controls P < 0.001 50 Fair e
Divergent
Convergent r ¼ 0.39; 0.44; 0.34 with WOMAC/ 105 Good e
SF-36 PF/ADLS correlation with related
constructs higher than unrelated <75% of
results in accordance with hypothesis

SCTs
12-stair up/down6 Construct Poor correlation with PROs as 100 Good Knee arthroplasty ES pre-1 mth/pre-12 mth Fair
predicted; r ¼ 0.36 to 0.46 with /1-12 mth:0.71, 0.84, 1.26
quads strength as predicted
Nine-stair up/down28 e Hip/knee arthroplasty SRM pre-post1: Fair
1.74 (2.13, 1.45)
SRM post1epost2:
1.98 (1.68, 2.42)
Four-stair up/down21 e e
Multi-activity tests
Lin battery31 Construct r ¼ 0.48e0.54 with WOMAC-PF 106 Poor e
PAR35 Construct 0.30e0.60 Treadmill time, VO2 peak 104e437 Good e
Convergent quads strength
Divergent 0.03e0.93 self-reported dysfunction 104e437
ALF36 Construct r ¼ 0.59/0.53 with WOMAC/SF-36PF 214 Poor Exercise program SRM 0.49 at 12 months f/u
Steultjens battery37e39 Construct r ¼ 0.29e0.55 with self-rated mobility 198 Fair Exercise program No differential responsiveness
of observed vs self-report
r ¼ 0.25e0.35 with ROM 198 Good Different factor structure
than expected
Stratford battery7,8,10 Construct SPWT, TUG, 6MWT best combination 177 Fair e
to evaluate
Pain and performance
Construct Change in pain rather than performance 85 Good e
(time/distance) is principal determinant
of change in self-reported function
Construct ANOVA P < 0.001: 73 Good e
PB was more sensitive to change than
SR measures

F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562


FAS32e34 Structural PCA-5 factors loading with physical 105 Fair Hip arthroplasty SRM of mean score ¼ 0.4
disability primarily 1 factor explaining at 3 months post-op
51e82% of variance SRM of mean score ¼ 0.7
at 6 months post-op
Construct PPMs were better able to discriminate
btw healthy and OA and btw hip and
knee OA P < 0.001 delta 0.67e0.93
Criterion Sensitivity 0.70e0.89 Controls 42 Fair
Specificity 0.57e1.0 Hip OA 302
(SPWT and SCT had best sensitivity Knee OA 258
and specificity)

ADLS, activities of daily living; ANOVA, analysis of variance; ES, effect size index; ES med, effect size median; FAS, functional assessment system; GRI, Gyatts responsiveness index; PCA, principal component analysis; PB,
performance battery; PPM, physical performance measure; PRO, patient-reported outcome; PT, physiotherapy; ROM, range of movement; SF-36 PF, short-form health survey physical function; SPWT, self-paced walk test;
WOMAC, Western Ontario and McMaster Universities Arthritis Index.

1557
1558 F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562

Table V
Levels of evidence of performance-based measures

Performance-based measure Internal consistency Reliability Measurement error Validity Responsiveness Interpretability

Intra Inter Retest


Single-activity measures
Walk tests
50ft fast-paced23 N/A þ(HK) þ(HK) 0 ? 0 0 0
40 m self-paced30 N/A 0 þ(H) 0 þ(H) 0 þ(H)* þþ(H)
80 m fast-paced24 N/A 0 0 0 0 0 þ(H)* 0
13 m self-paced25,29 N/A 0 0 ? ? 0 0 0
8ft self-paced21 N/A 0 0 ? ? 0 0 0
40 m fast-paced28 N/A 0 0 ? ? 0 ? 0
5 m-slow/medium/fast20 N/A 0 0 0 0 0 ? 0
6-min6,22,26,28 N/A 0 0 ? ? 0 ? 0

Sit to stand tests


30 s-chair stand23,30 N/A þ(HK) þ(HK) 0 þ(H) 0 þ(H)* þþ(H)
X5 chair stand22 N/A 0 0 ? ? 0 ? 0
Timed up and go6,22,30 N/A 0 þ(H) ? þ(H) þþ(K) (H)* þþ(H)
Get up and go27 N/A ? 0 ? ? (K) 0 0

Stair negotiation tests


12-stair up and down6 N/A 0 0 0 0 þþ(K) ? 0
Nine-stair up and down28 N/A 0 0 ? ? 0 ? 0
Four-stair up and down21 N/A 0 0 ? ? 0 0 0

Multi-activity measures
Lin31 ? 0 0 ? ? ? 0 0
PAR35 þþþ(K) 0 0 þ(K) 0 þþ(K) 0 0
ALF36 0 0 0 ? ? ? ? 0
Steultjens37e39 þþþ(HK) 0 0 0 0 (HK) (HK) 0
Stratford7,8,10 0 0 0 0 0 þþþ(HK) 0 0
FAS32e34 0 0 þ(HK) 0 0 þ(HK)y ? 0
þ(HK)z

þþþ or  strong evidence, þþ or  moderate evidence, þ or  limited evidence,  conflicting evidence, ? unknown, 0 no information [þ ¼ positive,  negative rating
(results)], (H) ¼ hip, (K) ¼ Knee, (HK) ¼ Hip and Knee.
* Physiotherapy/exercise.
y
Structural validity.
z
Criterion validity.

review reported evidence for all measurement properties, positive be determined. Based on current levels of evidence, the get up and
evidence for a selected few measures was established across go test27 is not recommended for use in people with either hip or
multiple measurement properties. This provides useful information knee OA.
for clinicians and researchers about which performance-based
measures are currently the most suitable for assessing people Stair negotiation tests
with hip and/or knee OA.
Similar to a previous review3, the current review identified Evidence for most variations of stair tests has yet to be deter-
a variety of performance-based measures that represented several mined. Only evidence of construct validity was reported for the 12-
different activity domains. For example, in this review, 10 different step stair test for knee OA6. Given the current limited evidence of
variations of the walking test were identified. As such, we found it stair negotiation tests, recommendations about which tests might
useful to group the measures under three main activity themes: (1) be more useful cannot be made.
walking tests; (2) sit to stand tests; and (3) stair negotiation tests.
An additional group, multi-activity measures, contains different Multi-activity measures
variations and combinations of the three activity domains as well as
some additional domains such as getting in/out of a car35 and lift Multi-activity measures with the best measurement evidence
and carrying tasks35,37e39. were the PAR35, the Stratford battery7,8,10 and the FAS32e34. In
addition, the PAR provided a good justification for the choice of
Walking tests included activities which consisted of a walking test (6-min walk
test), a stair negotiation test (five or nine-stair ascent/decent), a lift
Walking tests with the best measurement evidence included the and carry test and a car test. Based on current levels of evidence, the
40 m self-paced walk test for hip OA30 and the 50ft (15.2 m) fast- Steultjens battery is not recommended for hip and knee OA38,39.
paced walk test for hip/knee OA23. Evidence for other walk tests Evidence for the aggregated locomotor function (ALF) and Lin test is
such as the 6-min walk test has yet to be determined in people with yet to be determined.
hip and/or knee OA. A number of factors influenced the evidence found in the
review. The COSMIN quality scoring system developed for self-
Sit to stand tests reported questionnaires was modified to enable smaller studies
that were otherwise of acceptable quality, to be included in best
Sit to stand tests with the best measurement evidence included evidence synthesis. This change influenced the findings of the
the 30 s-chair stand test and the timed up and go test for hip/knee majority of the reliability studies. Without this change, there would
OA6,23,30. Evidence for the five-repetition chair stand test has yet to have been no evidence for reliability for any of the measures
F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562 1559

included in the review. Best evidence synthesis was mostly ob- including obtaining of funding, analysis and interpretation of the
tained from a single study as the majority of results could not be data, critical revision of the article for important intellectual
combined because of the large variations in the testing procedures. content and final approval of the article. CBT contributed to the
Further, for most multi-activity tests included in this review, there conception and design, analysis and interpretation of the data,
was no information about the measurement model (reflective or critical revision of the article for important intellectual content and
formative) in the development of the tests, nor in the validation final approval of the article. First and last authors take responsi-
studies. Therefore it is difficult to tell how important internal bility for the integrity of the work as a whole, from inception to
consistency is for these tests. For some of the included tests, that finished article.
were based on a formative model, where the activities define the
construct (causal indicators) internal consistency may not be Role of the funding source
relevant15. This project was partly funded by the OARSI, NHMRC Program
There were some limitations to this review. Publication bias Grant #631717 and the Arthritis Australia and States & Territory
from unpublished studies may threaten the internal validity as Affiliates Grant and forms part of an OARSI initiative to develop
unpublished studies are more likely to report negative or unfav- a recommended set of physical performance measures for hip and
ourable results. The decision to exclude measures that used knee OA. Kim Bennell is partly funded by an Australian Research
sophisticated equipment or measured constructs other than those Council Future Fellowship. The study sponsor did not play any role
defined as ‘Activities’ according to the ICF4 (i.e., balance measures) in the study design, collection, analysis or interpretation of data;
meant that evidence for these types of measures was not included nor in the writing of the manuscript or decision to submit the
in the review. In addition, further evidence may have been found manuscript for publication.
from some potentially good studies that fell short of the 80% OA
sample criteria40e46. We found considerable variations in the Conflict of interest
performance-based measures which meant most evidence from There are no other financial interests that any of the authors may
multiple studies of a measure could not be combined. Stronger have, which could create a potential conflict of interest or the
evidence may have been found if a larger number of more similar appearance of a conflict of interest with regard to the work.
studies were available.
This review highlights a number of areas worthy of future Appendix 1. Search strategy
research. More studies of the responsiveness and clinically MIC of
performance-based measures for people with hip and knee OA are Filter 1: Construct terms
required. Although there is growing evidence for some of the
performance measures included in this review, no test has been (“physical function*”[tw] OR “motor activity”[MH] OR “physical
evaluated with respect to all measurement properties. On balance activity”[tw] OR “physical activities”[tw] OR “physical perform-
of the evidence, the 40 m self-paced test30 was the best rated walk ance*”[tw] OR “functional activity”[tw] OR “functional activi-
test, the 30 s-chair stand test30 and timed up and go test30 were the ties”[tw] OR “functional performance*”[tw] OR “activity
best rated sit to stand tests, and the PAR35, Stratford battery7,8,10, limitation*”[tw] OR “functional limitation*”[tw] OR disability[Title/
and FAS32e34 were the best rated multi-activity measures. Addi- Abstract] OR disabilities[Title/Abstract] OR “Activities of daily
tionally, before strong recommendations can be made, consensus is living”[MH]).
still required on which variation of an activity theme is best and
what combination of tests would best assess physical function in Filter 2: Target population
people with hip and/or knee OA. Extensive variation in types of
outcomes measures has been found across trials5,47, making (“osteoarthritis”[MH]) OR osteoarthritis[Title/Abstract] OR
comparisons across studies and synthesis of results difficult9. We “arthritis”[MH]) OR arthritis[Title/Abstract]) OR (replacement
agree with recommendations that future work should be directed [Title/Abstract] OR arthroplasty[Title/Abstract]) AND (hip[Title/
at whether consensus can be achieved towards a standardised set Abstract] OR knee[Title/Abstract] OR “lower limb”[Title/Abstract]).
of performance-based outcome measures3,5,9.
Filter 3: Instrument terms
Conclusion
(“physical performance measure*”[tw] OR “performance
This systematic review highlighted current gaps in our knowl- test*”[tw] OR “performance-based test”[tw] OR “performance-
edge of evidence about the measurement properties of based tests”[tw] OR “performance based test*”[tw] OR “perfor-
performance-based measures of physical function in people with mance measure*”[tw] OR “performance-based measure”[tw] OR
hip and/or knee OA. Further good quality research investigating the “performance-based measures”[tw] OR “performance instru-
measurement properties, and in particular the responsiveness and ment*”[Title/Abstract] OR “performance-based instrument”[Title/
interpretability of performance-based measures, in people with hip Abstract] OR “performance-based instruments”[Title/Abstract] OR
and/or knee OA is needed. Consensus on which combination of “performance-based method”[Title/Abstract] OR “performance-
measures will best assess physical function in hip/and or knee OA is based methods”[Title/Abstract] OR “performance based meth-
urgently required. od*”[Title/Abstract] OR “performance index”[Title/Abstract] OR
“performance indices”[Title/Abstract] OR “performance-based
Author contributions index”[Title/Abstract] OR “performance-based indices”[Title/
Abstract] OR “performance-based assessment”[Title/Abstract] OR
FD contributed to the conception and design of the study “performance-based assessments”[Title/Abstract] OR “objective
including obtaining of funding, collection and assembly of data, test*”[Title/Abstract] OR “objective instrument*”[Title/Abstract] OR
analysis and interpretation of data, writing of the manuscript and “objective method*”[Title/Abstract] OR “objective measure*”[Title/
final approval of the article. MH contributed to collection and Abstract] OR “objective evaluation*”[Title/Abstract] OR “objective
assembly of data, drafting and final approval of the article. RSH, KLB function*”[Title/Abstract] OR “objective disability”[Title/Abstract]
and EMR contributed to conception and design of the study OR “objective assessment*”[Title/Abstract] OR “observational
1560 F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562

test*”[Title/Abstract] OR “observational-based test”[Title/Abstract] Rasch[tiab] OR “Differential item functioning”[tiab] OR DIF[tiab] OR


OR “observational-based tests”[Title/Abstract] OR “observational “computer adaptive testing”[tiab] OR “item bank”[tiab] OR “cross-
testing”[Title/Abstract] OR “observational instrument*”[Title/ cultural equivalence”[tiab]).
Abstract] OR “observational-based instrument”[Title/Abstract] OR
“observational-based instruments”[Title/Abstract] OR “observa- Filter 5: Exclusion filter
tional method*”[Title/Abstract] OR “observational-based meth-
od”[Title/Abstract] OR “observational-based methods”[Title/ (“addresses”[PT] OR “biography”[PT] OR “case reports”[PT] OR
Abstract] OR “observational measure*”[Title/Abstract] OR “obser- “comment”[PT] OR “directory”[PT] OR “editorial”[PT] OR “fes-
vational-based measure”[Title/Abstract] OR “observational-based tschrift”[PT] OR “interview”[PT] OR “lectures”[PT] OR ”legal
measures”[Title/Abstract] OR “observational index”[Title/Abstract] cases”[PT] OR “legislation”[PT] OR “letter”[PT] OR “news”[PT] OR
OR “observational indices”[Title/Abstract] OR “observation-based “newspaper article”[PT] OR “patient education handout”[PT] OR
index”[Title/Abstract] OR “observation-based indices”[Title/ “popular works”[PT] OR “congresses”[PT] OR “consensus develop-
Abstract] OR “observed disability”[Title/Abstract] OR “observed ment conference”[PT] OR “consensus development conference,
function”[Title/Abstract] OR “gait analysis”[Title/Abstract] OR “gait nih”[PT] OR “practice guideline”[Publication Type]) NOT (“animal-
evaluation”[Title/Abstract] OR “walk* test”[Title/Abstract] OR “task s”[MeSH Terms] NOT “humans”[MeSH Terms]).
performance and analysis”[MH] OR Outcome Assessment[MH]).
Appendix 2. Levels of evidence for the quality of the
Filter 4: Sensitive search filter for measurement properties measurement property

(instrumentation[sh] OR methods[sh] OR validation studies[pt]


OR Comparative Study[pt] OR psychometrics[MH] OR psychometr*
[tiab] OR clinimetr*[tw] OR clinometr*[tw] OR “outcome assess- Level Rating* Criteria
ment (health care)”[MH] OR “outcome assessment”[tiab] OR Strong þþþ or  Consistent findings in multiple studies
“outcome measure*”[tw] OR “observer variation”[MH] OR of good
“observer variation”[tiab] OR “Health Status Indicators”[MH] Methodological quality OR in one study
of excellent
OR “reproducibility of results”[MH] OR reproducib*[tiab] OR
Methodological quality
“discriminant analysis”[MH] OR reliab*[tiab] OR unreliab*[tiab] OR Moderate þþ or  Consistent findings in multiple studies of fair
valid*[tiab] OR coefficient[tiab] OR homogeneity[tiab] OR homo- Methodological quality OR in one study of good
geneous[tiab] OR “internal consistency”[tiab] OR (cronbach*[tiab] Methodological quality
AND (alpha[tiab] OR alphas[tiab])) OR (item[tiab] AND (correla- Limited þ or  One study of fair methodological quality
Conflicting  Conflicting findings
tion*[tiab] OR selection*[tiab] OR reduction*[tiab])) OR agreement
Unknown ? Only studies of poor methodological quality
[tiab] OR precision[tiab] OR imprecision[tiab] OR “precise value-
Adapted from Terwee et al. J Clin Epidemiol 2007;60(1):34e42.
s”[tiab] OR testeretest[tiab] OR (test[tiab] AND retest[tiab]) OR
* þ ¼ positive rating, ? ¼ indeterminate rating,  ¼ negative rating.
(reliab*[tiab] AND (test[tiab] OR retest[tiab])) OR stability[tiab] OR
interrater[tiab] OR inter-rater[tiab] OR intrarater[tiab] OR intra-
rater[tiab] OR intertester[tiab] OR inter-tester[tiab] OR intratester
[tiab] OR intra-tester[tiab] OR interobserver[tiab] OR inter-observer References
[tiab] OR intraobserver[tiab] OR intraobserver[tiab] OR inter-
technician[tiab] OR inter-technician[tiab] OR intratechnician[tiab] 1. Pham T, van der Heijde D, Altman RD, Anderson JJ, Bellamy N,
OR intra-technician[tiab] OR interexaminer[tiab] OR inter- Hochberg M, et al. OMERACT-OARSI initiative: Osteoarthritis
examiner[tiab] OR intraexaminer[tiab] OR intra-examiner[tiab] Research Society International set of responder criteria for
OR interassay[tiab] OR inter-assay[tiab] OR intraassay[tiab] OR osteoarthritis clinical trials revisited. Osteoarthritis Cartilage
intra-assay[tiab] OR interindividual[tiab] OR inter-individual[tiab] 2004;12:389e99.
OR intraindividual[tiab] OR intra-individual[tiab] OR inter- 2. Bellamy N, Kirwan J, Boers M, Brooks P, Strand V, Tugwell P,
participant[tiab] OR inter-participant[tiab] OR intraparticipant et al. Recommendations for a core set of outcome measures for
[tiab] OR intra-participant[tiab] OR kappa[tiab] OR kappa’s[tiab] OR future phase III clinical trials in knee, hip, and hand osteoar-
kappas[tiab] OR repeatab*[tiab] OR ((replicab*[tiab] OR repeated thritis. Consensus development at OMERACT III. J Rheumatol
[tiab]) AND (measure[tiab] OR measures[tiab] OR findings[tiab] OR 1997;24:799e802.
result[tiab] OR results[tiab] OR test[tiab] OR tests[tiab])) OR gen- 3. Terwee CB, Mokkink LB, Steultjens MP, Dekker J. Performance-
eraliza*[tiab] OR generalisa*[tiab] OR concordance[tiab] OR (intra- based methods for measuring the physical function of patients
class[tiab] AND correlation*[tiab]) OR discriminative[tiab] OR with osteoarthritis of the hip or knee: a systematic review of
“known group”[tiab] OR factor analysis[tiab] OR factor analyses measurement properties. Rheumatology (Oxford) 2006;45:
[tiab] OR dimension*[tiab] OR subscale*[tiab] OR (multitrait[tiab] 890e902.
AND scaling[tiab] AND (analysis[tiab] OR analyses[tiab])) OR item 4. World Health Organization. International Classification of
discriminant[tiab] OR interscale correlation*[tiab] OR error[tiab] Functioning, Disability, and Health. Geneva, Switzerland: ICF;
OR errors[tiab] OR “individual variability”[tiab] OR (variability[tiab] 2001.
AND (analysis[tiab] OR values[tiab])) OR (uncertainty[tiab] AND 5. Wright AA, Hegedus EJ, David Baxter G, Abbott JH. Measure-
(measurement[tiab] OR measuring[tiab])) OR “standard error of ment of function in hip osteoarthritis: developing a standard-
measurement”[tiab] OR sensitiv*[tiab] OR responsive*[tiab] OR ized approach for physical performance measures. Physiother
((minimal[tiab] OR minimally[tiab] OR clinical[tiab] OR clinically Theor Pract 2011;27:253e62.
[tiab]) AND (important[tiab] OR significant[tiab] OR detectable 6. Mizner RL, Petterson SC, Clements KE, Zeni Jr JA, Irrgang JJ,
[tiab])AND (change[tiab] OR difference[tiab])) OR (small*[tiab] AND Snyder-Mackler L. Measuring functional improvement after
(real[tiab] OR detectable[tiab]) AND (change[tiab] OR difference total knee arthroplasty requires both performance-based and
[tiab])) OR meaningful change[tiab] OR “ceiling effect”[tiab] OR patient-report assessments. A longitudinal analysis of
“floor effect”[tiab] OR “Item response model”[tiab] OR IRT[tiab] OR outcomes. J Arthroplasty 2011;26:728e37.
F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562 1561

7. Stratford PW, Kennedy DM, Riddle DL. New study design the Harris Hip Score with generic measures for hip function in
evaluated the validity of measures to assess change after hip or osteoarthritis of the hip. Ann Rheum Dis 2003;62:935e8.
knee arthroplasty. J Clin Epidemiol 2009;62:347e52. 25. Marks R. Walking time measures for evaluating OA of the
8. Stratford PW, Kennedy DM, Woodhouse LJ. Performance knee. S Afr J Physiother 1994;50:5þ7e8.
measures provide assessments of pain and function in people 26. Parent E, Moffet H. Comparative responsiveness of locomotor
with advanced osteoarthritis of the hip or knee. Phys Ther tests and questionnaires used to follow early recovery
2006;86:1489e96. after total knee arthroplasty. Arch Phys Med Rehabil 2002;83:
9. Jordan KP, Wilkie R, Muller S, Myers H, Nicholls E. 70e80.
Measurement of change in function and disability in osteoar- 27. Piva SR, Fitzgerald GK, Irrgang JJ, Bouzubar F, Starz TW. Get up
thritis: current approaches and future challenges. Curr Opin and go test in patients with knee osteoarthritis. Arch Phys Med
Rheumatol 2009;21:525e30. Rehabil 2004;85:284e9.
10. Stratford PW, Kennedy DM. Performance measures were 28. Kennedy DM, Stratford PW, Wessel J, Gollish JD, Penney D.
necessary to obtain a complete picture of osteoarthritic Assessing stability and change of four performance measures:
patients. J Clin Epidemiol 2006;59:160e7. a longitudinal study evaluating outcome following total hip
11. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, and knee arthroplasty. BMC Musculoskelet Disord 2005;6:3.
Knol DL, et al. The COSMIN checklist for assessing the meth- 29. Marks R. Reliability and validity of self-paced walking time
odological quality of studies on measurement properties of measures for knee osteoarthritis. Arthritis Care Res 1994;7:
health status measurement instruments: an international 50e3.
Delphi study. Qual Life Res 2010;19:539e49. 30. Wright AA, Cook CE, Baxter GD, Dockerty JD, Abbott JH.
12. Mokkink LB, Terwee CB, Stratford PW, Alonso J, Patrick DL, A comparison of 3 methodological approaches to defining
Riphagen I, et al. Evaluation of the methodological quality of major clinically important improvement of 4 performance
systematic reviews of health status measurement instruments. measures in patients with hip osteoarthritis. J Orthop Sports
Qual Life Res 2009;18:313e33. Phys Ther 2011;41:319e27.
13. Terwee C, Mokkink L, Knol D, Ostelo R, Bouter L, de Vet H. 31. Lin YC, Davey RC, Cochrane T. Tests for physical function of the
Rating the methodological quality in systematic reviews of elderly with knee and hip osteoarthritis. Scand J Med Sci
studies on measurement properties: a scoring system for the Sports 2001;11:280e6.
COSMIN checklist. Qual Life Res 2012;21:651e7. 32. Nilsdotter A, Roos EM, Westerlund JP, Roos HP, Lohmander LS.
14. Moher D, Tetzlaff J, Altman DG. Preferred reporting items for Comparative responsiveness of measures of pain and
systematic reviews and meta-analyses: the PRISMA statement. function after total hip replacement. Arthritis Care Res
Ann Intern Med 2009;151:264e9. 2001;45:258e62.
15. de Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement 33. Oberg U, Oberg B, Oberg T. Validity and reliability of a new
in Medicine: A Practical Guide to Biostatistics and Epidemi- assessment of lower-extremity dysfunction. Phys Ther
ology. London: Cambridge University Press; 2011. 1994;74:861e71.
16. Terwee CB, Jansma EP, Riphagen II , de Vet HC. Development of 34. Oberg U, Oberg T. Discriminatory power, sensitivity and
a methodological PubMed search filter for finding studies on specificity of a new assessment system (FAS). Physiother Can
measurement properties of measurement instruments. Qual 1997;49:40e7.
Life Res 2009;18:1115e23. 35. Rejeski WJ, Ettinger Jr WH, Schumaker S, James P, Burns R,
17. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Elam JT. Assessing performance-related disability in
Patrick DL, et al. The COSMIN checklist for evaluating patients with knee osteoarthritis. Osteoarthritis Cartilage
the methodological quality of studies on measurement prop- 1995;3:157e67.
erties: a clarification of its content. BMC Med Res Methodol 36. McCarthy CJ, Oldham JA. The reliability, validity and respon-
2010;10:22. siveness of an aggregated locomotor function (ALF) score in
18. Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, patients with osteoarthritis of the knee. Rheumatology
Knol DL, Dekker J, et al. Quality criteria were proposed for (Oxford) 2004;43:514e7.
measurement properties of health status questionnaires. J Clin 37. Steultjens MP, Dekker J, van Baar ME, Oostendorp RA,
Epidemiol 2007;60:34e42. Bijlsma JW. Internal consistency and validity of an
19. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso- observational method for assessing disability in mobility
Coello P, et al. GRADE: an emerging consensus on rating in patients with osteoarthritis. Arthritis Care Res 1999;12:
quality of evidence and strength of recommendations. BMJ 19e25.
2008;336:924e6. 38. Steultjens MP, Dekker J, van Baar ME, Oostendorp RA,
20. Borjesson M, Weidenhielm L, Elfving B, Olsson E. Tests of Bijlsma JW. Range of joint motion and disability in patients
walking ability at different speeds in patients with knee with osteoarthritis of the knee or hip. Rheumatology (Oxford)
osteoarthritis. Physiother Res Int 2007;12:115e21. 2000;39:955e61.
21. Davey RC, Edwards SM, Cochrane T. Testeretest reliability of 39. Steultjens MP, Roorda LD, Dekker J, Bijlsma JW. Responsive-
lower extremity functional and self-reported measures in ness of observational and self-report methods for assessing
elderly with osteoarthritis. Adv Physiother 2003;5:155e60. disability in mobility in patients with osteoarthritis. Arthritis
22. French HP, Fitzpatrick M, FitzGerald O. Responsiveness of Rheum 2001;45:56e61.
physical function outcomes following physiotherapy inter- 40. Almeida GJ, Schroeder CA, Gil AB, Fitzgerald GK, Piva SR.
vention for osteoarthritis of the knee: an outcome comparison Interrater reliability and validity of the stair ascend/descend
study. Physiotherapy 2011;97:302e8. test in subjects with total knee arthroplasty. Arch Phys Med
23. Gill S, McBurney H. Reliability of performance-based measures Rehabil 2010;91:932e8.
in people awaiting joint replacement surgery of the hip or 41. Bremander AB, Dahl LL, Roos EM. Validity and reliability of
knee. Physiother Res Int 2008;13:141e52. functional performance tests in meniscectomized patients
24. Hoeksma HL, Van Den Ende CHM, Ronday HK, Heering A, with or without knee osteoarthritis. Scand J Med Sci Sports
Breedveld FC, Dekker J. Comparison of the responsiveness of 2007;17:120e7.
1562 F. Dobson et al. / Osteoarthritis and Cartilage 20 (2012) 1548e1562

42. Cecchi F, Molino-Lova R, Di Iorio A, Conti AA, Mannoni A, 45. Jakobsen TL, Kehlet H, Bandholm T. Reliability of the 6-min
Lauretani F, et al. Measures of physical performance capture walk test after total knee arthroplasty. Knee Surg Sports
the excess disability associated with hip pain or knee pain in Traumatol Arthrosc, in press.
older persons. J Gerontol A Biol Sci Med Sci 2009;64:1316e24. 46. Stevens-Lapsley JE, Schenkman ML, Dayton MR. Comparison of
43. Crosbie J, Naylor JM, Harmer AR. Six minute walk distance or self-reported knee injury and osteoarthritis outcome score to
stair negotiation? Choice of activity assessment following total performance measures in patients after total knee arthro-
knee replacement. Physiother Res Int 2010;15:35e41. plasty. PM R 2011;3:541e9.
44. Kwoh CK, Petrick MA, Munin MC. Inter-rater reliability for 47. Riddle DL, Stratford PW, Bowman DH. Findings of extensive
function and strength measurements in the acute care hospital variation in the types of outcome measures used in hip and
after elective hip and knee arthroplasty. Arthritis Care Res knee replacement clinical trials: a systematic review. Arthritis
1997;10:128e34. Rheum 2008;59:876e83.

You might also like