GRADE Handbook
GRADE Handbook
GRADE Handbook
1. Overview of the GRADE Approach Example 1: Trials with positive findings (i.e. statistically significant differences) are more likely to
1.1 Purpose and advantages of the GRADE be published than trials with negative or null findings
approach A systematic review assessed the extent to which publication of a cohort of clinical trials is influenced
1.2 Separation of confidence in effect estimates by the statistical significance, perceived importance, or direction of their results. It found five studies
from strength of recommendations that investigated these associations in a cohort of registered clinical trials. Trials with positive findings
1.3 Special challenges in applying the the GRADE were more likely to be published than trials with negative or null findings (odds ratio: 3.9; 95% CI: 2.7 to
approach 5.7). This corresponds to a risk ratio of 1.8 (95% CI: 1.6 to 2.0), assuming that 41% of negative trials are
1.4 Modifications to the GRADE approach published (the median among the included studies, range = 11% to 85%). In absolute terms, this means
2. Framing the health care question that if 41% of negative trials are published, we would expect that 73% of positive trials would be
2.1 Defining the patient population and intervention published. Two studies assessed time to publication and showed that trials with positive findings tended
2.2 Dealing with multiple comparators to be published after 4 to 5 years compared with those with negative findings, which were published
after 6 to 8 years. Three studies found no statistically significant association between sample size and
2.3 Other considerations
publication. One study found no statistically significant association between either funding mechanism,
2.4 Format of health care questions using the investigator rank, or sex and publication.
GRADE approach
3. Selecting and rating the importance of outcomes Systematic reviews performed early in the development of a body of research may be biased due to the
3.1 Steps for considering the relative importance of tendency for positive results to be published sooner and for negative results to be published later or
outcomes withheld. This is referred to as “lag bias” and especially true of industry funded studies.
3.2 Influence of perspective Example 3: Reduced effect estimate in a systematic review as a result of negative studies not being
3.3 Using evidence in rating the importance of published
outcomes
An investigation of 74 antidepressant trials with a mean sample size of fewer than 200 patients was
3.4 Surrogate (substitute) outcomes submitted to the FDA. Of the 38 studies viewed as positive by the FDA, 37 were published. Of the 36
4. Summarizing the evidence studies viewed as negative by the FDA, only 14 were published. Publication bias of this magnitude can
4.1 Evidence Tables seriously bias effect estimates.
4.2 GRADE Evidence Profile
4.3 Summary of Findings table Example 5: Funnel plots to detect publication bias
5. Quality of evidence In A, the circles represent the point estimates of the trials. The pattern of distribution resembles an
5.1 Factors determining the quality of evidence inverted funnel. Larger studies tend to be closer to the pooled estimate (the dashed line). In this case, the
5.1.1 Study design effect sizes of the smaller studies are more or less symmetrically distributed around the pooled estimate.
5.2 Factors that can reduce the quality of the In B, publication bias is detected. This funnel plot shows that the smaller studies are not symmetrically
evidence distributed around either the point estimate (dominated by the larger trials) or the results of the larger
5.2.1 Study limitations (Risk of Bias) trials themselves. The trials expected in the bottom right quadrant are missing. One possible explanation
5.2.2 Inconsistency of results for this set of results is publication bias an overestimate of the treatment effect relative to the
5.2.2.1 Deciding whether to use estimates underlying truth.
from a subgroup analysis
5.2.3 Indirectness of evidence
5.2.4 Imprecision
5.2.4.1 Imprecision in guidelines
5.2.4.2 Imprecision in in systematic
reviews
5.2.4.3 Rating down two levels for
imprecision
5.2.5 Publication bias
5.3. Factors that can increase the quality of the
evidence
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding
5.4 Overall quality of evidence
6. Going from evidence to recommendations
6.1 Recommendations and their strength
6.1.1 Strong recommendation
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions
only in research
6.1.4 No recommendation
6.2 Factors determining direction and strength of
recommendations
6.2.1 Balance of desirable and undesirable
consequences
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects Example 6: Publication bias detected
6.2.1.2 Best estimates of values and
A number of small trials from a systematic review of oxygen therapy in patients with chronic obstructive
preferences pulmonary disease showed that the intervention improved exercise capacity, but evaluation of the data
6.3.2 Confidence in best estimates of suggested publication bias.
magnitude of effects (quality of evidence)
6.3.3 Confidence in values and preferences The funnel plot of exercise distance shows distance on the xaxis and variance on the yaxis. The red
6.3.4 Resource use (cost) dots represent the mean differences of individual trial estimates and the dotted line the point estimate of
6.3.4.1 Differences between costs and the mean effect indicating benefit from oxygen treatment. The distribution of these dots to the right of the
other outcomes dotted line suggests that there may be the equivalent number of ’negative’ trials that have not been
included in this analysis. Thus, one may downgrade the quality of evidence in this case due to
6.3.4.2 Perspective
uncertainty resulting from asymmetry in the pattern of results.
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of
resource use (quality of the evidence about
cost)
6.3.4.5 Presentation of resource use
6.3.4.6 Economic model
6.3.4.7 Consideration of resource use in
recommendations
6.4 Presentation of recommendations
6.4.1 Wording of recommandations
6.3.2 Symbolic representation
6.4.3 Providing transparent statements about
assumed values and preferences
6.5 The EvidencetoDecision framework
7. The GRADE approach for diagnostic tests and
strategies
7.1. Questions about diagnostic tests
7.1.1. Establishing the purpose of a test
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test
7.1.3. Clear clinical questions
7.2. Gold standard and reference test
Example 8: Publication bias undetected
1. Overview of the GRADE Approach
1.1 Purpose and advantages of the GRADE A systematic review of parenteral anticoagulation for prolonged survival in patients with cancer who had
approach no other indication for anticoagulation shows five RCTs which are symmetrically distributed around the
1.2 Separation of confidence in effect estimates best estimate of effect. Publication bias is undetected in this scenario and thus the evidence should not
from strength of recommendations be downgraded.
1.3 Special challenges in applying the the GRADE
approach
1.4 Modifications to the GRADE approach
2. Framing the health care question
2.1 Defining the patient population and intervention
2.2 Dealing with multiple comparators
2.3 Other considerations
2.4 Format of health care questions using the
GRADE approach
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of
outcomes
3.2 Influence of perspective
3.3 Using evidence in rating the importance of
outcomes
3.4 Surrogate (substitute) outcomes
4. Summarizing the evidence
4.1 Evidence Tables
4.2 GRADE Evidence Profile
4.3 Summary of Findings table
5. Quality of evidence
5.1 Factors determining the quality of evidence
5.1.1 Study design
5.2 Factors that can reduce the quality of the When to downgrade the quality of evidence because of suspicion of publication bias
evidence Guideline panels and authors of systematic reviews should consider the extent to which they are
5.2.1 Study limitations (Risk of Bias) uncertain about the magnitude of the effect due to selective publication of studies and they may
5.2.2 Inconsistency of results downgrade the quality of evidence by one level. Consider:
5.2.2.1 Deciding whether to use estimates
from a subgroup analysis ● study design (experimental vs. observational)
5.2.3 Indirectness of evidence ● study size (small studies vs. large studies)
5.2.4 Imprecision
● lag bias (early publication of positive results)
5.2.4.1 Imprecision in guidelines
5.2.4.2 Imprecision in in systematic ● search strategy (was it comprehensive?)
reviews ● asymmetry in funnel plot.
5.2.4.3 Rating down two levels for
imprecision
5.2.5 Publication bias
5.3. Factors that can increase the quality of the
evidence 5.3. Factors that can increase the quality of the evidence
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding Note: Consideration of factors reducing quality of evidence must precede consideration of reasons
5.4 Overall quality of evidence for rating it up. Thus, the 5 factors for rating down quality of evidence (risk of bias, imprecision,
6. Going from evidence to recommendations inconsistency, indirectness, and publication bias) must be rated prior to the 3 factors for rating it up
6.1 Recommendations and their strength (large effect, doseresponse and effects of residual confounding). The decision to rate up quality of
evidence should only be made when serious limitations in any of the 5 areas reducing the quality of
6.1.1 Strong recommendation
evidence are absent.
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions The following sections discuss in detail the 3 factors that permit rating up the quality of evidence, i.e.
only in research increase confidence in an estimate of an effect. Using the GRADE framework, body of evidence from
6.1.4 No recommendation observational studies is initially classified as low quality evidence (i.e. permitting low confidence in the
6.2 Factors determining direction and strength of estimated effect). There are times, however, when we have high confidence in the estimate of effect
recommendations from observational studies (including cohort, casecontrol, beforeafter, time series studies, etc.) and to
6.2.1 Balance of desirable and undesirable nonrandomized experimental studies (e.g. quasirandomized and nonrandomized controlled trials). The
circumstances under which the body of evidence from observational studies may provide higher than low
consequences
confidence in the estimated effects will likely occur infrequently.
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects Note: Although it is theoretically possible to rate up results from randomized control trials, we have yet
6.2.1.2 Best estimates of values and to find a compelling example of such an instance.
preferences
6.3.2 Confidence in best estimates of
magnitude of effects (quality of evidence)
6.3.3 Confidence in values and preferences 5.3.1 Large magnitude of an effect
6.3.4 Resource use (cost)
6.3.4.1 Differences between costs and When body of evidence from observational studies not downgraded for any of the 5 factors yield large or
other outcomes very large estimates of the magnitude of an intervention effect, then we may be more confident about the
6.3.4.2 Perspective results. In those situations, even though observational studies are likely to provide an overestimate of the
6.3.4.3 Resource implications considered true effect, the study design that is more prone to bias is unlikely to explain all of the apparent benefit (or
6.3.4.4 Confidence in the estimates of harm). Decisions to rate up quality of evidence because of large or very large effects (Table 5.9) should
resource use (quality of the evidence about consider not only the point estimate but also the precision (width of the CI) around that effect: one should
cost) rarely and very cautiously rate up quality of evidence because of apparent large effects, if the CI
6.3.4.5 Presentation of resource use overlaps substantially with effects smaller than the chosen threshold of clinical importance.
6.3.4.6 Economic model
6.3.4.7 Consideration of resource use in
recommendations
Table 5.9. Definitions of large and very large effect
6.4 Presentation of recommendations
6.4.1 Wording of recommandations Magnitude of Effect Quality of Evidence
6.3.2 Symbolic representation Definition
6.4.3 Providing transparent statements about Large RR* >2 or <0.5 may increase 1 level
assumed values and preferences
6.5 The EvidencetoDecision framework (based on direct evidence, with
7. The GRADE approach for diagnostic tests and no plausible confounders)
strategies Very large RR* >5 or <0.2 may increase 2 levels
7.1. Questions about diagnostic tests
(based on direct evidence with
7.1.1. Establishing the purpose of a test no serious problems with risk of
7.1.2. Establishing the role of a test bias or precision, i.e. with
7.1.3. Clear clinical questions
7.2. Gold standard and reference test (sufficiently narrow confidence
intervals)
1. Overview of the GRADE Approach * Note: these rules apply when effect measure is expressed as relative risk (RR) or hazard ratio (HR).
1.1 Purpose and advantages of the GRADE They cannot always be applied when the effect measure is expressed as odds ratio (OR). We suggest
approach converting OR to RR and only then assessing the magnitude of an effect.
1.2 Separation of confidence in effect estimates
from strength of recommendations
1.3 Special challenges in applying the the GRADE One may be more likely to rate up the quality of evidence because of large or very large magnitude of an
approach effect, when:
1.4 Modifications to the GRADE approach ● effect is rapid
2. Framing the health care question
2.1 Defining the patient population and intervention ● effect is consistent across subjects
2.2 Dealing with multiple comparators ● previous trajectory of disease is reversed
2.3 Other considerations
2.4 Format of health care questions using the ● large magnitude of an effect is supported by indirect evidence
GRADE approach
3. Selecting and rating the importance of outcomes Note: When outcomes are subjective it is important to be cautious when considering upgrading because
3.1 Steps for considering the relative importance of of observed large effects. This is especially true when outcome assessors were aware which group study
outcomes subjects belonged to (i.e. were not blinded).
3.2 Influence of perspective
3.3 Using evidence in rating the importance of Examples
outcomes A systematic review of observational studies examining the relationship between infant sleeping position
3.4 Surrogate (substitute) outcomes and sudden infant death syndrome (SIDS) found an odds ratio of 4.1 (95% CI: 3.1, 5.5) of SIDS
4. Summarizing the evidence occurring with front vs. back sleeping positions. Furthermore, “back to sleep” campaigns that were
4.1 Evidence Tables started in the 1980s to encourage back sleeping position were associated with a relative decline in the
4.2 GRADE Evidence Profile incidence of SIDS by 5070% in numerous countries.
4.3 Summary of Findings table
5. Quality of evidence
5.1 Factors determining the quality of evidence
5.1.1 Study design 5.3.2. Doseresponse gradient
5.2 Factors that can reduce the quality of the
evidence The presence of a doseresponse gradient has long been recognized as an important criterion for
5.2.1 Study limitations (Risk of Bias) believing a putative causeeffect relationship. The presence of a doseresponse gradient may increase
5.2.2 Inconsistency of results our confidence in the findings of observational studies and thereby increase the quality of evidence.
5.2.2.1 Deciding whether to use estimates
from a subgroup analysis
5.2.3 Indirectness of evidence Example 1: Doseresponse gradient (Upgraded by One Level)
5.2.4 Imprecision
The observation that, in patients receiving anticoagulation with warfarin, there is a dose response
5.2.4.1 Imprecision in guidelines gradient between higher levels of the international normalized ratio (INR), an indicator of the degree of
5.2.4.2 Imprecision in in systematic anticoagulation, and an increased risk of bleeding increases our confidence that supratherapeutic
reviews anticoagulation levels increase bleeding risk.
5.2.4.3 Rating down two levels for
imprecision
5.2.5 Publication bias Example 2: Doseresponse gradient (Upgraded by One Level)
5.3. Factors that can increase the quality of the
The doseresponse gradient associated with the rapidity of antibiotic administration in patients presenting
evidence with sepsis and hypotension may also be a reason to upgrade the quality of evidence for such a study.
5.3.1 Large magnitude of an effect There is a large absolute increase in mortality with each hour’s delay of antibiotic administration. This
5.3.2. Doseresponse gradient doseresponse relationship increases our confidence that the effect on mortality is real and substantial
5.3.3. Effect of plausible residual confounding leading to upgrading the quality of the evidence.
5.4 Overall quality of evidence
6. Going from evidence to recommendations
6.1 Recommendations and their strength
6.1.1 Strong recommendation
5.3.3. Effect of plausible residual confounding
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions
only in research On occasion, all plausible residual confounding from observational studies may be working to reduce
6.1.4 No recommendation the demonstrated effect or increase the effect, if no effect was observed.
6.2 Factors determining direction and strength of Rigorous observational studies will accurately measure prognostic factors associated with the outcome
recommendations of interest and will conduct an adjusted analysis that accounts for differences in the distribution of these
6.2.1 Balance of desirable and undesirable factors between intervention and control groups. The reason that in most instances we consider
consequences observational studies as providing only lowquality evidence is that unmeasured or unknown
6.2.1.1 Estimates of the magnitude of the determinants of outcome unaccounted for in the adjusted analysis are likely to be distributed
desirable and undesirable effects unequally between intervention and control groups, referred to as “residual confounding” or “residual
6.2.1.2 Best estimates of values and biases.”
preferences On occasion, all plausible confounders (biases) from observational studies unaccounted for in the
6.3.2 Confidence in best estimates of adjusted analysis (i.e. residual confounders) of a rigorous observational study would result in an
magnitude of effects (quality of evidence) underestimate of an apparent treatment effect. If, for instance, only sicker patients receive an
6.3.3 Confidence in values and preferences experimental intervention or exposure, yet they still fare better, it is likely that the actual intervention or
6.3.4 Resource use (cost) exposure effect is even larger than the data suggest. A parallel situation exists when observational
6.3.4.1 Differences between costs and studies have failed to demonstrate an association.
other outcomes Example 1: When confounding is expected to reduce a demonstrated effect (Upgraded by One Level)
6.3.4.2 Perspective
6.3.4.3 Resource implications considered A rigorous systematic review of observational studies including a total of 38 million patients
6.3.4.4 Confidence in the estimates of demonstrated higher death rates in private forprofit versus private notforprofit hospitals. It is likely,
resource use (quality of the evidence about however, that patients in the notforprofit hospitals were sicker than those in the forprofit hospitals.
This would bias results against the notforprofit hospitals. The second likely bias was the possibility that
cost)
higher numbers of patients with excellent private insurance coverage could lead to a hospital having
6.3.4.5 Presentation of resource use more resources and a spillover effect that would benefit those without such coverage. Since forprofit
6.3.4.6 Economic model hospitals are likely to admit a larger proportion of such wellinsured patients than notforprofit hospitals,
6.3.4.7 Consideration of resource use in the bias is once again against the notforprofit hospitals. Because the plausible biases would all
recommendations diminish the demonstrated intervention effect, one might consider the evidence from these observational
6.4 Presentation of recommendations studies as moderate rather than low quality.
6.4.1 Wording of recommandations
6.3.2 Symbolic representation Example 2: When confounding is expected to reduce a demonstrated effect (Upgraded by One Level)
6.4.3 Providing transparent statements about In a systematic review investigating the use of condoms in homosexual male relationships as a way of
assumed values and preferences preventing the spread of HIV, five observational studies were identified. The pooled estimate was a
6.5 The EvidencetoDecision framework relative risk of 0.34 (95%, 0.21 – 0.54) in favour of condom use. The authors failed to adjust in the
7. The GRADE approach for diagnostic tests and analysis for the fact that condom users are more likely to have more partners than noncondom users.
strategies One would expect that more partners would have increased the risk of acquiring HIV and therefore
7.1. Questions about diagnostic tests reduced the resulting relative risk of HIV infection. Therefore, the confidence in this effect, which is
7.1.1. Establishing the purpose of a test still large, would lead to upgrading by one level.
7.1.2. Establishing the role of a test Example 3: When confounding is expected to increase the effect but no effect was observed (Upgraded
7.1.3. Clear clinical questions by One Level)
7.2. Gold standard and reference test The hypoglycaemic drug phenformin causes lactic acidosis, and the related agent metformin is under
suspicion for the same toxicity. Very large observational studies have failed to demonstrate an
1. Overview of the GRADE Approach association between metformin and lactic acidosis. Given the likelihood that clinicians would have been
1.1 Purpose and advantages of the GRADE more alert to lactic acidosis with metformin and would have therefore overreported its occurrence, and
approach that no association was found, one could upgrade this evidence.
approach that no association was found, one could upgrade this evidence.
1.2 Separation of confidence in effect estimates Example 4: When confounding is expected to increase the effect but no effect was observed (Upgraded
from strength of recommendations by One Level)
1.3 Special challenges in applying the the GRADE
approach Consider the early reports associating MMR vaccination with autism. One would think that there would
1.4 Modifications to the GRADE approach be overreporting of autism in children given MMR vaccines. However, systematic reviews failed to
2. Framing the health care question prove any association between the two. Due to the negative results, despite the potential presence of
confounders which would increase the likelihood of reporting of autism, no association was found.
2.1 Defining the patient population and intervention
Therefore, we may upgrade the level of evidence by one level.
2.2 Dealing with multiple comparators
2.3 Other considerations
2.4 Format of health care questions using the
GRADE approach
3. Selecting and rating the importance of outcomes 5.4 Overall quality of evidence
3.1 Steps for considering the relative importance of
outcomes
3.2 Influence of perspective The overall quality of evidence is a combined rating of the quality of evidence across all outcomes
3.3 Using evidence in rating the importance of considered critical for answering a health care question (i.e. making a decision or a recommendation).
outcomes We caution against a mechanistic approach toward the application of the criteria for rating the quality of
3.4 Surrogate (substitute) outcomes the evidence up or down. Although GRADE suggests the initial separate consideration of five categories
4. Summarizing the evidence of reasons for rating down the quality of evidence, and three categories for rating it up, with a yes/no
4.1 Evidence Tables decision regarding rating up or down in each case, the final rating of overall evidence quality occurs in a
4.2 GRADE Evidence Profile continuum of confidence in the estimates of effects.
4.3 Summary of Findings table
5. Quality of evidence
5.1 Factors determining the quality of evidence For authors of systematic reviews:
5.1.1 Study design Authors of systematic reviews do not grade the overall quality of evidence across outcomes. Because
5.2 Factors that can reduce the quality of the systematic reviews do not – or at least should not – make recommendations, authors of systematic
evidence reviews rate the quality of evidence only for each outcome separately.
5.2.1 Study limitations (Risk of Bias)
5.2.2 Inconsistency of results
5.2.2.1 Deciding whether to use estimates For guideline panels and others making recommendations:
from a subgroup analysis Guideline panels have to determine the overall quality of evidence across all the critical outcomes
5.2.3 Indirectness of evidence essential to a recommendation they make. Guideline panels provide a single grade of quality of evidence
5.2.4 Imprecision for every recommendation, but the strength of a recommendation usually depends on evidence regarding
5.2.4.1 Imprecision in guidelines not just one, but a number of patientimportant outcomes and on the quality of evidence for each of these
5.2.4.2 Imprecision in in systematic outcomes.
reviews Because the GRADE approach rates quality of evidence separately for each outcome, it is frequently
5.2.4.3 Rating down two levels for the case that quality differs across outcomes. When determining the overall quality of evidence across
imprecision outcomes:
5.2.5 Publication bias
1. Consider only those outcomes that have been deemed critical.
5.3. Factors that can increase the quality of the
evidence 2. If the quality od evidence is the same for all critical outcomes, then this becomes the overall
5.3.1 Large magnitude of an effect quality of the evidence supporting the answer to the question.
5.3.2. Doseresponse gradient 3. If the quality of evidence differs across critical outcomes, it is logical that the overall
5.3.3. Effect of plausible residual confounding confidence in effect estimates cannot be higher than the lowest confidence in effect estimates
5.4 Overall quality of evidence for any outcome that is critical for a decision. Therefore, the lowest quality of evidence for any
6. Going from evidence to recommendations of the critical outcomes determines the overall quality of evidence.
6.1 Recommendations and their strength
Example 1: Rating overall quality of evidence based on the importance of outcomes
6.1.1 Strong recommendation
6.1.2 Weak recommendation Several systematic reviews of highquality randomised trials suggest a decrease in the incidence of
6.1.3 Recommendations to use interventions infections and, likely, the mortality of ventilated patients in intensive care units receiving selective
only in research digestive decontamination (SDD). The quality of evidence on the effect of SDD on the emergence of
6.1.4 No recommendation bacterial antibiotic resistance and its clinical relevance is much less clear. One might reasonably grade
6.2 Factors determining direction and strength of the evidence about this feared potential adverse effect as low quality. If those making a recommendation
recommendations felt that these downsides of therapy were critical, the overall grade of the quality of evidence for SDD
6.2.1 Balance of desirable and undesirable would be low. If guideline panel felt that the emergence of bacterial antibiotic resistance was important
but not critical, the grade for an overall quality of evidence would be high.
consequences
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects However, which outcomes are critical may depend on the evidence. On occasion, the overall confidence
6.2.1.2 Best estimates of values and in effect estimates may not come from the outcomes judged critical at the beginning of the guideline
preferences development process – judgments about which outcomes are critical to the decision (recommendation)
6.3.2 Confidence in best estimates of may change when considering the results. Note that such judgments require careful consideration and
magnitude of effects (quality of evidence) are probably rare.
6.3.3 Confidence in values and preferences
There are 2 prototypical situations in which an outcome initially considered critical may cease to be
6.3.4 Resource use (cost) critical once the evidence is summarized:
6.3.4.1 Differences between costs and
other outcomes 1. An outcome turns out to be not relevant (e.g. a particular adverse event may be considered
6.3.4.2 Perspective critical at the outset of the guideline process but, if it turns out that the event occurs very
6.3.4.3 Resource implications considered infrequently, the final decision may be that this adverse effect is important but not critical to the
6.3.4.4 Confidence in the estimates of recommendation).
resource use (quality of the evidence about 2. An outcome turns out to be not necessary if, across the range of possible effects of the
cost) intervention on that outcome, the recommendation and its strength would remain unchanged. If
6.3.4.5 Presentation of resource use there is higher quality of evidence for some critical outcomes to support a decision, then one
6.3.4.6 Economic model need not rate down quality of evidence because of lower confidence in estimates of effects on
6.3.4.7 Consideration of resource use in other critical outcomes that support the same recommendation.
recommendations For instance, consider the following question: should statins vs. no statins be used in individuals
6.4 Presentation of recommendations without documented coronary heart disease but at high risk of cardiovascular events? Guideline
6.4.1 Wording of recommandations developers are likely to start the process by considering outcomes: death from cardiovascular
6.3.2 Symbolic representation causes, myocardial infarction, stroke, and adverse effects, as critical to the decision.
6.4.3 Providing transparent statements about
A systematic review or randomized trials demonstrated consistent reductions in myocardial
assumed values and preferences infarctions and stroke but nonsignificant reductions in coronary deaths. Serious adverse effects were
6.5 The EvidencetoDecision framework unusual and readily reversible with drug discontinuation. The guideline authors found that for three
7. The GRADE approach for diagnostic tests and of the four outcomes (myocardial infarction, stroke, and adverse effects) there was high quality
strategies evidence. For coronary deaths evidence was of moderate quality because of imprecision.
7.1. Questions about diagnostic tests
7.1.1. Establishing the purpose of a test Should the overall quality of evidence across outcomes be high or moderate? The judgments made at
7.1.2. Establishing the role of a test the beginning of the process suggest that the answer is "moderate". However, once it is established
that the risk of myocardial infarction and stroke decreases with statins, most people would find
7.1.3. Clear clinical questions
7.2. Gold standard and reference test compelling reason to use statins. Knowing whether coronary mortality also decreases is no longer
necessary for the decision (as long as it is very unlikely that it increases). Considering this, the
1. Overview of the GRADE Approach overall rating of quality of evidence is most appropriately designated as "high".
1.1 Purpose and advantages of the GRADE
approach
1.2 Separation of confidence in effect estimates
from strength of recommendations
1.3 Special challenges in applying the the GRADE
approach 6. Going from evidence to recommendations
approach 6. Going from evidence to recommendations
1.4 Modifications to the GRADE approach
2. Framing the health care question
2.1 Defining the patient population and intervention
2.2 Dealing with multiple comparators
6.1 Recommendations and their strength
2.3 Other considerations
2.4 Format of health care questions using the The strength of a recommendation reflects the extent to which a guideline panel is confident that
GRADE approach desirable effects of an intervention outweigh undesirable effects, or vice versa, across the range of
3. Selecting and rating the importance of outcomes patients for whom the recommendation is intended.
3.1 Steps for considering the relative importance of GRADE specifies two categories of the strength of a recommendation. While GRADE suggests using
outcomes the terms strong and weak recommendations, those making recommendations may choose different
3.2 Influence of perspective wording to characterize the two categories of strength.
3.3 Using evidence in rating the importance of
outcomes In special cases, guideline panels may recommend an intervention be used only in research until more
data is generated, which would allow for a more comprehensive recommendation, or not make a
3.4 Surrogate (substitute) outcomes
recommendation at all.
4. Summarizing the evidence
4.1 Evidence Tables
4.2 GRADE Evidence Profile There are limitations to formal grading of recommendations. Like the quality of evidence, the balance
4.3 Summary of Findings table between desirable and undesirable effects reflects a continuum. Some arbitrariness will therefore be
5. Quality of evidence associated with placing particular recommendations in categories such as “strong” and “weak.” Most
5.1 Factors determining the quality of evidence organisations producing guidelines have decided that the merits of an explicit grade of recommendation
5.1.1 Study design outweigh the disadvantages.
5.2 Factors that can reduce the quality of the
evidence
5.2.1 Study limitations (Risk of Bias)
5.2.2 Inconsistency of results
5.2.2.1 Deciding whether to use estimates
from a subgroup analysis
5.2.3 Indirectness of evidence
5.2.4 Imprecision
5.2.4.1 Imprecision in guidelines
5.2.4.2 Imprecision in in systematic
reviews For a guideline panel or others making recommendations to offer a strong recommendation they have to
5.2.4.3 Rating down two levels for be certain about the various factors that influence the strength of a recommendation. The panel also
imprecision should have the relevant information at hand that supports a clear balance towards either the desirable
5.2.5 Publication bias effects of an intervention (to recommend an action) or undesirable effects (to recommend against an
5.3. Factors that can increase the quality of the action).
evidence When a guideline panel is uncertain whether the balance is clear or when the relevant information
5.3.1 Large magnitude of an effect about the various factors that influence the strength of a recommendation is not available, a guideline
5.3.2. Doseresponse gradient panel should be more cautious and in most instances it would opt to make a weak recommendation.
5.3.3. Effect of plausible residual confounding
5.4 Overall quality of evidence Figure 3: Balance scales to depict strong vs. weak recommendations.
6. Going from evidence to recommendations
6.1 Recommendations and their strength
6.1.1 Strong recommendation
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions
only in research
6.1.4 No recommendation
6.2 Factors determining direction and strength of
recommendations
6.2.1 Balance of desirable and undesirable
consequences
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects
6.2.1.2 Best estimates of values and To aid interpretation GRADE suggests implications of strong or weak recommendations that follow from
preferences the recommendations. The advantage of two categories of strength of recommendations is that they
6.3.2 Confidence in best estimates of provide clear direction to patients, clinicians, and policymakers.
magnitude of effects (quality of evidence)
6.3.3 Confidence in values and preferences Table 6.1. Implications of strong and weak recommendations for different users of guidelines
6.3.4 Resource use (cost)
6.3.4.1 Differences between costs and Strong Recommendation Weak Recommendation
other outcomes For patients Most individuals in this situation The majority of individuals in
6.3.4.2 Perspective would want the recommended this situation would want the
6.3.4.3 Resource implications considered course of action and only a suggested course of action, but
6.3.4.4 Confidence in the estimates of small proportion would not. many would not.
resource use (quality of the evidence about
cost) For clinicians Most individuals should receive Recognize that different choices
the recommended course of will be appropriate for different
6.3.4.5 Presentation of resource use
action. Adherence to this patients, and that you must help
6.3.4.6 Economic model recommendation according to each patient arrive at a
6.3.4.7 Consideration of resource use in the guideline could be used as a management decision consistent
recommendations quality criterion or performance with her or his values and
6.4 Presentation of recommendations indicator. Formal decision aids preferences. Decision aids may
6.4.1 Wording of recommandations are not likely to be needed to well be useful helping
6.3.2 Symbolic representation help individuals make decisions individuals making decisions
6.4.3 Providing transparent statements about consistent with their values and consistent with their values and
assumed values and preferences preferences. preferences. Clinicians should
6.5 The EvidencetoDecision framework expect to spend more time with
7. The GRADE approach for diagnostic tests and patients when working towards
strategies a decision.
7.1. Questions about diagnostic tests For policy makers The recommendation can be Policy making will require
7.1.1. Establishing the purpose of a test adapted as policy in most substantial debates and
7.1.2. Establishing the role of a test situations including for the use involvement of many
7.1.3. Clear clinical questions as performance indicators. stakeholders. Policies are also
7.2. Gold standard and reference test more likely to vary between
regions. Performance indicators
1. Overview of the GRADE Approach would have to focus on the fact
1.1 Purpose and advantages of the GRADE that adequate deliberation about
approach the management options has
1.2 Separation of confidence in effect estimates taken place.
from strength of recommendations
1.3 Special challenges in applying the the GRADE
Individualization of clinical decisionmaking in weak recommendations remains a challenge. Although
approach clinicians always should consider patients’ preferences and values, when they face weak
1.4 Modifications to the GRADE approach recommendations they may have more detailed conversations with patients than for strong
2. Framing the health care question recommendations to ensure that the ultimate decision is consistent with the patient’s preferences and
2.1 Defining the patient population and intervention values.
2.2 Dealing with multiple comparators
2.2 Dealing with multiple comparators
2.3 Other considerations Important Note:
2.4 Format of health care questions using the Clinicians, patients, thirdparty payers, institutional review committees, other stakeholders, or the courts
GRADE approach should never view recommendations as dictates. Even strong recommendations based on highquality
3. Selecting and rating the importance of outcomes evidence will not apply to all circumstances and all patients.
3.1 Steps for considering the relative importance of Users of guidelines may reasonably conclude that following some strong recommendations based on the
outcomes high quality evidence will be a mistake for some patients. No clinical practice guideline or
3.2 Influence of perspective recommendation can take into account all of the often compelling unique features of individual patients
3.3 Using evidence in rating the importance of and clinical circumstances. Thus, nobody charged with evaluating clinician’s actions, should attempt to
outcomes apply recommendations by rote or in a blanket fashion.
3.4 Surrogate (substitute) outcomes
4. Summarizing the evidence
4.1 Evidence Tables
4.2 GRADE Evidence Profile 6.1.1 Strong recommendation
4.3 Summary of Findings table
5. Quality of evidence
5.1 Factors determining the quality of evidence A strong recommendation is one for which guideline panel is confident that the desirable effects of an
5.1.1 Study design intervention outweigh its undesirable effects (strong recommendation for an intervention) or that the
5.2 Factors that can reduce the quality of the undesirable effects of an intervention outweigh its desirable effects (strong recommendation against an
intervention).
evidence
5.2.1 Study limitations (Risk of Bias) Note: Strong recommendations are not necessarily high priority recommendations.
5.2.2 Inconsistency of results A strong recommendation implies that most or all individuals will be best served by the recommended
5.2.2.1 Deciding whether to use estimates course of action.
from a subgroup analysis
5.2.3 Indirectness of evidence Example 1: Sample strong recommendations
5.2.4 Imprecision ● Early anticoagulation in patients with deep venous thrombosis for the prevention of pulmonary
5.2.4.1 Imprecision in guidelines embolism;
5.2.4.2 Imprecision in in systematic
● Antibiotics for the treatment of community acquired pneumonia;
reviews
5.2.4.3 Rating down two levels for ● Quitting smoking to prevent adverse consequences of tobacco smoke exposure;
imprecision ● Use of bronchodilators in patients with known COPD
5.2.5 Publication bias
5.3. Factors that can increase the quality of the
evidence
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient
6.1.2 Weak recommendation
5.3.3. Effect of plausible residual confounding
5.4 Overall quality of evidence A weak recommendation is one for which the desirable effects probably outweigh the undesirable
6. Going from evidence to recommendations effects (weak recommendation for an intervention) or undesirable effects probably outweigh the
6.1 Recommendations and their strength desirable effects (weak recommendation against an intervention) but appreciable uncertainty exists.
6.1.1 Strong recommendation A weak recommendation implies that not all individuals will be best served by the recommended course
6.1.2 Weak recommendation of action. There is a need to consider more carefully than usual the individual patient’s circumstances,
6.1.3 Recommendations to use interventions preferences, and values. When there are weak recommendations caregivers need to allocate more time
only in research to shared decision making, making sure that they clearly and comprehensively explain the potential
6.1.4 No recommendation benefits and harms to a patient.
6.2 Factors determining direction and strength of
recommendations
6.2.1 Balance of desirable and undesirable Alternative names for weak recommendations
consequences Some have been concerned with the term “weak recommendation” experiencing an unintended negative
6.2.1.1 Estimates of the magnitude of the connotation with the word “weak”, often also confusing it with “weak” evidence. To avoid confusion,
desirable and undesirable effects weak recommendations can instead be described using the terms:
6.2.1.2 Best estimates of values and ● conditional (depending on patient values, resources available or setting)
preferences
6.3.2 Confidence in best estimates of ● discretionary (based on opinion of patient or practitioner)
magnitude of effects (quality of evidence) ● qualified (by an explanation regarding the issues which would lead to different decisions).
6.3.3 Confidence in values and preferences
6.3.4 Resource use (cost) If any variations are used it is essential that authors exercise consistency across all recommendation in a
guideline and across all guidelines they produce.
6.3.4.1 Differences between costs and
other outcomes
6.3.4.2 Perspective
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of
resource use (quality of the evidence about 6.1.3 Recommendations to use interventions only in research
cost)
6.3.4.5 Presentation of resource use Promising interventions (usually new ones) with thus far insufficient evidence of benefit to support their
6.3.4.6 Economic model use may be associated with appreciable harms or costs. Decision makers may worry about providing
6.3.4.7 Consideration of resource use in premature favorable recommendations for their use, encouraging the rapid diffusion of potentially
recommendations ineffective or harmful interventions, and preventing recruitment to research already under way. They
6.4 Presentation of recommendations may be equally reluctant to recommend against such interventions out of fear that they will inhibit further
6.4.1 Wording of recommandations investigation. By making recommendations for use of an intervention only in the context of research they
6.3.2 Symbolic representation may provide an important stimulus to efforts to answer important research questions, thus resolving
6.4.3 Providing transparent statements about uncertainty about optimal management.
assumed values and preferences Recommendations for using interventions only in research are appropriate when three conditions are
6.5 The EvidencetoDecision framework met:
7. The GRADE approach for diagnostic tests and
strategies 1. There is thus far insufficient evidence to support a decision for or against an intervention
7.1. Questions about diagnostic tests 2. Further research has large potential for reducing uncertainty about the effects of the
7.1.1. Establishing the purpose of a test intervention
7.1.2. Establishing the role of a test
3. Further research is thought to be of good value for the anticipated costs.
7.1.3. Clear clinical questions
7.2. Gold standard and reference test Recommendations for using interventions only in research should be accompanied by detailed
suggestions about the specific research questions that should be addressed, particularly which patient
1. Overview of the GRADE Approach important outcomes they should measure. The recommendation for research may be accompanied by an
1.1 Purpose and advantages of the GRADE explicit strong recommendation not to use the experimental intervention outside of the research context.
approach
1.2 Separation of confidence in effect estimates
from strength of recommendations
1.3 Special challenges in applying the the GRADE 6.1.4 No recommendation
approach
1.4 Modifications to the GRADE approach
2. Framing the health care question
2.1 Defining the patient population and intervention There are 3 reasons for which those making recommendations may be reluctant to make a
2.2 Dealing with multiple comparators recommendation for or against a particular management strategy, and also conclude that a
recommendation to use the intervention only in research is not appropriate.
2.3 Other considerations
2.4 Format of health care questions using the 1. The confidence in effect estimates is so low that the panels feel a recommendation is too
GRADE approach speculative (see the US Preventative Services Task Force discussion on the topic [Petitti 2009;
3. Selecting and rating the importance of outcomes PMID: 19189910].
3. Selecting and rating the importance of outcomes PMID: 19189910].
3.1 Steps for considering the relative importance of 2. Irrespective of the confidence in effect estimates, the tradeoffs are so closely balanced, and
outcomes the values and preferences and resource implications not known or too variable, that the panel
3.2 Influence of perspective has great difficulty deciding on the direction of a recommendation.
3.3 Using evidence in rating the importance of
3. Two management options have very different undesirable consequences, and individual
outcomes
patients’ reactions to these consequences are likely to be so different that it makes little sense to
3.4 Surrogate (substitute) outcomes think about typical values and preferences.
4. Summarizing the evidence
4.1 Evidence Tables The third reason requires an explanation. Consider adult patients with thalassemia major considering
4.2 GRADE Evidence Profile hematopoietic cell transplantation (possibility of cure but an early mortality risk of 33%) vs. continued
4.3 Summary of Findings table medical treatment with transfusion and iron chelation (continued morbidity and an uncertain prognosis).
5. Quality of evidence A guideline panel may consider that in such situations the only sensible recommendation is a discussion
5.1 Factors determining the quality of evidence between patient and physician to ascertain the patient’s preferences.
5.1.1 Study design Users of guidelines, however, may be frustrated with the lack of guidance when the guideline panel fails
5.2 Factors that can reduce the quality of the to make a recommendation. The USPSTF states: "Decision makers do not have the luxury of waiting for
evidence certain evidence. Even though evidence is insufficient, the clinician must still provide advice, patients
5.2.1 Study limitations (Risk of Bias) must make choices, and policy makers must establish policies" [Petitti 2009; PMID: 19189910].
5.2.2 Inconsistency of results Clinicians themselves will rarely explore the evidence as thoroughly as a guideline panel, nor will they
5.2.2.1 Deciding whether to use estimates devote as much thought to the tradeoffs, or the possible underlying values and preferences in the
from a subgroup analysis population. GRADE encourages panels to deal with their discomfort and to make recommendations even
5.2.3 Indirectness of evidence when confidence in effect estimate is low and/or desirable and undesirable consequences are closely
5.2.4 Imprecision balanced. Such recommendations will inevitably be weak, and may be accompanied by qualifications.
5.2.4.1 Imprecision in guidelines In the unusual circumstances in which panels may choose not to make a recommendation, they should
5.2.4.2 Imprecision in in systematic specify the reason for this decision (see above).
reviews
5.2.4.3 Rating down two levels for
imprecision
5.2.5 Publication bias
5.3. Factors that can increase the quality of the 6.2 Factors determining direction and strength of
evidence
5.3.1 Large magnitude of an effect
recommendations
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding Four key factors influence the direction and the strength of a recommendation (Table 6.2)
5.4 Overall quality of evidence
Table 6.2. Domains that contribute to the strength of a recommendation
6. Going from evidence to recommendations
6.1 Recommendations and their strength Domain Comment
6.1.1 Strong recommendation
Balance between desirable and undesirable The larger the differences between the desirable
6.1.2 Weak recommendation outcomes (tradeoffs) taking into account: and undesirable consequences, the more likely a
6.1.3 Recommendations to use interventions strong recommendation is warranted. The
only in research best estimates of the magnitude of effects on
smaller the net benefit and the lower certainty
6.1.4 No recommendation desirable and undesirable outcomes
for that benefit, the more likely a weak
6.2 Factors determining direction and strength of importance of outcomes (estimated typical values recommendation is warranted
recommendations and preferences)
6.2.1 Balance of desirable and undesirable
consequences Confidence in the magnitude of estimates of effect The higher the quality of evidence, the more
of the interventions on important outcomes (overall likely a strong recommendation is warranted
6.2.1.1 Estimates of the magnitude of the
quality of evidence for outcomes)
desirable and undesirable effects
6.2.1.2 Best estimates of values and Confidence in values and preferences and their The greater the variability in values and
preferences variability preferences, or uncertainty about typical values
6.3.2 Confidence in best estimates of and preferences, the more likely a weak
magnitude of effects (quality of evidence) recommendation is warranted
6.3.3 Confidence in values and preferences Resource use The higher the costs of an intervention (the more
6.3.4 Resource use (cost) resources consumed), the less likely a strong
6.3.4.1 Differences between costs and recommendation is warranted
other outcomes
6.3.4.2 Perspective
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of
resource use (quality of the evidence about
cost) 6.2.1 Balance of desirable and undesirable consequences
6.3.4.5 Presentation of resource use
6.3.4.6 Economic model Deciding about the balance between desirable and undesirable outcomes ("tradeoffs") one considers
6.3.4.7 Consideration of resource use in two domains:
recommendations
1. best estimates of the magnitude of desirable effects and the undesirable effects (summarized in
6.4 Presentation of recommendations
evidence profiles)
6.4.1 Wording of recommandations
6.3.2 Symbolic representation 2. importance of outcomes – typical values that patients or a population apply to those outcomes
6.4.3 Providing transparent statements about (“weight” of outcomes).
assumed values and preferences
6.5 The EvidencetoDecision framework
7. The GRADE approach for diagnostic tests and
strategies 6.2.1.1 Estimates of the magnitude of the desirable and undesirable effects
7.1. Questions about diagnostic tests
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test Large relative effects of an intervention consistently pointing in the same direction towards desirable
7.1.3. Clear clinical questions or towards undesirable effects are more likely to warrant a strong recommendation. Conversely, large
7.2. Gold standard and reference test relative effects of an intervention pointing in opposite directions large desirable effects accompanied
by large undesirable ones will lead to weak recommendations.
1. Overview of the GRADE Approach Large absolute effects are also more likely to lead to a strong recommendation, than small absolute
1.1 Purpose and advantages of the GRADE effects. Baseline risk (control event rate) can influence the balance of desirable and undesirable
approach outcomes. Large baseline risk differences will result in large differences in absolute effects of
1.2 Separation of confidence in effect estimates interventions. The strength of recommendations and its direction, therefore, will likely differ in high and
from strength of recommendations lowrisk groups.
1.3 Special challenges in applying the the GRADE
approach
1.4 Modifications to the GRADE approach Examples
2. Framing the health care question Large gradient between the desirable and undesirable effects (higher likelihood of a strong
2.1 Defining the patient population and intervention recommendation)
2.2 Dealing with multiple comparators
2.3 Other considerations 1. The very large gradient between the benefits of low dose aspirin on reductions in death and recurrent
myocardial infarction and the undesirable consequences of minimal side effects and costs make a strong
2.4 Format of health care questions using the
recommendation very likely.
GRADE approach
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of Small gradient between the desirable and undesirable effects (higher likelihood of a weak
outcomes recommendation)
3.2 Influence of perspective
1. Consider the choice of immunomodulating agents, namely cyclosporine or tacrolimus, in kidney
3.3 Using evidence in rating the importance of
1. Consider the choice of immunomodulating agents, namely cyclosporine or tacrolimus, in kidney
3.3 Using evidence in rating the importance of
transplant recipients. Tacrolimus results in better graft survival (a highly valued outcome), but at the
outcomes important cost of a higher incidence of diabetes (the longterm complications of which can be
3.4 Surrogate (substitute) outcomes devastating).
4. Summarizing the evidence
4.1 Evidence Tables 2. Patients with atrial fibrillation typically are more stroke averse than bleeding averse. If, however, the
4.2 GRADE Evidence Profile risk of stroke is sufficiently low, the tradeoff between stroke reduction and increase in bleeding risk
4.3 Summary of Findings table with anticoagulants is closely balanced.
5. Quality of evidence
5.1 Factors determining the quality of evidence
5.1.1 Study design
5.2 Factors that can reduce the quality of the
evidence 6.2.1.2 Best estimates of values and preferences
5.2.1 Study limitations (Risk of Bias)
5.2.2 Inconsistency of results
5.2.2.1 Deciding whether to use estimates Without considering the associated values and preferences, assessing large vs. small magnitude of
from a subgroup analysis effects may be misleading. Balancing the magnitude of desirable and undesirable outcomes requires
5.2.3 Indirectness of evidence considering weight (importance) of those outcomes that is determined by values and preferences.
5.2.4 Imprecision Ideally, to inform estimates of typical patient values and preferences, guideline panels will conduct or
5.2.4.1 Imprecision in guidelines identify systematic reviews of relevant studies of patient values and preferences. There is, however,
5.2.4.2 Imprecision in in systematic paucity of empirical examinations of patients’ values and preferences.
reviews
Well resourced guideline panels will usually complement such studies with consultation with individual
5.2.4.3 Rating down two levels for
patients and patients’ groups. The panel should discuss whose values these people represent, namely
imprecision representative patients, a defined subset of patients, or representatives of the general population.
5.2.5 Publication bias
5.3. Factors that can increase the quality of the Less wellresourced panels, without systematic reviews of values and preferences or consultation with
evidence patients and patient groups, must rely on unsystematic reviews of the available literature and their
5.3.1 Large magnitude of an effect experience of interactions with patients. How well such estimates correspond to true typical values and
5.3.2. Doseresponse gradient preferences is likely to be uncertain.
5.3.3. Effect of plausible residual confounding Whatever the source of estimates of typical values and preferences, explicit, transparent statements of
5.4 Overall quality of evidence the panel’s choices are imperative (see 6.3.3 Providing transparent statements about assumed values and
6. Going from evidence to recommendations preferences).
6.1 Recommendations and their strength
6.1.1 Strong recommendation
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions 6.3.2 Confidence in best estimates of magnitude of effects (quality of evidence)
only in research
6.1.4 No recommendation
6.2 Factors determining direction and strength of For all outcomes considered, the GRADE process requires a rating describing the quality of evidence.
Ultimately, guideline authors will form their recommendations based on their confidence in all effect
recommendations
estimates for each outcome considered critical to their recommendation and the quality of evidence.
6.2.1 Balance of desirable and undesirable Quality of evidence ratings are determined by the eight already discussed; the five criteria that result in
consequences rating down the quality of evidence (study limitations, inconsistency, indirectness, imprecision, and
6.2.1.1 Estimates of the magnitude of the publication bias result in rating down the quality of evidence whereas the remaining three criteria, lead
desirable and undesirable effects to an increase in evidence quality; large magnitude of effect, doseresponse gradient and when all
6.2.1.2 Best estimates of values and plausible biases or confounders increase our confidence in the estimated effect.
preferences
6.3.2 Confidence in best estimates of Typically, a strong recommendation is associated with high, or at least moderate, confidence in the
effect estimates for critical outcomes. If one has high confidence in effects on some critical outcomes
magnitude of effects (quality of evidence)
(typically benefits), but low confidence in effects on other outcomes considered critical (often longterm
6.3.3 Confidence in values and preferences harms), then a weak recommendation is likely warranted. Even when an apparently large gradient exists
6.3.4 Resource use (cost) in the balance of desirable vs. undesirable outcomes, panels will be appropriately reluctant to offer a
6.3.4.1 Differences between costs and strong recommendation if their confidence in effect estimates for some critical outcomes is low.
other outcomes
6.3.4.2 Perspective For some questions, direct evidence about the effects on some critical outcomes may be lacking (e.g.
6.3.4.3 Resource implications considered quality of life has not been measured in any study). In such instances, even if well measured
6.3.4.4 Confidence in the estimates of surrogates are available, confidence in estimates of effects on patientimportant outcomes is very likely
to be low.
resource use (quality of the evidence about
cost) Low confidence in effect estimates may, rarely, be tied to strong recommendations. In general, GRADE
6.3.4.5 Presentation of resource use discourages guideline panels from making strong recommendations when their confidence in
6.3.4.6 Economic model estimates of effect for critical outcomes is low or very low. GRADE has identified five paradigmatic
6.3.4.7 Consideration of resource use in situations in which strong recommendations may be warranted despite low or very low quality of
recommendations evidence (Table 6.3). These situations can be conceptualized as ones in which a panel would have a low
6.4 Presentation of recommendations level of regret if subsequent evidence showed that their recommendation was misguided.
6.4.1 Wording of recommandations
6.3.2 Symbolic representation Table 6.3. Paradigmatic situations in which a strong recommendation may be warranted despite
6.4.3 Providing transparent statements about low or very low confidence in effect estimates
assumed values and preferences
6.5 The EvidencetoDecision framework Condition Example
7. The GRADE approach for diagnostic tests and 1 When low quality evidence 1. Fresh frozen plasma or
strategies suggests benefit in a life vitamin K in a patient receiving
7.1. Questions about diagnostic tests threatening situation (evidence warfarin with elevated INR and
regarding harms can be low or an intracranial bleed. Only low
7.1.1. Establishing the purpose of a test high) quality evidence supports the
7.1.2. Establishing the role of a test benefits of limiting the extent of
7.1.3. Clear clinical questions the bleeding.
7.2. Gold standard and reference test 2. Amphotericin B vs.
itraconazole in life threatening
disseminated blastomycosis.
1. Overview of the GRADE Approach High quality evidence suggests
1.1 Purpose and advantages of the GRADE that amphotericin B is more toxic
approach than itraconazole, and low
1.2 Separation of confidence in effect estimates quality evidence suggests that it
from strength of recommendations reduces mortality in this context.
2 When low quality evidence Headtotoe CT/MRI screening
1.3 Special challenges in applying the the GRADE suggests benefit and high quality for cancer. Low quality evidence
approach evidence suggests harm or a of benefit of early detection but
1.4 Modifications to the GRADE approach very high cost high quality evidence of possible
2. Framing the health care question harm and/or high cost (strong
2.1 Defining the patient population and intervention recommendation against this
strategy)
2.2 Dealing with multiple comparators 3 When low quality evidence Helicobacter pylori eradication
2.3 Other considerations suggests equivalence of two in patients with early stage
2.4 Format of health care questions using the alternatives, but high quality gastric MALT lymphoma with
GRADE approach evidence of less harm for one of H. pylori positive. Low quality
3. Selecting and rating the importance of outcomes the competing alternatives evidence suggests that initial H.
pylori eradication results in
3.1 Steps for considering the relative importance of similar rates of complete
outcomes response in comparison with the
3.2 Influence of perspective alternatives of radiation therapy
3.3 Using evidence in rating the importance of or gastrectomy; high quality
outcomes evidence suggests less
harm/morbidity
3.4 Surrogate (substitute) outcomes 4 When high quality evidence Hypertension in women planning
4. Summarizing the evidence suggests equivalence of two conception and in pregnancy.
4.1 Evidence Tables alternatives and low quality Strong recommendations for
4.1 Evidence Tables alternatives and low quality Strong recommendations for
4.2 GRADE Evidence Profile evidence suggests harm in one labetalol and nifedipine and
4.3 Summary of Findings table alternative strong recommendations against
angiotensin converting enzyme
5. Quality of evidence (ACE) inhibitors and angiotensin
5.1 Factors determining the quality of evidence receptor blockers (ARB) all
5.1.1 Study design agents have high quality
5.2 Factors that can reduce the quality of the evidence of equivalent beneficial
outcomes, with low quality
evidence evidence for greater adverse
5.2.1 Study limitations (Risk of Bias) effects with ACE inhibitors and
5.2.2 Inconsistency of results ARBs
5.2.2.1 Deciding whether to use estimates 5 When high quality evidence Testosterone in males with or at
from a subgroup analysis suggests modest benefits and risk of prostate cancer. High
low/very low quality evidence quality evidence for moderate
5.2.3 Indirectness of evidence suggests possibility of benefits of testosterone treatment
5.2.4 Imprecision catastrophic harm in men with symptomatic
5.2.4.1 Imprecision in guidelines androgen deficiency to improve
5.2.4.2 Imprecision in in systematic bone mineral density and muscle
reviews strength. Low quality evidence
for harm in patients with or at
5.2.4.3 Rating down two levels for risk of prostate cancer
imprecision INR – international normalized ratio; CT – computed tomography; MRI – magnetic resonance imaging;
5.2.5 Publication bias MALT – mucosaassociated lymphoid tissue.
5.3. Factors that can increase the quality of the
evidence
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding
6.3.3 Confidence in values and preferences
5.4 Overall quality of evidence
6. Going from evidence to recommendations
6.1 Recommendations and their strength Uncertainty concerning values and preferences or their variability among patients may lower the
6.1.1 Strong recommendation strength of a recommendation.
6.1.2 Weak recommendation As noted above, systematic study of patients’ values and preferences are very limited. Thus, panels will
6.1.3 Recommendations to use interventions often be uncertain about typical values and preferences. The greater is the uncertainty, the more likely
only in research they will make a weak recommendation. Given the sparse systematic study of patients’ values and
6.1.4 No recommendation preferences, one could argue that large uncertainty always exists about the patients’ perspective. On the
6.2 Factors determining direction and strength of other hand, clinicians’ experience with patients may provide considerable additional insight. Indeed, on
recommendations occasion, panels will, on the basis of clinical experience, be confident regarding typical patient’s values
6.2.1 Balance of desirable and undesirable and preferences. Pregnant women’s strong aversion to even a small risk of important fetal abnormalities
consequences may be one such situation.
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects
Large variability in values and preferences may also make a weak recommendation more likely. In such
6.2.1.2 Best estimates of values and situations, it is less likely that a single recommendation would apply uniformly across all patients, and
preferences the right course of action is likely to differ between patients. Again, systematic research about
6.3.2 Confidence in best estimates of variability in values and preferences is sparse. On the other hand, clinical experience may leave a panel
magnitude of effects (quality of evidence) confident that values and preferences differ widely among patients.
6.3.3 Confidence in values and preferences
6.3.4 Resource use (cost)
6.3.4.1 Differences between costs and Example
other outcomes 1. A hopeful patient may place more emphasis on a small chance of benefit, whereas a pessimistic, risk
6.3.4.2 Perspective averse patient may place more emphasis on avoiding the risks associated with a potentially beneficial
6.3.4.3 Resource implications considered therapy. Some patients may have a belief that even if the risk of an adverse event is low, they will be the
6.3.4.4 Confidence in the estimates of person who will suffer such an adverse effect. For instance, in patients with idiopathic pulmonary
resource use (quality of the evidence about fibrosis, evidence for the benefit of steroids warrants only low confidence, whereas we can be very
cost) confident of a wide range of adverse effects associated with steroids. The hopeful patient with
6.3.4.5 Presentation of resource use pulmonary fibrosis may be enthusiastic about use of steroids, whereas the riskaverse patient is likely to
6.3.4.6 Economic model decline.
6.3.4.7 Consideration of resource use in
recommendations
6.4 Presentation of recommendations 2. Thromboprophylaxis reduces the incidence of venous thromboembolism in immobile, hospitalized
6.4.1 Wording of recommandations severely ill medical patients. Careful thromboprophylaxis has minimal side effects and relatively low
6.3.2 Symbolic representation cost while being very effective at preventing deep venous thrombosis and its sequelae. Peoples’ values
6.4.3 Providing transparent statements about and preferences are such that virtually all patients admitted to a hospital would, if they understood the
choice they were making, opt to receive some form of thromboprophylaxis. Those making
assumed values and preferences
recommendations can thus offer a strong recommendation for thromboprophylaxis for patients in this
6.5 The EvidencetoDecision framework setting.
7. The GRADE approach for diagnostic tests and
strategies
7.1. Questions about diagnostic tests 3. A systematic review and metaanalysis describes a relative risk reduction (RRR) of approximately
7.1.1. Establishing the purpose of a test 80% in recurrent DVT for prophylaxis beyond 3 months up to one year. This large effect supports a
7.1.2. Establishing the role of a test strong recommendation for warfarin. Furthermore, the relatively narrow 95% confidence interval
7.1.3. Clear clinical questions (approximately 74 to 88%) suggests that warfarin provides a RRR of at least 74%, and further supports a
7.2. Gold standard and reference test strong recommendation. At the same time, warfarin is associated with an inevitable burden of keeping
dietary intake of vitamin K relatively constant, monitoring the intensity of anticoagulation with blood
1. Overview of the GRADE Approach tests, and living with the increased risk of both minor and major bleeding. It is likely, however, that most
1.1 Purpose and advantages of the GRADE patients would prefer avoiding another DVT and accept the risk of a bleeding episode. As a result,
approach almost all patients with high risk of recurrent DVT would choose taking warfarin for 3 to 12 months,
1.2 Separation of confidence in effect estimates suggesting the appropriateness of a strong recommendation. Thereafter, there may be an appreciable
from strength of recommendations number of patients who would reject lifelong anticoagulation.
1.3 Special challenges in applying the the GRADE
approach
1.4 Modifications to the GRADE approach
2. Framing the health care question 6.3.4 Resource use (cost)
2.1 Defining the patient population and intervention
2.2 Dealing with multiple comparators Panels may or may not consider resource use in their judgments about the direction and strength of
2.3 Other considerations recommendations. Reasons for not considering resource use include a lack of reliable data, the
2.4 Format of health care questions using the intervention is not useful and the effort of calculating resource use can be spared, the desirable effects
GRADE approach so greatly outweigh any undesirable effects that resource considerations would not alter the final
3. Selecting and rating the importance of outcomes judgment, or they have elected (or been instructed) to leave resource considerations up to other decision
3.1 Steps for considering the relative importance of makers. Panels should be explicit about the decision they made not to consider resource utilization and
outcomes the reason for their decision.
3.2 Influence of perspective If they elect to include resource utilization when making a recommendation, but have not included
3.3 Using evidence in rating the importance of resource use as a consequence when preparing an evidence profile, they should be explicit about what
outcomes types of resource use they considered when making the recommendation and whatever logic or evidence
3.4 Surrogate (substitute) outcomes was used in their judgments.
4. Summarizing the evidence
4.1 Evidence Tables
4.2 GRADE Evidence Profile Cost may be considered just another potentially important outcome – like mortality, morbidity, and
4.3 Summary of Findings table quality of life – associated with alternative ways of managing patient problems. In addition to these
5. Quality of evidence clinical outcomes, however, an intervention may increase costs or decrease costs. The GRADE
5.1 Factors determining the quality of evidence approach recommends that important or critical resource use be considered alongside other relevant
5.1 Factors determining the quality of evidence approach recommends that important or critical resource use be considered alongside other relevant
outcomes in evidence profiles and summary of findings tables. It is important to use natural units when
5.1.1 Study design
presenting resource use data as these can be applied in any setting.
5.2 Factors that can reduce the quality of the
evidence Special considerations when incorporating resources use (cost) in recommendations:
5.2.1 Study limitations (Risk of Bias) ● What are the differences between costs and other outcomes?
5.2.2 Inconsistency of results
5.2.2.1 Deciding whether to use estimates ● Which perspective to take?
from a subgroup analysis ● Which resource implications to include?
5.2.3 Indirectness of evidence
● How to make judgments about the quality of the evidence?
5.2.4 Imprecision
5.2.4.1 Imprecision in guidelines ● How to present these implications?
5.2.4.2 Imprecision in in systematic ● What is potential usefulness of a formal economic model?
reviews
5.2.4.3 Rating down two levels for ● How to consider resource use in formulating recommendations?
imprecision
5.2.5 Publication bias
5.3. Factors that can increase the quality of the 6.3.4.1 Differences between costs and other outcomes
evidence
5.3.1 Large magnitude of an effect There are several differences between costs and other outcomes:
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding 1. With costs the issue of who pays and who gains is most prominent.
5.4 Overall quality of evidence 2. Attitudes about the extent to which costs should influence the decision differ depending on
6. Going from evidence to recommendations who bears the cost.
6.1 Recommendations and their strength
6.1.1 Strong recommendation 3. Costs tend to vary widely across jurisdictions and over time.
6.1.2 Weak recommendation 4. People have different perspectives on the envelope in which they are considering opportunity
6.1.3 Recommendations to use interventions costs.
only in research 5. Resource allocation is a far more political issue than consideration of other outcomes.
6.1.4 No recommendation
6.2 Factors determining direction and strength of 1. With costs the issue of who pays and who gains is most prominent.
recommendations For most outcomes other than costs, it is clear that the patient and, secondarily, the patient’s family gains
6.2.1 Balance of desirable and undesirable the advantages, and has to live with the disadvantages (this is not true of all outcomes – with
consequences vaccinations the entire community benefits from the herd effect, or widespread use of antibiotics may
6.2.1.1 Estimates of the magnitude of the have downstream adverse consequences of drug resistance). Health care costs are often borne by the
desirable and undesirable effects society as a whole. Even within a society, who bears the cost may differ depending on the patient’s age
6.2.1.2 Best estimates of values and or situation.
preferences 2. Attitudes about the extent to which costs should influence the decision differ depending on who
6.3.2 Confidence in best estimates of bears the cost.
magnitude of effects (quality of evidence)
6.3.3 Confidence in values and preferences If costs are borne by the government, or a third party payer, some would argue that the physician’s
6.3.4 Resource use (cost) responsibility to the patient means that costs should not influence the decision. On the other hand, a
clinicians’ responsibility when caring for a patient is discharged in a broader context: resources that are
6.3.4.1 Differences between costs and
used for an intervention cannot be used for something else and can affect the ability of the health system
other outcomes to best meet the needs of those it serves.
6.3.4.2 Perspective
6.3.4.3 Resource implications considered 3. Costs tend to vary widely across jurisdictions or even within jurisdictions, and over time.
6.3.4.4 Confidence in the estimates of Costs of drugs are largely unrelated to the costs of production of those drugs, and more to marketing
resource use (quality of the evidence about decisions and national policies. Hospitals or health maintenance organizations may, for instance,
cost) negotiate special arrangements with pharmaceutical companies for prices substantially lower than are
6.3.4.5 Presentation of resource use available to patients or other providers. Even when resource use remains the same, the resource
6.3.4.6 Economic model implications may vary widely across jurisdictions. Costs can also vary widely over time (e.g. when a
6.3.4.7 Consideration of resource use in drug comes off patent or a new, cheaper technology becomes available). The large variability in costs
recommendations over time and jurisdictions requires that guideline panels formulate health care questions as specific as
6.4 Presentation of recommendations possible when bringing cost into the equation. The choice of comparator can be a particular problem in
6.4.1 Wording of recommandations economic analyses. If the choice of the comparator is inappropriate (for instance, no treatment rather
6.3.2 Symbolic representation than an alternative though less effective intervention) conclusions may be misleading. Even when
6.4.3 Providing transparent statements about resource use remains the same, the resource implications may vary widely across jurisdictions. A year’s
assumed values and preferences supply of a very expensive drug may pay a nurse’s salary in the United States, six nurses’ salaries in
Poland, and 30 nurses’ salaries in China. Thus, what one can buy with the resources saved if one
6.5 The EvidencetoDecision framework
foregoes purchase of the drug (the “opportunity cost”) – and the health benefits achieved with those
7. The GRADE approach for diagnostic tests and expenditures will differ to a large extent.
strategies
7.1. Questions about diagnostic tests 4. People have different perspectives on the envelope in which they are considering opportunity
7.1.1. Establishing the purpose of a test costs.
7.1.2. Establishing the role of a test A hospital pharmacy with a fixed budget considering purchase of an expensive new drug will have a
7.1.3. Clear clinical questions clear idea of what that purchase will mean in terms of other medications the pharmacy cannot afford.
7.2. Gold standard and reference test People often assume the envelope is public health spending – funding a new drug or program will
constrain resources for other public health expenditures. However, one may not be sure that refraining
1. Overview of the GRADE Approach from that purchase really means that equivalent resources will be available for the health care system.
1.1 Purpose and advantages of the GRADE Further, one may ask if the public health care is spending the correct envelope.
approach 5. Resource allocation is a far more political issue than consideration of other outcomes.
1.2 Separation of confidence in effect estimates
from strength of recommendations Whether the guideline panel does or does not explicitly consider resource allocation issues, those politics
may bear on a guideline panel’s function through conflict of interest.
1.3 Special challenges in applying the the GRADE
approach
1.4 Modifications to the GRADE approach Despite these differences, approaches to cost (resource use) are similar to other outcomes:
2. Framing the health care question
2.1 Defining the patient population and intervention ● guideline panels need to consider only important resource implications
2.2 Dealing with multiple comparators ● decision makers require an estimate of the difference between treatment and control
2.3 Other considerations
● guideline panels must make explicit judgments about the quality of evidence regarding
2.4 Format of health care questions using the
incremental resource use.
GRADE approach
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of
outcomes
3.2 Influence of perspective 6.3.4.2 Perspective
3.3 Using evidence in rating the importance of
outcomes GRADE suggests that a broad perspective is desirable.
3.4 Surrogate (substitute) outcomes
4. Summarizing the evidence A recommendation could be intended for a very narrow audience, such as a single hospital pharmacy, an
4.1 Evidence Tables individual hospital or a health maintenance organization. Alternatively it could be intended for a health
4.2 GRADE Evidence Profile region, a country or an international audience.
4.3 Summary of Findings table Regardless of how narrow or broad the intended audience, guideline groups that choose to incorporate
5. Quality of evidence resource implications must be explicit about the perspective they are taking.
5.1 Factors determining the quality of evidence
Alternatively a guideline may choose to take a societal perspective, and include all important resource
5.1.1 Study design implications, regardless of who bears the costs.
5.2 Factors that can reduce the quality of the
evidence In a publicly funded health system the patient perspective would consider only resource implications that
directly affect individual patients (e.g. out of pocket costs) and would ignore most of the costs generated
5.2.1 Study limitations (Risk of Bias) directly affect individual patients (e.g. out of pocket costs) and would ignore most of the costs generated
5.2.2 Inconsistency of results (e.g. costs borne by the government). In European health care systems in which, for the most part,
5.2.2.1 Deciding whether to use estimates governments bear the cost of health care, expenses borne directly by patients will be minimal. A
from a subgroup analysis pharmacy perspective would ignore downstream cost savings resulting for adverse events (e.g. stroke or
myocardial infarction) prevented by a drug. A hospital perspective would ignore outpatient costs either
5.2.3 Indirectness of evidence
incurred, or prevented. In the private sector, where disenrollment and loss of insurance can shift the
5.2.4 Imprecision burden of costs from one system to another, estimates of resource use should include the downstream
5.2.4.1 Imprecision in guidelines costs of all treated patients, not just those who remain in a particular health plan.
5.2.4.2 Imprecision in in systematic
reviews An even broader perspective, that of society, would include indirect costs or savings (e.g. lost wages).
5.2.4.3 Rating down two levels for These are difficult to estimate and controversial because they assume that lost productivity will not be
imprecision replaced by an individual who otherwise would be unemployed or underemployed, and implicitly place
5.2.5 Publication bias lower value on individuals not working (e.g. the retired). Taking a health systems perspective has
another advantage. A comprehensive display of the resource use associated with alternative
5.3. Factors that can increase the quality of the
management strategies allows an individual or group – a patient, a pharmacy, or a hospital – to examine
evidence the relative merits of the alternatives from their particular perspective.
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient Clinicians seeing patients who are uncovered by either public or private insurance may need to help
5.3.3. Effect of plausible residual confounding these individuals to make decisions taking into account their out of pocket costs. This is particularly true
5.4 Overall quality of evidence when clinical advantages and disadvantages are closely balanced, and there are substantial out of pocket
6. Going from evidence to recommendations costs. In these circumstances, if a guideline panel has used the GRADE approach and made evidence
6.1 Recommendations and their strength profiles available to the guideline users, clinicians can review evidence summaries and ensure that the
patients’ decision to accept the recommended management strategy is consistent with their values and
6.1.1 Strong recommendation
preferences – either though communicating the information directly to the patient, or by finding out what
6.1.2 Weak recommendation the patients’ situation and values and preferences are.
6.1.3 Recommendations to use interventions
only in research
6.1.4 No recommendation
6.3.4.3 Resource implications considered
6.2 Factors determining direction and strength of
recommendations
6.2.1 Balance of desirable and undesirable Evidence profiles and summary of findings tables should always present resource use, not just monetary
consequences values as monetary values for the same resource will vary depending on setting.
6.2.1.1 Estimates of the magnitude of the We suggest that guideline developers document best estimates of resource use, not best estimate of
desirable and undesirable effects costs. Costs are a function of resources expended and the cost per unit of resource. Given the wide
6.2.1.2 Best estimates of values and variability in costs per unit, reporting only total costs across broad categories of resource expenditure
preferences leaves users without the information required to judge whether estimates of unit costs apply to their
6.3.2 Confidence in best estimates of setting. It is therefore recommended that natural units be used to estimate resource use. For example,
magnitude of effects (quality of evidence) required number of days stayed in hospital, the cost per night will vary depending on the setting.
6.3.3 Confidence in values and preferences
Users of guidelines will be best informed if the guideline developers specify resources consumed by
6.3.4 Resource use (cost)
alternate management strategies, because they can:
6.3.4.1 Differences between costs and
other outcomes ● judge whether the resource use reflects practice patterns in their setting
6.3.4.2 Perspective ● focus on the items of most relevance to them
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of ● ascertain whether the unit costs apply in their setting.
resource use (quality of the evidence about Unless resource use is specified, users in settings other than that on which the analysts focus cannot
cost) estimate the associated incremental costs of the intervention.
6.3.4.5 Presentation of resource use
6.3.4.6 Economic model
6.3.4.7 Consideration of resource use in 6.3.4.4 Confidence in the estimates of resource use (quality of the evidence about cost)
recommendations
6.4 Presentation of recommendations
6.4.1 Wording of recommandations Evidence of resource use may come from different sources than evidence of health benefits. This may
6.3.2 Symbolic representation be the case both because trials of interventions do not fully report resource use, because the trial
6.4.3 Providing transparent statements about situation may not fully reflect the circumstances (thus the resource use) that we would expect in clinical
practice, because the relevant resource use may extend beyond the duration of trial, and because
assumed values and preferences
resource use may vary substantially across settings.
6.5 The EvidencetoDecision framework
7. The GRADE approach for diagnostic tests and For resource use that is reported in the context of trials, criteria for quality assessment are identical to
strategies that of other outcomes. Just as for other outcomes of a trial, the quality of evidence may differ across
7.1. Questions about diagnostic tests different resources. For example, drug use may be relatively easy to estimate, whereas use of health
7.1.1. Establishing the purpose of a test professionals’ time may be more difficult, and the estimate of drug use may therefore be of higher
7.1.2. Establishing the role of a test quality.
7.1.3. Clear clinical questions
7.2. Gold standard and reference test
6.3.4.5 Presentation of resource use
1. Overview of the GRADE Approach
1.1 Purpose and advantages of the GRADE A balance sheet (e.g. evidence profile) should inform judgments about whether the net benefits are
approach worth the incremental costs. Balance sheets efficiently present the raw information required to make
1.2 Separation of confidence in effect estimates informed explicit judgments concerning resource use in guideline recommendations. However, when
from strength of recommendations complex tradeoff decisions involving several outcomes need to be made judgments may remain implicit
1.3 Special challenges in applying the the GRADE or qualitatively described.
approach Pooling resource estimates from different studies is seldom as it can be quite controversial and should be
1.4 Modifications to the GRADE approach carefully considered. However, authors can consider presenting pooled estimates of resource use when
2. Framing the health care question they are confident that the outcome in question has a common meaning (i.e. number of nights stayed in
2.1 Defining the patient population and intervention hospital) across the studies involved in analysis. Even in this case, it is recommended that authors adjust
2.2 Dealing with multiple comparators for geographical and temporal differences in cost.
2.3 Other considerations
2.4 Format of health care questions using the
GRADE approach 6.3.4.6 Economic model
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of
Formal economic modeling may – or may not be helpful.
outcomes
3.2 Influence of perspective Formal economic modeling results in cost per unit benefit achieved: cost per natural unit, such as cost
3.3 Using evidence in rating the importance of per stroke prevented (costeffectiveness analysis) cost per qualityadjusted life year gained (costutility
outcomes analysis) cost and benefits valued in monetary values (costbenefit analysis). These summaries can be
3.4 Surrogate (substitute) outcomes helpful for informing judgments. Unfortunately, many published costeffectiveness analyses have a high
4. Summarizing the evidence probability of being flawed or biased, and are settingspecific. When estimates of harms, benefits and
4.1 Evidence Tables resources used are based on low quality evidence, transparency of the economic model will be reduced
and the model may be misleading.
4.2 GRADE Evidence Profile
4.3 Summary of Findings table Should guideline panels consider developing their own formal economic model?
5. Quality of evidence Creating an economic model may be advisable if:
5.1 Factors determining the quality of evidence
5.1.1 Study design ● guideline groups have the necessary expertise and resources
5.2 Factors that can reduce the quality of the ● difference in resources consumed by the alternative management strategies is large and
evidence therefore there is substantial uncertainty about whether the net benefits of an intervention are
5.2.1 Study limitations (Risk of Bias) worth the incremental costs
5.2.2 Inconsistency of results
● quality of available evidence regarding resource consumption is high and it is likely that a full
5.2.2.1 Deciding whether to use estimates economic model would help inform a decision
from a subgroup analysis
from a subgroup analysis
5.2.3 Indirectness of evidence ● implementing an intervention requires large capital investments, such as building new
5.2.4 Imprecision facilities or purchasing new, expensive equipment.
5.2.4.1 Imprecision in guidelines Modeling – while necessary for taking into account complexities and uncertainties in calculating cost per
5.2.4.2 Imprecision in in systematic unit benefit – reduces transparency. Any model is only as good as the data on which it is based. When
reviews estimates of benefits, harms, or resources used come from low quality evidence, results of any economic
5.2.4.3 Rating down two levels for modeling will be highly speculative.
imprecision Although criteria to assess the credence to give to results from statistical models of costeffectiveness or
5.2.5 Publication bias costutility are available, these models generally include a large number of assumptions and varying
5.3. Factors that can increase the quality of the quality evidence for the estimates that are included in the model. For these reasons, GRADE working
evidence group recommends not including costeffectiveness or costutility models in evidence profiles. These
5.3.1 Large magnitude of an effect models may, however, inform judgments of a guideline panel, or those of governments, or third part
5.3.2. Doseresponse gradient payers considering whether to include an intervention among their programs’ benefits.
5.3.3. Effect of plausible residual confounding
5.4 Overall quality of evidence
6. Going from evidence to recommendations 6.3.4.7 Consideration of resource use in recommendations
6.1 Recommendations and their strength
6.1.1 Strong recommendation
6.1.2 Weak recommendation Guideline panel may choose to explicitly consider or not to consider resource use in recommendations.
6.1.3 Recommendations to use interventions A guideline panel may legitimately choose to leave considerations of resource use aside, and offer a
only in research recommendation solely on the basis of other advantages and disadvantages of the alternatives being
6.1.4 No recommendation considered. Resource allocation must then be considered at the level of the ultimate decisionmaker – be
6.2 Factors determining direction and strength of it the patient and healthcare professional, an organization (e.g. hospital pharmacy or a health
recommendations maintenance organization), a third party payer, or a government. Guideline panels should be explicit
6.2.1 Balance of desirable and undesirable about the decision to consider or not to consider resource utilization.
consequences If guideline panel considers resource use it should, prior to bringing cost into the equation, first decide on
6.2.1.1 Estimates of the magnitude of the the quality of evidence regarding other outcomes, and weigh up the advantages and disadvantages.
desirable and undesirable effects Decisions regarding the importance of resource use issues will flow from this first step. For example,
6.2.1.2 Best estimates of values and resource implications may be irrelevant if evidence of net health benefits is lacking. If advantages of an
preferences intervention far outweigh disadvantages, resource use is less likely to be important. Resource use usually
6.3.2 Confidence in best estimates of becomes important when advantages and disadvantages are closely balanced.
magnitude of effects (quality of evidence) GRADE approach suggests that panels considering resource use should offer only a single
6.3.3 Confidence in values and preferences recommendation taking resource use into account. Panels should refrain from issuing two
6.3.4 Resource use (cost) recommendations – one not taking resource use into account and a second doing so. Although this would
6.3.4.1 Differences between costs and have the advantage of explicitness on which GRADE places a very high value, GRADE working group
other outcomes is concerned that those with interests in dissemination of an intervention would effectively use only the
6.3.4.2 Perspective recommendation ignoring resource implications as a weapon in their battle for funds (public funds, in
6.3.4.3 Resource implications considered particular).
6.3.4.4 Confidence in the estimates of
resource use (quality of the evidence about
cost)
6.3.4.5 Presentation of resource use
6.3.4.6 Economic model 6.4 Presentation of recommendations
6.3.4.7 Consideration of resource use in
recommendations
6.4 Presentation of recommendations 6.4.1 Wording of recommandations
6.4.1 Wording of recommandations
6.3.2 Symbolic representation Wording of a recommendation should offer clinicians as many indicators as possible for understanding
6.4.3 Providing transparent statements about and interpretation.
assumed values and preferences
Recommendations should always answer the initial clinical question. Therefore, they should
6.5 The EvidencetoDecision framework
specify patients or population (characterized by the disease and other identifying factors) for whom the
7. The GRADE approach for diagnostic tests and recommendation is intended and a recommended intervention as specifically and detailed as needed.
strategies Unless it is obvious, they should also specify the comparator. Sometimes, the recommendation may
7.1. Questions about diagnostic tests include a reference to the setting (e.g. primary or tertiary care, high or lowincome countries, etc.).
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test In general, it seems preferable to present recommendations in favor of a particular management
7.1.3. Clear clinical questions approach rather than against an alternative. For instance, in considering the addition of aspirin to
7.2. Gold standard and reference test clopidogrel in patients who have had a stroke, it would be preferable to state: "In patients who have had
a stroke, we suggest clopidogrel alone vs. adding aspirin to clopidogrel" rather than: "In patients who
1. Overview of the GRADE Approach have had a stroke and are using clopidogrel, we suggest not adding aspirin". However, when a useless or
1.1 Purpose and advantages of the GRADE harmful therapy is in wide use, recommendations against a management approach are appropriate. For
approach instance, "In patients undergoing cardiac surgery who were not previously receiving beta blockers, we
1.2 Separation of confidence in effect estimates suggest not initiating perioperative beta blocker therapy".
from strength of recommendations Recommendations in the passive voice may lack clarity, therefore, GRADE suggest that guideline
1.3 Special challenges in applying the the GRADE developers present recommendations in the active voice.
approach For strong recommendations, the GRADE working group has suggested adopting terminology, such as
1.4 Modifications to the GRADE approach "we recommend..." or "clinicians should...", “clinicians should not…” or “Do…”, “Don’t…”
2. Framing the health care question
2.1 Defining the patient population and intervention For weak recommendations, the GRADE working group has suggested less definitive wording, such as
2.2 Dealing with multiple comparators "we suggest..." or "clinicians might..." or “We conditionally recommend…” or “We make a qualified
2.3 Other considerations recommendation that…”.
2.4 Format of health care questions using the Wording strong and weak recommendations is particularly important when guidelines are developed by
GRADE approach international organizations and/or are intended for patients and clinicians in different regions, cultures,
3. Selecting and rating the importance of outcomes traditions, and usage of language. It is also crucial to explicitly and precisely consider wording when
3.1 Steps for considering the relative importance of translating recommendations into different languages. Whatever terminology guideline panels choose to
outcomes use to communicate the dichotomous nature of a recommendation, it is essential that they inform their
3.2 Influence of perspective users what the terms imply by providing the explanations as in Table 5.9.
3.3 Using evidence in rating the importance of Misinterpretation is possible however strength of recommendations is expressed. We suggest guideline
outcomes developers consider using both words and symbols (which may be less confusing than numbers or letters)
3.4 Surrogate (substitute) outcomes to express strength of recommendations.
4. Summarizing the evidence
4.1 Evidence Tables
4.2 GRADE Evidence Profile
4.3 Summary of Findings table 6.3.2 Symbolic representation
5. Quality of evidence
5.1 Factors determining the quality of evidence
5.1.1 Study design A variety of presentations of quality of evidence and strength of recommendations may be appropriate.
5.2 Factors that can reduce the quality of the Most guideline panels have used letters and numbers to summarize their recommendations. Because of
highly variable use of numbers and letters by different organizations this presentation may be confusing.
evidence
Symbolic representations of the quality of evidence and strength of recommendations are appealing in
5.2.1 Study limitations (Risk of Bias) that they are not burdened with this historical confusion. On the other hand, clinicians seem to be very
5.2.2 Inconsistency of results comfortable with numbers and letters, which are particularly suitable for verbal communication, so there
5.2.2.1 Deciding whether to use estimates may be good reasons why organizations have chosen to use them.
from a subgroup analysis
5.2.3 Indirectness of evidence The GRADE working group has decided to offer preferred symbolic representations, but users of
5.2.4 Imprecision guidelines based on the GRADE approach will often see numbers and letters being used to express the
5.2.4.1 Imprecision in guidelines quality of evidence and strength of a recommendation.
5.2.4.2 Imprecision in in systematic Table 6.4. Suggested representations of quality of evidence and strength of recommendations
5.2.4.2 Imprecision in in systematic Table 6.4. Suggested representations of quality of evidence and strength of recommendations
reviews
Quality of Evidence Symbol Letter (varies)
5.2.4.3 Rating down two levels for
imprecision High ⨁⨁⨁⨁ A
5.2.5 Publication bias
Moderate ⨁⨁⨁◯ B
5.3. Factors that can increase the quality of the
evidence Low ⨁⨁◯◯ C
5.3.1 Large magnitude of an effect Very low ⨁◯◯◯ D
5.3.2. Doseresponse gradient
5.3.3. Effect of plausible residual confounding Strength of Recommendation Symbol Number
5.4 Overall quality of evidence Strong for an intervention ↑↑ 1
6. Going from evidence to recommendations
6.1 Recommendations and their strength Weak for an intervention ↑? 2
6.1.1 Strong recommendation Weak against an intervention ↓? 2
6.1.2 Weak recommendation
6.1.3 Recommendations to use interventions Strong against an intervention ↓↓ 1
only in research
6.1.4 No recommendation
6.2 Factors determining direction and strength of
recommendations 6.4.3 Providing transparent statements about assumed values and preferences
6.2.1 Balance of desirable and undesirable
consequences Ideally, recommendations should be accompanied by a statement presenting assumptions about the
6.2.1.1 Estimates of the magnitude of the values and preferences that underlie recommendations. For instance, a guideline addressing issues of
desirable and undesirable effects thrombosis prevention and treatment in pregnancy noted: "Our recommendations reflect a belief that
6.2.1.2 Best estimates of values and most women will place a low value on avoiding the pain, cost, and inconvenience of heparin therapy to
preferences avoid the small risk of even a minor abnormality in their child associated with warfarin prophylaxis".
6.3.2 Confidence in best estimates of
In addition to, or in place of, making such general statements, guideline panels may provide statements
magnitude of effects (quality of evidence)
associated with individual recommendations, especially those that are particularly sensitive to values
6.3.3 Confidence in values and preferences and preferences. In such cases authors should place statements about underlying values and preferences
6.3.4 Resource use (cost) with the recommendation statement rather than in the accompanying text. This prominent positioning of
6.3.4.1 Differences between costs and the statements will make it less likely that users of guidelines miss the importance of the values and
other outcomes preference judgments.
6.3.4.2 Perspective
6.3.4.3 Resource implications considered Consider, for instance, two groups that were part of a broader guideline effort made apparently
6.3.4.4 Confidence in the estimates of contradictory recommendations regarding aspirin vs. clopidogrel in patients with atherosclerotic vascular
disease, despite using the same underlying evidence from a trial that enrolled both patients with
resource use (quality of the evidence about
threatened stroke and those with peripheral vascular disease. One group focusing on stroke prevention
cost) recommended clopidogrel over aspirin stating: "This recommendation places a relatively high value on a
6.3.4.5 Presentation of resource use small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug
6.3.4.6 Economic model expenditures". The other group focusing on the peripheral vascular disease recommended aspirin over
6.3.4.7 Consideration of resource use in clopidogrel, stating: "This recommendation places a relatively high value on avoiding large resource
recommendations expenditures to achieve small reductions in vascular events". These recommendations suggest opposite
6.4 Presentation of recommendations courses of action. Both are appropriate given the stated values and preferences, which were made
6.4.1 Wording of recommandations explicit in qualifying statements accompanying each recommendation.
6.3.2 Symbolic representation
Another way to frame values and preferences statements that panels may want to consider is in terms of
6.4.3 Providing transparent statements about patients who do not share the values and preferences underlying the recommendation. For instance, one
assumed values and preferences may say: "For most healthy patients with achalasia undergoing an invasive procedure, we suggest
6.5 The EvidencetoDecision framework minimally invasive surgical myotomy rather than pneumatic dilatation. Patients who prefer to avoid
7. The GRADE approach for diagnostic tests and surgery and the high rates of gastroesophageal reflux disease seen after surgery, and who are willing to
strategies accept a higher initial failure rate and longterm recurrence rate, can reasonably choose pneumatic
7.1. Questions about diagnostic tests dilatation".
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test
7.1.3. Clear clinical questions
7.2. Gold standard and reference test
6.5 The EvidencetoDecision framework
1. Overview of the GRADE Approach
1.1 Purpose and advantages of the GRADE
Ultimately, guideline panels must integrate these determinants of direction and strength to make a strong
approach or weak recommendation for or against an intervention. Table 6.2 presents the generic Evidenceto
1.2 Separation of confidence in effect estimates Decision (EtD) table that groups making recommendations may use to facilitate decision making, record
from strength of recommendations judgements, and document the process of going from evidence to the decision. Table 6.3 presents an
1.3 Special challenges in applying the the GRADE example of EtD framework used in development of recommendations about the use of ASA in patients
approach with atrial fibrillation (PDF version).
1.4 Modifications to the GRADE approach
2. Framing the health care question
2.1 Defining the patient population and intervention Table 6.5. The EvidencetoDecision framework
2.2 Dealing with multiple comparators Criteria Judgements Research evidence Additional
considerations
2.3 Other considerations
2.4 Format of health care questions using the ○ No
GRADE approach Is there a
○ Probably no
○ Uncertain
3. Selecting and rating the importance of outcomes Problem problem
priority? ○ Probably yes
○ Yes
3.1 Steps for considering the relative importance of ○ Varies
outcomes
3.2 Influence of perspective The relative importance or values of the main outcomes of
interest:
3.3 Using evidence in rating the importance of
outcomes Certainty
Outcome Relative of the
3.4 Surrogate (substitute) outcomes importance evidence
(GRADE)
4. Summarizing the evidence
4.1 Evidence Tables Outcome 1 CRITICAL ⨁⨁⨁⨁
HIGH
4.2 GRADE Evidence Profile
4.3 Summary of Findings table ⨁⨁⨁◯
Outcome 2 CRITICAL
5. Quality of evidence MODERATE
1. Overview of the GRADE Approach For each criterion there are four or five response options, from those that favour a recommendation
against the option on the left to ones that favour a recommendation for the option on the right. In addition,
1.1 Purpose and advantages of the GRADE most of the options include varies as a response option for situations when there is important variation
approach across different settings for which the guidelines are intended and those differences are substantial
1.2 Separation of confidence in effect estimates enough that they might lead to different recommendations for different settings.
from strength of recommendations
Questions to consider for each criterion and their relationship to a recommendation
1.3 Special challenges in applying the the GRADE For each criterion we suggest one or more detailed questions to consider when making a judgement and
approach explain the relationship between the criterion and the recommendation.
1.4 Modifications to the GRADE approach
2. Framing the health care question Criteria Questions Explanations
2.1 Defining the patient population and intervention Is the problem a Are the consequences of the problem serious The more serious a problem is, the more likely it is that an option
2.2 Dealing with multiple comparators priority? (i.e. severe or important in terms of the that addresses the problem should be a priority (e.g., diseases
potential benefits or savings)? Is the problem that are fatal or disabling are likely to be a higher priority than
2.3 Other considerations urgent? Is it a recognised priority (e.g. based diseases that only cause minor distress). The more people who
2.4 Format of health care questions using the on a national health plan)? Are a large are affected, the more likely it is that an option that addresses
GRADE approach number of people affected by the problem? the problem should be a priority.
3. Selecting and rating the importance of outcomes Is there important How much do those affected by the option The more likely it is that differences in values would lead to
uncertainty about how value each of the outcomes in relation to the different decisions, the less likely it is that there will be a
3.1 Steps for considering the relative importance of much people value the other outcomes (i.e. what is the relative consensus that an option is a priority (or the more important it is
outcomes main outcomes? importance of the outcomes)? Is there likely to be to obtain evidence of the values of those affected by
evidence to support those value judgements, the option). Values in this context refer to the relative importance
3.2 Influence of perspective or is there evidence of variability in those of the outcomes of interest (how much people value each of
3.3 Using evidence in rating the importance of values that is large enough to lead to different those outcomes). These values are sometimes called ‘utility
outcomes decisions? values’.
3.4 Surrogate (substitute) outcomes What is the overall What is the overall certainty of this evidence The less certain the evidence is for critical outcomes (those that
1 of effects, across all of the outcomes that are driving a recommendation), the less likely that an option
4. Summarizing the evidence certainty of the are critical to making a decision? should be recommended (or the more important it is likely to be to
evidence of
4.1 Evidence Tables effectiveness?
conduct a pilot study or impact evaluation, if it is recommended).
4.2 GRADE Evidence Profile How substantial are the How substantial (large)are the desirable The larger the benefit, the more likely it is that an option should
4.3 Summary of Findings table desirable anticipated anticipated effects (including health and other be recommended.
5. Quality of evidence effects? benefits) of the option (taking into account
the severity or importance of the desirable
5.1 Factors determining the quality of evidence consequences and the number of people
5.1.1 Study design affected)?
5.2 Factors that can reduce the quality of the How substantial are the How substantial (large) are the undesirable The greater the harm, the less likely it is that an option should be
evidence undesirable anticipated anticipated effects (including harms to health recommended.
effects? and other harms) of the option (taking into
5.2.1 Study limitations (Risk of Bias) account the severity or importance of the
5.2.2 Inconsistency of results adverse effects and the number of people
affected)?
5.2.2.1 Deciding whether to use estimates
from a subgroup analysis Do the desirable effects Are the desirable effects large relative to the The larger the desirable effects in relation to the undesirable
outweigh the undesirable effects? effects, taking into account the values of those affected (i.e. the
5.2.3 Indirectness of evidence undesirable effects? relative value they attach to the desirable and undesirable
5.2.4 Imprecision outcomes) the more likely it is that an option should be
recommended.
5.2.4.1 Imprecision in guidelines
5.2.4.2 Imprecision in in systematic How large are the How large an investment of resources would The greater the cost, the less likely it is that an option should be
resource the option require or save? a priority. Conversely, the greater the savings, the more likely it
reviews requirements? is that an option should be a priority.
5.2.4.3 Rating down two levels for How large is the Is the cost small relative to the net benefits The greater the cost per unit of benefit, the less likely it is that an
imprecision incremental cost (benefits minus harms)? option should be a priority.
relative to the net
5.2.5 Publication bias benefit?
5.3. Factors that can increase the quality of the What would be the Would the option reduce or increase health Policies or programmes that reduce inequities are more likely to
evidence impact on health inequities? be a priority than ones that do not (or ones that increase
5.3.1 Large magnitude of an effect inequities? inequities).
5.3.2. Doseresponse gradient Is the option Are key stakeholders likely to find the option The less acceptable an option is to key stakeholders, the less
5.3.3. Effect of plausible residual confounding acceptable to key acceptable (given the relative importance likely it is that it should be recommended, or if it is
stakeholders? they attach to the desirable and undesirable recommended, the more likely it is that the recommendation
5.4 Overall quality of evidence consequences of the option; the timing of the should include an implementation strategy to address concerns
6. Going from evidence to recommendations benefits, harms and costs; and their moral about acceptability. Acceptability might reflect who benefits (or is
values)? harmed) and who pays (or saves); and when the benefits,
6.1 Recommendations and their strength adverse effects, and costs occur (and the discount rates of key
values)? harmed) and who pays (or saves); and when the benefits,
6.1 Recommendations and their strength adverse effects, and costs occur (and the discount rates of key
6.1.1 Strong recommendation stakeholders; e.g. politicians may have a high discount rate for
anything that occurs beyond the next election). Unacceptability
6.1.2 Weak recommendation may be due to some stakeholders:
6.1.3 Recommendations to use interventions ● Not accepting the distribution of the benefits, harms and
only in research costs
6.1.4 No recommendation ● Not accepting costs or undesirable effects in the short
term for desirable effects (benefits) in the future
6.2 Factors determining direction and strength of ● Attaching more value (relative importance) to the
recommendations undesirable consequences than to the desirable
6.2.1 Balance of desirable and undesirable consequences or costs of an option (because of how they
consequences might be affected personally or because of their perceptions
6.2.1.1 Estimates of the magnitude of the of the relative importance of consequences for others)
● Morally disapproving (i.e. in relationship to ethical
desirable and undesirable effects principles such as autonomy, nonmaleficence, beneficence
6.2.1.2 Best estimates of values and or justice)
preferences Is the option feasible to Can the option be accomplished or brought The less feasible (capable of being accomplished or brought
6.3.2 Confidence in best estimates of implement? about? about) an option is, the less likely it is that it should be
magnitude of effects (quality of evidence) recommended (i.e. the more barriers there are that would be
difficult to overcome).
6.3.3 Confidence in values and preferences
1 The “certainty of the evidence” is an assessment the likelihood that the effect will be substantially
6.3.4 Resource use (cost)
6.3.4.1 Differences between costs and different from what the research found.
other outcomes
6.3.4.2 Perspective
6.3.4.3 Resource implications considered Explanations of the conclusions in the framework
6.3.4.4 Confidence in the estimates of Suggestions for how to make judgements in relation to each conclusion are provided in: Framework for
resource use (quality of the evidence about going from evidence to a recommendation – Guidance for health system and public health
recommendations. For each conclusion, we suggest one or more questions to consider when making a
cost) judgement and explain what is needed.
6.3.4.5 Presentation of resource use
6.3.4.6 Economic model Term Question Explanation
6.3.4.7 Consideration of resource use in Overall judgement What is the overall balance between all the An overall judgement whether the desirable consequences
recommendations across all criteria desirable and undesirable consequences? outweigh the undesirable consequences, or vice versa (based on
6.4 Presentation of recommendations all the research evidence and additional information considered in
relation to all the criteria). Consequences include health and other
6.4.1 Wording of recommandations benefits, adverse effects and other harms, resource use, and
6.3.2 Symbolic representation impacts on equity
6.4.3 Providing transparent statements about Type of Based on the balance of the consequences in A recommendation based on the balance of consequences and
recommendation relation to all of the criteria in the framework, your judgements in relation to all of the criteria, for example:
assumed values and preferences what is your recommendation? ● Not to implement the option
6.5 The EvidencetoDecision framework ● To consider the option only in the context of rigorous
7. The GRADE approach for diagnostic tests and research
strategies ● To consider the option only with specified monitoring and
7.1. Questions about diagnostic tests evaluation
7.1.1. Establishing the purpose of a test ● To consider the option only in specified contexts
● To implement the option
7.1.2. Establishing the role of a test
Recommendation (text) What is your recommendation in plain A concise, clear and actionable recommendation
7.1.3. Clear clinical questions language?
7.2. Gold standard and reference test Justification What is the justification for the A concise summary of the reasoning underlying the
recommendation, based on the criteria in the recommendation
framework that drove the recommendation?
1. Overview of the GRADE Approach
Subgroup What, if any, subgroups were considered and A concise summary of the subgroups that were considered and
1.1 Purpose and advantages of the GRADE considerations what, if any, specific factors (based on the any modifications of the recommendation in relation to any of
approach criteria in the framework) should be considered those subgroups
1.2 Separation of confidence in effect estimates in relation to those subgroups when
implementing the option?
from strength of recommendations
Implementation What should be considered when implementing Key considerations, including strategies to address concerns
1.3 Special challenges in applying the the GRADE considerations the option, including strategies to address about acceptability and feasibility, when implementing the option
approach concerns about acceptability and feasibility?
1.4 Modifications to the GRADE approach Monitoring and What indicators should be monitored? Is there a Any important indicators that should be monitored if the option is
2. Framing the health care question evaluation need to evaluate the impacts of the option, implemented
considerations either in a pilot study or an impact evaluation
2.1 Defining the patient population and intervention carried out alongside or before full
2.2 Dealing with multiple comparators implementation of the option?
2.3 Other considerations Research priorities Are there any important uncertainties in relation Any research priorities
2.4 Format of health care questions using the to any of the criteria that are a priority for
further research?
GRADE approach
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of Explanations of terms used in summaries of findings
outcomes
3.2 Influence of perspective Term Explanation
3.3 Using evidence in rating the importance of Outcomes These are all the outcomes (potential benefits or harms) that are considered to be important to those affected by the
outcomes intervention, and which are important to making a recommendation or decision. Consultation with those affected by an
3.4 Surrogate (substitute) outcomes intervention (such as patients and their carers) or other members of the public may be used to select the important
outcomes. A review of the literature may also be carried out to inform the selection of the important outcomes. The
4. Summarizing the evidence importance (or value) of each outcome in relation to the other outcomes should also be considered. This is the relative
4.1 Evidence Tables importance of the outcome.
4.2 GRADE Evidence Profile 95% Confidence A confidence interval is a range around an estimate that conveys how precise the estimate is. The confidence interval
4.3 Summary of Findings table Interval (CI) is a guide to how sure we can be about the quantity we are interested in. The narrower the range between the two
5. Quality of evidence numbers, the more confident we can be about what the true value is; the wider the range, the less sure we can be. The
width of the confidence interval reflects the extent to which chance may be responsible for the observed estimate (with a
5.1 Factors determining the quality of evidence wider interval reflecting more chance). 95% Confidence Interval (CI) means that we can be 95 percent confident that
5.1.1 Study design the true size of effect is between the lower and upper confidence limit. Conversely, there is a 5 percent chance that the
true effect is outside of this range.
5.2 Factors that can reduce the quality of the
evidence Relative Effect or Here the relative effect is expressed as a risk ratio (RR). Risk is the probability of an outcome occurring. A risk
RR (Risk Ratio) ratio is the ratio between the risk in the intervention group and the risk in the control group. For example, if the risk in the
5.2.1 Study limitations (Risk of Bias) intervention group is 1% (10 per 1000) and the risk in the control group is 10% (100 per 1000), the relative effect is 10/100
5.2.2 Inconsistency of results or 0.10. If the RR is exactly 1.0, this means that there is no difference between the occurrence of the outcome in the
5.2.2.1 Deciding whether to use estimates intervention and the control group. If the RR is greater than 1.0, the intervention increases the risk of the outcome. If it is
a good outcome (for example, the birth of a healthy baby), a RR greater than 1.0 indicates a desirable effect for the
from a subgroup analysis intervention. Whereas, if the outcome is bad (for example, death) a RR greater than 1.0 would indicate an undesirable
5.2.3 Indirectness of evidence effect. If the RR is less than 1.0, the intervention decreases the risk of the outcome. This indicates a desirable effect, if it
is a bad outcome (for example, death) and an undesirable effect if it is a good outcome (for example, birth of a healthy
5.2.4 Imprecision baby).
5.2.4.1 Imprecision in guidelines
Certainty of the The certainty of the evidence is an assessment of how good an indication the research provides of the likely effect; i.e.
5.2.4.2 Imprecision in in systematic evidence the likelihood that the effect will be substantially different from what the research found. By substantially different we
reviews (GRADE)2 mean a large enough difference that it might affect a decision. This assessment is based on an overall assessment of
5.2.4.3 Rating down two levels for reasons for there being more or less certainty using the GRADE approach. In the context of decisions, these
considerations include the applicability of the evidence in a specific context. Other terms may be used synonymously with
imprecision
certainty of the evidence, including quality of the evidence, confidence in the estimate, and strength of the
5.2.5 Publication bias evidence. Definitions of the categories used to rate the certainty of the evidence (high, moderate, low, and very low)
5.3. Factors that can increase the quality of the are provided in the table below.
evidence
5.3.1 Large magnitude of an effect
5.3.2. Doseresponse gradient Definitions for ratings of the certainty of the evidence
5.3.3. Effect of plausible residual confounding Ratings Definitions
5.4 Overall quality of evidence This research provides a very good indication of the likely effect. The likelihood that the effect will be substantially different is
low.
6. Going from evidence to recommendations High
6.1 Recommendations and their strength
This research provides a good indication of the likely effect. The likelihood that the effect will be substantially different is
6.1.1 Strong recommendation moderate.
6.1.2 Weak recommendation Moderate
6.1.3 Recommendations to use interventions This research provides some indication of the likely effect. However, the likelihood that it will be substantially different (a large
only in research Low enough difference that it might have an effect on a decision) is high.
This research provides some indication of the likely effect. However, the likelihood that it will be substantially different (a large
only in research Low enough difference that it might have an effect on a decision) is high.
6.1.4 No recommendation This research does not provide a reliable indication of the likely effect. The likelihood that the effect will be substantially different
6.2 Factors determining direction and strength of Very Low (a large enough difference that it might have an effect on a decision) is very high.
recommendations
6.2.1 Balance of desirable and undesirable
consequences
6.2.1.1 Estimates of the magnitude of the
desirable and undesirable effects
6.2.1.2 Best estimates of values and
preferences 7. The GRADE approach for diagnostic tests
6.3.2 Confidence in best estimates of
magnitude of effects (quality of evidence) and strategies
6.3.3 Confidence in values and preferences
6.3.4 Resource use (cost)
6.3.4.1 Differences between costs and Recommendations concerning diagnostic testing share the fundamental logic of recommendations for
other outcomes therapeutic and other interventions, such as screening. However, diagnostic questions also present
6.3.4.2 Perspective unique challenges.
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of While some tests naturally report positive and negative results (e.g., pregnancy, HIV infection), other
tests report their results as ordinal (e.g., Glasgow coma scale or minimental status examination) or
resource use (quality of the evidence about continuous variable (e.g., metabolic measures), usually with increasing likelihood of disease or adverse
cost) events as the test results become more extreme. For simplicity, in this discussion we generally assume a
6.3.4.5 Presentation of resource use diagnostic approach that ultimately categorizes test results as positive or negative. This also recognizes
6.3.4.6 Economic model that many tests ultimately lead to dichotomized decisions to treat or not to treat.
6.3.4.7 Consideration of resource use in Clinicians and researchers often administer diagnostic tests as a package or strategy composed of
recommendations several tests. Thus, one can often think of evaluating or recommending a diagnostic strategy rather than
6.4 Presentation of recommendations a single test.
6.4.1 Wording of recommandations
6.3.2 Symbolic representation Examples
6.4.3 Providing transparent statements about 1. In managing patients with a diagnosis of cervical intraepithelial neoplasia, a precursor of prevent
cervical cancer, based on visual inspection with acetic acid (VIA) clinicians may proceed to treatment
assumed values and preferences directly or apply a strategy of testing for human papilloma virus and VIA.
6.5 The EvidencetoDecision framework
7. The GRADE approach for diagnostic tests and 2. Testing strategy may use an initial sensitive but nonspecific test which, if positive, is followed by a
strategies more specific test (e.g., testing for HIV includes the use of an ELISA test followed by quantitative HIV
7.1. Questions about diagnostic tests RNA determination for those with positive results of the ELISA test; but one could ask the question why
quantitative HIV RNA determination alone would not be appropriate).
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test
7.1.3. Clear clinical questions
7.2. Gold standard and reference test
7.1. Questions about diagnostic tests
1. Overview of the GRADE Approach
1.1 Purpose and advantages of the GRADE The format of the question asked by authors of systematic reviews or guideline developers follows the
approach same principles as the format for management questions:
1.2 Separation of confidence in effect estimates Should TEST A vs. TEST B be used in SOME PATIENTS/POPULATION?
from strength of recommendations Should TEST A vs. TEST B be used for SOME PURPOSE?
1.3 Special challenges in applying the the GRADE
approach 7.1.1. Establishing the purpose of a test
1.4 Modifications to the GRADE approach
2. Framing the health care question
Guideline panels should be explicit about the purpose of the test in question. Researchers and clinicians
2.1 Defining the patient population and intervention apply medical tests that are usually referred to as “diagnostic” – including signs and symptoms, imaging,
2.2 Dealing with multiple comparators
biochemistry, pathology, and psychological testing – for a number of purposes. These applications
2.3 Other considerations include identifying physiological derangements, establishing prognosis, monitoring illness and treatment
2.4 Format of health care questions using the response, screening and diagnosis.
GRADE approach
3. Selecting and rating the importance of outcomes
3.1 Steps for considering the relative importance of 7.1.2. Establishing the role of a test
outcomes
3.2 Influence of perspective Guideline panels and authors of systematic reviews should also clearly establish the role of a diagnostic
3.3 Using evidence in rating the importance of test or strategy. This process should begin with determining the standard diagnostic pathway – or
pathways – for the target patient presentation and identify the associated limitations. Knowing those
outcomes limitations one can identify particular shortcomings for which the alternative diagnostic test or strategy
3.4 Surrogate (substitute) outcomes offers a putative remedy. The purpose of a test under consideration may be for (i) replacement (e.g., of
4. Summarizing the evidence tests with greater burden, invasiveness, cost, or inferior accuracy), (ii), triage (e.g., to minimize use of
4.1 Evidence Tables an invasive or expensive test) or (iii) addon (e.g., to further enhance diagnostic accuracy beyond the
4.2 GRADE Evidence Profile existing diagnostic pathway) (Table 7.1) [Bossuyt 2006; PMID: 16675820].
4.3 Summary of Findings table Table 7.1. Possible roles of new diagnostic tests
5. Quality of evidence Replacement A new test might substitute an old one, because it is more accurate, less invasive,
5.1 Factors determining the quality of evidence less risky or uncomfortable for patients, organizationally or technically less
5.1.1 Study design challenging, quicker to yield results or more easily interpreted, or less costly.
5.2 Factors that can reduce the quality of the Triage A new test is added before the existing diagnostic pathway and only patients with a
particular result on the triage test continue the testing pathway; triage tests are not
evidence necessarily more accurate but usually simpler and less costly.
5.2.1 Study limitations (Risk of Bias) Addon A new test is added after the existing diagnostic pathway and may be used to limit
5.2.2 Inconsistency of results the number of either false positive or false negative results after the existing
5.2.2.1 Deciding whether to use estimates diagnostic pathway; addon tests are usually more accurate but otherwise less
from a subgroup analysis attractive than existing tests.
5.2.3 Indirectness of evidence
5.2.4 Imprecision
5.2.4.1 Imprecision in guidelines
5.2.4.2 Imprecision in in systematic 7.1.3. Clear clinical questions
reviews
5.2.4.3 Rating down two levels for Clearly establishing the role or purpose of a test or test strategy will lead to the identification of sensible
imprecision clinical questions that, similar to other management problems, have four components: patients, diagnostic
5.2.5 Publication bias intervention (strategy), comparison diagnostic intervention (strategy), and the outcomes of interest.
5.3. Factors that can increase the quality of the
Examples
evidence 1: In patients suspected of coronary artery disease (patients) should multislice spiral computed
5.3.1 Large magnitude of an effect tomography (CT) of coronary arteries (intervention) be used as replacement for conventional invasive
5.3.2. Doseresponse gradient coronary angiography (comparison) to lower complications with acceptable rates of false negatives
5.3.3. Effect of plausible residual confounding associated with coronary events and false positives leading to unnecessary treatment and complications
5.4 Overall quality of evidence (outcomes)?
This example illustrates one common rationale for a new test – test replacement (coronary CT instead of
6. Going from evidence to recommendations conventional angiography) to avoid complications associated with a more invasive and expensive
6.1 Recommendations and their strength alternative for a condition that can effectively be treated. In this situation, the new test would only need
6.1.1 Strong recommendation to replicate the results of the existing test to demonstrate greater patient net benefit. This assumes that
6.1.2 Weak recommendation the new test similarly categorizes patients at the same stage of the disease and that the consequences of
6.1.3 Recommendations to use interventions the test result, i.e. management decisions and outcomes, are similar.
2: In patients suspected of cow’s milk allergy (CMA), should skin prick tests rather than an oral food
only in research challenge with cow’s milk be used for the diagnosis and management of IgEmediated CMA.
6.1.4 No recommendation 3: In adults cared for in a nonspecialized clinical setting, should serum or plasma cystatin C rather than
6.2 Factors determining direction and strength of serum creatinine concentration be used for the diagnosis and management of renal impairment.
recommendations
6.2.1 Balance of desirable and undesirable
6.2.1 Balance of desirable and undesirable
consequences
6.2.1.1 Estimates of the magnitude of the 7.2. Gold standard and reference test
desirable and undesirable effects
6.2.1.2 Best estimates of values and The concept of diagnostic accuracy relies on the presence of a socalled “gold standard”, i.e. a clearly
preferences stated definition of the target disease (i.e. construct of a disease). However, the term “gold standard” is
6.3.2 Confidence in best estimates of ambiguous and not consistently defined. Moreover, constructs of diseases are constantly changing with
magnitude of effects (quality of evidence) progress in understanding biology (e.g. in oncology, with a more molecular understanding of the
6.3.3 Confidence in values and preferences underlying pathologies or Alzheimer’s dementia). We will use the term “gold standard” here as
representing the “perfect” approach to defining or diagnosing the disease or condition of interest, even if
6.3.4 Resource use (cost) the approach is theoretical and based on convention. Following from this definition, diagnostic test
6.3.4.1 Differences between costs and accuracy (e.g. sensitivity and specificity) as a measurement property is not associated with a “gold
other outcomes standard”. We will use the term “reference standard” or reference test for the test or test strategy that
6.3.4.2 Perspective is the current best and accepted approach to making a diagnosis against which a comparison (with an
index test) may be made.
6.3.4.3 Resource implications considered
6.3.4.4 Confidence in the estimates of
resource use (quality of the evidence about
cost)
6.3.4.5 Presentation of resource use
7.3. Estimating impact on patients
6.3.4.6 Economic model
6.3.4.7 Consideration of resource use in It follows that recommendations regarding the use of medical tests require inferences about the
recommendations consequences of falsely identifying patients as having or not having the disease. If a test fails to
6.4 Presentation of recommendations improve patientimportant outcomes there is no reason to use it, whatever its accuracy. Given the
uncertainties about both reference and gold standards and the relation between diagnosis and patient or
6.4.1 Wording of recommandations population consequences, the best way to assess a diagnostic test or strategy would be a testtreat
6.3.2 Symbolic representation randomized controlled trial in which investigators allocate patients to experimental or control diagnostic
6.4.3 Providing transparent statements about approaches and measure patientimportant outcomes (mortality, morbidity, symptoms, quality of life and
assumed values and preferences resource use).
6.5 The EvidencetoDecision framework Figure 1. Generic study designs that guideline developers can use to evaluate the impact of
7. The GRADE approach for diagnostic tests and testing.
strategies
7.1. Questions about diagnostic tests
7.1.1. Establishing the purpose of a test
7.1.2. Establishing the role of a test
7.1.3. Clear clinical questions
7.2. Gold standard and reference test