Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
An Introduction to Critical
Appraisal
Isla Kuhn
Medical Librarian
Last updated: September
2014
Learning Outcomes
By the end of this session you will:
• Understand what Critical Appraisal is
• Be aware of some of the different types
of research
• Be able to interpret basic statistics within
a research paper
• Gain experience in critically appraising a
research paper
How do I Appraise?
• You don’t need to be a statistics expert
• Ready-made checklists help you focus on the
most important aspects of the article
• Different checklists available for different types of
research (RCTs, systematic reviews, case-control
studies, etc).
• Checklist for Qualitative research
• Available free from CASP
http://www.casp-uk.net
Critical Appraisal
Critical appraisal of any study design must assess:
Validity
Were sound scientific methods used?
Chance / Bias / Confounding Factors
Results
What are the results and how are they expressed?
Relevance
Are the findings generalisable – can they be
applied to settings / situations outside the
research study? Do these results apply to my local
context?
Event Rates
Number of people experiencing an event as a proportion of the
number of people in the population
• Form the basis of other calculations
 Control Event Rate (CER)
 Experimental Event Rate (EER)
Emerg Med J 2008 25: 26-29:
Proportion with recurrent headache (whole sample)
 CER = 12/31 = 39%
 EER = 8/30 = 27%
Risk of benefit and harm
Relative Risk (RR) = compares the risk in 2 different groups of
people
tells us how many times more likely it is that an event will occur
in the treatment group relative to the control group
 EER / CER
 Relative Risk of 1 means the risk is the same in each group
 <1 = treatment reduces risk of event
 >1 = treatment increases risk of event
27/39 = 0.69 = treatment reduces risk of event
Risk of headache is 0.69 times lower in the treatment group than in
the control group.
Risk continued
Absolute risk reduction (ARR)
Difference in risk between experimental and
control groups
Risk of Event in Control Group – Risk of Event
in intervention group
ARR=0 Treatment has no effect
ARR positive – Treatment is beneficial
ARR negative – Treatment is harmful
39% - 27% = 12%
Dexamethasone reduces the absolute risk of
recurrent headache by 12%
Relative Risk Reduction (RRR)
tells us the reduction in the rate of the outcome in
the treatment group relative to that in the control
group
ARR / CER Or 1 – RR
0.12 / 0.39 = 0.31 = 31%
1-0.69 = 0.31 = 31%
Dexamethasone reduces the risk of recurrent
headache by 31% relative to that occurring in the
control group.
Absolute Risk Reduction & Relative Risk
Reduction
Results of hypothetical trial of a new drug for
myocardial infarction
Numbers Needed to Treat
Measures the impact of a treatment or intervention
States how many patients need to be treated in order to
prevent an event which would otherwise occur.
NNT = 10 means that 10 patients need to be treated to prevent
one adverse outcome
The closer to 1 the better
Calculation:
 1 / ARR (if ARR expressed as a proportion)
 100/ARR (if ARR expressed as a %)
 100/12 = 8
 P=Probability
 A p-value is a measure of statistical significance which tells us the
probability of an event occurring due to chance alone
In simple terms, probability (p-value) can only take values between 0 and 1:
0|-----------------------|--------------------|1
Impossible…....... Absolutely certain…
If p=0.001 the likelihood of a result happening by chance is extremely low: 1 in
1000
If p=0.05 it is fairly unlikely that the result happened by chance 1 in 20
If p=0.5 it is fairly likely that the result happened by chance 1 in 2
If p=0.75 it is very likely that the result happened by chance 3 in 4
P Values
Confidence intervals:
“The recurrent headache rate in the control group was 39%(12/31, 95% CI
22% to 57%) compared with 27% (8/30, 95% CI13% to 46%) in the
dexamethasone group (relative risk (RR)0.69, 95% CI 0.33 to 1.45;
p=0.47)”.
Why 95%? It measures the reliability of an estimate, so if you repeated
this same study 95 times you could be certain that the result would be
the same every time, within that particular range i.e. 0.33 to 1.45. CI are
typically recorded as 95% but when presented in graphical terms they
are sometimes expressed as intervals of 50%, 95% and 99%
Confidence Intervals
An alternative way of assessing the effects of chance
The result of the trial is a “point estimate” – if you ran
the trial again you will get a different result
The Confidence Interval gives the range in which you
think the real answer lies
The 95% CI is the range in which we are 95% certain
that the true population value lies
Look at how wide the interval is, and the values at
each end
E.g. RR = 0.69 95% CI 0.33 to 1.45
Forest Plot – Simple Example
Individual sample
size
Combined
Results
Confidence Interval
Line of No Effect
Best Estimate
The shorter the
Confidence Interval (CI)
the more confident we
can be that the results
are true
If the CI crosses the line
of no effect, then the
results of that study are
not statistically significant
Favours Treatment Favours Control
Quantitative critical appraisal october 2015
Heterogeneity – what is it?
• Relevant to statistical meta-analysis, so you are more likely to come
across this in a study review or systematic review – it is when multiple
studies on an effect are actually measuring somewhat different effects
due to differences in subject population, intervention, choice of analysis,
experimental design, etc; this can cause problems in attempts to
summarize the meaning of the studies.
What is df?
• Degrees of freedom – frequently expressed with the Chi² test.
• The number of independent pieces of information available for the
statistician to make the calculations
What is Chi²?
• The chi-square test is used to determine whether there is a significant
difference between the expected frequencies and the observed
frequencies in one or more categories. Do the number of individuals or
objects that fall in each category differ significantly from the number you
would expect? Is this difference between the expected and observed
due to sampling error, or is it a real difference?
How do I understand and interpret
different statistical information?
• The short answer is, you don’t have to understand it, you only need to look at
the p value 
• As a general rule, remember the following:
• Statistics that describe data – percentages, mean, median, mode, standard
deviation
• Statistics that test confidence – confidence intervals, p values
• Statistics that test difference – t tests and other parametric tests, Mann-Whitney
and other non parametric tests, Chi² test
• Statistics that compare risk – risk and odds ratio, risk reduction and numbers
needed to treat
Source: Medical and Health Science Statistics Made Easy by Michael Harris and
Gordon Taylor
Conclusion
Critical Appraisal is part of Evidence Based
Healthcare
It takes practice
Use CASP checklists
Depth of Appraisal is your choice
Only you can assess usefulness
Useful websites
www.healthknowledge.org.uk/interactive-learning/finding-and-appraising-the
www.thennt.com/
www.casp-uk.net/
www.wikipedia.org
http://www.nhs.uk/news/Pages/NewsIndex.aspx NHS Choices Behind the
Headlines
Help!
Isla Kuhn
Medical Librarian
Medical Library
Box 111
Addenbrooke’s Hospital
email: ilk21@cam.ac.uk
twitter: @ilk21
phone: (01223 3) 36750
web: library.medschl.cam.ac.uk
Thank you.

More Related Content

Quantitative critical appraisal october 2015

  • 1. An Introduction to Critical Appraisal Isla Kuhn Medical Librarian Last updated: September 2014
  • 2. Learning Outcomes By the end of this session you will: • Understand what Critical Appraisal is • Be aware of some of the different types of research • Be able to interpret basic statistics within a research paper • Gain experience in critically appraising a research paper
  • 3. How do I Appraise? • You don’t need to be a statistics expert • Ready-made checklists help you focus on the most important aspects of the article • Different checklists available for different types of research (RCTs, systematic reviews, case-control studies, etc). • Checklist for Qualitative research • Available free from CASP http://www.casp-uk.net
  • 4. Critical Appraisal Critical appraisal of any study design must assess: Validity Were sound scientific methods used? Chance / Bias / Confounding Factors Results What are the results and how are they expressed? Relevance Are the findings generalisable – can they be applied to settings / situations outside the research study? Do these results apply to my local context?
  • 5. Event Rates Number of people experiencing an event as a proportion of the number of people in the population • Form the basis of other calculations  Control Event Rate (CER)  Experimental Event Rate (EER) Emerg Med J 2008 25: 26-29: Proportion with recurrent headache (whole sample)  CER = 12/31 = 39%  EER = 8/30 = 27%
  • 6. Risk of benefit and harm Relative Risk (RR) = compares the risk in 2 different groups of people tells us how many times more likely it is that an event will occur in the treatment group relative to the control group  EER / CER  Relative Risk of 1 means the risk is the same in each group  <1 = treatment reduces risk of event  >1 = treatment increases risk of event 27/39 = 0.69 = treatment reduces risk of event Risk of headache is 0.69 times lower in the treatment group than in the control group.
  • 7. Risk continued Absolute risk reduction (ARR) Difference in risk between experimental and control groups Risk of Event in Control Group – Risk of Event in intervention group ARR=0 Treatment has no effect ARR positive – Treatment is beneficial ARR negative – Treatment is harmful 39% - 27% = 12% Dexamethasone reduces the absolute risk of recurrent headache by 12%
  • 8. Relative Risk Reduction (RRR) tells us the reduction in the rate of the outcome in the treatment group relative to that in the control group ARR / CER Or 1 – RR 0.12 / 0.39 = 0.31 = 31% 1-0.69 = 0.31 = 31% Dexamethasone reduces the risk of recurrent headache by 31% relative to that occurring in the control group.
  • 9. Absolute Risk Reduction & Relative Risk Reduction Results of hypothetical trial of a new drug for myocardial infarction
  • 10. Numbers Needed to Treat Measures the impact of a treatment or intervention States how many patients need to be treated in order to prevent an event which would otherwise occur. NNT = 10 means that 10 patients need to be treated to prevent one adverse outcome The closer to 1 the better Calculation:  1 / ARR (if ARR expressed as a proportion)  100/ARR (if ARR expressed as a %)  100/12 = 8
  • 11.  P=Probability  A p-value is a measure of statistical significance which tells us the probability of an event occurring due to chance alone In simple terms, probability (p-value) can only take values between 0 and 1: 0|-----------------------|--------------------|1 Impossible…....... Absolutely certain… If p=0.001 the likelihood of a result happening by chance is extremely low: 1 in 1000 If p=0.05 it is fairly unlikely that the result happened by chance 1 in 20 If p=0.5 it is fairly likely that the result happened by chance 1 in 2 If p=0.75 it is very likely that the result happened by chance 3 in 4 P Values
  • 12. Confidence intervals: “The recurrent headache rate in the control group was 39%(12/31, 95% CI 22% to 57%) compared with 27% (8/30, 95% CI13% to 46%) in the dexamethasone group (relative risk (RR)0.69, 95% CI 0.33 to 1.45; p=0.47)”. Why 95%? It measures the reliability of an estimate, so if you repeated this same study 95 times you could be certain that the result would be the same every time, within that particular range i.e. 0.33 to 1.45. CI are typically recorded as 95% but when presented in graphical terms they are sometimes expressed as intervals of 50%, 95% and 99%
  • 13. Confidence Intervals An alternative way of assessing the effects of chance The result of the trial is a “point estimate” – if you ran the trial again you will get a different result The Confidence Interval gives the range in which you think the real answer lies The 95% CI is the range in which we are 95% certain that the true population value lies Look at how wide the interval is, and the values at each end E.g. RR = 0.69 95% CI 0.33 to 1.45
  • 14. Forest Plot – Simple Example Individual sample size Combined Results Confidence Interval Line of No Effect Best Estimate The shorter the Confidence Interval (CI) the more confident we can be that the results are true If the CI crosses the line of no effect, then the results of that study are not statistically significant Favours Treatment Favours Control
  • 16. Heterogeneity – what is it? • Relevant to statistical meta-analysis, so you are more likely to come across this in a study review or systematic review – it is when multiple studies on an effect are actually measuring somewhat different effects due to differences in subject population, intervention, choice of analysis, experimental design, etc; this can cause problems in attempts to summarize the meaning of the studies.
  • 17. What is df? • Degrees of freedom – frequently expressed with the Chi² test. • The number of independent pieces of information available for the statistician to make the calculations
  • 18. What is Chi²? • The chi-square test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling error, or is it a real difference?
  • 19. How do I understand and interpret different statistical information? • The short answer is, you don’t have to understand it, you only need to look at the p value  • As a general rule, remember the following: • Statistics that describe data – percentages, mean, median, mode, standard deviation • Statistics that test confidence – confidence intervals, p values • Statistics that test difference – t tests and other parametric tests, Mann-Whitney and other non parametric tests, Chi² test • Statistics that compare risk – risk and odds ratio, risk reduction and numbers needed to treat Source: Medical and Health Science Statistics Made Easy by Michael Harris and Gordon Taylor
  • 20. Conclusion Critical Appraisal is part of Evidence Based Healthcare It takes practice Use CASP checklists Depth of Appraisal is your choice Only you can assess usefulness
  • 22. Help! Isla Kuhn Medical Librarian Medical Library Box 111 Addenbrooke’s Hospital email: ilk21@cam.ac.uk twitter: @ilk21 phone: (01223 3) 36750 web: library.medschl.cam.ac.uk Thank you.

Editor's Notes

  1. Look at different types of study and introduce the Weight of Evidence Game next.
  2. The figures are taken from Table 3 on page 3: Under the column entitled ‘Proportion with recurrent headache’ The figures given in that row will form the basis of the statistical analysis we will carry out today.
  3. The likelihood of an event occurring. &amp;lt; = less than &amp;gt; = greater than
  4. The difference in risk between the groups.
  5. The reduction in the rate of the outcome (result) in the Dexamethasone group relative to that in the placebo group. What do you immediately notice about these findings?
  6. This slide may help you to understand the difference between ARR and RRR better. ARR is a straightforward comparison between the results in each group. RRR is the difference in the event rates, or likelihood of an event occurring, in each group. The result is expressed as a proportion, either as a fraction or as a percentage. So, RRR will often appear more significant than ARR. Behind the Headlines on NHS Choices
  7. The number needed to treat (NNT) is an epidemiological( Epidemiology is the science that studies the patterns, causes, and effects of health and disease conditions in defined populations) measure used in assessing the effectiveness of a health-care intervention, typically a treatment with medication. The NNT is the average number of patients who need to be treated to prevent one additional bad outcome (i.e. the number of patients that need to be treated for one to benefit compared with a control in a clinical trial). The ideal NNT is 1, where everyone improves with treatment and no one improves with control. The higher the NNT, the less effective is the treatment. But variations will occur depending on the circumstances, e.g. vaccinations for large populations.
  8. So, if our p value is 0.47 it seems that our results are more likely to have happened by chance.
  9. Generally, the wider the CI, the less reliable the result, but bear in mind sample and study size. It is easier to interpret the effects of the CI if they are expressed in graphical terms – see next slide. At the mid point between 0 and 1 is the line of no effect. If the CI crosses this line, which it does in our study, this indicates that the results are not statistically significant because the range in which we can be 95% certain of the same result each time is too wide.
  10. This is just an example to help you see how confidence intervals work. Each study is represented by a square, with the horizontal lines showing the Confidence Intervals. The size of each square is proportional to the study&amp;apos;s sample size. The shorter the confidence interval the more confident we are of the results. If the Confidence Interval crosses the line of no effect, then the results of the study are not statistically significant. Significance is achieved at the set level if the diamond is clear of the &amp;apos;line of no effect’ On this made up example, the position of the CI in most cases would indicate that the treatment is not effective. Bear in mind – if the results are not statistically significant, it does not automatically follow that they are not clinically significant..
  11. Here is another example showing the CI more clearly. We will look at some of the other statistical tests mentioned here: Heterogeneity Degrees of freedom Chi squared test
  12. This is relevant to this type of study as it is a meta analysis. It would not have been appropriate to use it in our Dexamethasone study as that was a stand-alone study.