This document provides an introduction to critical appraisal of research articles. It explains that critical appraisal assesses the validity, results, and relevance of studies. Key aspects include evaluating study design, interpreting basic statistics and event rates, and determining applicability of results. Ready-made checklists can help appraise different types of research studies. Understanding concepts like confidence intervals, p-values, and risk ratios is important for interpretation. Practice is needed to develop critical appraisal skills.
Report
Share
Report
Share
1 of 22
More Related Content
Quantitative critical appraisal october 2015
1. An Introduction to Critical
Appraisal
Isla Kuhn
Medical Librarian
Last updated: September
2014
2. Learning Outcomes
By the end of this session you will:
• Understand what Critical Appraisal is
• Be aware of some of the different types
of research
• Be able to interpret basic statistics within
a research paper
• Gain experience in critically appraising a
research paper
3. How do I Appraise?
• You don’t need to be a statistics expert
• Ready-made checklists help you focus on the
most important aspects of the article
• Different checklists available for different types of
research (RCTs, systematic reviews, case-control
studies, etc).
• Checklist for Qualitative research
• Available free from CASP
http://www.casp-uk.net
4. Critical Appraisal
Critical appraisal of any study design must assess:
Validity
Were sound scientific methods used?
Chance / Bias / Confounding Factors
Results
What are the results and how are they expressed?
Relevance
Are the findings generalisable – can they be
applied to settings / situations outside the
research study? Do these results apply to my local
context?
5. Event Rates
Number of people experiencing an event as a proportion of the
number of people in the population
• Form the basis of other calculations
Control Event Rate (CER)
Experimental Event Rate (EER)
Emerg Med J 2008 25: 26-29:
Proportion with recurrent headache (whole sample)
CER = 12/31 = 39%
EER = 8/30 = 27%
6. Risk of benefit and harm
Relative Risk (RR) = compares the risk in 2 different groups of
people
tells us how many times more likely it is that an event will occur
in the treatment group relative to the control group
EER / CER
Relative Risk of 1 means the risk is the same in each group
<1 = treatment reduces risk of event
>1 = treatment increases risk of event
27/39 = 0.69 = treatment reduces risk of event
Risk of headache is 0.69 times lower in the treatment group than in
the control group.
7. Risk continued
Absolute risk reduction (ARR)
Difference in risk between experimental and
control groups
Risk of Event in Control Group – Risk of Event
in intervention group
ARR=0 Treatment has no effect
ARR positive – Treatment is beneficial
ARR negative – Treatment is harmful
39% - 27% = 12%
Dexamethasone reduces the absolute risk of
recurrent headache by 12%
8. Relative Risk Reduction (RRR)
tells us the reduction in the rate of the outcome in
the treatment group relative to that in the control
group
ARR / CER Or 1 – RR
0.12 / 0.39 = 0.31 = 31%
1-0.69 = 0.31 = 31%
Dexamethasone reduces the risk of recurrent
headache by 31% relative to that occurring in the
control group.
9. Absolute Risk Reduction & Relative Risk
Reduction
Results of hypothetical trial of a new drug for
myocardial infarction
10. Numbers Needed to Treat
Measures the impact of a treatment or intervention
States how many patients need to be treated in order to
prevent an event which would otherwise occur.
NNT = 10 means that 10 patients need to be treated to prevent
one adverse outcome
The closer to 1 the better
Calculation:
1 / ARR (if ARR expressed as a proportion)
100/ARR (if ARR expressed as a %)
100/12 = 8
11. P=Probability
A p-value is a measure of statistical significance which tells us the
probability of an event occurring due to chance alone
In simple terms, probability (p-value) can only take values between 0 and 1:
0|-----------------------|--------------------|1
Impossible…....... Absolutely certain…
If p=0.001 the likelihood of a result happening by chance is extremely low: 1 in
1000
If p=0.05 it is fairly unlikely that the result happened by chance 1 in 20
If p=0.5 it is fairly likely that the result happened by chance 1 in 2
If p=0.75 it is very likely that the result happened by chance 3 in 4
P Values
12. Confidence intervals:
“The recurrent headache rate in the control group was 39%(12/31, 95% CI
22% to 57%) compared with 27% (8/30, 95% CI13% to 46%) in the
dexamethasone group (relative risk (RR)0.69, 95% CI 0.33 to 1.45;
p=0.47)”.
Why 95%? It measures the reliability of an estimate, so if you repeated
this same study 95 times you could be certain that the result would be
the same every time, within that particular range i.e. 0.33 to 1.45. CI are
typically recorded as 95% but when presented in graphical terms they
are sometimes expressed as intervals of 50%, 95% and 99%
13. Confidence Intervals
An alternative way of assessing the effects of chance
The result of the trial is a “point estimate” – if you ran
the trial again you will get a different result
The Confidence Interval gives the range in which you
think the real answer lies
The 95% CI is the range in which we are 95% certain
that the true population value lies
Look at how wide the interval is, and the values at
each end
E.g. RR = 0.69 95% CI 0.33 to 1.45
14. Forest Plot – Simple Example
Individual sample
size
Combined
Results
Confidence Interval
Line of No Effect
Best Estimate
The shorter the
Confidence Interval (CI)
the more confident we
can be that the results
are true
If the CI crosses the line
of no effect, then the
results of that study are
not statistically significant
Favours Treatment Favours Control
16. Heterogeneity – what is it?
• Relevant to statistical meta-analysis, so you are more likely to come
across this in a study review or systematic review – it is when multiple
studies on an effect are actually measuring somewhat different effects
due to differences in subject population, intervention, choice of analysis,
experimental design, etc; this can cause problems in attempts to
summarize the meaning of the studies.
17. What is df?
• Degrees of freedom – frequently expressed with the Chi² test.
• The number of independent pieces of information available for the
statistician to make the calculations
18. What is Chi²?
• The chi-square test is used to determine whether there is a significant
difference between the expected frequencies and the observed
frequencies in one or more categories. Do the number of individuals or
objects that fall in each category differ significantly from the number you
would expect? Is this difference between the expected and observed
due to sampling error, or is it a real difference?
19. How do I understand and interpret
different statistical information?
• The short answer is, you don’t have to understand it, you only need to look at
the p value
• As a general rule, remember the following:
• Statistics that describe data – percentages, mean, median, mode, standard
deviation
• Statistics that test confidence – confidence intervals, p values
• Statistics that test difference – t tests and other parametric tests, Mann-Whitney
and other non parametric tests, Chi² test
• Statistics that compare risk – risk and odds ratio, risk reduction and numbers
needed to treat
Source: Medical and Health Science Statistics Made Easy by Michael Harris and
Gordon Taylor
20. Conclusion
Critical Appraisal is part of Evidence Based
Healthcare
It takes practice
Use CASP checklists
Depth of Appraisal is your choice
Only you can assess usefulness
Look at different types of study and introduce the Weight of Evidence Game next.
The figures are taken from Table 3 on page 3:
Under the column entitled ‘Proportion with recurrent headache’
The figures given in that row will form the basis of the statistical analysis we will carry out today.
The likelihood of an event occurring.
&lt; = less than
&gt; = greater than
The difference in risk between the groups.
The reduction in the rate of the outcome (result) in the Dexamethasone group relative to that in the placebo group.
What do you immediately notice about these findings?
This slide may help you to understand the difference between ARR and RRR better.
ARR is a straightforward comparison between the results in each group.
RRR is the difference in the event rates, or likelihood of an event occurring, in each group. The result is expressed as a proportion, either as a fraction or as a percentage.
So, RRR will often appear more significant than ARR.
Behind the Headlines on NHS Choices
The number needed to treat (NNT) is an epidemiological( Epidemiology is the science that studies the patterns, causes, and effects of health and disease conditions in defined populations) measure used in assessing the effectiveness of a health-care intervention, typically a treatment with medication.
The NNT is the average number of patients who need to be treated to prevent one additional bad outcome (i.e. the number of patients that need to be treated for one to benefit compared with a control in a clinical trial). The ideal NNT is 1, where everyone improves with treatment and no one improves with control. The higher the NNT, the less effective is the treatment. But variations will occur depending on the circumstances, e.g. vaccinations for large populations.
So, if our p value is 0.47 it seems that our results are more likely to have happened by chance.
Generally, the wider the CI, the less reliable the result, but bear in mind sample and study size.
It is easier to interpret the effects of the CI if they are expressed in graphical terms – see next slide.
At the mid point between 0 and 1 is the line of no effect. If the CI crosses this line, which it does in our study, this indicates that the results are not statistically significant because the range in which we can be 95% certain of the same result each time is too wide.
This is just an example to help you see how confidence intervals work.
Each study is represented by a square, with the horizontal lines showing the Confidence Intervals.
The size of each square is proportional to the study&apos;s sample size.
The shorter the confidence interval the more confident we are of the results.
If the Confidence Interval crosses the line of no effect, then the results of the study are not statistically significant.
Significance is achieved at the set level if the diamond is clear of the &apos;line of no effect’
On this made up example, the position of the CI in most cases would indicate that the treatment is not effective.
Bear in mind – if the results are not statistically significant, it does not automatically follow that they are not clinically significant..
Here is another example showing the CI more clearly.
We will look at some of the other statistical tests mentioned here:
Heterogeneity
Degrees of freedom
Chi squared test
This is relevant to this type of study as it is a meta analysis.
It would not have been appropriate to use it in our Dexamethasone study as that was a stand-alone study.