The document provides an introduction to common measures and statistics used in epidemiological literature. It defines measures of frequency such as risk, rate, and prevalence which characterize the occurrence of health outcomes in a population. Risk ratios and odds ratios are presented as measures of the strength of association between an exposure and outcome. The null value in epidemiological studies is 1.0 for risk ratios, rate ratios, and odds ratios, representing no association. Confidence intervals are discussed as a measure of precision of an estimate. Crude and adjusted values are also explained as ways to account for confounding factors.
1 of 36
More Related Content
Common measures and statistics in epidemiological literature
1. Common Measures and Statistics
in Epidemiological Literature
Dr. Mohammed Jawad
2. Introduction
• For the non-epidemiologist or non-
statistician, understanding the statistical
nomenclature presented in journal articles
can sometimes be challenging, particularly
since multiple terms are often used
interchangeably, and still others are
presented without definition.
• This lecture will provide a basic introduction
to the terminology commonly found in
epidemiological literature.
4. Measures of
frequency
• Measures of frequency characterize the
occurrence of health outcomes, disease, or
death in a population.
• These measures are descriptive in nature
and indicate how likely one is to develop a
health outcome in a specified population.
• The three most common measures of health
outcome or frequency are risk, rate, and
prevalence.
5. Risk
• Risk, also known as incidence, cumulative
incidence, incidence proportion, or attack
rate (although not really a rate at all) is a
measure of the probability of an unaffected
individual developing a specified health
outcome over a given period of time. For a
given period of time (i.e.: 1 month, 5 years,
lifetime):
• A 5-year risk of 0.10 indicates that an
individual at risk has a 10% chance of
developing the given health outcome over a
5-year period of time.
6. Risk ratio and rate ratio
Risk ratios or rate ratios are commonly
found in cohort studies and are
defined as: the ratio of the risk in the
exposed group to the risk in the
unexposed group or the ratio of the
rate in the exposed group to the rate
in the unexposed group
7. Risk ratio
and rate
ratio
• Risk ratios and rate ratios are measures of the strength of the association
between the exposure and the outcome.
• How is a risk ratio or rate ratio interpreted?
• A risk ratio of 1.0 indicates there is no difference in risk between the
exposed and unexposed group.
• A risk ratio greater than 1.0 indicates a positive association, or
increased risk for developing the health outcome in the exposed
group.
• A risk ratio of 1.5 indicates that the exposed group has 1.5 times the
risk of having the outcome as compared to the unexposed group.
• Rate ratios can be interpreted the same way but apply to rates
rather than risks.
8. Risk ratio
and rate
ratio
• A risk ratio or rate ratio of less than 1.0
indicates a negative association between the
exposure and outcome in the exposed group
compared to the unexposed group.
• In this case, the exposure provides a
protective effect.
• For example, a rate ratio of 0.80 where the
exposed group received a vaccination for
Human Papillomavirus (HPV) indicates that
the exposed group (those who received the
vaccine) had 0.80 times the rate of HPV
compared to those who were unexposed
(did not receive the vaccine).
9. Risk ratio
and rate
ratio
• One of the benefits the measure risk
difference has over the risk ratio is that it
provides the absolute difference in risk,
information that is not provided by the ratio
of the two.
• A risk ratio of 2.0 can imply both a doubling
of a very small or large risk, and one cannot
determine which is the case unless the
individual risks are presented.
11. Odds ratio
• The odds ratio is used in place of the risk ratio
or rate ratio in case-control studies.
• In this type of study, the underlying population
at risk for developing the health outcome or
disease cannot be determined because
individuals are selected as either diseased or
non-diseased or as having the health outcome
or not having the health outcome.
• An odds ratio may approximate the risk ratio or
rate ratio in instances where the health
outcome prevalence is low (less that 10%) and
specific sampling techniques are utilized,
otherwise there is a tendency for the OR to
overestimate the risk ratio or rate ratio.
12. Odds ratio
• The odds ratio is interpreted in the same
manner as the risk ratio or rate ratio with an
OR of 1.0 indicating no association, an OR
greater than 1.0 indicating a positive
association, and an OR less than 1.0
indicating a negative, or protective
association.
13. The null
value
• The null value is a number corresponding to
no effect, that is, no association between
exposure and the health outcome. In
epidemiology, the null value for a risk ratio
or rate ratio is 1.0, and it is also 1.0 for odds
ratios and prevalence ratios (terms you will
come across). A risk ratio, rate ratio, odds
ratio or prevalence ratio of 1.0 is obtained
when, for a risk ratio for example, the risk of
disease among the exposed is equal to the
risk of disease among the unexposed.
14. The null
value
• Statistical testing focuses on the null
hypothesis, which is a statement predicting
that there will be no association between
exposure and the health outcome (or
between the assumed cause and its effect),
i.e. that the risk ratio, rate ratio or odds ratio
will equal 1.0.
• If the data obtained from a study provide
evidence against the null hypothesis, then
this hypothesis can be rejected, and an
alternative hypothesis becomes more
probable.
15. The null
value
• For example, a null hypothesis would say that there
is no association between children having cigarette
smoking mothers and the incidence of asthma in
those children.
• If a study showed that there was a greater incidence
of asthma among such children (compared with
children of nonsmoking mothers), and that the risk
ratio of asthma among children of smoking mothers
was 2.5 with a 95% confidence interval of 1.7 to 4.0,
we would reject the null hypothesis.
• The alternative hypothesis could be expressed in
two ways: 1) children of smoking mothers will have
either a higher or lower incidence of asthma than
other children, or 2) children of smoking mothers
will only have a higher incidence of asthma.
16. The null
value
• The first alternative hypothesis involves
what is called a "two-sided test" and is used
when we simply have no basis for predicting
in which direction from the null value
exposure is likely to be associated with the
health outcome, or, in other words, whether
exposure is likely to be beneficial or harmful.
• The second alternative hypothesis involves a
"one-sided test" and is used when we have a
reasonable basis to assume that exposure
will only be harmful (or if we were studying
a therapeutic agent, that it would only be
beneficial).
18. The p-value
• The "p" value is an expression of the
probability that the difference between the
observed value and the null value has
occurred by "chance", or more precisely, has
occurred simply because of sampling
variability.
• The smaller the "p" value, the less likely the
probability that sampling variability accounts
for the difference.
19. The p-
value
• Typically, a "p" value less than 0.05, is used
as the decision point, meaning that there is
less than a 5% probability that the difference
between the observed risk ratio, rate ratio,
or odds ratio and 1.0 is due to sampling
variability.
• If the "p" value is less than 0.05, the
observed risk ratio, rate ratio, or odds ratio is
often said to be "statistically significant."
However, the use of 0.05 as a cut-point is
arbitrary.
20. The p-
value
• The exclusive use of "p" values for
interpreting results of epidemiologic studies
has been strongly discouraged in the more
recent texts and literature because research
on human health is not conducted to reach a
decision point (a "go" or "no go" decision),
but rather to obtain evidence that there is
reason for concern about certain exposures
or lifestyle practices or other factors that
may adversely influence the health of the
public.
21. The p-
value
• Statistical tests of significance, (such as p-
values) were developed for industrial quality-
control purposes, in order to make a decision
whether the manufacture of some item is
achieving acceptable quality. We are not making
such decisions when we interpret the results of
research on human health.
• The lower bound of the 95% confidence interval
is also often utilized to decide whether a point
estimate is statistically significant, i.e. whether
the measure of effect (e.g. the ratio 2.5 with a
lower bound of 1.8) is statistically different than
the null value of 1.0.
23. Confidence
interval
• A confidence interval expresses the extent of
potential variation in a point estimate (the
mean value or risk ratio, rate ratio, or odds
ratio).
• This variation is attributable to the fact that
our point estimate of the mean or risk ratio,
rate ratio, or odds ratio is based on some
sample of the population rather than on the
entire population.
24. Confidence
interval
• For example, from a clinical trial, we might
conclude that a new treatment for high
blood pressure is 2.5 times as effective as
the standard treatment, with a 95%
confidence interval of 1.8 to 3.5.
• 2.5 is the point estimate we obtain from this
clinical trial. But not all subjects with high
blood pressure can be included in any study,
thus the estimate of effectiveness, 2.5, is
based on a particular sample of people with
high blood pressure.
25. Confidence
interval
• If we assume that we could draw other samples of
persons from the same underlying population as the
one from which subjects were obtained for this
study, we will obtain a set of point estimates, not all
of which would be exactly 2.5. Some samples would
be likely to show an effectiveness less than 2.5, and
some greater than 2.5.
• The 95% CI is an interval that would contain the
true, real (population) parameter value 95% of the
time if you repeated the experiment/study.
• So if we were to repeat the experiment/study, 95
out of 100 intervals would give an interval that
contains the true risk ratio, rate ratio or odds ratio
value.
26. Confidence
interval
• Remember, that you can only interpret the
CI in relation to talking about repeated
sampling.
• Thus we can also say that the new treatment
for high blood pressure is 2.5 times as
effective as the standard treatment, but this
measure could range from a low of 1.8 to a
high of 3.5.
27. Confidence
interval
• The confidence interval also provides
information about how precise an estimate
is.
• The tighter, or narrower, the confidence
interval, the more precise the estimate.
• Typically, larger sample sizes will provide a
more precise estimate.
• Estimates with wide confidence intervals
should be interpreted with caution.
29. Crude and
adjusted
values
• There are often two types of estimates
presented in research articles, crude and
adjusted values.
• Crude estimates refer to simple measures that
do not account for other factors that may be
driving the estimate.
• For instance, a crude death rate would simply
be the number of deaths in a calendar year
divided by the average population for that year.
• This may be an appropriate measure in certain
circumstances but could become problematic if
you want to compare two or more populations
that vary on specific factors known to contribute
to the death rate.
30. Crude and
adjusted
values
• For example, you may want to compare the
death rate for two populations, one of which is
located in a high air pollution area, to determine
if air pollution levels affect the death rate.
• The high air pollution population may have a
higher death rate, but you also determine that it
is a much older population. As older individuals
are more likely to die, age may be driving the
death rate rather than the pollution level.
31. Crude and
adjusted
values
• To account for the difference in age distribution
of the populations, one would want to calculate
an adjusted death rate that adjusts for the age
structure of the two groups.
• This would remove the effect of age from the
effect of air pollution on mortality.
32. Crude and
adjusted
values
• Adjusted estimates are a means of
controlling for confounders or accounting for
effect modifiers in analyses. Some factors
that are commonly adjusted for include
gender, race, socioeconomic status, smoking
status, and family history.
33. Practice
Questions
1. Based on the following table, calculate the
requested measures. Also provide the
definition for each measure in one
sentence.
a. The risk ratio comparing the exposed and
the unexposed study participants
b. The risk difference between the exposed
and the unexposed study participants
c. The prevalence of the disease among the
entire study sample, assuming the disease
is a long-term, chronic disease with no cure
and assuming no study participants have
died.
35. Practice
Questions
2. Interpret the following risk ratios in words.
a. A risk ratio= 1.0 in a study where researchers
examined the association between consuming
a certain herbal supplement (the exposure) and
developing arthritis.
b. A risk ratio= 2.6 in a study where researchers
examined the association between ever having
texted while driving (the exposure) and being
in a car accident.
c. A risk ratio = 0.75 in a study where
researchers examined the association between
≥ 30 minutes of daily exercise (the exposure)
and heart disease.
36. References
Alexander LK, Lopes B, Ricchetti-Masterson
K, Yeatts KB. Common Measures and
Statistics in Epidemiological Literature.
Epidemiologic Research and Information
Center (ERIC) Notebook. Second Edition.
2015.