FCM 3.4 Biostatistics
FCM 3.4 Biostatistics
FCM 3.4 Biostatistics
BIOSTATISTICS
Medicine II
CONSTANT INTERVAL
− A phenomenon whose values remain the same − The exact distance between two variables can be
− More in the physical science than the life sciences determined but the zero point is arbitrary
− Examples: − Example:
• Number of minutes in one hour • Temperature (00C does not mean absence of
• Speed of light temperature; it does not follow that 800C is twice
as hot as 400C)
VARIABLES
− A phenomenon whose values or categories cannot be RATIO
predicted with certainty − Similar to interval scale but the zero point is fixed:
− Examples: the ratio of two variables can be meaningfully
• Anthropometric measures computed or interpreted
• Educational attainment − Example:
• Annual Income • Weight (0 kg means absence of weight: 100 kgs
is twice as heavy as 50 kgs)
Two Types of Variables
(1) Qualitative, (2) Quantitative DESCRIPTIVE STATISTICS
Family and Community Medicine II III-04 Biostatistics Page 2 of 9
Mean Guidelines for Measures of
− The lay persons concept of average Variability/ Spread/ Dispersion
− Can be computed by dividing the total or sum by the − The standard deviation is used when the mean is
number of observations used (i.e. with symmetric and not skewed distribution
of data)
Median − The range is used with numerical data when the
− The middle most observation in a set of observations purpose is to emphasize extreme values
put in a numerical array or order − The coefficient of variation is used when the intent
is to compare two numerical distributions, measured
Mode on different scales (e.g. Compared variability of total
− The most frequently occurring value in a set of cholesterol with the variability of vessel diameter)
observations
− Can be bimodal, multimodal or no mode in a given set MEASURES OF LOCATION
of observations
Percentile (Pi)
− 1 of 99 values of variable which divide the distribution
Characteristics of Measures of into 100 equal parts
Central Tendency − Example: NMAT of 99 percentile means that 99% of
If normally distributed or Mean = Median = Mode the examinees are lower than your score
symmetrical
Decile (Di)
If skewed to the left Mean < Median < Mode
− 1 of 9 values of variables which divides the
If skewed to the right Mean > Median > Mode distribution into 10 equal parts
Quartile (Qi)
− 1 of 3 values of variables which divides the
distribution into 4 parts
INFERENTIAL STATISTICS
CONCEPTS
Guidelines for Measures of Central Tendency 1. Hypothesis
− The mean is used for numerical data and for − A statement or belief used in the evaluation of
symmetric not skewed distributions population values
− The median is used for ordinal data or for numerical 2. Null Hypothesis (Ho)
data whose distribution is skewed − A claim that there is no difference between the
− The mode is primarily used for Bimodal distributions population mean µ and the hypothesized value µ0
− The geometric mean is used primarily for − No relationship, no association, Range= 0
observations measured on a logarithmic scale
ONE TAILED TEST
− If determining the deviation of the sample mean from
MEASURES OF µ in one direction
VARIABILITY/SPREAD/DISPERSION − Examples of research question using a one tailed test:
(1) Range, (2) Variance, (3) Standard Deviation, (4) • Is new drug is superior to the standard drug for
Coefficient of Variation hypertension?
• Has the mortality rate been reduced to those who
Range quit smoking?
− The highest value minus the lowest value
− Example: TWO-TAILED TEST
Highest value is 80 and lowest value is 20 − If determining the deviation of the sample mean from
Range= 80-20 µ in either direction
Range= 60 − Examples of research question using a two tailed test:
• Is there a difference between the cholesterol level
Variance (σ2) of men and women?
− The average of squared deviations from mean • Does the mean age of a group volunteers differ
− Determines how far the numbers lie from the mean from that of the general population?
Family and Community Medicine II III-04 Biostatistics Page 3 of 9
POWER (1-β) Application of Statistical Inference
− The ability of the study ( or the statistical test) to − A group of Barangay health workers in a family
detect an actual effect or to detect the actual planning clinic conducted a study to determine the
difference proportion of women in the community taking OCP
− (β = 0.20, 0.10) having gained weight.
• Commonly used = 0.20, 80% power
• To increase power: increase the sample size − Sample calculations:
• Practical example: missile alarm, pag masyadong
sensitive yung alarm di maganda kahit konting Compute for the parameter µ and P where P is the
galaw ng langaw madedetect pag mahina ang population proportion of students with cholesterol
power kahit na airplane na ang malapit sa paligid higher than or equal to 6.3 using the following
di pa madetect. So it doesn’t imply na mas cholesterol values of 15 college students in PLM.
maraming power mas maganda.
Cholesterol Cholesterol
P-VALUE Students Students
value value
− The probability that the observe difference (or the 8. 8.0
observed effect) is due to chance: 1. 7.8
9. 8.7
• P >0.5 (results is not significant) 2. 6.2
10. 9.0
• P < 0.05 (results is significant) 3. 5.2
11. 8.1
• P < 0.01 (results is highly significant) 4. 4.1
12. 5.3
• P < 0.001 (results is very highly significant) 5. 6.3
13. 4.2
*significant = statistically significant 6. 5.6
14. 4.7
7. 7.8
15. 9.0
CAUTION IN INTERPRETING µ=9.847
“STATISTICAL SIGNIFICANCE” P= 8/15 = 0.53
− “NOT STATISTICALLY SIGNIFICANT” does NOT prove
that Ho is true Two Types of Statistical Inference
− “STATISTICALLY SIGNIFICANT” does not imply that (1) Estimation, (2) Hypothesis Testing
the results are “CLINICALLY SIGNIFICANT” 1. Estimation
− The process by which a process is computed for a
CONFIDENCE INTERVAL random sample is used to approximate
− The interval computed from the sample data that has (“estimate”) the corresponding parameters.
a given probability that the unknown parameter − Ex. Ave. length of stay in the hospital
(example: µ) is contained in the interval. 2. Hypothesis Testing
− Common confidence intervals: 90%, 95%, and 99%. − Set of procedures that culminate in either the
− Most commonly used: 95% ejection or the non-rejection of the null
hypothesis.
True State of Nature
HO is false Schema for the Process of Hypothesis Testing
Ho is true
(H1 is true)
Correct
Type II Error
DECISION Accept Ho Decision
(β)
(1-α)
Reject Ho Correct
Type I Error
assume Decision
(α)
H1 is true (1-β)
Example:
x-µ
t= A Philippine study aims to determine whether the
s mean systolic blood pressure (SBP) of the
hypertensive patients taking their usual diet (Mean =
n 138, SD = 11.9) is different from the mean SBP of
Step 3: Select the level of significance for the ``
the same hypertensive patients after 1 month of
statistical test
taking DASH Diet (Mean = 120, SD= 12.2)
− α = 0.05
Step 4: Determine the value the test statistic must Step 1. State the research question in term of
attain to be declared significant. statistical hypothesis.
− The H0 is rejected if: t < -1.99 or t > +1.99 − Null hypothesis (Ho): the mean SBP of hypertensive
Step 5: Perform the calculation patients (µ1) is not different from the mean SBP
− Result: t = -1.67 after 1 month of taking DASH diet (µ0).
Step 6: Draw and state the conclusion. − µ1 = µ0
− Since t =-1.67 > -1.99, Ho is not rejected. − Alternative Hypothesis (H1): the mean SBP of
− Therefore the MEI in the study is not significantly hypertensive patients (µ1) is different from the
different from those in the NHANES III study. mean SBP after 1 month of taking DASH diet (µ0).
−µ ≠µ
Family and Community Medicine II III-04 Biostatistics Page 5 of 9
Step 3: Select the level of significance for the STATISTICAL TESTS BEYOND ANOVA: MULTIPLE
statistical test COMPARISON PROCEDURES
− α= 0.05
Step 4. Determine the value the test statistic must Posteriori (Post Hoc) Comparisons:
attain to be declared significant. 1. Tukey’s HSD (Honestly Significant Difference)
− The H0 is rejected if: t < -2.021 or t > +2.021 − Allows the researcher to compute a single value
Step 5: Perform the calculation that determines the minimum difference between
− Result: t = +4.72 treatment means that is necessary for significance.
Step 6. Draw and state the conclusion. 2. Scheffe
− Since t = +4.72 > +2.021, H0 is rejected. − adjusting significance levels in a linear
− Therefore, the mean SBP of hypertensive patients regression analysis to account for multiple
taking usual diet is significantly different from the comparisons
mean SBP after 1 month of taking DASH diet. 3. Student-Newman-Keuls
− Modified Tukey
ANOVA (ANALYSIS OF VARIANCE) 4. Dunnett’s Procedure
− A logical statistical extension of t-test wherein the − of one of the groups is a control
research data involve 3 or more independent (or
dependent) groups PARAMETRIC VS NON-PARAMETRIC
− Protects from “α inflation” when the researcher
compares groups in different combinations one pair at PARAMETRIC TEST
a time − A statistical test of hypothesis based on assumptions
− α inflation/ Familywise error made concerning the parameters of the population
• that the more tests you conduct at α=0.05, the from which the sample was drawn.
more likely you are to claim you have a significant − Validity depends on whether these assumptions are
result when you shouldn't have (i.e., a Type I error) satisfied or not.
• The more comparisons you make, the higher the α − Assumptions that must be satisfied by Parametric
becomes tests:
α * = 1 − (1 − α ) #ofcomparisons ) 1. Random selection of the sample
2. Normal distribution of the populations from which
“TYPES” OF ANOVA the samples were drawn
3. Equality of variances (homoscedasticity) when
more than one population is sampled.
One-Way ANOVA (F-Test)
Example:
NON-PARAMETRIC TEST
A researcher wants to know whether there will be
significant differences in weight gain (in oz.) of rats − Statistical test of hypothesis in which no attempt is
when fed with 4 different brands of cereals. made to specify and identify the form of the
population from which the sample was drawn.
Weight gain (in oz.) in rats fed with 4 different
− Generally applied to those sets of data which are not
brands of cereals
truly numerical (i.e. nominal or ordinal)
Cereal A Cereal B Cereal C Cereal D
CHI-SQUARE TEST
1 7 9 8
− Most commonly used method for comparing
1 7 6 6
proportion
1 7 5 4
− Very useful in the tests of significance of frequency
1 7 3 1
data, categorical data and qualitative data
1 7 2 1
− Non-parametric test
Step-by-step procedure:
(1) State the hypothesis “TYPES” OF CHI-SQUARE TESTS
H0 : µA = µB= µC = µD Tests include:
H1 : one or more means are different from the 1. Test of independence between 2 variables
others 2. Test of homogeneity
(2) Find the critical value 3. Test of significance of the difference between 2
Critical value (when α = 0.05): 3.24 proportions
(3) Calculate the F statistic
F=7.35 Test of Independence Between 2 Variables
F-test statistic
− Allows the investigator to compare more than two Example:
means simultaneously. A researcher wants to determine whether there is
(4) Make the decision a relationship between the gender of an individual
Since 7.35 > 3.24, H0 is rejected. and the amount of alcohol consumed.
(5) Summarize the result
At least one of the means is significantly different Alcohol consumption
from the others (i.e. there is a treatment effect) Gender Low Moderate High
Male 10 9 8
Two-way ANOVA
Family and Community Medicine II III-04 Biostatistics Page 6 of 9
H1 : The amount of alcohol that a person consumes 2
n(ad − bc )
is dependent of the individual’s gender. X2 =
2) Find the critical value (a + c )(b + d )(a + b )(c + d )
Critical value (when α = 0.05): 4.605
3) Calculate the X2 statistic Tests for 2 X 2 Contingency Tables
X2 = 0.283 1. Mcnemar’s Test for Correlated Proportions
4) Make the decision 2. Measures of Strength of Associations (RR, OR)
Since 0.283 < 4.605, H0 is not rejected. 3. Fisher’s Exact Test
5) Summarize the result
There is not enough evidence to support the claim
that the amount of alcohol a person consumes is Mcnemar’s Test for Correlated Proportions
dependent on the individual’s gender.
− This is a chi-square test application wherein the
samples are matched (and not independent)
Test of Homogeneity
− Researchers usually uses a “before-and-after” study
− Used when we wish to find out whether two or more design to test whether there has been a significant
populations have the same proportions for the
change between he before-and-after situations.
different categories of a particular variable. 2
− Expected frequencies are computed under the
X 2
=
(b − c)
assumption that populations represented in the
contingency table are homogenous with respect to the
(b + c )
Example:
variable of interest
− Assumes that categories are collectively exhaustive
Wore Wore seatbelt regularly after the
(at least one should happen i.e. rolling a dice) and
seatbelt accident
mutually exclusive(cannot occur at the same time)
regularly YES NO
and the samples being independent of one another.
before the YES a = 60 b=6
Example: accident NO c = 19 d = 15
On average, 79% of 1st time fathers are in the DR
when their children are born. An OB/Gyn resident
surveyed 300 1st time fathers to determine if they had
X2 =
(6 − 19 )2
been in the DR when their children were born. Is (6 + 19 )
there enough evidence to reject the claim that the
=6.76
proportions of those who were in the DR at the time
of birth were the same?
Measures of Strength of Associations (RR, OR)
Hospital Hospital Hospital Hospital
1. Relative risk (RR) / risk ratio
A B C D
− Widely used in research
Present 66 60 57 56
Not
− Ratio of the incidence of the outcome among
9 15 18 18 exposed to the incidence among unexposed
present
− Can be directly calculated only in a cohort or
experimental study design
Test of Significance of the Difference Between 2
Proportions 𝑎
𝑅𝑖𝑠𝑘 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠 𝑖𝑓 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝑎 + 𝑏)
𝑅𝑅 = = 𝑐
Example: 𝑅𝑖𝑠𝑘 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑓 𝑛𝑜𝑡 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝑐 + 𝑑)
For several years, there has been a controversy
over the efficacy of Vitamin C in preventing the Disease Disease
common cold. The research data shows that 63% of Risk factor Total
present absent
the children treated with Vitamin C and 76% of the
Present A b a+b
placebo group caught colds. Does the number of
Absent C d c+d
developing colds differ between the 2 groups?
Interpretation
Vitamin C Placebo
Status − The disease for other health related outcome is [RR]
group group
Children free of times more likely to occur among those exposed to
21 11 the suspected risk factor than among those with no
colds
Children exposure.
36 35 − The larger the value of RR, the stronger the
developing colds
association between the disease in question and the
2x 2 Contingency Tables exposure of the risk factor = This statistical measure
is called Effect size
− The most common chi-square analysis used in health
research involves data presented in a 2 x 2 (four-fold)
Example:
table in which there are 2 groups and 2 possible
In a group of retirees, a community health survey
responses
revealed the relationship between smoking and the
− Used to determine whether the distribution of one
presence of heart disease.
variable is conditionally dependent (contingent) on
Family and Community Medicine II III-04 Biostatistics Page 7 of 9
25 − For rare diseases, OR may be interpreted as an
𝑅𝑅 = 35 = 3.3 estimate of RR.
45
65
Interpretation Example:
− Based on the results of the survey, smokers have In a controversial study of the relationship between
approximately 3.3 times the risk of developing heart coffee consumption and pancreatic cancer, MacMahon
disease as non-smokers et al. (1981) interviewed 369 cancer patients and 644
controls.
2. Odds ratio (OR) / relative odds
− OR compares the odds that a disease (or other Coffee
Male pancreatic
drinking Male controls
health related-outcome) will occur among cancer patients
individuals who have a particular characteristic (or (Cups per day)
who have been exposed to a risk factor) to the odds >5 or =5 a = 60 b = 82
that the disease will occur in individuals who lack 0 c=9 d = 32
the characteristic (or who have not been exposed)
− OR can be directly calculated from the data of a OR = (60 x 32) / (82 x 9)
cohort or an experimental study = 2.6
− OR can be indirectly calculated from the data of a − Their findings, in part, showed that the patients were
case-control study or cross-sectional study much more likely than the controls to have been
(Prevalence odds ratio/POR. For example, a POR heavy coffee drinkers.
of 4.75 means that the odds of having disease are − We would estimate from these results that habitual
4.75 times higher for the exposed than the heavy coffee use increased the risk of pancreatic
unexposed group.) cancer in men by a factor of 2.6 relative to men who
did not drink coffee.
*ODDS RATIO (OR) FOR COHORT/EXPERIMENTAL
STUDY Fisher’s Exact Test
− Used when the expected frequencies in a contingency
𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑎𝑑 table is “too small”
𝑂𝑅 = =
𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 ℎ𝑒𝑎𝑙𝑡ℎ𝑦 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 ℎ𝑎𝑠 𝑏𝑒𝑒𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑏𝑐 − There is an expected frequency that is <1
− There is >20% of the cells that have expected
Interpretation frequency of <5
− “The odds having the disease in question are [OR]
times greater among those exposed to the suspected
p − value =
(a + b )!(c + d )!(a + c)!(b + d )!
risk factor than among those with no such exposure.”
− For rare diseases, OR approximates RR.
N! a!b!c! d!
− The larger the value of OR, the stronger the Example:
association between the disease in question and An infant heart transplant surgeon had 9 infant
exposure to the risk factor. patients who needed a heart transplant. Only 5 suitable
doctors were identified. A follow-up of the 9 patients
*ODDS RATIO (OR) FOR CASE-CONTROL STUDY was done a year later to see if there was a difference in
the survival rates of those with and those without heart
𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑑 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 ℎ𝑎𝑠 𝑏𝑒𝑒𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑎𝑑
𝑂𝑅 = = transplants.
𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 ℎ𝑒𝑎𝑙𝑡ℎ𝑦 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 ℎ𝑎𝑠 𝑏𝑒𝑒𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑏𝑐
Alive 12 months later
Surgery performed? Yes No Total
Interpretation
Yes A=4 B=1 5
− “The odds of having the disease are [OR] times No C=1 D=3 4
greater among individuals exposed to the risk factor Total 5 4 9
than among those not exposed.”
− “It is [OR] times more likely that a diseased
individual has been exposed to the risk factor than a
healthy individual.”
TEST FORMULA
2
n(ad − bc )
2X2 Contingency table X2 =
(a + c )(b + d )(a + b )(c + d )
McNemar’s Test for
X2 =
(b − c )2
Correlated Proportions (b + c )
𝑎
Relative Risk/ 𝑅𝑖𝑠𝑘 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠 𝑖𝑓 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝑎 + 𝑏)
𝑅𝑅 = = 𝑐
Risk Ratio 𝑅𝑖𝑠𝑘 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑓 𝑛𝑜𝑡 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 (𝑐 + 𝑑)
Odds Ratio for Cohort/ 𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑎𝑑
𝑂𝑅 = =
Experimental study 𝑜𝑑𝑑𝑠 𝑡ℎ𝑎𝑡 𝑎 ℎ𝑒𝑎𝑙𝑡ℎ𝑦 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 ℎ𝑎𝑠 𝑏𝑒𝑒𝑛 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑏𝑐
Family and Community Medicine II III-04 Biostatistics Page 8 of 9
NON-PARAMETRIC TESTS − There is a significant reduction in the smoking habit
subsequent to pregnancy.
WILCOXON RANK-SUM TEST
(MANN-WHITNEY U TEST) KRUSKALL-WALLIS ONE-WAY ANOVA
− A non-parametric counterpart of independent samples (KRUSKALL-WALLIS H TEST)
t-test − Non-parametric equivalent of one-way ANOVA (>3
− This test is used instead of independent samples t- groups, independent samples)
test if the data are significantly skewed − Useful if the data are ordinal (ranked)
− Assumption: observations are independent
Step-by-step procedure: SIGN TEST (ONE-SIDED T-TEST)
(1) State the hypothesis. − Test the hypothesis that there is "no difference in
(2) Find the critical value. medians" between the continuous distributions of
(3) Compute the test value U: two random variables X and Y, in the situation when
a) Combine data from 2 groups and rank each we can draw paired samples from X and Y
value
b) If different sample sizes, get the sum of the SPEARMAN RANK-ORDER CORRELATION
group with the smaller sample sizes COEFFICIENT (SPEARMAN 𝝆)
(4) Make the decision. − Non-parametric equivalent of Pearson r
(5) Summarize the results. − Tests the association between 2 ranked variables
− Converts the observations to so called Ranks to limit
Example: the influence of an outlier.