Multiple Testing in QTL Mapping: Lucia Gutierrez Lecture Notes Tucson Winter Institute
Multiple Testing in QTL Mapping: Lucia Gutierrez Lecture Notes Tucson Winter Institute
in QTL Mapping
Lucia Gutierrez lecture notes
Tucson Winter Institute
1
Multiple Testing
2
Multiple Testing
3
Multiple Testing
WHAT IS WRONG WITH THE PREVIOUS CARTOON?
A common practice is to use P<0.05 to decide about significance of a
test. But with large number of tests (as is common in QTL mapping were
we perform one test at each marker), the chance of having at least one
false positive reaches 1 pretty quickly.
4
Hypothesis Testing (in QTL context)
OUTCOMES OF A STATISTICAL TEST
False True
Positive Positive
RESULT Reject H0
FROM
TEST
Do not
reject H0
True False
Negative Negative
H0 is True H0 is False
(i.e. no QTL) (i.e. a QTL)
“THE TRUTH”
False True
Positive Positive
RESULT Reject H0
FROM
TEST
Do not
reject H0
True False
Negative Negative
H0 is True H0 is False
(i.e. no QTL) (i.e. a QTL)
“THE TRUTH”
6
B5
Type II Error
Type II error: It is the failure to reject a true null hypothesis (when there is
a QTL and we fail to declare it) in a given test.
Probability of Type II error (β): It is the probability of finding a false
negative (failure to declare a true QTL) in a given test.
False True
Positive Positive
RESULT Reject H0
FROM
TEST
Do not
reject H0
True False
Negative Negative
H0 is True H0 is False
(i.e. no QTL) (i.e. a QTL)
“THE TRUTH”
7
B5
Type I and II Error
NOTE that for a given test, decreasing the false positive rate means that
the power (the proportion of true positives, or 1- β) is also decreased.
The only way to decrease false positives and increase the power is by
changing your design (i.e. increasing the population sizes, reducing
experimental error, etc.).
FROM
TEST
Do not Do not
reject H0 reject H0
True False True False
Negative Negative Negative Negative
8
B5
Multiple Testing
Let’s now assume that we are not interested in a single hypothesis testing,
but we are conducting multiple hypothesis testing. For example, we are
performing one hypothesis testing on each marker (at each marker we ask
the question about whether the marker is associated to a QTL or not). We
could summarize the information of the number of hypothesis that follow
each category in the following chart:
Decision
“Truth” Do not reject H0 Reject H0 Total
H0 is true U V m0
H0 is false Z S m1=m-m0
Total m-R R m
9
Benjamini and Hochberg, 1995
Multiple Testing
Decision
“Truth” Do not reject H0 Reject H0 Total
H0 is true U V m0
H0 is false Z S m1=m-m0
Total m-R R m
Bonferroni:
Since the increase in the error is related to the number of independent tests,
Bonferroni proposed to use a new alpha value as threshold:
α * = 1 − (1 − α )
1
m
α
≅
m
However, many studies have shown that a Bonferroni correction for QTL
studies is overly conservative mainly because tests are not independent (i.e.
markers are linked and therefore not-independent). Having an unnecessarily
stringent threshold reduces power to detect true QTL as has been shown.
11
Benjamini and Hochberg, 1995; Li and Ji, 2005
Multiple Testing – Family control
Li and Ji (2005):
An alternative is to use a Bonferroni correction but instead of using the total
number of tests (which we know are not independent because markers are
correlated), is to use the effective number of independent tests. This idea
was first proposed by Cheverud (2001) and then modified by Li and Ji
(2005). The steps are:
1. Calculate the correlation matrix of markers.
2. Use the number of significantly different from zero eigenvalues (λi) of the
correlation matrix to determine the effective number of independent tests
f (λi )
(Meff) as:
∑
M
M eff = i =1
f ( x ) = I ( x ≥ 1 ) + ( x − x ), x ≥ 0
α
≅ 12
Meff
Li and Ji, 2005
Multiple Testing – Family control
Li and Ji (2005):
This method have been shown to be better than the Bonferroni correction
and the Cheverud (2001) method.
It performs equally good as permutation but is fast and simple to perform.
13
Li and Ji, 2005
Multiple Testing – False Discovery
Decision
“Truth” Do not reject H0 Reject H0 Total
H0 is true U V m0
H0 is false Z S m1=m-m0
Total m-R R m
14
Benjamini and Hochberg, 1995
Multiple Testing – False Discovery
False Discovery Rate (FDR): The steps to use the False Discovery
Rate are as follow:
1. Order the observed p-values: p(1) ≤ ... ≤ p(m)
2. Calculate an arithmetic sequence as follows: i
α
m
i
3. Reject all hypothesis where: k = max i : p (i ) ≤ α
1≤ i ≤ m m
15
Benjamini and Hochberg, 1995
Multiple Testing - Permutation
Permutations (Broman and Sen, 2009).
16
Multiple Testing
17
Multiple Testing
SO WHAT WAS WRONG WITH THE PREVIOUS CARTOON?
A common practice is to use P<0.05 to decide about significance of a
test. But with large number of tests the chance of having at least one
false positive is close to 1.
18