STAT2120: Categorical Data Analysis Chapter 1: Introduction
STAT2120: Categorical Data Analysis Chapter 1: Introduction
Chapter 1: Introduction
Spring 2012
Department of Mathematics
Hong Kong Baptist University
1 / 51
1.1 Categorical Response Data
The subject of this course is to study how to analyze a data
set in which we have one response variable (dependent
variable or Y variable) and one or more explanatory variables
(independent variable or X variable), where the response
variable is categorical and the explanatory variables may be
categorical or continuous.
What is a categorical variable?
A categorical variable has a measurement scale consisting
of a set of categories. Categorical variables are common in
many areas, e.g., in social sciences, health sciences, behavioral
sciences, zoology, education, marketing, engineering Sciences,
industrial quality control, and etc.
2 / 51
There are two kinds of categorical variables: ordinal and nominal.
Categorical variables having ordered scales are called ordinal
variables. Many categorical scales have a natural ordering.
Examples include:
(a) patient condition (excellent, good, fair, poor),
(b) government spending (too high, about right, too low),
(c) frequency of feeling anxiety (never, occasionally, often).
Categorical variables having unordered scales are called
nominal variables. Examples include:
(a) transportation to work (car, bicycle, bus, subway, walk),
(b) favorite type of music (classical, country, folk, jazz, rock),
(c) religious aliation (Catholic, Jewish, Muslim, Buddhism).
3 / 51
For nominal variables, the order of listing the categories is
irrelevant. The statistical analysis should not depend on that
ordering. That is, methods designed for nominal variables
should give the same results no matter how the categories are
listed.
Methods designed for ordinal variables utilize the category
ordering. Methods designed for ordinal variables cannot be
used with nominal variables, since nominal variables do not
have ordered categories.
Methods designed for nominal variables can be used with
nominal or ordinal variables. However, when used with ordinal
variables, they do not use the information in ordering so that
we may suer a serious loss in power.
4 / 51
1.2 Probability Distributions for Categorical Data
Recall that in STAT2110, the response variable is assumed to
follow a normal distribution.
We introduce four important discrete distributions used for
modeling categorical data:
(1) Binomial distribution
(2) Multinomial distribution
(3) Poisson distribution
(4) Negative Binomial distribution
5 / 51
(1) Binomial distribution
The binomial distribution is based on the idea of a Bernoulli
trial (two possible outcomes: success and failure). Let the
probability of having a success is in a Bernoulli trial. Then
for a series of independent Bernoulli trials with the same
probability of having a success, the total number of successes
out of n trials denes a Binomial distribution.
Let X Binomial(n, ). The probability mass function (pmf)
of X is
P(X = x) =
_
n
x
_
x
(1 )
nx
, for x = 0, . . ., n.
The mean and variance of X are
E(X) = n, Var(X) = n(1 ).
6 / 51
Example 1
(a) Let X Binomial(3, 0.3). We have
P(X 1) = P(X = 0) + P(X = 1)
=
_
3
0
_
0.7
3
+
_
3
1
_
0.3 0.7
2
= 0.784.
(b) Let X Binomial(10, 0.3). We have
P(X 2) = P(X = 0) + P(X = 1) + P(X = 2)
=
_
10
0
_
0.7
10
+
_
10
1
_
0.3
1
0.7
9
+
_
10
2
_
0.3
2
0.7
8
= 0.3828.
(c) Let X Binomial(100, 0.3). We have
P(X 25) =
25
i =0
_
100
i
_
0.3
i
(1 0.3)
100i
= ?
7 / 51
Probability Mass Function of Binomial(n, 0.3)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
.
1
0
.
2
0
.
3
0
.
4
Binomial(3,0.3)
i
d
e
n
s
i
t
y
0 2 4 6 8 10
0
.
0
0
0
.
1
0
0
.
2
0
Binomial(10,0.3)
i
d
e
n
s
i
t
y
0 5 10 15 20 25 30
0
.
0
0
0
.
0
5
0
.
1
0
0
.
1
5
Binomial(30,0.3)
i
d
e
n
s
i
t
y
0 20 40 60 80 100
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
0
.
0
8
Binomial(100,0.3)
i
d
e
n
s
i
t
y
8 / 51
The binomial distribution is symmetric when = 0.5;
The binomial distribution is right-skewed when < 0.5;
The binomial distribution is left-skewed when > 0.5.
Asymptotic result for Binomial(n, ): When n is large, the
binomial distribution can be approximated by a normal
distribution with = n and
2
= n(1 ). That is,
Binomial(n, ) N(n, n(1 )) as n is large.
Rule of Thumb: We apply for the normal approximation when
both n 5 and n(1 ) 5. Examples include:
(a) When = 0.5, we require only n 10.
(b) When = 0.1 or = 0.9, we require n 50.
9 / 51
Example 1 - ctd
When n = 100 and = 0.3, we have n = 30 5 and
n(1 ) = 70 5. The normal approximation suggests that
X approximately follows
N(n, n(1 )) = N(30, 21).
This leads to
P(X 25) = P(X 25.5)
= P(
X 30
21
25.5 30
21
)
(0.9819805)
= 0.1630547.
Remark: The true probability of P(X 25) is 0.1631301.
10 / 51
(2) Multinomial distribution
Some trials may have more than two possible outcomes.
Example are:
(a) The possible outcome of Dr Tongs teaching evaluation
can be Excellent, Very Good, Good, Fair or Poor.
(b) The possible genotype of an individual can be AA, Aa or
aa, where A represents dominant and a represents recessive.
When the trials are independent with the same categories
probabilities for each trial, the counts in the various categories
denes a Multinomial distribution.
11 / 51
Specically, let c denote the number of all possible categories,
and denote their probabilities by {
1
,
2
, . . . ,
c
} with
j
j
= 1. For n independent observations, assume that X
1
fall in category 1, X
2
fall in category 2, . . . , and X
c
fall in
category c. We have n =
j
X
j
.
For any non-negative counts n
1
, . . . , n
c
with
j
n
j
= n, the
pmf of (X
1
, . . . , X
c
) is
P(X
1
= n
1
, , X
c
= n
c
) =
n!
n
1
!n
2
! n
c
!
n
1
1
n
2
2
n
c
c
.
The binomial distribution is a special case of the multinomial
distribution with c = 2. The marginal distribution of the
count in any particular category is also binomial. Thus for any
category j , we have E(X
j
) = n
j
and Var(X
j
) = n
j
(1
j
).
12 / 51
(3) Poisson distribution
Many discrete response variables have counts as possible
outcomes. Examples include:
(a) the number of parties attended in the past month,
(b) the number of car accidents in a certain city per week,
(c) the number of phone calls arriving at a center per minute.
Note that the above counts usually have no maximum value.
Therefore, it is not appropriate to use the binomial or
multinomial distribution to model these data.
The simplest distribution to model these data is the Poisson
distribution. Let X Poisson() where is the parameter.
Then X can take any non-negative integer value with pmf
P(X = x) =
e
x
x!
, x = 0, 1, 2, . . . .
13 / 51
The Poisson distribution is unimodal and right-skewed for any
parameter . The mean and variance of X are
E(X) = Var(X) = .
When n is large and is small, the binomial distribution
Binomial(n, ) can be approximated by a Poisson distribution
with parameter = n.
(Revisit Example 1) If we apply the Poisson approximation for
Binomial(100, 0.3), we have = 100 0.3 = 30 and
P(X 25) =
25
x=0
e
30
30
x
x!
= 0.2083574.
Given that the true value is 0.1631301, this does not provide a
good approximation. [Reason: is not small enough.]
14 / 51
(4) Negative Binomial distribution
The Poisson distribution is the simplest to model count data.
However, Poisson has only one parameter and does not
allow for the variance to be adjusted independently of the
mean.
For count data, the variance may not be guaranteed to be the
same as the mean value.
Example: Consider the following sample data:
5, 16, 50, 71, 123.
The sample variance is 2226.5, which is much larger than the
sample mean at 53. This suggests that Poisson may not be
appropriate to model this sample data.
15 / 51
Denition
(1) When the observed variance is greater than the expected
variance, we say overdispersion has occurred; (2) When the
observed variance is less than the expected variance, we say
underdispersion has occurred.
Overdispersion or underdispersion is not an issue in the
normal distribution because the normal has a separate
parameter
2
from the mean to measure variability.
In practice, overdispersion is more commonly observed than
underdispersion.
16 / 51
When overdispersion is observed, an alternative to Poisson is
the Negative Binomial (NB) distribution.
NB is dened to be the total number of failures in Bernoulli
trials that leads to the kth success. Let X NB(k, ). The
pmf for X is
P(X = x) =
_
x + k 1
k 1
_
k
(1 )
x
, x = 0, 1, 2, ....
For NB, we have
E(X) =
k(1 )
, Var(X) =
k(1 )
2
.
17 / 51
For convenience, we do the following parameterization for NB.
Let = k(1 )/. Then
=
1
1 + /k
=
1
1 + D
,
where D = 1/k is referred to as the dispersion parameter.
With and D, we denote the parameterized NB to be
X NB(, D) with pmf
P(X = x) =
_
x + D
1
1
D
1
1
__
1
1 + D
_
D
1 _
D
1 + D
_
x
.
18 / 51
For the parameterized NB, we have
E(X) = , Var(X) = + D
2
.
This mean-variance relationship provides enough exibility for
modeling real data.
When D 0, the above NB distribution converges to the
Poisson distribution with parameter . In this sense, NB can
be treated as a generalized Poisson distribution.
In fact, NB can be derived as a Poisson-Gamma mixture, i.e.,
a Poisson distribution in which the mean itself is random that
follows the Gamma distribution.
19 / 51
1.3 Statistical Inference for Population Parameters
1.3.1 Maximum Likelihood Estimation
In practice, the parameter values for Binomial, Multinomial,
Poisson and NB are unknown; they are often estimated by the
sample data.
In this section, we consider to estimate the parameters using
Maximum likelihood.
The likelihood function is the probability (or density) of the
observed data, expressed as a function of the parameter value.
The maximum likelihood estimate (MLE) is the parameter
value at which the likelihood function is maximized.
20 / 51
Let X
1
, . . . , X
n
be i.i.d. random variables from a population
with pmf or pdf f (x|), where = (
1
, . . . ,
k
) R
k
. The
likelihood function is given by
(|x) = (|x
1
, . . . , x
n
) =
n
i =1
f (x
i
|).
The log-likelihood function is L(|x) = ln (|x).
The MLE of , denoted by
MLE
, is the value of that
maximizes L(|x) or (|x). If the likelihood function is
dierentiable in all
i
, the MLE can be obtained by solving the
following likelihood equations,
i
L(|x) = 0, i = 1, . . . , k.
21 / 51
Example 2
(1) Let X
1
, . . . , X
n
iid
Bernoulli(). The MLE of is
MLE
=
1
n
n
i =1
X
i
X
n
,
where X is the total number of successes in n trials.
(2) Let X
1
, . . . , X
n
iid
Binomial(m, ), m known. The MLE of is
MLE
=
1
mn
n
i =1
X
i
.
(3) Let X
1
, . . . , X
n
iid
Poisson(). The MLE of is
MLE
=
1
n
n
i =1
X
i
=
X.
22 / 51
Invariance Property of MLEs
Theorem
(Invariance Property of MLEs) If
MLE
is the MLE of , then for
any function (), the MLE of () is (
MLE
).
Example: Let X
1
, . . . , X
n
iid
Poisson(). Note that the MLE of
is
MLE
=
X. By the invariance property of MLEs, we have
(a) The MLE of
2
is
X
2
;
(b) The MLE of
is
X;
(c) The MLE of ln() is ln(
X).
Example: Let X
1
, . . . , X
n
iid
Bernoulli(). The MLE of
_
p(1 p)
is
_
X
n
(1
X
n
).
23 / 51
Asymptotic Normality of MLEs
Theorem
(Asymptotic Normality of MLEs) Let X
1
, . . . , X
n
be i.i.d random
variables from f (x|), and
n
be the MLE of . Then under some
regularity conditions, as n ,
n
_
_
N(0, v()) in distribution,
where v() is the Cramer-Rao Lower Bound on the variance of any
unbiased estimators of .
Remark: The property that the distribution converges to the
normal distribution as the sample size goes to innity is called
asymptotic normality. The asymptotic normality of MLEs is an
important property and will be repeatedly used in this course.
24 / 51
1.3.2 Signicance Tests about Population Parameters
A hypothesis is a statement about a population. The two
complementary hypotheses in a hypothesis testing problem are
called the null hypothesis and the alternative hypothesis. We
denote them by H
0
and H
1
, respectively.
Let denote a population parameter of interest. The general
format of the null and alternative hypotheses about is
H
0
:
0
versus H
1
:
c
0
,
where =
0
c
0
is the entire parameter space.
25 / 51
A hypothesis test is a rule that decides, based on a sample
from the population, which of the two complementary
hypotheses is true. Specically, we are interested in:
(a) for which sample values the decision is made to accept H
0
;
(b) for which sample values H
0
is rejected and H
1
is accepted.
Accept H
0
Reject H
0
H
0
is true correct decision type I error
H
1
is true type II error correct decision
Typically, a hypothesis test consists of the following three
components:
(a) a test statistic T(X) = T(X
1
, . . . , X
n
),
(b) a signicance level ,
(c) a critical region that suggests rejection of H
0
based on
the observed test statistic value.
26 / 51
Statistical Test about a Binomial Proportion
Recall that the MLE of the proportion is the sample
proportion
MLE
= X/n, where X is the total number of
successes in n trials.
For ease of notation, we use p to represent the sample
proportion. That is,
p
MLE
= X/n.
Remark: Greek letter corresponds to Roman letter p.
The sample proportion p has mean and variance
E(p) = , Var(p) =
(1 )
n
.
27 / 51
The sample proportion p is an unbiased estimator of . In
addition, the variance of p decreases toward zero when n
increases. This shows that p is a consistent estimator of .
By the asymptotic normality of MLEs (or by Central Limit
Theorem), the sampling distribution of p is approximately
normal for large n. This suggests to apply large-sample
inferential methods for .
Consider test H
0
: =
0
versus H
1
: =
0
. Let the test
statistic be
z =
p
0
_
0
(1
0
)/n
.
Under H
0
, the large-sample distribution of the z test statistic
is the standard normal distribution.
28 / 51
Large-Sample Decision: For a given sample, we reject H
0
if
|z
obs
| > z
/2
,
where is the signicance level and z
is the upper th
percentile of the standard normal distribution.
For the two-sided test, let P-value = P(|Z| > z
obs
), where
Z N(0, 1). The decision rule based on the P-value is:
Reject H
0
if P-value < .
Remark: The smaller the P-value, the stronger evidence
against H
0
.
29 / 51
1.3.3 Condence Intervals for Population Parameters
Point Estimation is to use a single value to estimate the
parameter . It represents our best guess for the true value of
the parameter, but it provides little condence to support that
the guess is correct.
Interval Estimation is an alternative approach to estimate the
parameter . It provides condence in such a way: to nd a
random interval that contains the true parameter with a
pre-specied probability.
30 / 51
For a random sample X = (X
1
, . . . , X
n
), an interval estimator
of with coverage probability 1 is a random interval
[L(X), U(X)] ,
where P( [L(X), U(X)]) 1 for all .
[L(X), U(X)] is called a (1 ) condence interval (CI) of .
The quantity 1 is referred to as the condence level or
condence coecient.
Note that L(X) and U(X) are random variables but is xed
(though unknown). If both L(X) and U(X) are nite, then the
interval is called a two-sided CI; otherwise it is a one-sided CI.
31 / 51
Example 3
Let X
1
, . . . , X
n
be i.i.d. from N(,
2
) with
2
known. Let
L(X) =
X c and U(X) =
X + c.
Find the value of c so that the CI has condence level 1 .
Solution: Note that
X N(,
2
/n). We have
P( [L(X), U(X)]) = P(
X c
X + c)
= P(
nc
X
_
2
/n
nc
)
= 1 2(
nc/),
where () is the cdf of N(0, 1). To make the CI having level
1 , we set 1 2(
X z
/2
n
,
X + z
/2
n
].
32 / 51
Condence Interval for a Binomial Proportion
Let SE =
_
p(1 p)/n denote the estimated standard error
of p. It can be shown that (p )/SE is asymptotically
normal.
A large-sample (1 ) CI for (p )/SE is
z
/2
p
SE
z
/2
.
This leads to, equivalently, the (1 ) CI for as
p z
/2
_
p(1 p)/n.
33 / 51
1.4 More on Statistical Inference for Discrete Data
In this section, we introduce three frequently used methods
for conducting inference (signicance tests and condence
intervals).
(1) Wald Test
(2) Likelihood Ratio Test
(3) Raos Score Test
The above methods are general and apply to any parameter in
a statistical model. Consider test
H
0
: =
0
versus H
1
: =
0
.
We will apply the methods to test the binomial proportion
for illustration.
34 / 51
(1) Wald Test
Let
be the MLE of . Let SE be the standard error of
,
evaluated by substituting
for the unknown in the
expression of sd(
).
The Wald test statistic is dened as
z =
0
SE
.
Under H
0
, the test statistic z has approximately a standard
normal distribution. Equivalently, z
2
has approximately a
2
1
distribution.
Example: For testing H
0
: =
0
in binomial distribution, the
Wald test statistic is
z =
p
0
_
p(1 p)/n
.
35 / 51
(2) Likelihood Ratio Test
Likelihood ratio test statistic is dened as
2 ln() = 2(L
0
L
1
),
where =
0
/
1
is the ratio of
0
(the maximum likelihood
under the null space) to
1
(the maximum likelihood under
the entire space), and L
0
= ln
0
and L
1
= ln
1
are the
log-likelihood functions.
Note that L
1
L
0
, because L
1
refers to the global maximum
and L
0
is only a local maximum. Therefore, the test statistic
2 ln() is always non-negative.
When H
0
is not true, the discrepancy between L
1
and L
0
can
be large. This suggests to reject H
0
for large values of
2 ln().
36 / 51
The reason for taking the natural log transformation and
multiplying by 2 is that it yields an approximate chi-square
distribution.
Specically, under H
0
, the test statistic 2 ln() follows
approximately a chi-square distribution with degrees of
freedom, where is equal to the dierence between the
numbers of free parameters in the null and entire spaces.
Based on the asymptotic distribution, the likelihood ratio test
rejects H
0
if
2 ln() >
2
(),
where
2
distribution.
37 / 51
Example 4
For testing H
0
: =
0
in binomial distribution, the likelihood
ratio test statistic is
2 ln() = 2 ln
x
0
(1
0
)
nx
p
x
(1 p)
nx
.
In addition, noting that there are 1 free parameter in the entire
space and 0 free parameter in the null space, we have = 1.
This implies that 2 ln() follows approximately
2
1
under H
0
.
Finally, we reject H
0
if
2 ln() >
2
1
().
38 / 51
(3) Raos Score Test
Raos score test is motivated from the score function, which is
dened as
s() =
(|x).
For any value, it can be shown that E(s()) = 0.
Under H
0
, we expect that s(
0
) is near zero. When H
0
is not
true, s(
0
) tends to be away from zero. This suggests to
reject H
0
for large values of |s(
0
)| up to a certain scale.
Of course, the formal denition of Raos score test statistic
will be more complicated so that we skip to discuss the details.
However, it is worth mentioning that Raos score test statistic
is similar to the Wald test statistic except that it nds the
standard error under the assumption that H
0
is true.
39 / 51
Example 5
For testing H
0
: =
0
in binomial distribution, recall that
the Wald test statistic is
z =
p
0
SE
=
p
0
_
p(1 p)/n
,
where the standard error SE is evaluated at the ML estimate.
For Raos score test, we evaluate SE under the assumption
that H
0
is true. This leads to SE =
_
0
(1
0
)/n so that
the score test statistic is
z =
p
0
_
0
(1
0
)/n
.
Under H
0
, the score test statistic z has approximately a
standard normal distribution.
40 / 51
Comparison
The Wald, likelihood ratio, and Raos score tests are the three
major ways to construct signicance tests for parameters in
statistical models.
For normal data, the three tests provide identical results.
For other data, the three tests are asymptotically equivalent
when n is large.
Some relationships among the three tests are:
(a) Wald test requires calculation of MLE under ;
(b) The likelihood ratio test requires calculation of MLEs
under both
0
and ;
(c) Raos score test requires calculation of MLE under
0
.
41 / 51
Test-based Condence Intervals
For each test, we has a corresponding condence interval.
This is based on inverting results of the signicance test:
A (1 ) CI contains all values that will not be rejected
by the test at the signicance level.
Consider estimate in binomial distribution. For a given
sample proportion p and sample size n, the upper and lower
bounds of a (1 ) CI for , based on the Wald test, are the
values
0
that satisfy
|p
0
|
_
p(1 p)/n
= z
/2
.
This is equivalent to saying that the CI of is
p z
/2
_
p(1 p)/n.
42 / 51
If we use the score test, then the upper and lower bounds of a
(1 ) CI for are the values
0
that satisfy
|p
0
|
_
0
(1
0
)/n
= z
/2
.
To achieve the CI, we can square both sides to give a
quadratic equation in
0
and then solve this for
0
.
Similarly, we may use the likelihood ratio test to construct
condence intervals. This conrms, again, that we have a
corresponding condence interval for each test.
43 / 51
1.5 Small-Sample Inference for Discrete Data
Recall that the above three tests are all asymptotic tests that
require a large sample size n. In addition, the three tests are
asymptotically equivalent as n .
While for small sample sizes, the above three tests can be very
dierent and their normal or
2
approximation may have large
errors.
In view of this, for small sample sizes, it is safer to use the
discrete distributions directly (rather than a normal or
2
approximation) to calculate the exact P-value.
44 / 51
P-value
Denition
The P-value is the probability of obtaining a test statistic value
that is at least as extreme as the one that was actually observed,
under the assumption that H
0
is true.
For discrete distributions, a value that is at least as extreme
as the observed x value is dened to be
a value that has a probability less than or equal to
P(X = x) in the direction of H
1
.
For continuous distributions, a value that is at least as
extreme as the observed x value is dened to be
a value that has a density less than or equal to f (x) in
the direction of H
1
.
45 / 51
Example 6
Let X = x be the number of successes out of 7 trials. Given
= 0.4, the pmf of X is
x P(X = x) =
_
7
x
_
0.4
x
0.6
7x
0 0.028
1 0.131
2 0.261
3 0.290
4 0.194
5 0.077
6 0.017
7 0.002
46 / 51
Example 6 - ctd
Consider the one-sided test H
0
: = 0.4 versus H
1
: < 0.4.
If the observed x value is 1, the exact P-value is
P-value = P(X = 1) + P(X = 0)
= 0.131 + 0.028
= 0.159.
Consider the one-sided test H
0
: = 0.4 versus H
1
: > 0.4.
If the observed x value is 5, the exact P-value is
P-value = P(X = 5) + P(X = 6) + P(X = 7)
= 0.077 + 0.017 + 0.002
= 0.096.
47 / 51
Example 6 - ctd
Now consider the two-sided test H
0
: = 0.4 versus H
1
: = 0.4.
If the observed x value is 6, what is the exact P-value?
P-value = P(X = 0) + P(X = 1) + P(X = 6) + P(X = 7)
= 0.178.
The above answer is NOT correct. Why?
By denition, the exact P-value should be
P-value = P(X = 6) + P(X = 7) = 0.019.
What if the observed x value is 0? The exact P-value is
P-value = P(X = 0) + P(X = 6) + P(X = 7) = 0.047.
48 / 51
Exact Test
Such a test, using the exact distribution instead of using the
asymptotic distribution to calculate the P-value, is called an
exact test.
There are two problems in using exact tests:
(a) It is often very dicult to obtain the exact distribution in
many complicated situations.
(b) If the distribution is discrete, the exact test is always
conservative. As a consequence, the power is also lower than
it can be.
49 / 51
Conservative Test
In Example 6, we reject H
0
at the 0.05 signicance level if and
only if the observed x value is 0, 6 or 7, because their
P-values are less than 0.05 whilst any other value will give a
P-value higher than 0.05.
With the above decision rule, we will commit the type I error
if the true is really 0.4 but the observed x value is 0, 6 or 7.
This leads to the actual probability of committing the type I
error as
P(X = 0, 6 or 7) = 0.047.
If the actual probability of committing the type I error at the
signicance level is strictly less than , then the test is
called a conservative test.
50 / 51
Mid P-value
The mid P-value is dened as
mid P-value = 0.5 P(the observed value)
+ P(more extreme values).
To diminish the conservativeness of tests, we may use the mid
P-value instead of the exact P-value to make the decision.
Using mid P-values will lead to less conservative tests.
However, it has a major disadvantage. Specically, the
probability of committing the type I error may exceed the
signicance level, leading to a liberal test.
Therefore, using mid P-values for conservative tests is only
another option, which is common used but not mandatory.
51 / 51