Ctre
Ctre
Ctre
Abstract
This vignette describes the new reimplementation of conditional inference trees (CTree)
in the R package partykit. CTree is a non-parametric class of regression trees embedding
tree-structured regression models into a well defined theory of conditional inference pro-
cedures. It is applicable to all kinds of regression problems, including nominal, ordinal,
numeric, censored as well as multivariate response variables and arbitrary measurement
scales of the covariates. The vignette comprises a practical guide to exploiting the flexi-
ble and extensible computational tools in partykit for fitting and visualizing conditional
inference trees.
1. Overview
This vignette describes conditional inference trees (Hothorn, Hornik, and Zeileis 2006) along
with its new and improved reimplementation in package partykit. Originally, the method was
implemented in the package party almost entirely in C while the new implementation is now
almost entirely in R. In particular, this has the advantage that all the generic infrastructure
from partykit can be reused, making many computations more modular and easily extensible.
Hence, partykit::ctree is the new reference implementation that will be improved and
developed further in the future.
In almost all cases, the two implementations will produce identical trees. In exceptional cases,
additional parameters have to be specified in order to ensure backward compatibility. These
and novel features in ctree:partykit are introduced in Section 7.
2. Introduction
The majority of recursive partitioning algorithms are special cases of a simple two-stage
algorithm: First partition the observations by univariate splits in a recursive way and second
fit a constant model in each cell of the resulting partition. The most popular implementations
of such algorithms are ‘CART’ (Breiman, Friedman, Olshen, and Stone 1984) and ‘C4.5’
(Quinlan 1993). Not unlike AID, both perform an exhaustive search over all possible splits
maximizing an information measure of node impurity selecting the covariate showing the best
split. This approach has two fundamental problems: overfitting and a selection bias towards
covariates with many possible splits. With respect to the overfitting problem Mingers (1987)
notes that the algorithm
2 ctree: Conditional Inference Trees
With conditional inference trees (see Hothorn et al. 2006, for a full description of its method-
ological foundations) we enter at the point where White and Liu (1994) demand for
[. . . ] a statistical approach [to recursive partitioning] which takes into account the
distributional properties of the measures.
We present a unified framework embedding recursive binary partitioning into the well defined
theory of permutation tests developed by Strasser and Weber (1999). The conditional distri-
bution of statistics measuring the association between responses and covariates is the basis
for an unbiased selection among covariates measured at different scales. Moreover, multiple
test procedures are applied to determine whether no significant association between any of
the covariates and the response can be stated and the recursion needs to stop.
where we restrict ourselves to partition based regression relationships, i.e., r disjoint cells
B1 , . . . , Br partitioning the covariate space X = rk=1 Bk . A model of the regression relation-
S
A generic algorithm for recursive binary partitioning for a given learning sample Ln can be
formulated using non-negative integer valued case weights w = (w1 , . . . , wn ). Each node of a
tree is represented by a vector of case weights having non-zero elements when the correspond-
ing observations are elements of the node and are zero otherwise. The following algorithm
implements recursive binary partitioning:
1. For case weights w test the global null hypothesis of independence between any of the
m covariates and the response. Stop if this hypothesis cannot be rejected. Otherwise
select the covariate Xj ∗ with strongest association to Y.
2. Choose a set A∗ ¢ Xj ∗ in order to split Xj ∗ into two disjoint sets A∗ and Xj ∗ \ A∗ . The
case weights wleft and wright determine the two subgroups with wleft,i = wi I(Xj ∗ i ∈ A∗ )
and wright,i = wi I(Xj ∗ i ̸∈ A∗ ) for all i = 1, . . . , n (I(·) denotes the indicator function).
Torsten Hothorn, Kurt Hornik, Achim Zeileis 3
3. Recursively repeat steps 1 and 2 with modified case weights wleft and wright , respectively.
The separation of variable selection and splitting procedure into steps 1 and 2 of the algorithm
is the key for the construction of interpretable tree structures not suffering a systematic
tendency towards covariates with many possible splits or many missing values. In addition,
a statistically motivated and intuitive stopping criterion can be implemented: We stop when
the global null hypothesis of independence between the response and any of the m covariates
cannot be rejected at a pre-specified nominal level α. The algorithm induces a partition
{B1 , . . . , Br } of the covariate space X , where each cell B ∈ {B1 , . . . , Br } is associated with a
vector of case weights.
one can dispose of this dependency by fixing the covariates and conditioning on all possible
permutations of the responses. This principle leads to test procedures known as permutation
tests. The conditional expectation µj ∈ Rpj q and covariance Σj ∈ Rpj q×pj q of Tj (Ln , w)
under H0 given all permutations σ ∈ S(Ln , w) of the responses are derived by Strasser and
Weber (1999):
n
! !
X
¦
µj = E(Tj (Ln , w)|S(Ln , w)) = vec wi gj (Xji ) E(h|S(Ln , w)) ,
i=1
Σj = V(Tj (Ln , w)|S(Ln , w))
!
w· X
= V(h|S(Ln , w)) ¹ wi gj (Xji ) ¹ wi gj (Xji )¦ (2)
w· − 1 i
! !¦
1 X X
− V(h|S(Ln , w)) ¹ wi gj (Xji ) ¹ wi gj (Xji )
w· − 1 i i
where w· = ni=1 wi denotes the sum of the case weights, ¹ is the Kronecker product and
P
Having the conditional expectation and covariance at hand we are able to standardize a
linear statistic T ∈ Rpq of the form (1) for some p ∈ {p1 , . . . , pm }. Univariate test statistics c
mapping an observed multivariate linear statistic t ∈ Rpq into the real line can be of arbitrary
form. An obvious choice is the maximum of the absolute values of the standardized linear
statistic
(t − µ)k
cmax (t, µ, Σ) = max p
k=1,...,pq (Σ)kk
utilizing the conditional expectation µ and covariance matrix Σ. The application of a qua-
dratic form cquad (t, µ, Σ) = (t − µ)Σ+ (t − µ)¦ is one alternative, although computationally
more expensive because the Moore-Penrose inverse Σ+ of Σ is involved.
The type of test statistic to be used can be specified by means of the ctree_control function,
for example
denotes the P -value of the conditional test for H0j . So far, we have only addressed testing
each partial hypothesis H0j , which is sufficient for an unbiased variable selection. A global
test for H0 required in step 1 can be constructed via an aggregation of the transformations
gj , j = 1, . . . , m, i.e., using a linear statistic of the form
n
!
X ¦
¦ ¦ ¦
T(Ln , w) = vec wi g1 (X1i ) , . . . , gm (Xmi ) h(Yi , (Y1 , . . . , Yn )) .
i=1
However, this approach is less attractive for learning samples with missing values. Universally
applicable approaches are multiple test procedures based on P1 , . . . , Pm . Simple Bonferroni-
adjusted P -values (the adjustment 1 − (1 − Pj )m is used), available via
induces a two-sample statistic measuring the discrepancy between the samples {Yi |wi >
0 and Xji ∈ A; i = 1, . . . , n} and {Yi |wi > 0 and Xji ̸∈ A; i = 1, . . . , n}. The conditional
6 ctree: Conditional Inference Trees
expectation µA A ∗
j ∗ and covariance Σj ∗ can be computed by (2). The split A with a test statistic
maximized over all possible subsets A is established:
A∗ = argmax c(tA A A
j ∗ , µj ∗ , Σj ∗ ). (3)
A
requires the sum of the weights in both the left and right daughter node to exceed the value
of 20.
> ctree_control(maxsurrogate = 3)
in the following illustrations. In partykit::ctree, the dependency structure and the vari-
ables may be specified in a traditional formula based way
> library("partykit")
> ctree(y ~ x1 + x2, data = ls)
Case counts w may be specified using the weights argument. Once we have fitted a condi-
tional tree via
> plot(ct)
1
x1
p < 0.001
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
A B C A B C
> ct
Model formula:
y ~ x1 + x2
Fitted party:
[1] root
| [2] x1 <= 0.82552: C (n = 96, err = 57.3%)
| [3] x1 > 0.82552: A (n = 54, err = 42.6%)
> ct[1]
Model formula:
y ~ x1 + x2
8 ctree: Conditional Inference Trees
Fitted party:
[1] root
| [2] x1 <= 0.82552: C (n = 96, err = 57.3%)
| [3] x1 > 0.82552: A (n = 54, err = 42.6%)
> class(ct[1])
and we refer to the manual pages for a description of those elements. The predict function
computes predictions in the space of the response variable, in our case a factor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
A A A A C A C A C C A A C A A A A
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
C A C A A A C A A A C C A A C A A
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
C A A C C C A A C C C C A A A A A
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
A C C C C A C C A C C C C C C A A
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
A A A C C A C A C C C C C C C C C
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
C C C A C A C A C C C C C C C C A
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
C C C A C C A C C C C C C C A C C
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
C C C C C C C C C C C C C C C C C
137 138 139 140 141 142 143 144 145 146 147 148 149 150
C A C C C C A C C A C A C A
Levels: A B C
When we are interested in properties of the conditional distribution of the response given the
covariates, we use
A B C
1 0.5740741 0.2592593 0.1666667
51 0.5740741 0.2592593 0.1666667
101 0.1979167 0.3750000 0.4270833
Torsten Hothorn, Kurt Hornik, Achim Zeileis 9
which, in our case, is a data frame with conditional class probabilities. We can determine the
node numbers of nodes some new observations are falling into by
1 51 101
3 3 2
Finally, the sctest method can be used to extract the test statistics and p-values computed
in each node. The function sctest is used because for the mob algorithm such a method
(for structural change tests) is also provided. To make the generic available, the strucchange
package needs to be loaded (otherwise sctest.constparty would have to be called directly).
> library("strucchange")
> sctest(ct)
$`1`
x1 x2
statistic 2.299131e+01 4.0971294
p.value 2.034833e-05 0.2412193
$`2`
x1 x2
statistic 2.6647107 4.3628130
p.value 0.4580906 0.2130228
$`3`
x1 x2
statistic 2.1170497 2.8275567
p.value 0.5735483 0.4272879
Here, we see that x1 leads to a significant test result in the root node and is hence used for
splitting. In the kid nodes, no more significant results are found and hence splitting stops. For
other data sets, other stopping criteria might also be relevant (e.g., the sample size restrictions
minsplit, minbucket, etc.). In case, splitting stops due to these, the test results may also
be NULL.
5. Examples
are represented by gji (k) = eK (k), the unit vector of length K with kth element being
equal to one. Due to this flexibility, special test procedures like the Spearman test, the
Wilcoxon-Mann-Whitney test or the Kruskal-Wallis test and permutation tests based on
ANOVA statistics or correlation coefficients are covered by this framework. Splits obtained
from (3) maximize the absolute value of the standardized difference between two means of
the values of the influence functions. For prediction, one is usually interested in an estimate
of the expectation of the response E(Y|X = x) in each cell, an estimate can be obtained by
n
!−1 n
X X
Ê(Y|X = x) = wi (x) wi (x)Yi .
i=1 i=1
with gj (x) = eK (x) and h(Yi , (Y1 , . . . , Yn )) = eJ (Yi ). If both response and covariate are
ordinal, the matrix of coefficients is given by the Kronecker product of both score vectors
M = ξ ¹ γ ∈ R1,KJ . In case the response is ordinal only, the matrix of coefficients M is a
block matrix
ξ1 0 ξq 0
0 ξ1 0 ξq
Torsten Hothorn, Kurt Hornik, Achim Zeileis 11
when one covariate is ordered but the response is not. For both Y and Xj being ordinal, the
corresponding test is known as linear-by-linear association test (Agresti 2002). Scores can be
supplied to ctree using the scores argument, see Section 6 for an example.
The impact of certain environmental factors on the population density of the tree pipit Anthus
trivialis is investigated by Müller and Hothorn (2004). The occurrence of tree pipits was
recorded several times at n = 86 stands which were established on a long environmental
gradient. Among nine environmental factors, the covariate showing the largest association
to the number of tree pipits is the canopy overstorey (P = 0.002). Two groups of stands
can be distinguished: Sunny stands with less than 40% canopy overstorey (n = 24) show
a significantly higher density of tree pipits compared to darker stands with more than 40%
canopy overstorey (n = 62). This result is important for management decisions in forestry
enterprises: Cutting the overstorey with release of old oaks creates a perfect habitat for this
indicator species of near natural forest environments.
Laser scanning images taken from the eye background are expected to serve as the basis
of an automated system for glaucoma diagnosis. Although prediction is more important in
this application (Mardin, Hothorn, Peters, Jünemann, Nguyen, and Lausen 2003), a simple
12 ctree: Conditional Inference Trees
1
coverstorey
p = 0.002
≤ 40 > 40
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 4 5 0 1 2 3 4 5
visualization of the regression relationship is useful for comparing the structures inherent in
the learning sample with subject matter knowledge. For 98 patients and 98 controls, matched
by age and gender, 62 covariates describing the eye morphology are available. The data is part
of the TH.data package, http://CRAN.R-project.org). The first split in Figure 3 separates
eyes with a volume above reference less than mm3 in the inferior part of the optic nerve head
(vari). Observations with larger volume are mostly controls, a finding which corresponds to
subject matter knowledge: The volume above reference measures the thickness of the nerve
layer, expected to decrease with a glaucomatous damage of the optic nerve. Further separation
is achieved by the volume above surface global (vasg) and the volume above reference in the
temporal part of the optic nerve head (vart).
The plot in Figure 3 is generated by
> plot(gtree)
and shows the distribution of the classes in the terminal nodes. This distribution can be
shown for the inner nodes as well, namely by specifying the appropriate panel generating
function (node_barplot in our case), see Figure 4.
The class predictions of the tree for the learning sample (and for new observations as well)
Torsten Hothorn, Kurt Hornik, Achim Zeileis 13
1
vari
p < 0.001
2 5
vasg tms
p < 0.001 p = 0.049
glaucoma
glaucoma
glaucoma
0.8 0.8 0.8 0.8
normal
normal
normal
0.2 0.2 0.2 0.2
0 0 0 0
Figure 3: Conditional inference tree for the glaucoma data. For each inner node, the
Bonferroni-adjusted P -values are given, the fraction of glaucomatous eyes is displayed for
each terminal node.
Node 1 (n = 196)
1
glaucoma
0.8
0.6
0.4
normal
0.2
0
glaucoma
0.8 0.8
0.6 0.6
0.4 0.4
normal
normal
0.2 0.2
0 0
glaucoma
glaucoma
glaucoma
normal
normal
normal
Figure 4: Conditional inference tree for the glaucoma data with the fraction of glaucomatous
eyes displayed for both inner and terminal nodes.
14 ctree: Conditional Inference Trees
0.8
0.6
glaucoma
normal
0.4
0.2
vari
Figure 5: Estimated conditional class probabilities (slightly jittered) for the Glaucoma data
depending on the first split variable. The vertical line denotes the first split point.
can be computed using the predict function. A comparison with the true class memberships
is done by
glaucoma normal
glaucoma 74 5
normal 24 93
When we are interested in conditional class probabilities, the predict(, type = "prob")
method must be used. A graphical representation is shown in Figure 5.
modeling, the dataset is available within the TH.data package. The number of positive lymph
nodes (pnodes) and the progesterone receptor (progrec) have been identified as prognostic
factors in the survival tree analysis by Schumacher et al. (2001). Here, the binary variable
coding whether a hormonal therapy was applied or not (horTh) additionally is part of the
model depicted in Figure 6, which was fitted using the following code:
Model formula:
Surv(time, cens) ~ horTh + age + menostat + tsize + tgrade +
pnodes + progrec + estrec
Fitted party:
[1] root
| [2] pnodes <= 3
| | [3] horTh in no: 2093.000 (n = 248)
| | [4] horTh in yes: Inf (n = 128)
| [5] pnodes > 3
| | [6] progrec <= 20: 624.000 (n = 144)
| | [7] progrec > 20: 1701.000 (n = 166)
The estimated median survival time for new patients is less informative compared to the whole
Kaplan-Meier curve estimated from the patients in the learning sample for each terminal node.
We can compute those ‘predictions’ by means of the treeresponse method
> plot(stree)
1
pnodes
p < 0.001
≤3 >3
2 5
horTh progrec
p = 0.035 p < 0.001
no yes ≤ 20 > 20
0 0 0 0
0 5001000 2000 0 5001000 2000 0 5001000 2000 0 5001000 2000
Figure 6: Tree-structured survival model for the GBSG2 data and the distribution of survival
times in the terminal nodes. The median survival time is displayed in each terminal node of
the tree.
Ordinal response variables are common in investigations where the response is a subjective
human interpretation. We use an example given by Hosmer and Lemeshow (2000), p. 264,
studying the relationship between the mammography experience (never, within a year, over
one year) and opinions about mammography expressed in questionnaires answered by n = 412
women. The resulting partition based on scores ξ = (1, 2, 3) is given in Figure 7. Women
who (strongly) agree with the question ‘You do not need a mammogram unless you develop
symptoms’ seldomly have experienced a mammography. The variable benefit is a score with
low values indicating a strong agreement with the benefits of the examination. For those
women in (strong) disagreement with the first question above, low values of benefit identify
persons being more likely to have experienced such an examination at all.
> plot(mtree)
1
SYMPT
p < 0.001
3
PB
p = 0.012
≤8 >8
0 0 0
Never Within a Year Over a Year Never Within a Year Over a Year Never Within a Year Over a Year
Figure 7: Ordinal regression for the mammography experience data with the fractions of
(never, within a year, over one year) given in the nodes. No admissible split was found for
node 5 because only 5 of 91 women reported a family history of breast cancer and the sample
size restrictions would require more than 5 observations in each daughter node.
trees dealing with multivariate responses. The abundance of 12 hunting spider species is
regressed on six environmental variables (water, sand, moss, reft, twigs and herbs) for
n = 28 observations. Because of the small sample size we allow for a split if at least 5
observations are element of a node The prognostic factor water found by De’ath (2002) is
confirmed by the model shown in Figures 8 and 9 which additionally identifies reft. The
data are available in package mvpart (De’ath 2014).
1
water
p < 0.001
≤3 >3
3
reft
p = 0.013
≤2 >2
Figure 8: Regression tree for hunting spider abundance with bars for the mean of each
response.
same trees in partykit and party in this section. In addition, some novel features introduced
in partykit 1.2-0 are described.
7.1. Regression
We use the airquality data from package party and fit a regression tree after removal of
missing response values. There are missing values in one of the explanatory variables, so we
ask for three surrogate splits to be set-up:
Response: Ozone
Inputs: Solar.R, Wind, Temp, Month, Day
Number of observations: 116
> plot(sptree)
1
water
p < 0.001
≤3 >3
3
reft
p = 0.013
≤2 >2
Figure 9: Regression tree for hunting spider abundance with boxplots for each response.
20 ctree: Conditional Inference Trees
4) Temp > 77
6)* weights = 21
1) Temp > 82
7) Wind <= 10.3; criterion = 0.997, statistic = 11.712
8)* weights = 30
7) Wind > 10.3
9)* weights = 7
[1] 403.6668
For this specific example, the same call produces the same tree under both party and partykit.
To ensure this also for other patterns of missingness, the numsurrogate flag needs to be set
in order to restrict the evaluation of surrogate splits to numeric variables only (this is a
restriction hard-coded in party):
Model formula:
Ozone ~ Solar.R + Wind + Temp + Month + Day
Fitted party:
[1] root
| [2] Temp <= 82
| | [3] Wind <= 6.9: 55.600 (n = 10, err = 21946.4)
| | [4] Wind > 6.9
| | | [5] Temp <= 77: 18.479 (n = 48, err = 3956.0)
| | | [6] Temp > 77: 31.143 (n = 21, err = 4620.6)
| [7] Temp > 82
| | [8] Wind <= 10.3: 81.633 (n = 30, err = 15119.0)
| | [9] Wind > 10.3: 48.714 (n = 7, err = 1183.4)
[1] 403.6668
3 5 6 8 9
3 10 0 0 0 0
Torsten Hothorn, Kurt Hornik, Achim Zeileis 21
5 0 48 0 0 0
6 0 0 21 0 0
8 0 0 0 30 0
9 0 0 0 0 7
[1] 0
> airct_party@tree$criterion
$statistic
Solar.R Wind Temp Month Day
13.34761286 41.61369618 56.08632426 3.11265955 0.02011554
$criterion
Solar.R Wind Temp Month Day
9.987069e-01 1.000000e+00 1.000000e+00 6.674119e-01 1.824984e-05
$maxcriterion
[1] 1
> info_node(node_party(airct_partykit))
$criterion
Solar.R Wind Temp Month
statistic 13.347612859 4.161370e+01 5.608632e+01 3.1126596
p.value 0.001293090 5.560572e-10 3.467894e-13 0.3325881
criterion -0.001293926 -5.560572e-10 -3.467894e-13 -0.4043478
Day
statistic 0.02011554
p.value 0.99998175
criterion -10.91135399
$p.value
Temp
3.467894e-13
$unweighted
[1] TRUE
$nobs
[1] 116
22 ctree: Conditional Inference Trees
partykit has a nicer way or presenting the variable selection test statistics on the scale of the
statistics and the p-values. In addition, the criterion to be maximised (here: log(1−p−value))
is given.
7.2. Classification
For classification tasks with more than two classes, the default in party is a maximum-type
test statistic on the multidimensional test statistic when computing splits. partykit employs
a quadratic test statistic by default, because it was found to produce better splits empirically.
One can switch-back to the old behaviour using the splitstat argument:
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
Model formula:
Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
Fitted party:
[1] root
| [2] Petal.Length <= 1.9: setosa (n = 50, err = 0.0%)
| [3] Petal.Length > 1.9
| | [4] Petal.Width <= 1.7
| | | [5] Petal.Length <= 4.8: versicolor (n = 46, err = 2.2%)
| | | [6] Petal.Length > 4.8: versicolor (n = 8, err = 50.0%)
| | [7] Petal.Width > 1.7: virginica (n = 46, err = 2.2%)
2 5 6 7
2 50 0 0 0
5 0 46 0 0
6 0 0 8 0
7 0 0 0 46
to
[1] 0
leading to identical results. For ordinal regression, the conditional class probabilities can be
computed in the very same way:
Response: ME
Inputs: SYMPT, PB, HIST, BSE, DECT
Number of observations: 412
Model formula:
ME ~ SYMPT + PB + HIST + BSE + DECT
Fitted party:
[1] root
| [2] SYMPT <= Agree: Never (n = 113, err = 15.9%)
| [3] SYMPT > Agree
| | [4] PB <= 8: Never (n = 208, err = 60.1%)
| | [5] PB > 8: Never (n = 91, err = 38.5%)
[1] 0
Model formula:
Surv(time, cens) ~ horTh + age + menostat + tsize + tgrade +
pnodes + progrec + estrec
Fitted party:
[1] root
| [2] pnodes <= 3
| | [3] horTh in no: 2093.000 (n = 248)
| | [4] horTh in yes: Inf (n = 128)
| [5] pnodes > 3
| | [6] progrec <= 20: 624.000 (n = 144)
| | [7] progrec > 20: 1701.000 (n = 166)
[1] TRUE
alpha : The user can optionally change the default nominal level of α = 0.05; mincriterion
is updated to 1 − α and logmincriterion is then log(1 − α). The latter allows variable
selection on the scale of log(1 − p-value):
Model formula:
Ozone ~ Solar.R + Wind + Temp + Month + Day
Fitted party:
[1] root
| [2] Temp <= 82: 26.544 (n = 79, err = 42531.6)
| [3] Temp > 82: 75.405 (n = 37, err = 22452.9)
26 ctree: Conditional Inference Trees
> depth(airct_partykit_1)
[1] 1
[1] 560.2113
splittest : This enables the computation of p-values for maximally selected statistics for
variable selection. The default test statistic is not particularly powerful against cutpoint-
alternatives but much faster to compute. Currently, p-value approximations are not
available, so one has to rely on resampling for p-value estimation
Model formula:
Ozone ~ Solar.R + Wind + Temp + Month + Day
Fitted party:
[1] root
| [2] Temp <= 82
| | [3] Wind <= 6.9: 55.600 (n = 10, err = 21946.4)
| | [4] Wind > 6.9
| | | [5] Temp <= 77
| | | | [6] Solar.R <= 78: 12.533 (n = 15, err = 723.7)
| | | | [7] Solar.R > 78: 21.182 (n = 33, err = 2460.9)
| | | [8] Temp > 77
| | | | [9] Solar.R <= 148: 20.000 (n = 7, err = 652.0)
| | | | [10] Solar.R > 148: 36.714 (n = 14, err = 2664.9)
| [11] Temp > 82
| | [12] Temp <= 87
| | | [13] Wind <= 8.6: 72.308 (n = 13, err = 8176.8)
| | | [14] Wind > 8.6: 45.571 (n = 7, err = 617.7)
| | [15] Temp > 87: 90.059 (n = 17, err = 3652.9)
saveinfo : Reduces the memory footprint by not storing test results as part of the tree. The
core information about trees is then roughly half the size needed by party.
Torsten Hothorn, Kurt Hornik, Achim Zeileis 27
nmax : Restricts the number of possible cutpoints to nmax, basically by treating all explana-
tory variables as ordered factors defined at quantiles of underlying numeric variables.
This is mainly implemented in package libcoin. For the standard ctree, it is only ap-
propriate to use in classification problems, where is can lead to substantial speed-ups:
> (irisct_partykit_1 <- partykit::ctree(Species ~ .,data = iris,
+ control = partykit::ctree_control(splitstat = "maximum", nmax = 25)))
Model formula:
Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
Fitted party:
[1] root
| [2] Petal.Width <= 0.6: setosa (n = 50, err = 0.0%)
| [3] Petal.Width > 0.6
| | [4] Petal.Width <= 1.7
| | | [5] Petal.Length <= 4.8: versicolor (n = 46, err = 2.2%)
| | | [6] Petal.Length > 4.8: versicolor (n = 8, err = 50.0%)
| | [7] Petal.Width > 1.7: virginica (n = 46, err = 2.2%)
multiway : Implements multiway splits in unordered factors, each level defines a correspond-
ing daughter node:
> GBSG2$tgrade <- factor(GBSG2$tgrade, ordered = FALSE)
> (GBSG2ct_partykit <- partykit::ctree(Surv(time, cens) ~ tgrade,
+ data = GBSG2, control = partykit::ctree_control(multiway = TRUE,
+ alpha = .5)))
Model formula:
Surv(time, cens) ~ tgrade
Fitted party:
[1] root
| [2] tgrade in I: Inf (n = 81)
| [3] tgrade in II: 1730.000 (n = 444)
| [4] tgrade in III: 1337.000 (n = 161)
Two arguments of ctree are also interesting. The novel cluster argument allows conditional
inference trees to be fitted to (simple forms of) correlated observations. For each cluster,
the variance of the test statistics used for variable selection and also splitting is computed
separately, leading to stratified permutation tests (in the sense that only observations within
clusters are permuted). For example, we can cluster the data in the airquality dataset by
month to be used as cluster variable:
Model formula:
Ozone ~ Solar.R + Wind + Temp
Fitted party:
[1] root
| [2] Temp <= 82
| | [3] Temp <= 76: 18.250 (n = 48, err = 4199.0)
| | [4] Temp > 76
| | | [5] Wind <= 6.9: 71.857 (n = 7, err = 15510.9)
| | | [6] Wind > 6.9
| | | | [7] Temp <= 81: 32.412 (n = 17, err = 4204.1)
| | | | [8] Temp > 81: 23.857 (n = 7, err = 306.9)
| [9] Temp > 82
| | [10] Wind <= 10.3: 81.633 (n = 30, err = 15119.0)
| | [11] Wind > 10.3: 48.714 (n = 7, err = 1183.4)
> info_node(node_party(airct_partykit_3))
$criterion
Solar.R Wind Temp
statistic 14.4805065501 3.299881e+01 4.783766e+01
p.value 0.0004247923 2.766464e-08 1.389038e-11
criterion -0.0004248826 -2.766464e-08 -1.389038e-11
$p.value
Temp
1.389038e-11
$unweighted
Torsten Hothorn, Kurt Hornik, Achim Zeileis 29
[1] TRUE
$nobs
[1] 116
[1] 349.3382
This reduces the number of partitioning variables and makes multiplicity adjustment less
costly.
The ytrafo argument has be made more general. party is not able to update influence
functions h within nodes. With the novel formula-based interface, users can create influence
functions which are newly evaluated in each node. The following example illustrates how one
can compute a survival tree with updated logrank scores:
Model formula:
Surv(time, cens) ~ horTh + age + menostat + tsize + tgrade +
pnodes + progrec + estrec
Fitted party:
[1] root
| [2] pnodes <= 3
| | [3] horTh in no: 2093.000 (n = 248)
| | [4] horTh in yes: Inf (n = 128)
| [5] pnodes > 3
| | [6] progrec <= 20: 624.000 (n = 144)
| | [7] progrec > 20: 1701.000 (n = 166)
The results are usually not very sensitive to (simple) updated influence functions. However,
when one uses score functions of more complex models as influence functions (similar to
the mob family of trees), it is necessary to refit models in each node. For example, we are
30 ctree: Conditional Inference Trees
interested in a normal linear model for ozone concentration given temperature; both the
intercept and the regression coefficient for temperature shall vary across nodes of a tree.
Such a “permutation-based” MOB, here taking clusters into account, can be set-up using
> ### normal varying intercept / varying coefficient model (aka "mob")
> h <- function(y, x, start = NULL, weights = NULL, offset = NULL, cluster = NULL, ...)
+ glm(y ~ 0 + x, family = gaussian(), start = start, weights = weights, ...)
> (airct_partykit_4 <- partykit::ctree(Ozone ~ Temp | Solar.R + Wind,
+ data = airq, cluster = month, ytrafo = h,
+ control = partykit::ctree_control(maxsurrogate = 3)))
Model formula:
Ozone ~ Temp + (Solar.R + Wind)
Fitted party:
[1] root
| [2] Wind <= 5.7: 98.692 (n = 13, err = 11584.8)
| [3] Wind > 5.7
| | [4] Wind <= 8
| | | [5] Wind <= 6.9: 55.286 (n = 14, err = 11330.9)
| | | [6] Wind > 6.9: 50.824 (n = 17, err = 15400.5)
| | [7] Wind > 8: 27.306 (n = 72, err = 25705.3)
Call:
glm(formula = Ozone ~ node + node:Temp - 1, data = airq)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
node2 300.0527 93.4828 3.210 0.001750 **
node5 -217.3416 51.3970 -4.229 4.94e-05 ***
node6 -178.9333 58.1093 -3.079 0.002632 **
node7 -82.2722 17.9951 -4.572 1.29e-05 ***
node2:Temp -2.2922 1.0626 -2.157 0.033214 *
node5:Temp 3.2989 0.6191 5.328 5.47e-07 ***
node6:Temp 2.8059 0.7076 3.965 0.000132 ***
node7:Temp 1.4769 0.2408 6.133 1.45e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] 306.6534
Both intercept and effect of temperature change considerably between nodes. The corre-
sponding MOB can be fitted using
$criterion
Solar.R Wind
statistic 8.5987001 19.559486324
p.value 0.2818551 0.002658029
criterion -0.3310839 -0.002661567
$p.value
Wind
0.002658029
$coefficients
(Intercept) Temp
-146.995491 2.428703
$objfun
[1] 64109.89
$object
Call:
lm(formula = Ozone ~ Temp)
Coefficients:
(Intercept) Temp
-146.995 2.429
$converged
[1] TRUE
32 ctree: Conditional Inference Trees
$nobs
[1] 116
[1] 443.9422
The p-values in the root node are similar but the two procedures find different splits. mob
(and therefore lmtree) directly search for splits by optimising the objective function for all
possible splits whereas ctree only works with the score functions.
Argument xtrafo allowing the user to change the transformations gj of the covariates was
removed from the user interface.
References
Agresti A (2002). Categorical Data Analysis. 2nd edition. John Wiley & Sons, Hoboken.
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984). Classification and Regression Trees.
Wadsworth, California.
De’ath G (2002). “Multivariate Regression Trees: A New Technique for Modeling Species-
Environment Relationships.” Ecology, 83(4), 1105–1117.
De’ath G (2014). mvpart: Multivariate Partitioning. R package version 1.6-2, URL http:
//CRAN.R-project.org/package=mvpart.
Hosmer DW, Lemeshow S (2000). Applied Logistic Regression. 2nd edition. John Wiley &
Sons, New York.
Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional
Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.
doi:10.1198/106186006X133933.
LeBlanc M, Crowley J (1992). “Relative Risk Trees for Censored Survival Data.” Biometrics,
48, 411–425.
Mardin CY, Hothorn T, Peters A, Jünemann AG, Nguyen NX, Lausen B (2003). “New Glau-
coma Classification Method Based on Standard HRT Parameters by Bagging Classification
Trees.” Journal of Glaucoma, 12(4), 340–346.
Mingers J (1987). “Expert Systems – Rule Induction with Statistical Data.” Journal of the
Operations Research Society, 38(1), 39–47.
Molinaro AM, Dudoit S, van der Laan MJ (2004). “Tree-Based Multivariate Regression and
Density Estimation with Right-Censored Data.” Journal of Multivariate Analysis, 90(1),
154–177.
Müller J, Hothorn T (2004). “Maximally Selected Two-Sample Statistics as a new Tool for
the Identification and Assessment of Habitat Factors with an Application to Breeding Bird
Communities in Oak Forests.” European Journal of Forest Research, 123, 218–228.
Torsten Hothorn, Kurt Hornik, Achim Zeileis 33
Noh HG, Song MS, Park SH (2004). “An Unbiased Method for Constructing Multilabel
Classification Trees.” Computational Statistics & Data Analysis, 47(1), 149–164.
Quinlan JR (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers,
San Mateo.
Segal MR (1988). “Regression Trees for Censored Data.” Biometrics, 44, 35–47.
Shih Y (1999). “Families of Splitting Criteria for Classification Trees.” Statistics and Com-
puting, 9, 309–315.
Strasser H, Weber C (1999). “On the Asymptotic Theory of Permutation Statistics.” Mathe-
matical Methods of Statistics, 8, 220–250.
Van der Aart PJ, Smeenk-Enserink N (1975). “Correlations between Distributions of Hunting
Spiders (Lycosidae, Ctenidae) and Environment Characteristics in a Dune Area.” Nether-
lands Journal of Zoology, 25, 1–45.
Westfall PH, Young SS (1993). Resampling-Based Multiple Testing. John Wiley & Sons, New
York.
White AP, Liu WZ (1994). “Bias in Information-Based Measures in Decision Tree Induction.”
Machine Learning, 15, 321–329.
Zhang H (1998). “Classification Trees for Multiple Binary Responses.” Journal of the Amer-
ican Statistical Association, 93, 180–193.
Affiliation:
Torsten Hothorn
Institut für Epidemiologie, Biostatistik und Prävention
Universität Zürich
Hirschengraben 84
CH-8001 Zürich, Switzerland
E-mail: Torsten.Hothorn@R-project.org
URL: http://user.math.uzh.ch/hothorn/
34 ctree: Conditional Inference Trees
Kurt Hornik
Institute for Statistics and Mathematics
WU Wirtschaftsuniversität Wien
Welthandelsplatz 1
1020 Wien, Austria
E-mail: Kurt.Hornik@R-project.org
URL: http://statmath.wu.ac.at/~hornik/
Achim Zeileis
Department of Statistics
Faculty of Economics and Statistics
Universität Innsbruck
Universitätsstr. 15
6020 Innsbruck, Austria
E-mail: Achim.Zeileis@R-project.org
URL: http://eeecon.uibk.ac.at/~zeileis/