0% found this document useful (0 votes)

59 views

tutorial-MART and R Interface

This document provides a tutorial introduction to using Multiple Additive Regression Trees (MART) in R for predictive data mining tasks like regression and classification. It discusses how to read data into R, specify the predictor variable types, handle missing values, and use the mart command to build a MART model for regression. The mart command runs independently and displays training and test errors at each iteration until completion.

Uploaded by

aarivalagan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

tutorial-MART and R Interface

Uploaded by

aarivalagan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Tutorial: Getting Started with MART in R

Jerome H. Friedman
Stanford University
May 13, 2002

Abstract
Multiple additive regression trees (MART) is a methodology for predictive data mining
(regression and classification). This note illustrates the use of the R/MART interface. It is
intended to be a tutorial introduction. Minimal knowledge concerning the technical details
of the MART methodology or the use of the R statistical package is presumed.

1 Introduction
Predictive data mining is concerned with constructing statistical models from historical data.
These models are used to predict future unknown data values, and/or to help gain an under-
standing of the predictive relationships represented in the data. The data consists of sets of
measurements made on objects (observations) collected in a data base. One of the measured
variables (denoted by y) is designated as the one to be predicted, given future values of the other
variables denoted by x = {x1 , x2 , · · ·, xn }. Depending on the field of study, y is referred to as the
response variable (Statistics), output variable (neural nets), or concept (Machine Learning). The
x–variables are referred to as predictor variables, input variables, or attributes in these respective
fields.
The data base consists of a collection of N previously solved cases
N
{yi , xi1 , · · ·, xin }i=1 . (1)

The predictive model takes the abstract form

ŷ = fˆ(x1 , · · ·, xn ), (2)

where fˆ is a prediction rule that maps a set of predictor variable values to a response value.
The goal is to use the data to produce an accurate mapping. The notion of accuracy depends
on the type of the response variable y in terms of the set of values it can assume. If y assumes
numeric values the problem is known as regression and lack of accuracy is defined in terms of a
distance measure d(y, ŷ) between the predicted value ŷ and the unknown true value y. Common
measures of inaccuracy are average absolute error

AAE = ave |y − ŷ| (3)

or root mean squared error

p
RSM E = ave (y − ŷ)2 .

Here ave represents the average over future values to be predicted. If y assumes unorderable
categorical values (class labels), the problem is called classification. In this case inaccuracy is
generally defined to be the fraction of incorrect future predictions (error rate).
Multiple additive regression trees (MART) is a particular methodology (among many) for
trying to solve prediction problems. It is described in detail in Friedman (1999a, 1999b). Besides

1
accuracy, its primary goal is robustness. It tends to be resistant against moderate to heavy
contamination by bad measurements (outliers) of the predictors and/or the responses, missing
values, and to the inclusion of potentially large numbers of irrelevant predictor variables that
have little or no effect on the response .
This note describes the use of MART with the R statistical package. It is presumed that both
R and the R/MART interface have been installed. The R statistical package can be obtained
from

http://www.r-project.org/.

The R/MART interface can be obtained from

http://www-stat.stanford.edu/˜jhf/R-MART.html. (4)

This note is intended for beginning users. Only the most basic uses of the MART package
are described. Most options and “tuning” parameters are set to defaults. As users become
familiar with the basic procedures, they can experiment with more advanced options. All of the
options associated with each MART procedure in R are described in detail in their associated
documentation (help files) included with the installation.
The R/MART interface consists of commands entered in the R command window. These
commands return either numeric results or plots. It is assumed that the user is familiar with
how to invoke R and enter commands in its command window.

2 Reading in data
The primary input to MART is the historical data (1) used to construct the predictive model (2).
This data must be read into R. R is able to input data from a wide variety of data producing
applications. (See R documentation.) The most straightforward way is to store the data in
some directory as ASCII (text) file(s). They can then be read into R using the scan command.
Suppose the predictor variable values are stored in the file datadir /xdata.txt where datadir is
the full path name of a selected directory. Entering

> x scan(’datadir /xdata.txt’)

at the R command prompt “>” will store this data under the name “x” in R. Similarly, if the
response values were stored in the file ydata.txt in the same directory, the command

> y scan(’datadir /ydata.txt’)

stores them under the name “y” in R. The actual data files (xdata and ydata) used for the
illustrations below can be downloaded from (4).
The R mart procedure requires that the predictor variable values be in the form of a matrix
whose rows represent observations and columns represent predictor variables. The values in “x”
can be converted to such a matrix with the R matrix command

> x matrix(x, ncol = nvar, byrow=T)

where nvar is the number of predictor variables. The argument “byrow=T” indicates that the
predictor variable values were stored in the file xdata.txt in observation sequence. That is,
the predictor variable values for each observation follow those of the previous observation. If
the data were stored in variable sequence, where all of the observation values for a predictor
variable follow those of the previous variable, then the last argument would be omitted. For the
illustrative data used below, there are ten predictor variables stored in observation sequence, so
the command would be

> x matrix(x, ncol = 10, byrow=T)

2
The observations now correspond to the rows of the matrix “x”. For the illustrative data set the
command

> nrow(x)
[1] 10000

shows that this data set consist of 10000 observations.

The mart procedure must be told the type of variable (numeric or categorical) of each pre-
dictor variable. This information can be stored in a vector of flags “lx” of length nvar. Each
element of “lx” corresponds to a predictor variable

 0 ignore xj
lx [ j ] = 1 xj is numeric .
2 xj is categorical


Binary variables can be considered either numeric or categorical. Numeric specification provides
a speed advantage. For the illustrative data set there are two categorical variables (x 3 and x4 )
and the rest are numeric. This would be in indicated with the R command

> lx c(1,1,2,2,rep(1,6))

Here “c(·)” is the R “concatenate” operator, and the procedure rep(i,j) simply repeats the value
i, j times. An alternative would be

> lx c(1,1,2,2,1,1,1,1,1,1)

The mart default is that all predictor variables are numeric. So if this is the case, the “lx”
parameter need not be specified in the mart command.
Finally, the mart procedure must be told which entries in the predictor data matrix have
missing values. Each missing value must be represented by a numeric missing value flag missval.
The value of this flag must be larger than any nonmissing data value. The mart default is
missval = 9.0e30. If this default value is used, or if there are no missing values and no data
value is greater than 9.0e30, then the parameter “xmiss” need not be specified as an argument
to the mart command. Otherwise, the actual value of missval that was used must be specified
by setting xmiss = missval in the argument list of the mart command. For the illustrative data
set there are no missing values.

3 Regression
For regression the MART model is constructed using the mart command

> mart(x,y,lx)

This command starts a task independent of R running in a separate window. The real time
progress of model construction can be viewed in this window. MART is an iterative procedure.
Each iteration increases the complexity (flexibility) of the model. Shown in the MART window
are the current number of iterations, and the corresponding “training” and “test” set average
absolute error (3) of the MART model at that iteration. Internally, mart randomly partitions
the input data into a “training” data set and a complement “test” set. Only the training set
is used to construct the model; the test set is used to provide an unbiased error estimate. The
default size of the test set is (1/5)th that of the total input data set.
The default number of iterations is 200. When completed the MART window automatically
closes. A numerical summary of the MART run is then displayed in the command window:

3
Training and test absolute error

0.6
absolute error
0.4
0.2
0.0

0 50 100 150 200

Iterations

Figure 1: Output from the command progress() after 200 iterations.

MART execution finished.

iters best test abs(error)
200 197 0.2607

This result indicates that there have been 200 total iterations so far, the smallest test error was
obtained at iteration 197 with a corresponding average absolute error of 0.2607.
A graphical summary can be obtained by issuing the command

> progress()

This produces the plot shown in Fig. 1. The lower (black) line is the training error as a function
of number of iterations. The upper (red) line is the corresponding test error.
From Fig. 1 it appears that there is still a general downward trend in test error when the
iterations finished. This suggests that more iterations may be beneficial. Iterations can be
continued from where the last run ended by the command

> moremart()

This again creates a separate MART task that performs 200 (default) more iterations. After
this task completes, a summary of the current model is presented:

MART execution finished.

iters best test abs(error)
400 400 0.2534

This shows that the additional iterations did improve test absolute error, if only by about 3%.
Entering

4
Training and test absolute error

0.6
absolute error
0.4
0.2
0.0

0 100 200 300 400

Iterations

Figure 2: Output from the command progress()after 400 iterations.

> progress()

now produces the plot shown in Fig. 2.

Prediction error is still getting smaller at the 400th iteration. This can be more clearly seen
with the command

> progress(F)

which shows a magnified version (Fig. 3) of just the iterations corresponding to the last moremart
command. The noisy appearance of the training error is due to the stochastic sampling described
in Friedman 1999b. One could continue iterating in this manner by issuing further moremart
commands. However, it might be useful to pause and examine the predictive model at this point.
More iterations can be applied at any time in the future using additional moremart commands.

3.1 Interpretation
The command

> varimp()

produces the plot shown in Fig. 4. This shows the estimated relative importance (relevance
or influence) of each predictor variable in predicting the response based on the current model.
The variables are ordered in decreasing importance. Here one sees that variables x 1 – x4 are all
highly and nearly equally important. Variable x5 is somewhat less important. Variables x6 –
x10 are seen to have little relevance for prediction.
After identifying the most relevant predictor variables, the next step is to get an idea of the
dependence of the MART model on each of them. This is done through “partial dependence”
plots. Entering the command

5
Training and test absolute error

0.25
0.20
absolute error
0.15
0.10
0.05
0.00

200 250 300 350 400

Iterations

Figure 3: Output from the command progress(F) after 400 iterations.

Relative Variable Importance

100

80
Relative importance

0
3

Input variable

Figure 4: Output from the command varimp().

6
0.6
0.4
partial dependence
0.2
0.0
-0.2

| | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0

var 1

Figure 5: Output from the command singleplot(1).

> singleplot(1)

produces the plot shown in Fig. 5. This is the partial dependence of the model (2) on predictor
variable x1 , after accounting for the (average) joint effect of the other predictors (x2 , · · ·, x10 ).
The hash marks at the base of the plot represent the deciles of the x1 distribution. In general,
the nature of the dependence of the model on this variable will depend on the values of the other
predictor variables, and cannot be represented on a single plot. The partial dependence plot
shows the dependence on x1 as averaged over the distribution of values of the other variables
{x2 , x3 , · · ·, x10 }. While this may not provide a comprehensive description, it can show general
trends.
Partial dependence plots for the other predictor variables are obtained using the singleplot
command with the corresponding variable number in the argument. Figure 6 collects together
the results for the first six predictor variables. All plots are centered to have zero mean. Partial
dependence plots for categorical variables (here x3 and x4 ) are displayed as horizontal barplots,
each bar representing one of the categorical values. The values are (bottom to top) ordered on
increasing value when the category label is interpreted as a real number. (For this example the
labels for x3 were encoded as 0,1,2,3, and those for x4 are 0,1,2.)
Note that none of the functions of the numeric predictors are strictly smooth. Unlike most
other modeling procedures, MART imposes no explicit smoothness constraint on the solution. It
is able to model arbitrarily sharp changes in the response. The extent to which the plots exhibit
smooth general trends is dictated by the data. Here a smoother model tends to fit better. Also
note the much smaller vertical scale on the plot for x6 (lower right). This dependence looks like
pure noise and seems to have no general trend. The plots for predictor variables x 7 - x10 look
similar. This is consistent with Fig. 4 that indicated that x6 – x10 have little to no predictive
power.
The MART procedure singleplot plots partial dependence on individual predictor variables.
It is also possible to plot partial dependences on pairs of predictor variables. The command

> pairplot(1,2)

7
0.6

0.4
3
0.4

0.2
partial dependence

partial dependence

2
0.2

var 3
0.0
0.0

1
−0.2
−0.2

−0.4

| | | | | | | | | | | | | | | | | |
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.0 0.2 0.4 0.6
var 1 var 2 Partial dependence

0.04
0.4

2
0.02
partial dependence

partial dependence
0.2

0.00
var 4

1
0.0

−0.02
−0.2

−0.04

0
−0.4

−0.06

| | | | | | | | | | | | | | | | | |
−0.4 −0.2 0.0 0.2 0.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Partial dependence var 5 var 6

Figure 6: Output of the commands singleplot(1); · · ·; singleplot(6).

8
1.0

partial dep
0.5
endence

0.0

−0.5

0.8
0.6

r2
0.2 0.4

va
0.4
var 0.6 0.2
1
0.8

Figure 7: Output of the command pairplot(1,2).

produces the picture shown in Fig. 7. This is a perspective mesh plot showing the dependence
of the predictive model (2) on joint values of (x1 , x2 ) after accounting for the average joint effect
of the other eight predictor variables. This joint dependence is seen to be fairly additive. The
shape of the dependence on either variable is unaffected by the value of the other variable. This
suggests that there is no interaction between these two variables.
Entering the command
> pairplot(1,3)
produces a series of plots as shown in Fig. 8. This is because x3 is a categorical variable. Each
successive plot shows the partial dependence of the predictive model on the numeric variable x 1 ,
for each of the respective categorical values of x3 . Here one sees a very strong interaction effect
between these two variables. For x3 = 0, the model has little or no dependence on x1 . For other
x3 –values, the shape of the dependence changes considerably. Figure 8 can be contrasted with
Fig. 5. Figure 8 is a more detailed description showing the partial dependence on x 1 , for each
value of x3 . Figure 5 shows the partial dependence on x1 as averaged over the values of x3 in
the data base.
Proceeding further, the command
> pairplot(1,4)
shows (Fig. 9) the partial dependence of the model on x1 for the three categorical values of x4 .
All of the plots reveal the same general dependence indicating little or no interaction between
x1 and x4 .

9
var 3 = 0 var 3 = 1
1.0

1.0
0.5

0.5
partial dependence

partial dependence
0.0

0.0
-0.5

-0.5
| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 1 var 1

var 3 = 2 var 3 = 3
1.0

1.0
0.5

0.5
partial dependence

partial dependence
0.0

0.0
-0.5

-0.5

| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 1 var 1

Figure 8: Output of the command pairplot(1,3).

10
var 4 = 0 var 4 = 1
0.6

0.6
0.4

0.4
partial dependence

partial dependence
0.2

0.2
0.0

0.0
-0.2

-0.2
-0.4

-0.4

| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 1 var 1

var 4 = 2
0.6
0.4
partial dependence
0.2
0.0
-0.2
-0.4

| | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0

var 1

Figure 9: Output of the command pairplot(1,4).

11
var 4 = 0 var 4 = 1

0.5

0.5
partial dependence

partial dependence
0.0

0.0
-0.5

-0.5
-1.0

-1.0
| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 2 var 2

var 4 = 2
0.5
partial dependence
0.0
-0.5
-1.0

| | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0

var 2

Figure 10: Output of the command pairplot(2,4).

The command

> pairplot(2,4)

produces the plots shown in Fig. 10. The dependence of the model on x2 is seen to be generally
linear for all values of x4 , but the slope is seen to change for different x4 –values. This suggests
a product dependence f (x4 ) · x2 on these two predictor variables.
The command

> pairplot(3,4)

produces a series of barplots (Fig. 11) because both x3 and x4 are categorical. Each barplot
shows the partial dependence on x3 , conditioned on each of the x4 values. By default, pairplot
chooses the variable with the smaller number of values as the one to be conditioned on. One can
override this default by specifying the conditioning variable. The command

> pairplot(3,4,cvar=3)

produces the corresponding plot shown in Fig. 12. Here neither set of plots show much evidence
for interactions between these two variables.
By repeatedly applying the procedures singleplot and pairplot, one can graphically examine
the predictive relationship between the predictor variables and the response, as reflected by the

12
var 4 = 0 var 4 = 1

3 3

2 2
var 3

var 3
1 1

0 0

−0.2 0.0 0.2 0.4 0.6 −0.2 0.0 0.2 0.4 0.6

Partial dependence Partial dependence

var 4 = 2

2
var 3

−0.2 0.0 0.2 0.4 0.6

Partial dependence

Figure 11: Output of the command pairplot(3,4).

var 3 = 0 var 3 = 1

2 2
var 4

var 4

1 1

0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

Partial dependence Partial dependence

var 3 = 2 var 3 = 3

2 2
var 4

var 4

1 1

0 0

−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4

Partial dependence Partial dependence

Figure 12: Output of the command pairplot(3,4,cvar=3).

13
predictive model. Depending on the complexity of this relationship, considerable understanding
can often be obtained concerning the system from which the data were collected.
The data used here for illustration were artificially generated according to the prescription
y = f (x) + σ · ε.
The eight numeric predictor variables were randomly sampled from a joint uniform distribution
U 8 [0, 1]. The two categorical variables were randomly sampled with equal probability assigned
to each of their values. The.errors were standard normal ε v N [0, 1]. The scale of the noise was
p
taken to be σ = varx [f ] 3, providing a 3/1 signal-to-noise ratio. The “target” function was

f (x) = [sin(2πx1 )]x3 + x2 · x4 + x5 . (5)

3.2 Prediction
The MART model can be used to predict (estimate) response values ŷ given sets of predictor
variable values {x1 , x2, · ··, xn } (2). This is done with the martpred command
> yh martpred(xp)
Here “xp” is either a vector of length nvar representing a single observation, or an nobs by nvar
matrix representing nobs observations to be predicted. Here nvar is the number of predictor
variables (columns of “x”). In the former case a single number is returned to “yh”. In the latter
case a vector of predictions of length nobs is returned. As an example,
> xp c(0.5,0.5,1,2,rep(0.5,6))
> martpred(xp)
[1] 1.474161

returns the prediction 1.474161 for the indicated vector “xp”.

A matrix of observations to be predicted “xp” can be input to R from an ASCII file in the
same manner as was the original data matrix “x”
> xp matrix(scan(’datadir /xpred’), ncol = nvar, byrow=T)
The resulting vector “yh” can be written to an ASCII file with the R write command
> write(yh,file=’datadir /yhat’)

4 Classification
For classification problems the response variable y is a label identifying class membership of each
observation. The goal of the model is to produce predictions ŷ that minimize the misclassification
risk R (average loss) of future predictions
R = ave L(ŷ, y). (6)
Here L(ŷ, y) is the loss incurred in predicting ŷ when the unknown true label is y. The loss
matrix is specified by the user. The default loss matrix in MART is
L(ŷ, y) = 1(ŷ 6= y) (7)
so that the risk (6) is the fraction of incorrect predictions ŷ 6= y. General loss structures can be
specified: see mart documentation (help file).
For MART in R the class labels can be of any type (numeric or character) and can assume
any set of values, provided that observations corresponding to the same class are represented
by the same label value. The observation labels can be read into R from an ASCII file by the
command

14
> y scan(’datadir /classlab.txt’)

where datadir is the directory where the file is stored. For the illustrations below, the predictor
variable data matrix “x” is the same as that used above to illustrate regression. The corre-
sponding class labels (classlab.txt) can be downloaded from (4). The R scan command assumes
by default that the quantities to be read from the file are numeric values. This is the case for
the classlab.txt file (4) used here. If the class labels stored in the file happened to be character
values, then the corresponding command

> y scan(’datadir /classlab.txt’,what=’ ’)

should be used.
The classification MART model is constructed using the mart command

> mart(x,y,lx,martmode=’class’)

This launches an independent task to build the model. One can monitor construction in real
time in the corresponding window. Shown in the window are the current number of iterations,
the corresponding test set misclassification risk (6) (7), and the fraction of training observations
currently being used to improve the model. MART uses only those observations that are close
to the current estimated decision boundary. When 200 iterations (default) are completed, the
window automatically closes and the numerical summary is presented:

MART execution finished.

iters best test misclass risk
200 195 0.1835

A graphical summary is obtained by

> progress()

which produces the two plots shown in Fig. 13. The left plot shows test misclassification risk as
a function of number of iterations and the right one shows the fraction of training observations
being used. Note that the largest fraction is 0.5. This is due to the stochastic sampling being
employed (Friedman 1999b).
As seen in the left plot, misclassification risk (here error rate) is still generally decreasing at
the end of the previous run. Entering the command

> moremart()

performs 200 more iterations (default) starting from the previous solution. When completed,
the summary

MART execution finished.

iters best test misclass risk
400 370 0.1555

shows that the additional iterations have improved the error rate. Again entering

> progress()

shows (Fig. 14) a graphical summary of the 400 iterations so far. The command

> progress(F)

15
Test misclassification risk Fraction of training observations used

0.5
0.4

0.4
0.3

0.3
Fraction
Risk
0.2

0.2
0.1

0.1
0.0

0.0
0 50 100 150 200 0 50 100 150 200

Iterations Iterations

Figure 13: Output of the command progress() for classification after 200 iterations.

Test misclassification risk Fraction of training observations used

0.5
0.4

0.4
0.3

0.3
Fraction
Risk
0.2

0.2
0.1

0.1
0.0

0.0

0 100 200 300 400 0 100 200 300 400

Iterations Iterations

Figure 14: Output of the command progress() for classification after 400 iterations.

16
produces a magnified version for just the iterations corresponding to the last moremart command
(not shown).
It appears from Fig. 14 that the error rate is on an increasing trend at the 400th iteration.
However, it sometimes happens that a local increasing trend can be followed by a later decreasing
one. This can be verified by again entering

> moremart()

to produce 200 more iterations. The corresponding summary

MART execution finished.

iters best test misclass risk
600 556 0.1540

shows that a later downward trend did indeed occur, but the final result is estimated to be only
slightly better.

4.1 Interpretation
A first step in interpreting classification models is to understand the nature of the misclassifica-
tions. Entering the command

> classerrors()

produces the horizontal barplot shown in Fig. 15. Each bar corresponds to one of the class
labels. The labels are shown on the left. For this example we see that there are five classes, each
labeled by an integer k ∈ {1, · · ·, 5}. This information can also be obtained from the command

> classlabels(y)

[1] 1 2 3 4 5
attr(,’’type’’)
[1] ’’num’’

The attribute type “num” indicates that here the class labels realize numeric rather than char-
acter (“chr”) values.
The length of each bar in Fig. 15 represents the fraction of test set observations in the
corresponding class that were misclassified. We see that class 3 was the most difficult to separate,
whereas classes 1 and 5 had relatively low error rates.
The next step is to examine the specific nature of the misclassifications. The command

> classerrors(class=3)

produces the barplot shown in Fig. 16. Each bar represents one of the classes into which class
3 observations were misclassified. The length of the bar is the fraction of class 3 (test set)
observations assigned to that (incorrect) class. Here we see that most class 3 observations were
misclassified as class 2 and a slightly smaller number as class 4.
One can examine the misclassification structure for each of the other classes in an analogous
manner. Figure 17 collects the results for all the classes of this example.
As with regression one can plot the estimated importance of the respective predictor variables
with the command

> varimp()

17
Total error rate = 0.154

1
Class label

0.00 0.05 0.10 0.15 0.20 0.25

Error rate

Figure 15: Output of the command classerrors().

Error rate for class 3 = 0.286

4
Class label

0.00 0.05 0.10 0.15

Error rate

Figure 16: Output of the command classerrors(class=3).

18
Error rate for class 1 = 0.094 Error rate for class 2 = 0.155 Error rate for class 3 = 0.286

4
4
Class label

Class label

Class label
2 1

2
3

0.00 0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06 0.08 0.10 0.00 0.05 0.10 0.15
Error rate Error rate Error rate

Error rate for class 4 = 0.145 Error rate for class 5 = 0.08

5 3
Class label

Class label

3 4

0.00 0.02 0.04 0.06 0.08 0.00 0.02 0.04 0.06

Error rate Error rate

Figure 17: Output of the commands classerrors(class=1); · · ·; classerrors(class=5).

Input variable importances for all classes

100

80
Relative importance

0
1

Input variable

Figure 18: Output of the command varimp() for classification.

19
Input variable importances for class 2

100

80
Relative importance

0
1

8
Input variable

Figure 19: Output of the command varimp(class=2).

producing the plot shown in Fig. 18. This shows the average importance of each of the variables
in predicting all of the classes.
One can also see the relative importances of the predictor variables in separating each of the
individual classes. The command

> varimp(class=2)

produces the plot shown in Fig. 19. Comparing this plot with Fig. 18, one sees that x 1 is
relatively more important in separating class 2 than it is averaged over all of the classes.
Given a class label, varimp allows one to identify which predictor variables are most important
in separating the corresponding class from the other classes. Sometimes it may be of interest to
pose the complement question. Given a predictor variable, to which classes does it contribute
most to their separation. This can be done with the command

> classimp(var=2)

producing the plot shown in Fig. 20. Here one sees that x2 contributes considerably more
towards separating class 5 than in separating the other classes.
After identifying those predictor variables that are most relevant to separating each of the
respective classes, it can be informative to see the nature of the dependence of each class on
those variables. The MART model for classification consists of a set of real valued functions
{fk (x)}k∈K , one for each class label k in the complete set of labels K. Each function is related
to the probability pk (x) = Pr[y = k | x] by
1 X
fk (x) = log pk (x) − log pl (x) (8)
#K
l∈K

where #K is the total number of classes. Thus, larger values of fk (x) correspond to increased
probability of realizing the class label k for the set of predictor values x. Partial dependence
plots of fk (x) on the predictor variables most influential in separating that class can sometimes

20
Contributions of variable 2

3
Class label

0 20 40 60 80 100

Contribution

Figure 20: Output of the command classimp(var=2).

provide insight into what values of those variables tend to increase or decrease the odds of
observing that class.
As in regression, partial dependence plots are made with singleplot and pairplot. The com-
mand

> singleplot(1,class=5)

plots the partial dependence of f5 (x) (8) on x1 , as shown in Fig. 21.

The command

> pairplot(1,3,class=5)

plots the partial dependence of f5 (x) (8) on joint values of x1 and x3 , as shown in Fig. 22.
The vague resemblance of Figs. 21 and 22 to Figs. 5 and 8 respectively is not a coincidence.
The class labels in the file “classlab”, used here for illustration, were obtained by partitioning
the values of the “target” function (5) at their 20%, 40%, 60%, and 80% points. Class labels
were assigned the to the data within each successive interval in ascending values from 1 to 5.

4.2 Prediction
The classification MART model can be used to predict either a class label, or class probabilities,
given one or more sets of predictor variable values {x1 , x2, · ··, xn }. Class label prediction is
done using martpred as described in Section 3.2. The only difference is that for classification the
procedure returns a predicted class label of the same type as “y” (numeric or character) that
was input to mart. For example

> xp c(0.5,0.5,1,2,rep(0.5,6))

> martpred(xp)

21
4
3
partial dependence
2
1
0
-1

| | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0

var 1

Figure 21: Output of the command singleplot(1,class=5).

[1] 3

predicts class 3 for the indicated vector “xp”.

Probability prediction is also done with martpred

> martpred(xp, probs=T)

[1] 0.001129717 0.203446046 0.448028862 0.330960423 0.016434953

The estimated probabilities are reported in the order of their internal mart numeric codes. These
can be obtained with the command

> classlabels(y)

[1] 1 2 3 4 5
attr(,’’type’’)
[1] ’’num’’

which returns the class labels in internal mart order.

If “xp” is an nobs by nvar matrix representing nobs observations to be predicted, martpred
returns a vector of predicted labels of length nobs, or an nobs by nclass matrix of probabilities
(probs=T). Here nvar is the number of predictor variables (columns of “x”), and nclass is the
number of classes (distinct values of y).

5 Saving the MART model

Each mart or moremart command causes the resulting model to be written to an internal working
storage buffer, over writing the model currently stored there. All subsequent MART commands
associated with this model (moremart, progress, martstat, martpred, varimp, singleplot, pairplot,
classimp, classerrors) reference this buffer. The martsave command can be used to save the
contents of the current working buffer. It can then be retrieved in its present state at a later
time for further analysis or to make predictions. For example,

22
var 3 = 0 var 3 = 1
6

6
4

4
partial dependence

partial dependence
2

2
0

0
-2

-2
| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 1 var 1

var 3 = 2 var 3 = 3
6

6
4

4
partial dependence

partial dependence
2

2
0

0
-2

-2

| | | | | | | | | | | | | | | | | |

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
var 1 var 1

Figure 22: Output of the command pairplot(1,3,class=5).

23
> martsave(’tutorial’)

saves the MART model developed for the present illustrative example under the name “tutorial”.
If that name is the same as a currently saved model, the latter will be overwritten by the current
buffer.
The command martget transfers a previously saved model to the current working buffer. It
can then be analyzed in the same manner as before it was saved (with martsave). For example,

> martget(’tutorial’)

can be used to retrieve the illustrative example at a later time. Note that this over writes the
model stored in the current working buffer, which should be saved (if needed) before issuing the
martget command.
The command martlist prints a list of all currently saved MART models:

> martlist()

$tutorial

The leading character (here $) is for internal use and should be ignored when referencing the
corresponding saved models.
Finally, the marterase command can be used to irrevocably remove a previously saved MART
model:

> marterase(’tutorial’)

> martlist()

No MART models saved

References
[1] Friedman, J. H. (1999a). Greedy function approximation: a gradient boosting machine.
http://www-stat.stanford.edu/˜jhf/ftp/trebst.ps

[2] Friedman, J. H. (1999b). Stochastic gradient boosting.

http://www-stat.stanford.edu/˜jhf/ftp/stobst.ps

Companion Applied Regression R
100% (13)
Companion Applied Regression R
802 pages
Week Two Assignment A
No ratings yet
Week Two Assignment A
1 page
Assignment 2: Introduction To R: Text Like This Will Be Problems For You To Do and Turn In. (There Are 7 in All.)
No ratings yet
Assignment 2: Introduction To R: Text Like This Will Be Problems For You To Do and Turn In. (There Are 7 in All.)
15 pages
An R Companion To Applied Regression 2nd Edition
No ratings yet
An R Companion To Applied Regression 2nd Edition
538 pages
KP Medical Astrology
80% (5)
KP Medical Astrology
68 pages
Tutorial: Getting Started With MART in R: Jerome H. Friedman Stanford University May 13, 2002
No ratings yet
Tutorial: Getting Started With MART in R: Jerome H. Friedman Stanford University May 13, 2002
24 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Week 5
No ratings yet
Week 5
5 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Mars 05
No ratings yet
Mars 05
28 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Rintro
No ratings yet
Rintro
14 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
R For Introductory Econometrics-1
No ratings yet
R For Introductory Econometrics-1
4 pages
Package Rminer': R Topics Documented
No ratings yet
Package Rminer': R Topics Documented
43 pages
R Examples
No ratings yet
R Examples
56 pages
Theory 1. R Basics
No ratings yet
Theory 1. R Basics
43 pages
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell - Read Online Or Download Now
100% (6)
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell - Read Online Or Download Now
35 pages
1.1.4 Introduction To R FW
No ratings yet
1.1.4 Introduction To R FW
13 pages
R_intro2021
No ratings yet
R_intro2021
23 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Howtouser: 1 What Is R
No ratings yet
Howtouser: 1 What Is R
6 pages
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow
No ratings yet
Brief Introduction To R Kaustav Banerjee: Decision Sciences Area, IIM Lucknow
7 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Download full Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell all chapters
100% (20)
Download full Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell all chapters
43 pages
Applied Multivariate Statistics in R
100% (1)
Applied Multivariate Statistics in R
562 pages
DA_Lab_Week-1
No ratings yet
DA_Lab_Week-1
7 pages
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
100% (2)
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
40 pages
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
Download full Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell all chapters
100% (13)
Download full Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell all chapters
43 pages
R Short Tutorial
No ratings yet
R Short Tutorial
5 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
R- language
No ratings yet
R- language
23 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
R Programming
No ratings yet
R Programming
50 pages
Econ 1-2
No ratings yet
Econ 1-2
33 pages
Mendenhall R
No ratings yet
Mendenhall R
14 pages
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
No ratings yet
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
33 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Untitled
No ratings yet
Untitled
59 pages
Modeling and Visulizing Data Using R: A Practical Introduction
No ratings yet
Modeling and Visulizing Data Using R: A Practical Introduction
106 pages
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
No ratings yet
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
31 pages
saurabh
No ratings yet
saurabh
22 pages
RStudio Exercices
No ratings yet
RStudio Exercices
8 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Essential R
No ratings yet
Essential R
261 pages
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
No ratings yet
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
10 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Carbon Border Mechanism Cbam Adjustment
No ratings yet
Carbon Border Mechanism Cbam Adjustment
10 pages
Marriage Prediction in KP Astrology
100% (2)
Marriage Prediction in KP Astrology
12 pages
Enginectra - HR - Project Coordinator 1-2
No ratings yet
Enginectra - HR - Project Coordinator 1-2
1 page
Elec Rev0 A
No ratings yet
Elec Rev0 A
3 pages
Associate Fellow
No ratings yet
Associate Fellow
3 pages
Practicing Lean Management - DESL Presentation
No ratings yet
Practicing Lean Management - DESL Presentation
28 pages
Lean Management - The Role of Lean Leaders - Planview
No ratings yet
Lean Management - The Role of Lean Leaders - Planview
7 pages
A Political Career in Astrology
No ratings yet
A Political Career in Astrology
31 pages
CV-DR Arivalagan Arumugam-September 2021
No ratings yet
CV-DR Arivalagan Arumugam-September 2021
6 pages
Check Career in Acting Reference Guide
No ratings yet
Check Career in Acting Reference Guide
1 page
How To Write & Launch Your Book To $10,000 in 90 Days
100% (1)
How To Write & Launch Your Book To $10,000 in 90 Days
84 pages
1633928266
100% (1)
1633928266
11 pages
The FPSO Contractors Guidebook
100% (3)
The FPSO Contractors Guidebook
20 pages
Online Training On Virtual Training Skills May 2020
No ratings yet
Online Training On Virtual Training Skills May 2020
1 page
Managing Customer Relationships Through Price and Service Quality
No ratings yet
Managing Customer Relationships Through Price and Service Quality
26 pages
Sector Key Drivers Areas of Opportunities: Title D/Img Fees Status Acquisition
No ratings yet
Sector Key Drivers Areas of Opportunities: Title D/Img Fees Status Acquisition
1 page
Centralized Ordering Policies in A Multi-Warehouse System With Lead Times and Random Demand
No ratings yet
Centralized Ordering Policies in A Multi-Warehouse System With Lead Times and Random Demand
16 pages
Management Research Methodology Integration of Principles Methods and Techniques by K N Krishnaswamy Appa Iyer Sivakumar M Mathirajan B00GO552JE
No ratings yet
Management Research Methodology Integration of Principles Methods and Techniques by K N Krishnaswamy Appa Iyer Sivakumar M Mathirajan B00GO552JE
5 pages
Overcoming Post COVID19 Challenges Through Design Alternatives
No ratings yet
Overcoming Post COVID19 Challenges Through Design Alternatives
16 pages
Data Mining Pertemuan 6
No ratings yet
Data Mining Pertemuan 6
28 pages
DM Assg 041
No ratings yet
DM Assg 041
9 pages
Project: Creditworthiness: Step 1: Business and Data Understanding
No ratings yet
Project: Creditworthiness: Step 1: Business and Data Understanding
12 pages
Download full Sampling Theory and Practice 1st Edition Changbao Wu Mary E Thompson ebook all chapters
No ratings yet
Download full Sampling Theory and Practice 1st Edition Changbao Wu Mary E Thompson ebook all chapters
50 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Acupuncture For Uterine Fibroids
100% (1)
Acupuncture For Uterine Fibroids
17 pages
BRM Unit-4
No ratings yet
BRM Unit-4
18 pages
ADANCO 2.0.1 User Manual
No ratings yet
ADANCO 2.0.1 User Manual
61 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
Apds Presantion
No ratings yet
Apds Presantion
24 pages
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
No ratings yet
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
10 pages
Inequalities by Sex On Remaining Teeth in Adults - A Decomposition Analysis
No ratings yet
Inequalities by Sex On Remaining Teeth in Adults - A Decomposition Analysis
7 pages
IDF cURVE OF pOKHARA KUSHAMA
No ratings yet
IDF cURVE OF pOKHARA KUSHAMA
28 pages
[FREE PDF sample] Statistical and Machine-Learning Data Mining, Third Edition: Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition Bruce Ratner ebooks
100% (4)
[FREE PDF sample] Statistical and Machine-Learning Data Mining, Third Edition: Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition Bruce Ratner ebooks
55 pages
Dodge 2015
No ratings yet
Dodge 2015
12 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Quantitative Analysis Using Spss
No ratings yet
Quantitative Analysis Using Spss
42 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Unit 2 PPT (BA)
No ratings yet
Unit 2 PPT (BA)
33 pages
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
No ratings yet
The Effect of Fraud Risk Management, Risk Culture, On The Performance of Nigerian Banking Sector Preliminary Analysis2
14 pages
rubin1976
No ratings yet
rubin1976
12 pages
Dance Movement Therapy For Depressed Clients Profiles of The Level and Changes in Depression
No ratings yet
Dance Movement Therapy For Depressed Clients Profiles of The Level and Changes in Depression
18 pages
AWS Machine Learning Specialty
100% (1)
AWS Machine Learning Specialty
67 pages
PRELIS Examples Guide PDF
No ratings yet
PRELIS Examples Guide PDF
78 pages
FDS - 3 SOLVED
No ratings yet
FDS - 3 SOLVED
21 pages
Session 2 - Data Pre-Processing
No ratings yet
Session 2 - Data Pre-Processing
19 pages
Da Public Slides Ch02 v3 2023
No ratings yet
Da Public Slides Ch02 v3 2023
49 pages
Knowledge Management in International Development: Conference Paper
No ratings yet
Knowledge Management in International Development: Conference Paper
5 pages