Teaching Decision Tree Classification Using Microsoft Excel: INFORMS Transactions On Education
Teaching Decision Tree Classification Using Microsoft Excel: INFORMS Transactions On Education
Teaching Decision Tree Classification Using Microsoft Excel: INFORMS Transactions On Education
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Vol. 11, No. 3, May 2011, pp. 123–131
issn 1532-0545 11 1103 0123
I N F O R M S doi 10.1287/ited.1100.0060
© 2011 INFORMS
Transactions on Education
D ata mining is concerned with the extraction of useful patterns from data. With the collection, storage,
and processing of data becoming easier and more affordable by the day, decision makers increasingly
view data mining as an essential analytical tool. Unfortunately, data mining does not get as much attention
in the OR/MS curriculum as other more popular areas such as linear programming and decision theory. In
this paper, we discuss our experiences in teaching a popular data mining method (decision tree classification)
in an undergraduate management science course, and we outline a procedure to implement the decision tree
algorithm in Microsoft Excel.
Key words: data mining; decision tree classifier; spreadsheet modeling
History: Received: January 2010; accepted: August 2010.
as linear programming, Monte Carlo simulation, time- is a predictive task, looks at assigning objects to one
series forecasting, aggregate planning, and inventory of several predefined categories or class values. Com-
management. With this foundation, students are able mon applications of classification algorithms include
to tackle the more advanced material taught in our categorization of customers as loyal or risky based on
course. the recorded historical behavior (Wei and Chiu 2002),
Data mining is a broad field of study, and it is detection of spam e-mail messages based on the mes-
not possible to cover the entire field in about a quar- sage header and content (Pantel and Lin 1998), cat-
ter of a semester. Because students will likely see egorization of cells as malignant or benign based on
data mining techniques used for predictive purposes, the results of various tests (Mangasarian et al. 1995),
we focus primarily on one such predictive technique and classification of galaxies based on their shapes
called decision tree classification. Decision tree classi- (Bershady et al. 2000).
fication algorithms are covered in many data mining Well-known classification algorithms include deci-
textbooks such as those by Witten and Frank (2005), sion trees (Quinlan 1986), artificial neural networks
Tan et al. (2006), and Olson and Shi (2007). (Rosenblatt 1958, Rumelhart et al. 1986), naive Bayes
The contribution of our work is a self-contained classifier (Domingos and Pazzani 1997), nearest neigh-
teaching module that OR/MS educators can incorpo- bor algorithms (Dasarathy 1990), and support vector
rate directly into or adapt for their own courses. Our machines (Vapnik 1995).
use of Microsoft Excel to implement the decision tree
algorithm eliminates the need to devote class time to
teaching specialized data mining software. Also, the 3. Decision Tree Induction Algorithm
data set used in the case has issues commonly seen in The basic concept behind the decision tree classifi-
real-life data such as missing or incomplete data. By cation algorithm is the partitioning of records into
having students go through the process of verifying “purer” subsets of records based on the attribute val-
and cleaning their data, they learn how to deal with ues. A pure subset is one in which all the records have
missing or noisy data should they encounter this in the same class label. The end result of the decision
the future. tree algorithm is the output of classification rules that
The outline of the paper is as follows. A brief are simple to understand and interpret. This inter-
review of the data mining literature is provided in the pretability property is strength of the decision tree
next section. When teaching this data mining topic algorithms.
in our course, we begin by illustrating the decision In general, these algorithms find the attribute that
tree classification algorithm using a simple example of best splits a set of records into a collection of subsets
assessing whether a loan applicant is likely to default with the greatest overall purity measure. The purity of
on the loan. This example, adapted from Tan et al. a subset can be quantified by entropy, which measures
(2006), and the decision tree algorithm are described the amount of information loss. Entropy is a real num-
in §3. In §4, we illustrate an implementation of the ber between zero and one, where an entropy value of
decision tree algorithm in Microsoft Excel. In §5, we zero indicates that the data set is perfectly classified
discuss our experience of teaching this data mining while a value of one indicates that no information has
topic in the management science elective course and been gained. The algorithm recursively operates on
provide details of the case study assigned to the stu- each newly generated subset to find the next attribute
dents. Concluding remarks follow in §6. with which to split the data set. The algorithm stops
when all subsets are pure or some other stopping
2. Data Mining criterion has been met. Examples of stopping crite-
Data mining is concerned with the extraction of use- ria include the exhaustion of attributes with which
ful patterns from data. The identification of patterns to split the impure nodes, predefined node size, and
can be performed in an ad hoc manner if the number minimum purity level.
of records (or entries) in the database and the number To illustrate the decision tree algorithm, we use a
of fields (or attributes) per record is small. However, database of previous loan borrowers adapted from
with many practical databases such as point-of-sales Tan et al. (2006). The data set, shown in Table 1,
data, Web logs, and e-commerce data containing mil- contains 10 records and each record has 4 attributes:
lions of records with hundreds of attributes (Fayyad home owner, marital status, annual income, and
et al. 1996, Witten and Frank 2005), a more systematic defaulted. The defaulted attribute is the class or pre-
approach is needed. diction variable and it has two labels: yes and no.
Generally speaking, data mining tasks are predic- Three borrowers defaulted on their loans while the
tive (identifying patterns for predictive purposes), remaining seven did not. The possible values for the
explanatory (identifying patterns to help explain rela- other three attributes are yes or no for home owner;
tionships in the data), or both. Classification, which single, married, or divorced for marital status; and
Ataman, Kulick, and Sim: Teaching Decision Tree Classification Using Microsoft Excel
INFORMS Transactions on Education 11(3), pp. 123–131, © 2011 INFORMS 125
Table 1 Data Set of Previous Loan Borrowers Figure 1 Splitting on the Home Owner Attribute
By weighing the entropy of each child node by the Figure 3 Completed Decision Tree
number of records in each child node relative to the
total number of records in the parent node, we obtain Y=3
the impurity measure for that particular split. Using N=7
Equation (2), the impurity value of the split using the
home owner attribute is
3
10
× Entropy4home owner = yes5 + 107
Y=0 Y=3 Y=0
× Entropy4home owner = no5 N=3 N=1 N=3
3
= 10
× 0 + 107 × 00985 = 00690 (3) Income = Low Income = Average Income = High
4. Implementing the Decision variable. We have this data located in a sheet titled
Tree in Excel “Data.”
A significant portion of the work in building a deci- Figure 5 shows our implementation of the first level
sion tree is in the calculations of the impurity values of the decision tree (see the sheet labeled Level 1).
for each possible split at each nonpure node in the At the first level, the algorithm evaluates the splitting
tree. There are many software packages that automat- of the root node using each of the three attributes.
ically perform these calculations, such as the open- The top table in Figure 5 corresponds to splitting the
source software package Weka (Witten and Frank root node using the home owner attribute, the mid-
2005) that is available from http://www.cs.waikato.ac dle table to splitting using the marital status attribute,
.nz/~ml/. However, it is pedagogically instructive for and the bottom table to the income attribute.
students to manually build the decision tree to better The table within Figure 5 illustrates that the split
understand the mechanics of the algorithm and the of the root node using the home owner attribute con-
issues that may arise when constructing the decision tains two pairs of rows, each row representing the
tree. scenario where the home owner attribute takes on the
Our choice of Microsoft Excel for this exercise is value of yes or no. The formulas in Row 4 of Fig-
primarily because of its ability to quickly perform ure 5 extract the number of records containing home
complex calculations. Once students understand how owner = yes and calculates the entropy value for this
entropy and impurity are calculated, using Excel to child node. The formulas in Cells F4 to K4 are as
perform these calculations will free them from this below.
mechanical process so they may focus more on the
structure of the tree. We will discuss this in further
Cell Formula Copy to
detail in §5. In addition, Excel allows us to work on
larger data sets, which brings more realism to the F4 = DCOUNT(Data!$A$1: G4
topic while keeping the problem tractable without $E$11,“ID,”A3:D4)
being too encumbered by the entropy and impurity H4 = SUM(F4:G4)
calculations. I4 = IF(F4=0,0,(F4/$H4) J4
The number of entropy and impurity calculations ∗ LOG(F4/$H4,2))
remains an issue with the decision tree algorithm. K4 = −SUM(I4:J4)
In the worst case, the number of entropy values
that have to be calculated is of the order O4ana 5, The DCOUNT formula is a query function that
where a is the number of attributes in the data set, counts the number of records in the database (in
and n = maxi=11 000 1 a 8ni 9, where ni is the number of this case, the loan data set table) that match the
classes in attribute i. This problem is similar to the criteria default = no, home owner = yes, marital
curse of dimensionality issue with the implementa-
status = , income = . The DCOUNT formula ignores
tion of dynamic programming in Excel (Raffensperger
attributes that have blank values (i.e., marital status
and Pascal 2005). Though the issue of the exponen-
and income). When the formula in Cell F4 is copied to
tial number of impurity calculations has yet to be
Cell G4, the DCOUNT formula in Cell G4 is updated
resolved, we have designed an implementation proce-
to contain the default = yes criterion and drops the
dure for problem sets with binary class variables that
default = no criterion. The IF function in Cells I4 and
requires the modeler to perform several copy-and-
J4 is used to return a value of zero instead of an error
paste operations and needs only some subsequent
value when calculating log2 0. (Recall that we define
minor modifications to the pasted cells.
0 log2 0 = 0.) The formula in Cell K4 is Equation (1).
Figure 4 shows the loan data set in Excel, with
The formulas in Row 6 of Figure 5 perform the
the annual salary attribute converted into a nominal
same calculations for the home owner = no child
Figure 4 Loan Data Set in Excel
node. Finally, in cell B2, we calculate the impurity
value of this split (see Equation (3)) using the formula
A B C D E F
1 ID Homeowner Marital status Income Default = SUMPRODUCT4H42 H61 K42 K65 SUM4H42 H650
2 1 Yes Single High No
3 2 No Married Average No
4 3 No Single Low No When one of the tables shown in Figure 5 is com-
5 4 Yes Married High No pleted, one can create a copy of the table to evaluate
6 5 No Divorced Average Yes
7 6 No Married Low No other splits. For example, to evaluate splitting the root
8 7 Yes Divorced High No node using the marital status attribute, we copy the
9 8 No Single Average Yes
10 9 No Married Low No
home owner table and make the necessary changes to
11 10 No Single Average Yes the criteria used in the DCOUNT formula by leaving
12 the home owner cells blank and entering the different
Ataman, Kulick, and Sim: Teaching Decision Tree Classification Using Microsoft Excel
128 INFORMS Transactions on Education 11(3), pp. 123–131, © 2011 INFORMS
Figure 5 Calculating the Impurity Values of Splitting the Root Node Using the Home Owner, Marital Status, and Income Attributes
class values for the marital status attribute. As shown 2008. The class met twice a week for 75 minutes each
in Figure 5, an additional row has to be added to class period. The students were business majors of
the table because the marital status attribute has three junior or senior standing, and they would have already
possible values: single, married, and divorced. The taken the introductory management science course.
SUMPRODUCT impurity formula must be updated In this course, we cover four topics: decision theory,
to include any newly added pairs of rows. revenue management, data mining, and mathematical
Figure 5 shows that splitting the root node by programming. For each topic, we typically spend two
annual income provides the lowest impurity value to four 75-minute class meetings discussing basic the-
with only the income = average child node being an ory and working through examples, one class period
impure node. We can use the same setup as before outlining the details and setting expectations for the
to split this node. Consider splitting the income = case the students will complete in groups, and one
average node by the home owner attribute. As a class period for the student groups to present their
shortcut, we can use a copy of the home owner table work on the case and the instructor to summarize
from the Level 1 sheet. In that table, we simply set the topic.
income = average and the formulas will automatically For the data mining module, we spent two class
recalculate to provide the impurity value of this split. periods motivating the need to understand data
Figure 6 shows the splitting of the income = average mining, working through the bank loan example pre-
node by the home owner and marital status attributes. sented in §3 that includes calculating the entropy and
impurity values by hand, implementing the decision
5. Classroom Experience tree induction algorithm in Microsoft Excel (see §4),
We have taught this material in an undergraduate and discussing some implementation issues with the
management science elective course every year since decision tree induction algorithm.
Figure 6 Calculating the Impurity Values of Splitting the Income = Average Node Using the Home Owner, and Marital Status Attributes
Source. Microsoft product screen shot reprinted with permission from Microsoft Corporation.
Ataman, Kulick, and Sim: Teaching Decision Tree Classification Using Microsoft Excel
INFORMS Transactions on Education 11(3), pp. 123–131, © 2011 INFORMS 129
In the third class period, we discussed a case that the missing value with an appropriate estimate such
we had prepared based on “A Well-Known Business as the average, median, or mode value, which can be
School” case by Bell (1998). The original intent of this based on the whole database or a sample of records
case is to create expert systems. In our version of the with similar values for the other attributes.
case, the focus is on data mining and classification, Students usually run into a minor roadblock
and the task is to extract rules for admitting students when verifying the GMAT variable because they are
into an MBA program from a database of previous unaware of the range for GMAT scores. This leads
MBA applicants. The database contains records for 73 to another important lesson: Understand your data.
previous MBA applicants and each record contains From a quick Web search, students found that GMAT
an applicant’s GPA and GMAT scores, the number of scores ranged from 200 to 800. Recalling that decision
months of relevant work experience, an extracurricu- trees work better with categorical instead of continu-
lar activity score, an essay score, and whether or not ous variables, students were asked to suggest rules to
the applicant was offered admission into the program. discretize the GMAT variable. Without fail, they sug-
The GPA and GMAT scores are continuous variables. gested using “nice” ranges such as 200 to 300, 300 to
The work experience variable has integer values rang- 400, and so on. Even though such a discretization rule
ing from 0 to 84 months. The activity and essay vari- seems reasonable, a more important factor is whether
ables are rated as A, B, C, or D with A being the the rule is sensible. When we explained that GMAT
highest score and D the lowest. scores are interpreted in a similar fashion to ACT and
This database has two attributes that are continu- SAT scores with which they are more familiar, they
ous variables and several records with missing val- realized that each GMAT score corresponds to a per-
ues. The continuous variables and missing values centile score and that a more sensible approach is
issues provide an excellent opportunity to discuss to discretize the GMAT variable based on percentiles
data preparation, which is one of the KDD processes. instead of the raw scores. Again, this reinforced the
(Bell 2008, p. 30) refers to this data preparation step need to understand the data. At this point, the stu-
as “pre-O.R.,” which he defines as “the grunt work dents are reminded about how the discretization deci-
that O.R. people have to do before they can apply sion could affect the final decision tree in terms of its
O.R. methods and models.” Based on his experi- size: broad versus narrow decision trees (based on the
ence, Bell (2008) estimates that there is an “80/20 number of class values for each attribute) and shal-
rule of analytics”: Analysts usually spend a lot more low versus deep decision trees (based on the number
time doing data processing than building the actual of attributes).
OR/MS models. As such, Bell believes that students With the necessary foundation and background
should be given more exposure to this exercise in from the discussion of the case and data, the students
their O.R. coursework and that this MBA database gathered in their groups to decide how to prepro-
provides an excellent opportunity to do so. cess their data and build their decision trees. While
We began the data preparation process by instruct- working on their decision trees, the groups found that
ing students to check the validity of their data. For a handful of the nodes at the end of their decision
example, GPA scores should be between 1 and 4. This trees were not pure and they did not have any other
can be verified easily using the MIN and MAX func- attributes that they could use to split the impure child
tions in Excel. We also showed the students how to nodes. We discussed some ways to deal with these
perform this data verification process using the sort- nodes, for example, implementing a “majority rules”
ing tool and the AutoFilter tool in Excel. or “flip a coin” rule to classify all the records in an
While performing the data validation process, impure child node, or using an approach called prun-
many students noticed that there were records with ing that helps simplify the final decision trees so that
missing or incomplete GPA values. When we asked more interpretable classification rules can be obtained.
them to suggest ways to deal with these records, On the day of the group presentations, we invited
a common suggestion was to remove these records the MBA director to attend the class to provide
from the database. We responded by posing a ques- real-life feedback and comments about the students’
tion based on the bank loan problem they had work and participate in the question-and-answer
previously seen: If a customer were to refuse to pro- period. The students particularly enjoyed discussing
vide information about his income in the loan applica- the admission process with the MBA director. One
tion form, what could that imply about the customer? interesting question asked by a student was how the
The students inevitably realized that a missing value director decided which applicants to admit if the
could itself prove to be useful information. We then number of qualified candidates exceeded the num-
discussed possible solutions to handling missing val- ber of available spots in the program. From a data
ues, for example, treating a missing value as a valid mining perspective, we mentioned that many classi-
stand-alone class value for the attribute or replacing fication algorithms including decision trees can also
Ataman, Kulick, and Sim: Teaching Decision Tree Classification Using Microsoft Excel
130 INFORMS Transactions on Education 11(3), pp. 123–131, © 2011 INFORMS
rank the records within the database (Ataman et al. similar to those in the confusion matrix table. This
2006, Caruana et al. 1996, Crammer and Singer 2002, classification exercise illustrated to the students one
Rakotomamonjy 2004). Ranking using decision tree approach of obtaining this expert information.
algorithms is typically done by modifying the algo-
rithm so that in addition to providing the predicted
6. Conclusions
class label for a record, the tree provides the probabil-
In this paper, we introduce a classification algorithm
ity of the record belonging to a class. These probabili-
called decision tree induction that can be used for
ties then can be used to order the data points (Provost
data mining. We show how one can implement the
and Domingos 2003).
algorithm in Microsoft Excel and discuss our experi-
As part of the summary of the data mining topic,
ences teaching this material within an undergraduate
the student groups were provided with a test or
management science elective course.
out-of-sample set of 20 applicants. The groups then
Data mining is usually not covered in the typical
assessed which of these 20 applicants would be
OR/MS curriculum. However, we believe that data
accepted into the MBA program based on their mining is a very useful and practical tool that stu-
decision trees. After the groups had classified the dents should have in their OR toolbox and, there-
applicants in the test set, they were informed which fore, it is worth dedicating a handful of class hours
applicants were accepted and which were not. Based to this increasingly important topic. We envision that
on these results, we evaluated the accuracy of their this material can be a supplemental topic to forecast-
decision tree models, where accuracy is defined as the ing (e.g., classification as a predictive task), regres-
ratio of correctly classified records to the total number sion (e.g., an alternative approach to binary logistic
of records: regression for identifying the key independent vari-
TP + TN
1 ables to help explain the dependent variable), or deci-
TP + FP + TN + FN
sion theory (e.g., creating the confusion matrix for use
and where TP, TN, FP, and FN are the number of
in sequential decision-making problems).
true positive, true negative, false positive, and false
negative records, respectively. These four metrics also Acknowledgments
allowed us to revisit the Types I and II error measures We thank the editor and anonymous referees for their
that the students have seen in their statistics courses. thoughtful comments on earlier versions of the paper. Their
We ended the lesson by instructing the student suggestions helped improve the content and presentation of
groups to construct a confusion matrix (Kohavi and the paper.
Provost 1998) that displays the four metrics TP, TN,
FP, and FN in a 2-by-2 table as shown in Figure 7.
Students recognize that the confusion matrix is References
similar to the table of joint probabilities that they Albright, S. C., W. Winston, C. Zappe. 2008. Data Analysis and Deci-
had seen in sequential decision making problems, sion Making with Microsoft Excel. South-Western College Pub-
lishers, Cincinnati, OH.
which is covered in the decision theory portion of Anderson, D. R., D. J. Sweeney, T. A. Williams, J. D. Camm, R. K.
the course. Recall that in sequential decision mak- Martin. 2010. An Introduction to Management Science. South-
ing, a decision maker is faced with choosing from Western College Publishers, Cincinnati, OH.
Ataman, K., W. N. Street, Y. Zhang. 2006. Learning to rank by
two or more competing options, where each option maximizing AUC with linear programming. Proc. IEEE Inter-
will result in different payoffs or rewards depending nat. Joint Conf. Neural Networks (IJCNN), Vancouver, British
on the realization of the random event following the Columbia, Canada, 123–129.
choice. The decision-making process may also include Babcock, C. 2006. Data, data, everywhere. InformationWeek.
Accessed July 6, 2010, http://www.informationweek.com/news/
an option where the decision maker can enlist the global-cio/showArticle.jhtml?articleID=175801775.
help of an external source (for example, an expert) Bell, P. C. 1998. Management Science/Operations Research: A Strategic
to provide better information about the likelihood of Perspective. South-Western College Publishers, Cincinnati, OH.
the occurrences of future random events. Typically, Bell, P. C. 2008. Riding the analytics wave. OR/MS Today 35(4)
28–32.
this information is presented in the form of condi- Bershady, M. A., A. Jangren, C. J. Conselice. 2000. Structural and
tional probabilities or in the form of joint probabilities photometric classification of galaxies. I. Calibration based on a
nearby galaxy sample. Astronomical J. 119 2645–2663.
Caruana, R., S. Baluja, T. Mitchell. 1996. Using the future to “sort
Figure 7 Confusion Matrix out” the present: Rankprop and multitask learning for med-
ical risk evaluation. Adv. Neural Inform. Processing Systems
Predicted 8 959–965.
Negative Positive Crammer, K., Y. Singer. 2002. Pranking with ranking. Adv. Neural
Inform. Processing Systems 14 641–647.
Negative TN FP Dasarathy, B. V. 1990. Nearest Neighbor (NN) Norms: NN Pat-
Actual
tern Classification Techniques. IEEE Computer Society Press,
Positive FN TP
Los Alamitos, CA.
Ataman, Kulick, and Sim: Teaching Decision Tree Classification Using Microsoft Excel
INFORMS Transactions on Education 11(3), pp. 123–131, © 2011 INFORMS 131
Domingos, P., M. Pazzani. 1997. On the optimality of the simple Raffensperger, J. F., R. Pascal. 2005. Implementing dynamic pro-
Bayesian classifier under zero-one loss. Machine Learn. 29(2–3) grams in spreadsheets. INFORMS Trans. Ed. 5(2). http://ite
103–130. .pubs.informs.org/Vol5No2/RaffenspergerRichard/.
Fayyad, U., G. Piatetsky-Shapiro, P. Smyth. 1996. From data mining Ragsdale, C. 2007. Spreadsheet Modeling & Decision Analysis: A Prac-
to knowledge discovery in databases. AI Magazine 17(3) 37–54. tical Introduction to Management Science. South-Western College
Hillier, F. S., G. J. Lieberman. 2009. Introduction to Operations Publishers, Cincinnati, OH.
Research. McGraw-Hill, New York. Rakotomamonjy, A. 2004. Optimizing area under ROC curve with
Kohavi, R., F. Provost. 1998. Glossary of terms. Machine Learn. 30 SVMs. First Workshop ROC Analysis in AI, Valencia, Spain,
271–274. 71–80.
MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Rosenblatt, F. 1958. The perceptron: A probabilistic model for infor-
Algorithms. Cambridge University Press, Cambridge, UK. mation storage and organization in the brain. Psych. Rev. 65
Mangasarian, O. L., W. N. Street, W. H. Wolberg. 1995. Breast cancer 386–408.
diagnosis and prognosis via linear programming. Oper. Res. Rumelhart, D. E., G. E. Hinton, R. J. Williams. 1986. Learning
43(4) 570–577. internal representations by error propagation. D. E. Rumelhart,
Netflix. 2006. Netflix prize. Accessed August 18, 2009, http://www J. L. McClelland, eds. Parallel Distributed Processing, Vol. 1. MIT
.netflixprize.com. Press, Cambridge, MA, 318–362.
Olson, D., Y. Shi. 2007. Introduction to Business Data Mining. Tan, P.-N., M. Steinbach, V. Kumar. 2006. Introduction to Data Min-
McGraw-Hill, New York. ing. Addison Wesley, Boston.
Pantel, P., D. Lin. 1998. Spamcop: A spam classification and organi- Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer-
zation program. Proc. Fifteenth Natl. Conf. Artificial Intelligence, Verlag, New York.
Madison, WI, 95–98. Wei, C., I. Chiu. 2002. Turning telecommunications call details to
Provost, F., P. Domingos. 2003. Tree induction for probability-based churn prediction: A data mining approach. Expert Systems Appl.
ranking. Machine Learn. 52 199–215. 23 103–112.
Quinlan, J. R. 1986. Induction of decision trees. Machine Learn. Witten, I. H., E. Frank. 2005. Data Mining: Practical Machine Learning
1 81–106. Tools and Techniques, 2nd ed. Morgan Kaufmann, San Francisco.