CH 06
CH 06
CH 06
POWELL
KENNETH R. BAKER
• k-Nearest Neighbor
• Naïve Bayes
• Classification and Prediction Trees
• Multiple Linear Regression
• Logistic Regression
• Neural Networks
• Strengths:
– Simplicity. Requires no assumptions as to the form of the
model, few assumptions about parameters.
– Only one parameter estimated (k)
– Performs well where there is a large training database, or many
combinations of predictor variables
• Weaknesses:
– Provides no information on which predictors are most effective
in making a good classification
– Long computational times; number of records required
increases faster than linearly
• Strengths:
– Remarkably simple, but often gives classification accuracy as
good as or better than more sophisticated algorithms
– Requires no assumptions other than class-conditional
independence
• Weaknesses:
– Requires large number of records for good results
– Estimates a probability of zero for new cases with a predictor
value missing from the training database
– Suitable only for classification, not for estimating class
probabilities
• Strengths:
– Easy to understand and explain
– Transparent results, can be interpreted as explicit If-Then rules
– Based on few assumptions, works well even with missing data
and outliers
• Weaknesses:
– Accurate results require very large databases
– Allows partitioning of only individual variables, not pairs or
groups
– Specific to XLMiner: only binary categorical variables allowed.
• Strengths:
– Well-known and well-accepted model for prediction
– Easy to implement and interpret
– Inferential statistics (p-values and R2) are available
• Weaknesses:
– Possible for regression model to exhibit high R2 but low predictive
accuracy
• Strengths:
– Well-known, widely used, especially in marketing
– Easy to implement and fairly straightforward
• Weaknesses:
– A facility with concept of odds often necessary
– If a large number of predictor values, then often necessary to
reduce them to most important through pre-processing,
inferential statistics, best subset selection
• Strengths:
– Highly successful in many applications (thought unsuccessful in
many more)
– Very flexible because the fundamental structure (the number of
hidden layers and nodes) is chosen by the user
– Capture complex relationships between inputs and outputs
• Weaknesses:
– Difficult to interpret, thus hard to justify
– Limited insight into underlying relationships
– Requires modeler to carefully pre-process predictor variables,
experiment with different sets of predictors