Operations Research & Data Mining: Siggi Olafsson

Operations Research
&
Data Mining
Siggi Olafsson
Associate Professor
Department of Industrial Engineering
Iowa State University
20th European Conference on Operational Research

Rhodes, Greece, July 5 - 8
Should
Purpose of Talk I be
here?
Give a definition and an
overview of data mining as
it relates to operations
research
Present some examples to
give the flavor for the type
of work that is possible
My views and future of OR
and data mining
Aim for it to be accessible
without prior knowledge of
data mining
20th European Conference on Operational 2

Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

Background
Rapidly growing interest in data mining among
operations research academics and practitioners
For example evidenced by increased data mining
presence in professional organizations
New INFORMS Section on Data Mining
Large number of data mining sessions at INFORMS and
IIE research conferences
Special issues in Computers & Operations Research, IIE
Transactions, Discrete Applied Mathematics, etc.
Numerous presentations/sessions at this conference

What is Data Mining?

What is Data Mining, Really?
Extracting meaningful, previously unknown
patterns or knowledge from large databases
The knowledge discovery process
Define Prepare Mine Interpret

Objective Data Knowledge Results
Business/scientific Data cleaning Classification Predictive models

objective Data selection Association rule Structural insights
Data mining Attribute selection discovery
objective Visualization Clustering

Interdisciplinary Field
Statistics
Machine
Data Mining Databases
Learning
Optimization

Input Engineering
Preparing the data may take as much as 70% of
the entire effort
Numerous steps, including
Combining data sources
Transforming attributes
Data cleaning
Data selection
Attribute selection
Data visualization
Many of those have connections with operations
research and optimization in particular

Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Data Visualization
Visualizing the data is important in any
data mining project
Generally difficult because the data is
always high-dimensional, i.e., hundreds or
thousands of attributes (variables)
How can we best visualize such data in 2
or 3 dimensions?
Traditional techniques include
multidimensional scaling, which uses
nonlinear optimization

Optimization Formulation
Recent combinatorial optimization formulation by Abbiw-
Jackson, Golden, Raghavan, and Wasil (2004)
Map a set M of m points from Rr to Rq, q = 2,3
Approximate the q-dimensional space by a lattice N
min F d
iM jM kN lN
original
(i, j ), d new (k , l ) xik x jl
j 1
x
kN
ik 1, i M
s.t.
xik 0,1
d original(i, j ) Distance measure in R r

d new (k , l ) Distance measure in R q
F Function such as least square, Sammon map, etc
Solution Methods
Quadratic Assignment Problem (QAP)
Not possible to solve exactly for large scale problems
Local search procedure proposed
Key to the formulation is selection of objective function,

e.g., Sammon map
1 d original(i , j ) d new ( k , l ) xik x jl

2
min
d original(i, j ) iM
iM jM
jM kN lN
j i
d original(i, j )
j i

Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Attribute Selection
Usually large number of attributes
Some attributes are redundant or
irrelevant and should be removed
Benefits:
Faster subsequent induction
Simpler models (important in data mining)
Better (predictive) performance of models
Discover which attributes are important
(descriptive or structural knowledge)

Define decision variable
1, if attribute j is selected,
xj
0, otherwise.
Combinatorial optimization problem
max f x x1 , x2 ,..., xn
x
s.t. x j 0,1 j
Number of solutions is 2n-1

How should the objective function be defined?
Solution Methods
Non-linear objective function
(Defining a good objective is a major issue)
Mathematical programming approach (Bradley,
Mangasarian and Street, 1998)
Metaheuristics have been applied extensively
Genetic algorithms, simulated annealing
Nested partitions method (Olafsson and Yang, 2004)
Intelligent partitioning: take advantage of what is known in
data mining about evaluating attributes
Random instance sampling: in each step the algorithm
uses a sample of instances, which improves scalability

Learning from Data
Each data point (instance) represents an
example from which we can learn
The instances are either
Labeled (supervised learning)
One attribute is of special interest (called the class or
target) and each instance is labeled by its class value
Unlabeled (unsupervised learning)
Instances are assumed to be independent
(However, spatial and temporal data mining
are active areas of research)
Learning Tasks in Data Mining
Classification (supervised learning)
Learn how to classify data in one of a given
number of categories or classes
Clustering (unsupervised learning)
Learn natural groupings (clusters) of data
Association Rule Discovery
Learn correlations (associations) among the
data instances
Also called market basket analysis
Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Classification
Classification is the most common learning
task in data mining
Many methods have been proposed
Decision trees, neural networks, support
vector machines, Bayesian networks, etc.
The algorithm is trained on part of the
data and the accuracy tested on
independent data (or use cross-validation)
Optimization is relevant to many
classification methods

Suppose we have n attributes and each instance has been
labeled as belonging to one of two classes
Represent by two matrices A and B
Need to learn what separates the points in the two sets (if
they can be separated)
In a 1965 Operations Research article, Olvi Mangasarian
studied the case where the two sets can be separated with
a hyperplane:
Aw e , Bw e ,
wx 0

Separating Hyperplane
Closest points in
convex hulls Class A
x2
Class B
d
c
Separating hyperplane
x1
Finding the Closest Points
1
cd
2
Formulate as QP: min
c ,d 2
s.t. c i xi
i:Class A
d x
i:Class B
i i

i:Class A
i 1

i:Class B
i 1
i 0

Support Vector Machines
Support Vectors
Class A
x2
Class B
Separating
Hyperplane
x1
Limitations
The points (instances) may not be separable by a
hyperplane
Add error terms to minimize
A linear separation is quite limited
x2
Class A
Class B
x1
Solution is to map the data to a higher dimensional space

Wolfe Dual Problem
First formulate the Wolfe dual
1
w i i j yi y j x i x j
2
max
2 i, j
i
subject to 0 i C
y
i
i i 0.
Now the data only appears in the dot

product in the objective function

Kernel Functions
Use kernel functions to map the data and replace
the dot product with
K (x, y ) (x) (y ) : Rn H
For example,
K (x, y ) (x y 1) p
x y / 2 2
2
K (x, y ) e
K (x, y ) tanh( x y )

Other Classification Work
Extensive publications on SVM and mathematical
programming for classifications
Several other approaches also relevant, e.g.
Logical Analysis of Data (LAD) learns logical
expressions to classify the target attribute (series of
papers by Hammer, Boros, et al.)
Related approach is Logic Data Miner Lsquare (e.g.,
talk by Felici, Truemper, and Paola last Monday)
Bayesian networks are often used, and finding the
best structure of such networks is a combinatorial
optimization problem
Further discussed in the next talk

Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Data Clustering
Now we do not have labeled data to train
(unsupervised learning)
Want to identify natural clusters or
groupings of data instances
Many possible set of clusters
What makes a set of clusters good?

Given a set A of m points, find the centers Cj of k
clusters that minimize the 1-norm

m
min
C ,D
min
i 1
e
j
T
Dij
s.t. Dij AiT C j Dij , i 1,..., m; j 1,..., k
This formulation is due to Bradley, Mangasarian,

and Street (1997)
Much more work is needed in this area

Association Rule Discovery
Find strong associations among instances (e.g.,
high support and confidence)
Originally used in market basket analysis, e.g.,
what products are candidates for cross-sell, up-
sell, etc.
Define an item as an attribute-value pair
Algorithm approach (Agrawal et al., 1992, Apriori
and related methods):
Generate frequent item sets with high support
Generate rules from these sets with high confidence

Objectives for Association Rules
Want high support and high confidence
Maximizing support would lead to only discovering a few
trivial rules (those that occur very frequently)
Maximizing confidence leads to obvious rules (those that
are 100% accurate)
Support and confidence are usually treated as
constraints (user specified minimum)
Still need measures for good rules (i.e., rules that
add insights and are hence interesting)
Significant opportunities for optimizing the rules
that are obtained (not much work, yet)

Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Data Mining for OR Applications
Data mining can be used to complement
traditional OR methods in many areas
Example applications areas:
E-commerce
Supply chain management (e.g., to enable
customer-value management in the chain)

Data Mining for Scheduling
Production scheduling is often ad-hoc in practice
Experience and intuition of human schedulers
Li and Olafsson (2004) propose a method to learn

directly from production data
Benefits
Make scheduling practices explicit
Incorporate in automatic scheduling system
Insights into operations
Improve schedules

Background
Scheduling task
Given a finite set of jobs, sequence the jobs in
order of priority
Many simple dispatching rules available
Machine learning in scheduling
Considerable work over two decades
Expert systems
Inductive learning
Select dispatching rules from simulated data
Has not been applied directly to scheduling data
(which would be data mining)
Simple Example: Dispatching List
Job Release Start Processing Completion
ID Time Time Time Time
J5 0 0 17 17
J1 10 17 15 32
J3 18 32 20 52
J4 0 52 7 59
J2 30 59 5 64
How were these five jobs scheduled?

Longest processing time first (LPT)
Data Mining Formulation
Determine the target concept
Dispatching rules are a pair-wise comparison
Learning task: Given two jobs, which job
should be dispatched first?
Data preparation
Construct a flat file
Each line (instance/data object) is an example
of the target concept
Prepared Data File
Job Processing Release Job Processing Release Job1Scheduled
1 Time1 1 2 Time2 2 First
J1 15 10 J2 5 30 Yes
J1 15 10 J3 20 18 Yes
J1 15 10 J4 7 0 Yes
J1 15 10 J5 17 0 No
J2 5 30 J1 15 10 No
J2 5 30 J3 20 18 No
J2 5 30 J4 7 0 No

Input Engineering
Attribute creation (i.e., composite
attributes) and attribute selection is an
important part of data mining
Add attributes:
ProcessingTimeDifference
ReleaseDifference
Job1Longer
Job1ReleasedFirst
Select the best subset of attributes
Apply the C4.5 decision tree algorithm

Decision Tree
Job 1 Longer?
Yes No
Job 1 Released Job 1 Released

First? First?
Yes No Yes No
Processing Time Processing Time
Yes Difference Difference No
LPT for
released jobs
5 >5 -8 > -8
No Yes No Yes
Do not wait for Job 1
if not much longer than Job 2 Wait for Job 1 to be
released if it is much
longer than Job 2
Structural Knowledge
The dispatching rule is LPT
Mine data that use this rule and the processing
time and release time data
The induced model takes into account:
Possible range of processing times
Largest delay caused by a not released job
New structural patterns, not explicitly known by
the dispatcher, discovered
Next step is to improve schedules
Instance selection: learn from best practices
Optimize the decision tree

Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Optimizing Decision Trees
Decision tree induction is often unstable
Genetic algorithms have been used to
select the best tree from a set of trees
Kennedy et al. (1997) encode decision trees
and define crossover and mutation operators
The accuracy of the tree is the fitness function
A series of papers by Fu, Golden, et al. (2003;
2004a; 2004b) builds further on this approach
Other optimization methods could also
apply and other outputs can be optimized
Overview
Background
Data visualization
Attribute selection
Classification
mining algorithms
Open research areas

Conclusions
Although data mining related optimization work
dates back to the 1960s, most problems are
still open or need more research
Need to be aware of the key concerns of data
mining: extracting meaningful, previously
unknown patterns or knowledge from large
databases
Algorithms should handle massive data sets, that is, be
scalable with respect to both time and memory use
Results often focus on simple to interpret meaningful
patterns that provide structural insights
Previously unknown means few modeling assumptions
that restrict what can be discovered

Open Problems
Many data mining problems can be formulated as
optimization problems
Seen numerous examples, e.g., classification and
attribute selection (most work for these problems)
Many areas have not been addressed or need more work
(in particular, clustering and association rule mining)
Optimizing model outputs is very promising
Use of data mining in OR applications has been
very little investigated
Supply chain management
Logistics and transportation
Planning and scheduling
Questions?
For more information after today:
Email me at olafsson@iastate.edu
Visit my homepage at http://www.public.iastate.edu/~olafsson
Consult Dilbert

Select References
The following surveys on optimization and data mining are available:
1. Padmanabhan, B. and A. Tuzhilin (2003). On the Use of Optimization for Data Mining: Theoretical Interactions and
eCRM Opportunities, Management Science 49: 1327-1343.
2. Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). Mathematical Programming for Data Mining: Formulations
and Challenges, INFORMS Journal of Computing 11: 217-238.
Work mentioned in presentation:

3. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). A Divide-and-Conquer Local Search Heuristic for Data
Visualization, Working Paper, University of Maryland.
4. Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). Logical Analysis of Numerical Data, Mathematical Programming
79: 163-190.
5. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). Clustering via Concave Minimization, in M.C. Mozer, M.I.
Jordan, T. Petsche (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.
6. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). Feature Selection via Mathematical Programming,
INFORMS Journal of Computing 10: 209-217.
7. Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). A Genetic Algorithm-Based Approach for Building
Accurate Decision Trees, INFORMS Journal of Computing 15: 3-22.
8. Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). The Construction and Evaluation of Decision Trees: A
Comparison of Evolutionary and Concept Learning Methods, in D. Corne and J.L. Shapiro (eds.) Evolutionary
Computing, Lecture Notes in Computer Science, Springer-Verlag, 147-161.
9. Li, X. and S. Olafsson (2004). Discovering Dispatching Rules using Data Mining, Journal of Scheduling, to appear.
10. Mangasarian, O.L. (1965). Linear and Nonlinear Separation of Patterns by Linear Programming, Operations
Research 13: 455-461.
11. Olafsson, S. and J. Yang (2004). Intelligent Partitioning for Feature Selection, INFORMS Journal on Computing, to
appear.


Operations Research & Data Mining: Siggi Olafsson

Uploaded by

Copyright:

Available Formats

Operations Research & Data Mining: Siggi Olafsson

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Operations Research & Data Mining: Siggi Olafsson

Uploaded by

Copyright:

Available Formats

Operations Research

20th European Conference on Operational Research

20th European Conference on Operational 2

20th European Conference on Operational 3

20th European Conference on Operational 4

20th European Conference on Operational 5

The knowledge discovery process

Define Prepare Mine Interpret

Business/scientific Data cleaning Classification Predictive models

20th European Conference on Operational 6

20th European Conference on Operational 7

20th European Conference on Operational 8

20th European Conference on Operational 9

20th European Conference on Operational 10

d original(i, j ) Distance measure in R r

Key to the formulation is selection of objective function,

1 d original(i , j ) d new ( k , l ) xik x jl

20th European Conference on Operational 12

20th European Conference on Operational 13

20th European Conference on Operational 14

Number of solutions is 2n-1

20th European Conference on Operational 16

20th European Conference on Operational 19

20th European Conference on Operational 20

20th European Conference on Operational 21

20th European Conference on Operational 23

Solution is to map the data to a higher dimensional space

20th European Conference on Operational 25

Now the data only appears in the dot

20th European Conference on Operational 26

20th European Conference on Operational 27

20th European Conference on Operational 28

20th European Conference on Operational 29

What makes a set of clusters good?

20th European Conference on Operational 30

s.t. Dij AiT C j Dij , i 1,..., m; j 1,..., k

This formulation is due to Bradley, Mangasarian,

20th European Conference on Operational 31

20th European Conference on Operational 32

20th European Conference on Operational 33

20th European Conference on Operational 34

20th European Conference on Operational 35

Li and Olafsson (2004) propose a method to learn

20th European Conference on Operational 36

How were these five jobs scheduled?

20th European Conference on Operational 40

20th European Conference on Operational 41

Job 1 Released Job 1 Released

20th European Conference on Operational 43

20th European Conference on Operational 44

20th European Conference on Operational 46

20th European Conference on Operational 47

20th European Conference on Operational 49

Work mentioned in presentation:

20th European Conference on Operational 50

You might also like