Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Operations Research & Data Mining: Siggi Olafsson

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

Operations Research

&
Data Mining

Siggi Olafsson
Associate Professor
Department of Industrial Engineering
Iowa State University

20th European Conference on Operational Research


Rhodes, Greece, July 5 - 8
Should
Purpose of Talk I be
here?
Give a definition and an
overview of data mining as
it relates to operations
research
Present some examples to
give the flavor for the type
of work that is possible
My views and future of OR
and data mining
Aim for it to be accessible
without prior knowledge of
data mining

20th European Conference on Operational 2


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 3


Research, July 4-7, 2004
Background
Rapidly growing interest in data mining among
operations research academics and practitioners
For example evidenced by increased data mining
presence in professional organizations
New INFORMS Section on Data Mining
Large number of data mining sessions at INFORMS and
IIE research conferences
Special issues in Computers & Operations Research, IIE
Transactions, Discrete Applied Mathematics, etc.
Numerous presentations/sessions at this conference

20th European Conference on Operational 4


Research, July 4-7, 2004
What is Data Mining?

20th European Conference on Operational 5


Research, July 4-7, 2004
What is Data Mining, Really?
Extracting meaningful, previously unknown
patterns or knowledge from large databases

The knowledge discovery process

Define Prepare Mine Interpret


Objective Data Knowledge Results

Business/scientific Data cleaning Classification Predictive models


objective Data selection Association rule Structural insights
Data mining Attribute selection discovery
objective Visualization Clustering

20th European Conference on Operational 6


Research, July 4-7, 2004
Interdisciplinary Field
Statistics

Machine
Data Mining Databases
Learning

Optimization

20th European Conference on Operational 7


Research, July 4-7, 2004
Input Engineering
Preparing the data may take as much as 70% of
the entire effort
Numerous steps, including
Combining data sources
Transforming attributes
Data cleaning
Data selection
Attribute selection
Data visualization
Many of those have connections with operations
research and optimization in particular

20th European Conference on Operational 8


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 9


Research, July 4-7, 2004
Data Visualization
Visualizing the data is important in any
data mining project
Generally difficult because the data is
always high-dimensional, i.e., hundreds or
thousands of attributes (variables)
How can we best visualize such data in 2
or 3 dimensions?
Traditional techniques include
multidimensional scaling, which uses
nonlinear optimization

20th European Conference on Operational 10


Research, July 4-7, 2004
Optimization Formulation
Recent combinatorial optimization formulation by Abbiw-
Jackson, Golden, Raghavan, and Wasil (2004)
Map a set M of m points from Rr to Rq, q = 2,3
Approximate the q-dimensional space by a lattice N
min F d
iM jM kN lN
original
(i, j ), d new (k , l ) xik x jl
j 1

x
kN
ik 1, i M
s.t.
xik 0,1

d original(i, j ) Distance measure in R r


d new (k , l ) Distance measure in R q
F Function such as least square, Sammon map, etc
20th European Conference on Operational 11
Research, July 4-7, 2004
Solution Methods
Quadratic Assignment Problem (QAP)
Not possible to solve exactly for large scale problems
Local search procedure proposed

Key to the formulation is selection of objective function,


e.g., Sammon map

1 d original(i , j ) d new ( k , l ) xik x jl


2

min
d original(i, j ) iM
iM jM
jM kN lN
j i
d original(i, j )
j i

20th European Conference on Operational 12


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 13


Research, July 4-7, 2004
Attribute Selection
Usually large number of attributes
Some attributes are redundant or
irrelevant and should be removed
Benefits:
Faster subsequent induction
Simpler models (important in data mining)
Better (predictive) performance of models
Discover which attributes are important
(descriptive or structural knowledge)

20th European Conference on Operational 14


Research, July 4-7, 2004
Optimization Formulation
Define decision variable

1, if attribute j is selected,
xj
0, otherwise.
Combinatorial optimization problem
max f x x1 , x2 ,..., xn
x
s.t. x j 0,1 j

Number of solutions is 2n-1


How should the objective function be defined?
20th European Conference on Operational 15
Research, July 4-7, 2004
Solution Methods
Non-linear objective function
(Defining a good objective is a major issue)
Mathematical programming approach (Bradley,
Mangasarian and Street, 1998)
Metaheuristics have been applied extensively
Genetic algorithms, simulated annealing
Nested partitions method (Olafsson and Yang, 2004)
Intelligent partitioning: take advantage of what is known in
data mining about evaluating attributes
Random instance sampling: in each step the algorithm
uses a sample of instances, which improves scalability

20th European Conference on Operational 16


Research, July 4-7, 2004
Learning from Data
Each data point (instance) represents an
example from which we can learn
The instances are either
Labeled (supervised learning)
One attribute is of special interest (called the class or
target) and each instance is labeled by its class value
Unlabeled (unsupervised learning)
Instances are assumed to be independent
(However, spatial and temporal data mining
are active areas of research)
20th European Conference on Operational 17
Research, July 4-7, 2004
Learning Tasks in Data Mining
Classification (supervised learning)
Learn how to classify data in one of a given
number of categories or classes
Clustering (unsupervised learning)
Learn natural groupings (clusters) of data
Association Rule Discovery
Learn correlations (associations) among the
data instances
Also called market basket analysis
20th European Conference on Operational 18
Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 19


Research, July 4-7, 2004
Classification
Classification is the most common learning
task in data mining
Many methods have been proposed
Decision trees, neural networks, support
vector machines, Bayesian networks, etc.
The algorithm is trained on part of the
data and the accuracy tested on
independent data (or use cross-validation)
Optimization is relevant to many
classification methods

20th European Conference on Operational 20


Research, July 4-7, 2004
Optimization Formulation
Suppose we have n attributes and each instance has been
labeled as belonging to one of two classes
Represent by two matrices A and B
Need to learn what separates the points in the two sets (if
they can be separated)
In a 1965 Operations Research article, Olvi Mangasarian
studied the case where the two sets can be separated with
a hyperplane:

Aw e , Bw e ,
wx 0

20th European Conference on Operational 21


Research, July 4-7, 2004
Separating Hyperplane
Closest points in
convex hulls Class A
x2
Class B
d
c

Separating hyperplane

x1
20th European Conference on Operational 22
Research, July 4-7, 2004
Finding the Closest Points
1
cd
2
Formulate as QP: min
c ,d 2
s.t. c i xi
i:Class A

d x
i:Class B
i i


i:Class A
i 1


i:Class B
i 1

i 0

20th European Conference on Operational 23


Research, July 4-7, 2004
Support Vector Machines
Support Vectors
Class A
x2
Class B

Separating
Hyperplane

x1
20th European Conference on Operational 24
Research, July 4-7, 2004
Limitations
The points (instances) may not be separable by a
hyperplane
Add error terms to minimize
A linear separation is quite limited
x2

Class A

Class B

x1

Solution is to map the data to a higher dimensional space

20th European Conference on Operational 25


Research, July 4-7, 2004
Wolfe Dual Problem
First formulate the Wolfe dual

1
w i i j yi y j x i x j
2
max
2 i, j
i

subject to 0 i C
y
i
i i 0.

Now the data only appears in the dot


product in the objective function

20th European Conference on Operational 26


Research, July 4-7, 2004
Kernel Functions
Use kernel functions to map the data and replace
the dot product with
K (x, y ) (x) (y ) : Rn H
For example,
K (x, y ) (x y 1) p

x y / 2 2
2

K (x, y ) e
K (x, y ) tanh( x y )

20th European Conference on Operational 27


Research, July 4-7, 2004
Other Classification Work
Extensive publications on SVM and mathematical
programming for classifications
Several other approaches also relevant, e.g.
Logical Analysis of Data (LAD) learns logical
expressions to classify the target attribute (series of
papers by Hammer, Boros, et al.)
Related approach is Logic Data Miner Lsquare (e.g.,
talk by Felici, Truemper, and Paola last Monday)
Bayesian networks are often used, and finding the
best structure of such networks is a combinatorial
optimization problem
Further discussed in the next talk

20th European Conference on Operational 28


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 29


Research, July 4-7, 2004
Data Clustering
Now we do not have labeled data to train
(unsupervised learning)
Want to identify natural clusters or
groupings of data instances
Many possible set of clusters

What makes a set of clusters good?

20th European Conference on Operational 30


Research, July 4-7, 2004
Optimization Formulation
Given a set A of m points, find the centers Cj of k
clusters that minimize the 1-norm


m
min
C ,D
min
i 1
e
j
T
Dij

s.t. Dij AiT C j Dij , i 1,..., m; j 1,..., k

This formulation is due to Bradley, Mangasarian,


and Street (1997)
Much more work is needed in this area

20th European Conference on Operational 31


Research, July 4-7, 2004
Association Rule Discovery
Find strong associations among instances (e.g.,
high support and confidence)
Originally used in market basket analysis, e.g.,
what products are candidates for cross-sell, up-
sell, etc.
Define an item as an attribute-value pair
Algorithm approach (Agrawal et al., 1992, Apriori
and related methods):
Generate frequent item sets with high support
Generate rules from these sets with high confidence

20th European Conference on Operational 32


Research, July 4-7, 2004
Objectives for Association Rules
Want high support and high confidence
Maximizing support would lead to only discovering a few
trivial rules (those that occur very frequently)
Maximizing confidence leads to obvious rules (those that
are 100% accurate)
Support and confidence are usually treated as
constraints (user specified minimum)
Still need measures for good rules (i.e., rules that
add insights and are hence interesting)
Significant opportunities for optimizing the rules
that are obtained (not much work, yet)

20th European Conference on Operational 33


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 34


Research, July 4-7, 2004
Data Mining for OR Applications
Data mining can be used to complement
traditional OR methods in many areas
Example applications areas:
E-commerce
Supply chain management (e.g., to enable
customer-value management in the chain)
Production scheduling

20th European Conference on Operational 35


Research, July 4-7, 2004
Data Mining for Scheduling
Production scheduling is often ad-hoc in practice
Experience and intuition of human schedulers

Li and Olafsson (2004) propose a method to learn


directly from production data
Benefits
Make scheduling practices explicit
Incorporate in automatic scheduling system
Insights into operations
Improve schedules

20th European Conference on Operational 36


Research, July 4-7, 2004
Background
Scheduling task
Given a finite set of jobs, sequence the jobs in
order of priority
Many simple dispatching rules available
Machine learning in scheduling
Considerable work over two decades
Expert systems
Inductive learning
Select dispatching rules from simulated data
Has not been applied directly to scheduling data
(which would be data mining)
20th European Conference on Operational 37
Research, July 4-7, 2004
Simple Example: Dispatching List
Job Release Start Processing Completion
ID Time Time Time Time
J5 0 0 17 17
J1 10 17 15 32
J3 18 32 20 52
J4 0 52 7 59
J2 30 59 5 64

How were these five jobs scheduled?


Longest processing time first (LPT)
20th European Conference on Operational 38
Research, July 4-7, 2004
Data Mining Formulation
Determine the target concept
Dispatching rules are a pair-wise comparison
Learning task: Given two jobs, which job
should be dispatched first?

Data preparation
Construct a flat file
Each line (instance/data object) is an example
of the target concept
20th European Conference on Operational 39
Research, July 4-7, 2004
Prepared Data File
Job Processing Release Job Processing Release Job1Scheduled
1 Time1 1 2 Time2 2 First
J1 15 10 J2 5 30 Yes
J1 15 10 J3 20 18 Yes
J1 15 10 J4 7 0 Yes
J1 15 10 J5 17 0 No
J2 5 30 J1 15 10 No
J2 5 30 J3 20 18 No
J2 5 30 J4 7 0 No

20th European Conference on Operational 40


Research, July 4-7, 2004
Input Engineering
Attribute creation (i.e., composite
attributes) and attribute selection is an
important part of data mining
Add attributes:
ProcessingTimeDifference
ReleaseDifference
Job1Longer
Job1ReleasedFirst
Select the best subset of attributes
Apply the C4.5 decision tree algorithm

20th European Conference on Operational 41


Research, July 4-7, 2004
Decision Tree
Job 1 Longer?
Yes No

Job 1 Released Job 1 Released


First? First?
Yes No Yes No
Processing Time Processing Time
Yes Difference Difference No
LPT for
released jobs
5 >5 -8 > -8

No Yes No Yes
Do not wait for Job 1
if not much longer than Job 2 Wait for Job 1 to be
released if it is much
longer than Job 2
20th European Conference on Operational 42
Research, July 4-7, 2004
Structural Knowledge
The dispatching rule is LPT
Mine data that use this rule and the processing
time and release time data
The induced model takes into account:
Possible range of processing times
Largest delay caused by a not released job
New structural patterns, not explicitly known by
the dispatcher, discovered
Next step is to improve schedules
Instance selection: learn from best practices
Optimize the decision tree

20th European Conference on Operational 43


Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 44


Research, July 4-7, 2004
Optimizing Decision Trees
Decision tree induction is often unstable
Genetic algorithms have been used to
select the best tree from a set of trees
Kennedy et al. (1997) encode decision trees
and define crossover and mutation operators
The accuracy of the tree is the fitness function
A series of papers by Fu, Golden, et al. (2003;
2004a; 2004b) builds further on this approach
Other optimization methods could also
apply and other outputs can be optimized
20th European Conference on Operational 45
Research, July 4-7, 2004
Overview
Background
Intersection of OR and Data Mining
Optimization algorithms used for data mining
Data visualization
Attribute selection
Classification
Unsupervised learning
Data mining used in OR applications
Production scheduling
Optimization methods applied to output of standard data
mining algorithms
Selecting and improving decision trees
Open research areas

20th European Conference on Operational 46


Research, July 4-7, 2004
Conclusions
Although data mining related optimization work
dates back to the 1960s, most problems are
still open or need more research
Need to be aware of the key concerns of data
mining: extracting meaningful, previously
unknown patterns or knowledge from large
databases
Algorithms should handle massive data sets, that is, be
scalable with respect to both time and memory use
Results often focus on simple to interpret meaningful
patterns that provide structural insights
Previously unknown means few modeling assumptions
that restrict what can be discovered

20th European Conference on Operational 47


Research, July 4-7, 2004
Open Problems
Many data mining problems can be formulated as
optimization problems
Seen numerous examples, e.g., classification and
attribute selection (most work for these problems)
Many areas have not been addressed or need more work
(in particular, clustering and association rule mining)
Optimizing model outputs is very promising
Use of data mining in OR applications has been
very little investigated
Supply chain management
Logistics and transportation
Planning and scheduling
20th European Conference on Operational 48
Research, July 4-7, 2004
Questions?
For more information after today:
Email me at olafsson@iastate.edu
Visit my homepage at http://www.public.iastate.edu/~olafsson
Consult Dilbert

20th European Conference on Operational 49


Research, July 4-7, 2004
Select References
The following surveys on optimization and data mining are available:
1. Padmanabhan, B. and A. Tuzhilin (2003). On the Use of Optimization for Data Mining: Theoretical Interactions and
eCRM Opportunities, Management Science 49: 1327-1343.
2. Bradley, P.S., U.M. Fayyad, and O.L. Mangasarian (1999). Mathematical Programming for Data Mining: Formulations
and Challenges, INFORMS Journal of Computing 11: 217-238.

Work mentioned in presentation:


3. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil (2004). A Divide-and-Conquer Local Search Heuristic for Data
Visualization, Working Paper, University of Maryland.
4. Boros, E. P.L. Hammer, T. Ibaraki, A. Kogan (1997). Logical Analysis of Numerical Data, Mathematical Programming
79: 163-190.
5. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1997). Clustering via Concave Minimization, in M.C. Mozer, M.I.
Jordan, T. Petsche (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.
6. Bradley, P.S., O.L. Mangasarian, and W.N. Street (1998). Feature Selection via Mathematical Programming,
INFORMS Journal of Computing 10: 209-217.
7. Fu, Z., B. Golden, S. Lele, S. Raghavan, and E. Wasil (2003). A Genetic Algorithm-Based Approach for Building
Accurate Decision Trees, INFORMS Journal of Computing 15: 3-22.
8. Kennedy, H., C. Chinniah, P. Bradbeer, and L. Morss (1997). The Construction and Evaluation of Decision Trees: A
Comparison of Evolutionary and Concept Learning Methods, in D. Corne and J.L. Shapiro (eds.) Evolutionary
Computing, Lecture Notes in Computer Science, Springer-Verlag, 147-161.
9. Li, X. and S. Olafsson (2004). Discovering Dispatching Rules using Data Mining, Journal of Scheduling, to appear.
10. Mangasarian, O.L. (1965). Linear and Nonlinear Separation of Patterns by Linear Programming, Operations
Research 13: 455-461.
11. Olafsson, S. and J. Yang (2004). Intelligent Partitioning for Feature Selection, INFORMS Journal on Computing, to
appear.

20th European Conference on Operational 50


Research, July 4-7, 2004

You might also like