Operations Research & Data Mining: Siggi Olafsson
Operations Research & Data Mining: Siggi Olafsson
Operations Research & Data Mining: Siggi Olafsson
&
Data Mining
Siggi Olafsson
Associate Professor
Department of Industrial Engineering
Iowa State University
Machine
Data Mining Databases
Learning
Optimization
x
kN
ik 1, i M
s.t.
xik 0,1
min
d original(i, j ) iM
iM jM
jM kN lN
j i
d original(i, j )
j i
1, if attribute j is selected,
xj
0, otherwise.
Combinatorial optimization problem
max f x x1 , x2 ,..., xn
x
s.t. x j 0,1 j
Aw e , Bw e ,
wx 0
Separating hyperplane
x1
20th European Conference on Operational 22
Research, July 4-7, 2004
Finding the Closest Points
1
cd
2
Formulate as QP: min
c ,d 2
s.t. c i xi
i:Class A
d x
i:Class B
i i
i:Class A
i 1
i:Class B
i 1
i 0
Separating
Hyperplane
x1
20th European Conference on Operational 24
Research, July 4-7, 2004
Limitations
The points (instances) may not be separable by a
hyperplane
Add error terms to minimize
A linear separation is quite limited
x2
Class A
Class B
x1
1
w i i j yi y j x i x j
2
max
2 i, j
i
subject to 0 i C
y
i
i i 0.
x y / 2 2
2
K (x, y ) e
K (x, y ) tanh( x y )
m
min
C ,D
min
i 1
e
j
T
Dij
Data preparation
Construct a flat file
Each line (instance/data object) is an example
of the target concept
20th European Conference on Operational 39
Research, July 4-7, 2004
Prepared Data File
Job Processing Release Job Processing Release Job1Scheduled
1 Time1 1 2 Time2 2 First
J1 15 10 J2 5 30 Yes
J1 15 10 J3 20 18 Yes
J1 15 10 J4 7 0 Yes
J1 15 10 J5 17 0 No
J2 5 30 J1 15 10 No
J2 5 30 J3 20 18 No
J2 5 30 J4 7 0 No
No Yes No Yes
Do not wait for Job 1
if not much longer than Job 2 Wait for Job 1 to be
released if it is much
longer than Job 2
20th European Conference on Operational 42
Research, July 4-7, 2004
Structural Knowledge
The dispatching rule is LPT
Mine data that use this rule and the processing
time and release time data
The induced model takes into account:
Possible range of processing times
Largest delay caused by a not released job
New structural patterns, not explicitly known by
the dispatcher, discovered
Next step is to improve schedules
Instance selection: learn from best practices
Optimize the decision tree