Top 10 Data Mining Mistakes
Top 10 Data Mining Mistakes
Top 10 Data Mining Mistakes
5. Accept Leaks from the Future 10. Believe the Best Model
Model 1
Delaunay Triangles
• Ex: Last quintile of customers are 4 times more expensive to obtain than
first quintile (10% vs. 40% to gain 20%)
• Decision Tree provides relatively few decision points.
© 2005 Elder Research, Inc.
19
Bundling 5 Trees
improves accuracy and smoothness
MPN
M 70
i
s
65
s
e
d 60
55
1 2 3 4 5
No. Models in combination
PATH to success:
• Persistence - Attack repeatedly, from different angles.
Automate essential steps. Externally check work.
John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems
Engineering from the University of Virginia, where he’s an adjunct professor, teaching Optimization.
Prior to a decade leading ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an
investment management firm, and 2 in Rice's Computational & Applied Mathematics department.
Dr. Elder has authored innovative data mining tools, is active on Statistics, Engineering, and Finance
conferences and boards, is a frequent keynote conference speaker, and was a Program co-chair of the
2004 Knowledge Discovery and Data Mining conference. John’s courses on data analysis techniques --
taught at dozens of universities, companies, and government labs -- are noted for their clarity and
effectiveness. Dr. Elder holds a top secret clearance, and since the Fall of 2001, has been honored to
serve on a panel appointed by Congress to guide technology for the National Security Agency.