Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations

Uploaded by

Shavil Ling

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

269 views

Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations

Uploaded by

Shavil Ling

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/30876208

Data Mining - Practical Machine Learning Tools and Techniques with JAVA
Implementations

Article in ACM SIGMOD Record · March 2002

Source: OAI

CITATIONS READS
2,987 7,262

2 authors, including:

Ian Witten
The University of Waikato
558 PUBLICATIONS 90,387 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

TOETOE Technology for Open English – Toying with Open E-resources [ˈtɔɪtɔɪ] View project

F-Lingo View project

All content following this page was uploaded by Ian Witten on 04 November 2014.

The user has requested enhancement of the downloaded file.

Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations
by Ian H. Witten and Eibe Frank

Morgan Kaufmann Publishers, 2000

416 pages, Paper, $49.95
ISBN 1-55860-552-5

Review by:
James Geller, New Jersey Institute of Technology
CS Department, 323 Dr. King Blvd., Newark, NJ 07102
geller@oak.njit.edu
http://web.njit.edu/~geller/

Summary of the book A walk through the contents

Witten and Frank's textbook was one of two The greatest strength of this Data Mining
books that I used for a data mining class in book lies outside of the book itself. All the
the Fall of 2001. The book covers all major algorithms described in this book are
methods of data mining that produce a implemented and freely available through
knowledge representation as output. the WEKA (Waikato Environment for
Knowledge representation is hereby Knowledge Analysis) Website
understood as a representation that can be (www.cs.waikato.ac.nz/ml/weka). Chapter 8
studied, understood, and interpreted by of the book is a tutorial to the implemented
human beings, at least in principle. Thus, algorithms. The integration between the
neural networks and genetic algorithms are book and the Web site is excellent, and the
excluded from the topics of this textbook. Web site is alive, thriving and growing.
We need to say “can be understood in Thus, the number of data mining algorithms
principle” because a large decision tree or a available on the Web site goes far beyond
large rule set may be as hard to interpret as a what is described in the book. Indeed, even
neural network. Neural Networks have been added to the
Web site since the book was first published.
The book first develops the basic machine While many books offer an associated Web
learning and data mining methods. These site by now, the close linkage between book
include decision trees, classification and and Web site and the rapid growth of the
association rules, support vector machines, Web site are highly commendable.
instance-based learning, Naive Bayes
classifiers, clustering, and numeric Another pleasant feature of the WEKA
prediction based on linear regression, implementation is that it is done in Java.
regression trees, and model trees. It then This makes it possible to construct systems,
goes deeper into evaluation and based on Java, that capitalize on the other
implementation issues. Next it moves on to strengths of Java, such as access to relational
deeper coverage of issues such as attribute databases through JDBC and easy access to
selection, discretization, data cleansing, and Web pages from within Java programs.
combinations of multiple models (bagging,
boosting, and stacking). The final chapter Target audience
deals with advanced topics such as visual
machine learning, text mining, and Web The book is written for academics and
mining. practitioners and I believe it can be well
understood, even by undergraduate students.
edition (which this book will undoubtedly
In fact, it is probably the most accessible have) to strengthen the formulas, without
survey of data mining in print, without necessarily adding new ones.
sacrificing too much of precision and rigor.
The book is written in a highly redundant At a few places, the book could also be
style, which I would like to describe as an improved by adding more explanations to
exercise in iterative deepening. Basic figures. Figure 3.6 is a prime example for
concepts are repeated in several chapters, this issue. I found myself spending time
but covered to a deeper level in the later verifying that instance counts in two
chapters. This should make it easy for subfigures truly add to the same total (of
students to keep reading it, without having 209). They do. The reader could be spared
to refer back to earlier chapters at every step this effort by a better caption or a better
of the way. On the other hand, for a person description in the body of the text.
that is already familiar with the basics of Similarly, the Apriori algorithm is
data mining, this makes boring reading at introduced in a figure, but only in the
some places. However, I do not recommend “Further Reading” subsection (following
a streamlining of the book. Instead, I much later) is the name of the algorithm
recommend that readers with some mentioned. A better figure caption would
knowledge of the topic may skip paragraphs help the scholarly advancement of students
that sound familiar without any guilty who might not take the “Further Reading”
feelings. section that seriously.

Reviewer's appreciation In America we say “Actions speak louder

than words”. Thus, instead of summarizing
The book goes to great lengths to avoid the book I will describe some actions that I
“formula shock”. Formulas are developed intend to take (or that I am already taking).
step-by-step and well explained. Only (1) I am using WEKA for my research.
absolutely necessary formulas are included. (2) If I teach the same course again, I will
In many cases, where the derivation of a use Witten and Frank's book again.
complex result is irrelevant to the actual data (3) If the book appears in a second edition, I
mining issues, the authors defer to statistics will acquire it.
textbooks. While I am greatly in favor of
both these approaches in writing textbooks, I
feel that they have gone too far at a few
places. At a number of places, the authors
avoid introducing ``one more letter'' to keep
the text readable. However, the price they
pay for that is that many of their formulas
have no equal signs. Thus, a sentence is
terminated with a colon and followed with a
formula, which is presumably equal to the
quantity described by the sentence. This is
done on many pages, e.g., 132--135, 137,
196, 207, 222, etc. Not in my wildest
dreams would I have thought that I could
ever criticize a book author for having too
few formulas and too few variables. But
this is exactly what I need to do here. While
I do not recommend eliminating the
previously mentioned redundancy of
description, I do recommend for the next

View publication stats

Interventional Urology 2nd Edition Ardeshir R. Rastinehad - The ebook with all chapters is available with just one click
100% (1)
Interventional Urology 2nd Edition Ardeshir R. Rastinehad - The ebook with all chapters is available with just one click
72 pages
Principles of Vibration (2nd Edition) - Tongue, Benson H.-Oxford University Press (2002)
83% (6)
Principles of Vibration (2nd Edition) - Tongue, Benson H.-Oxford University Press (2002)
388 pages
Utag Strike
No ratings yet
Utag Strike
2 pages
DL - Assignment 9 Solution
100% (3)
DL - Assignment 9 Solution
7 pages
Little Book of Ruby
100% (1)
Little Book of Ruby
85 pages
Little Book of Java
100% (1)
Little Book of Java
125 pages
History of Programming Languages: From Wikipedia, The Free Encyclopedia
No ratings yet
History of Programming Languages: From Wikipedia, The Free Encyclopedia
6 pages
Insight Segmentation and Registration Toolkit
No ratings yet
Insight Segmentation and Registration Toolkit
9 pages
Squeak - Smalltalk - Terse Guide To Squeak PDF
No ratings yet
Squeak - Smalltalk - Terse Guide To Squeak PDF
23 pages
Years Serving The Scientific and Engineering Community
No ratings yet
Years Serving The Scientific and Engineering Community
48 pages
Time Does Not Exist
No ratings yet
Time Does Not Exist
3 pages
[FREE PDF sample] Stenting the Urinary System Second Edition Daniel Yachia ebooks
100% (1)
[FREE PDF sample] Stenting the Urinary System Second Edition Daniel Yachia ebooks
82 pages
The Basic VTEC Mechanism
No ratings yet
The Basic VTEC Mechanism
11 pages
Designing A Rocket-Balloon Hybrid Launch System For Affordable Access To Suborbital Space
No ratings yet
Designing A Rocket-Balloon Hybrid Launch System For Affordable Access To Suborbital Space
51 pages
12 Generation Intel Core Processors
No ratings yet
12 Generation Intel Core Processors
140 pages
Radio Engineering - Frederick Terman (1937)
No ratings yet
Radio Engineering - Frederick Terman (1937)
827 pages
SEMINAR On Laminated Composites
No ratings yet
SEMINAR On Laminated Composites
15 pages
Small Talk
No ratings yet
Small Talk
105 pages
8.4 - Nozzle Theory PDF
No ratings yet
8.4 - Nozzle Theory PDF
170 pages
Aerospace & Defense Technology - April 2020
No ratings yet
Aerospace & Defense Technology - April 2020
52 pages
Complete Download Windows Internals Part 1 7th Edition Pavel Yosifovich PDF All Chapters
100% (7)
Complete Download Windows Internals Part 1 7th Edition Pavel Yosifovich PDF All Chapters
55 pages
Optimal Design of A Thermoelectric Cooling - Heating System For Car PDF
No ratings yet
Optimal Design of A Thermoelectric Cooling - Heating System For Car PDF
118 pages
OpticStudio UserManual En
No ratings yet
OpticStudio UserManual En
2,530 pages
Physical Systems Simulation using SIMSCAPE
No ratings yet
Physical Systems Simulation using SIMSCAPE
138 pages
Basic Aerodynamics: Lecture 12: Blade Element Analysis
No ratings yet
Basic Aerodynamics: Lecture 12: Blade Element Analysis
42 pages
Linear Circuits Decarlo Solution Manual
100% (1)
Linear Circuits Decarlo Solution Manual
4 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Silicon Photonics Packaging: Ino Offers
No ratings yet
Silicon Photonics Packaging: Ino Offers
2 pages
Solid Modelling Techniques
No ratings yet
Solid Modelling Techniques
13 pages
Physics EE Subject Guide
No ratings yet
Physics EE Subject Guide
9 pages
MSC Project1
No ratings yet
MSC Project1
5 pages
Python Code For AI
100% (3)
Python Code For AI
219 pages
File 135
No ratings yet
File 135
13 pages
Computational Fluid Dynamics: Tutorial: Meshing
No ratings yet
Computational Fluid Dynamics: Tutorial: Meshing
5 pages
GDB Quick Guide
No ratings yet
GDB Quick Guide
7 pages
Thermal Radiation Heat Transfer 5th Edition
0% (1)
Thermal Radiation Heat Transfer 5th Edition
14 pages
Cell Biology by The Numbers: Ron Milo and Rob Phillips
No ratings yet
Cell Biology by The Numbers: Ron Milo and Rob Phillips
368 pages
String Theory
No ratings yet
String Theory
91 pages
Modern Cmake
No ratings yet
Modern Cmake
83 pages
Arvel Gentry A Review of Modern Sail Theory
No ratings yet
Arvel Gentry A Review of Modern Sail Theory
17 pages
Phase Field Modeling
100% (1)
Phase Field Modeling
39 pages
Download Full Statistical Mechanics: Fourth Edition R.K. Pathria PDF All Chapters
100% (3)
Download Full Statistical Mechanics: Fourth Edition R.K. Pathria PDF All Chapters
41 pages
Coding Examples from Simple to Complex
No ratings yet
Coding Examples from Simple to Complex
240 pages
Design of Composite Structures Containing Bolt Holes and Open Holes PDF
No ratings yet
Design of Composite Structures Containing Bolt Holes and Open Holes PDF
33 pages
ORI Aviation Materials 2009
No ratings yet
ORI Aviation Materials 2009
6 pages
Lab 1 - Intro - OpenGL
No ratings yet
Lab 1 - Intro - OpenGL
8 pages
A Hybrid Approach To Implement The Digital Twin Concept Into A Damage Evolution Prediction For Composite Structures
No ratings yet
A Hybrid Approach To Implement The Digital Twin Concept Into A Damage Evolution Prediction For Composite Structures
141 pages
Research Paper
No ratings yet
Research Paper
9 pages
Vector Relations
No ratings yet
Vector Relations
3 pages
Evaluation and Optimization of Aerodynamic Performance of The TARF-LCV Architecture
No ratings yet
Evaluation and Optimization of Aerodynamic Performance of The TARF-LCV Architecture
44 pages
Advanced Numerical Methods with Matlab 2: Resolution of Nonlinear, Differential and Partial Differential Equations
From Everand
Advanced Numerical Methods with Matlab 2: Resolution of Nonlinear, Differential and Partial Differential Equations
Bouchaib Radi
No ratings yet
Data Mining - Practical Machine Learning Tools and
No ratings yet
Data Mining - Practical Machine Learning Tools and
3 pages
Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations
2 pages
Encyclopedia Of Electronic Components Volume 1 - Charles Platt -021-025
No ratings yet
Encyclopedia Of Electronic Components Volume 1 - Charles Platt -021-025
5 pages
Clinical Anesthesia, 8Th Edition: Reviews of Educational Material
No ratings yet
Clinical Anesthesia, 8Th Edition: Reviews of Educational Material
2 pages
A Quick Guide to Writing Your Dissertation (PDF Doc)
No ratings yet
A Quick Guide to Writing Your Dissertation (PDF Doc)
15 pages
(Ebook) Data Structures & Problem Solving Using Java by Mark Allen Weiss ISBN 9780321541406, 0321541405instant download
100% (6)
(Ebook) Data Structures & Problem Solving Using Java by Mark Allen Weiss ISBN 9780321541406, 0321541405instant download
58 pages
p40 Herman
No ratings yet
p40 Herman
3 pages
Paper How To Write
No ratings yet
Paper How To Write
2 pages
Principles of Vibration 2nd Edition Tongue All Chapters Instant Download
100% (3)
Principles of Vibration 2nd Edition Tongue All Chapters Instant Download
68 pages
Qrgid Principles of Vibration 2nd Edition
100% (2)
Qrgid Principles of Vibration 2nd Edition
388 pages
Lab 3
No ratings yet
Lab 3
3 pages
Machine Learning and Deep Neural Networks
No ratings yet
Machine Learning and Deep Neural Networks
8 pages
People Identification Via Tongue Print Using Fine-Tuning Deep Learning
No ratings yet
People Identification Via Tongue Print Using Fine-Tuning Deep Learning
9 pages
UGEC1281&JASP1120 3sep19
No ratings yet
UGEC1281&JASP1120 3sep19
4 pages
Project List
No ratings yet
Project List
3 pages
Fuhl Santini PupilNet Convolutional Neural Networks
No ratings yet
Fuhl Santini PupilNet Convolutional Neural Networks
10 pages
41 JDBC Java Activity 1
No ratings yet
41 JDBC Java Activity 1
4 pages
Pp-Ocrv3: More Attempts For The Improvement of Ultra Lightweight Ocr System
No ratings yet
Pp-Ocrv3: More Attempts For The Improvement of Ultra Lightweight Ocr System
9 pages
CS325 Artificial Intelligence - Spring 2013 Midterm Solution Guide
No ratings yet
CS325 Artificial Intelligence - Spring 2013 Midterm Solution Guide
11 pages
Key Elements of Mechatronics Class Notes PDF
No ratings yet
Key Elements of Mechatronics Class Notes PDF
5 pages
MCQ On Knowledge Management
No ratings yet
MCQ On Knowledge Management
2 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Oral Comm Midterm
No ratings yet
Oral Comm Midterm
3 pages
EE142 Project Report
No ratings yet
EE142 Project Report
4 pages
Unit No.7 Crash Recovery & Backup
No ratings yet
Unit No.7 Crash Recovery & Backup
17 pages
Its Int.: H. in N. An of
No ratings yet
Its Int.: H. in N. An of
5 pages
Document 5
No ratings yet
Document 5
24 pages
RL Examples
No ratings yet
RL Examples
6 pages
Student Guide - Module 2 Machine Learning
No ratings yet
Student Guide - Module 2 Machine Learning
50 pages
Xgboost
No ratings yet
Xgboost
4 pages
Using Artificial Bee Colony Algorithm Fo
No ratings yet
Using Artificial Bee Colony Algorithm Fo
8 pages
Robotics Explorer (150 Classes) - Skyfi Labs
No ratings yet
Robotics Explorer (150 Classes) - Skyfi Labs
6 pages
IE405 System Dynamics
No ratings yet
IE405 System Dynamics
2 pages
Chapter Six Transient and Steady State Responses
No ratings yet
Chapter Six Transient and Steady State Responses
35 pages
Introduction To Computational Linguistics and Natural Language Processing
100% (1)
Introduction To Computational Linguistics and Natural Language Processing
182 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
6 pages
Kapco PDF
No ratings yet
Kapco PDF
12 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
14 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics
16 pages

Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations

Uploaded by

Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in ACM SIGMOD Record · March 2002

F-Lingo View project

The user has requested enhancement of the downloaded file.

Morgan Kaufmann Publishers, 2000

Summary of the book A walk through the contents

Reviewer's appreciation In America we say “Actions speak louder

View publication stats

You might also like