Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
net/publication/30876208
Data Mining - Practical Machine Learning Tools and Techniques with JAVA
Implementations
CITATIONS READS
2,987 7,262
2 authors, including:
Ian Witten
The University of Waikato
558 PUBLICATIONS 90,387 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
TOETOE Technology for Open English – Toying with Open E-resources [ˈtɔɪtɔɪ] View project
All content following this page was uploaded by Ian Witten on 04 November 2014.
Review by:
James Geller, New Jersey Institute of Technology
CS Department, 323 Dr. King Blvd., Newark, NJ 07102
geller@oak.njit.edu
http://web.njit.edu/~geller/
Witten and Frank's textbook was one of two The greatest strength of this Data Mining
books that I used for a data mining class in book lies outside of the book itself. All the
the Fall of 2001. The book covers all major algorithms described in this book are
methods of data mining that produce a implemented and freely available through
knowledge representation as output. the WEKA (Waikato Environment for
Knowledge representation is hereby Knowledge Analysis) Website
understood as a representation that can be (www.cs.waikato.ac.nz/ml/weka). Chapter 8
studied, understood, and interpreted by of the book is a tutorial to the implemented
human beings, at least in principle. Thus, algorithms. The integration between the
neural networks and genetic algorithms are book and the Web site is excellent, and the
excluded from the topics of this textbook. Web site is alive, thriving and growing.
We need to say “can be understood in Thus, the number of data mining algorithms
principle” because a large decision tree or a available on the Web site goes far beyond
large rule set may be as hard to interpret as a what is described in the book. Indeed, even
neural network. Neural Networks have been added to the
Web site since the book was first published.
The book first develops the basic machine While many books offer an associated Web
learning and data mining methods. These site by now, the close linkage between book
include decision trees, classification and and Web site and the rapid growth of the
association rules, support vector machines, Web site are highly commendable.
instance-based learning, Naive Bayes
classifiers, clustering, and numeric Another pleasant feature of the WEKA
prediction based on linear regression, implementation is that it is done in Java.
regression trees, and model trees. It then This makes it possible to construct systems,
goes deeper into evaluation and based on Java, that capitalize on the other
implementation issues. Next it moves on to strengths of Java, such as access to relational
deeper coverage of issues such as attribute databases through JDBC and easy access to
selection, discretization, data cleansing, and Web pages from within Java programs.
combinations of multiple models (bagging,
boosting, and stacking). The final chapter Target audience
deals with advanced topics such as visual
machine learning, text mining, and Web The book is written for academics and
mining. practitioners and I believe it can be well
understood, even by undergraduate students.
edition (which this book will undoubtedly
In fact, it is probably the most accessible have) to strengthen the formulas, without
survey of data mining in print, without necessarily adding new ones.
sacrificing too much of precision and rigor.
The book is written in a highly redundant At a few places, the book could also be
style, which I would like to describe as an improved by adding more explanations to
exercise in iterative deepening. Basic figures. Figure 3.6 is a prime example for
concepts are repeated in several chapters, this issue. I found myself spending time
but covered to a deeper level in the later verifying that instance counts in two
chapters. This should make it easy for subfigures truly add to the same total (of
students to keep reading it, without having 209). They do. The reader could be spared
to refer back to earlier chapters at every step this effort by a better caption or a better
of the way. On the other hand, for a person description in the body of the text.
that is already familiar with the basics of Similarly, the Apriori algorithm is
data mining, this makes boring reading at introduced in a figure, but only in the
some places. However, I do not recommend “Further Reading” subsection (following
a streamlining of the book. Instead, I much later) is the name of the algorithm
recommend that readers with some mentioned. A better figure caption would
knowledge of the topic may skip paragraphs help the scholarly advancement of students
that sound familiar without any guilty who might not take the “Further Reading”
feelings. section that seriously.