Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Skip header Section
Principles of Data MiningFebruary 2013
Publisher:
  • Springer Publishing Company, Incorporated
ISBN:978-1-4471-4883-8
Published:21 February 2013
Pages:
454
Skip Bibliometrics Section
Reflects downloads up to 25 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

Data Mining, the automatic extraction of implicit and potentially useful information from data, is increasingly used in commercial, scientific and other application areas. Principles of Data Mining explains and explores the principal techniques of Data Mining: for classification, association rule mining and clustering. Each topic is clearly explained and illustrated by detailed worked examples, with a focus on algorithms rather than mathematical formalism. It is written for readers without a strong background in mathematics or statistics, and any formulae used are explained in detail. This second edition has been expanded to include additional chapters on using frequent pattern trees for Association Rule Mining, comparing classifiers, ensemble classification and dealing with very large volumes of data. Principles of Data Mining aims to help general readers develop the necessary understanding of what is inside the 'black box' so they can use commercial data mining packages discriminatingly, as well as enabling advanced readers or academic researchers to understand or contribute to future technical advances in the field. Suitable as a textbook to support courses at undergraduate or postgraduate levels in a wide range of subjects including Computer Science, Business Studies, Marketing, Artificial Intelligence, Bioinformatics and Forensic Science.

Cited By

  1. ACM
    Kaweesinsakul K, Nuchitprasitchai S and Pearce J Open source disease analysis system of cactus by artificial intelligence and image processing Proceedings of the 12th International Conference on Advances in Information Technology, (1-7)
  2. ACM
    Hedar A, Ibrahim A, Abdel-Hakim A and Sewisy A Modulated clustering using integrated rough sets and scatter search attribute reduction Proceedings of the Genetic and Evolutionary Computation Conference Companion, (1394-1401)
  3. ACM
    Mirsky Y, Shabtai A, Rokach L, Shapira B and Elovici Y SherLock vs Moriarty Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, (1-12)
  4. Faed A, Chang E, Saberi M, Hussain O and Azadeh A (2016). Intelligent customer complaint handling utilising principal component and data envelopment analysis (PDA), Applied Soft Computing, 47:C, (614-630), Online publication date: 1-Oct-2016.
  5. Erdem M, Boran F and Akay D (2016). Classification of Risks of Occupational Low Back Disorders with Support Vector Machines, Human Factors in Ergonomics & Manufacturing, 26:5, (550-558), Online publication date: 1-Sep-2016.
  6. Ruiz-Muñoz J, Orozco-Alzate M and Castellanos-Dominguez G Multiple instance learning-based birdsong classification using unsupervised recording segmentation Proceedings of the 24th International Conference on Artificial Intelligence, (2632-2638)
  7. ACM
    Freitas A (2014). Comprehensible classification models, ACM SIGKDD Explorations Newsletter, 15:1, (1-10), Online publication date: 17-Mar-2014.
  8. Heinemann L Facilitating reuse in model-based development with context-dependent model element recommendations Proceedings of the Third International Workshop on Recommendation Systems for Software Engineering, (16-20)
  9. ACM
    Qamar S and Adil S Comparative analysis of data mining techniques for financial data using parallel processing Proceedings of the 7th International Conference on Frontiers of Information Technology, (1-6)
  10. Soria E, Martín J, Caravaca J, Serrano A, Martínez M, Magdalena R, Gómez J, Heras M and Sanz G Survival prediction in patients undergoing ischemic cardiopathy Proceedings of the 2009 international joint conference on Neural Networks, (1817-1820)
Contributors
  • University of Portsmouth

Reviews

Alexis Leon

Data mining is one of the most popular and effective tools for knowledge discovery. It involves the analysis and summary of data from different perspectives and the automatic extraction of useful information. Data mining reveals trends, patterns, and other information hidden within huge volumes of data. Today, it is used in commercial, medical, scientific, geographical, meteorological, and other areas that generate large volumes of information that require automatic processing methods to be of real use. This book introduces the concept of data mining and explains the various techniques involved. The author starts with an introduction to data mining and its importance, and succinctly explains the fundamental concepts of data mining and principal techniques for classification, association rule mining, and clustering. Classification is a data mining technique that assigns items in a collection to target categories or classes. The book introduces the various classification techniques (naive Bayes, nearest neighbor, decision trees), and explains the top-down induction of decision trees (TDIDT) algorithm and the various criteria for attribute selection (entropy, Gini index of diversity, chi-square statistic, gain ratio). This is followed by discussions about related topics, including classifier predictive accuracy estimation, classifier performance measurement, classifier comparison, conversion of continuous attributes to categorical ones (discretization), overfitting reduction of decision trees, modular rules for classification, dealing with large volumes of data, and ensemble classification (use of a set of classifiers instead of a single one to classify unseen data). Association rules are if/then statements that help uncover relationships between data that seems to be unrelated in an information repository. The book covers the basic concepts of association rule mining, along with the various algorithms and criteria for selecting the best algorithms. There is also a comprehensive discussion of association rule mining algorithms, such as Apriori, market basket analysis, and frequency pattern growth. The author presents a detailed exploration of the two most popular data clustering methods, k -means clustering and hierarchical clustering, followed by a discussion of text mining, a type of classification where the objects are text documents. Other chapters examine the bag-of-words representation for document classification, automatic classification of web pages (hypertext categorization), and the difference between hypertext and standard text classification. Each topic discussion begins with the basics, and the book assumes that the reader has no prior knowledge of data mining. All explanations are clear and supported with detailed illustrations, examples, and solved problems. The focus on algorithms helps those who do not have a strong mathematical background to better understand the concepts, and the learning process is enhanced with self-assessment exercises and a list of references at the end of each chapter. The book has five appendices that add value. The first explains the mathematical notation and techniques used in the book and would especially help those with limited mathematical exposure. The second gives basic information about the different datasets used in the book. The third lists sources for further reading. The fourth is a comprehensive glossary of data mining terms and mathematical notation, and the last provides solutions to the self-assessment exercises. This book is written primarily as a text for a course on data mining. The rich pedagogical features, including illustrations, examples, solved problems, exercises and solutions, a glossary, and references, make it an ideal choice for that purpose. It would be very useful for any reader who wants to gain a good understanding of data mining concepts and techniques. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Recommendations