Weka Tutorial

Weka is an open-source machine learning software suite containing algorithms for data pre-processing, classification, regression, clustering, association rules, and visualization. It provides a graphical user interface for easily applying these algorithms to datasets in ARFF format. Weka supports standard data mining tasks and contains implementations of many common machine learning algorithms such as decision trees, naive Bayes classifiers, and k-means clustering.

Uploaded by

Vairavasundaram Vairam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views

Weka Tutorial

Uploaded by

Vairavasundaram Vairam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

WEKA Analysis

Waikato

Environment

for

Knowledge

Weka is a popular suite of machine learning software written in Java, developed at theUniversity of Waikato, New Zealand. Weka is free software available under the GNU General Public License. The Weka (pronounced Way-Kuh) workbench[1] contains a collection of visualization tools and algorithms for data analysis andpredictive modeling, together with graphical user interfaces for easy access to this functionality. The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus datapreprocessing utilities in C, and a Makefile-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include: Free availability under the GNU General Public License Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform A comprehensive collection of data preprocessing and modeling techniques Ease of use due to its graphical user interfaces Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka.[4] Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modeling. Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.

The Explorer interface features several panels providing access to the main components of the workbench: The Preprocess panel has facilities for importing data from a database, a CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria. The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree). The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data. The Cluster panel gives access to the clustering techniques in Weka, e.g., the simple kmeans algorithm. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset. The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.

Open source ML algorithms

pre-processing classifiers clustering association rule

Installation
Download software from http://www.cs.waikato.ac.nz/ml/weka/ If you are interested in modifying/extending weka there is a developer version that includes the source code

Set the weka environment variable for java setenv WEKAHOME /usr/local/weka/weka-3-0-2 setenv CLASSPATH $WEKAHOME/weka.jar:$CLASSPATH

Routines are implemented as classes and logically arranged in packages Comes with an extensive GUI interface Weka routines can be used stand alone via the command line

Eg. java weka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff

Data format
Uses flat text files to describe the data Can work with a wide variety of data files including its own .arff format and C4.5 file formats Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

Data can also be read from a URL or from an SQL database (using JDBC)

ARFF file
Attribute Relationship File Format (ARFF) is the text format file used by Weka to store data in a database. This kind of file is structured as follows ("weather" relational database): @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} The ARFF file contains two sections: the header and the data section. The first line of the header defines the relation name. Then there is the list of the attributes (@attribute...). Each attribute is associated with a unique name and a type. The latter describes the kind of data contained in the variable and what values it can have. The variables types are: numeric, nominal, string and date. The class attribute is by default the last one of the list. In the header section there can also be some comment lines, identified with a '%' at the beginning, which can describe the database content or give the reader information about the author. After that there is the data itself (@data), each line stores the attribute of a single entry separated by a comma.

Sample ARRF file @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present Explorer: Preprocessing Pre-processing tools in WEKA are called filters WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc

Explorer: building classifiers Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets,

Meta-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning,

Explorer: Clustering Example showing simple K-means on the Iris dataset

Algorithm for decision tree learning

Algorithm decisionTree(D, A, T) 1 if D contains only training examples of the same class Cj C then 2 make T a leaf node labeled with class cj 3 elseif A = then 4 make T a leaf node labeled with cj, which is the most frequent class in D 5 else // D contains examples belonging to a mixture of classes. We select a single 6 // attribute to partition D into subsets so that each subset is purer 7 po =impurityEval-l (D); 8 for each attribute Ai {A1, A2, ., Ak} do 9 pi = impurityEval-2(Ai,D) 10 end 11 Select Ag {A1, A2, ,Ak} that gives the biggest impurity reduction, computed using P0 - Pi;

12 13 14 15 16 17 18 19 20 21 22 23 24

if P0 - Pg < threshold then //Ag does not significantly reduce impurity P0 make T a leaf node labeled with ci, the most frequent class in D, else // Ag is able to reduce impurity P0 Make T a decision node on Ag; Let the possible values of Ag be V1, V2, ... , Vm. Partition D into m disjoint subsets D1, D2 ,... , Dm based on the m values of Ag. for each Dj in { D1, D2,..., Dm } do if Dj then create a branch (edge) node Tj for vj as a child node of T; decisionTree(Dj, A-{Ag},Tj) // Ag is removed end end end end

Supplier Document Requirement Listing (SDRL)
No ratings yet
Supplier Document Requirement Listing (SDRL)
23 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
RNE Brochure (Compatibility Mode)
No ratings yet
RNE Brochure (Compatibility Mode)
6 pages
Structural Design of Pavement
No ratings yet
Structural Design of Pavement
38 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Learning To Use We Ka
No ratings yet
Learning To Use We Ka
5 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Weka Tutorial
100% (1)
Weka Tutorial
58 pages
WEKA LAB MANUAL (1)
No ratings yet
WEKA LAB MANUAL (1)
49 pages
Task 0: Weka Introduction
No ratings yet
Task 0: Weka Introduction
11 pages
DMDW Lab Record
No ratings yet
DMDW Lab Record
60 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
DataMining Record Using Weka Tool
No ratings yet
DataMining Record Using Weka Tool
55 pages
DM Manual III-II
No ratings yet
DM Manual III-II
18 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
WEKA A Machine Learning Workbench for Data Mining
No ratings yet
WEKA A Machine Learning Workbench for Data Mining
11 pages
Wekappt
No ratings yet
Wekappt
58 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
dwdm_file-final_ver3.pdf_20241230_172003_0000
No ratings yet
dwdm_file-final_ver3.pdf_20241230_172003_0000
54 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
DM - Weka
No ratings yet
DM - Weka
27 pages
AIML FINAL
No ratings yet
AIML FINAL
45 pages
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
No ratings yet
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
14 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
DWM1
No ratings yet
DWM1
19 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
WEKA Lab Manual
100% (1)
WEKA Lab Manual
107 pages
Sharanya Thandra
No ratings yet
Sharanya Thandra
41 pages
DWDM WEEK1&2
No ratings yet
DWDM WEEK1&2
13 pages
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Weather Land Related Data
No ratings yet
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Weather Land Related Data
21 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
DM Lab
No ratings yet
DM Lab
27 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
En Tanagra Scilab Data Mining PDF
No ratings yet
En Tanagra Scilab Data Mining PDF
12 pages
Soniya Hariramani JAVA PROGRAMMING
No ratings yet
Soniya Hariramani JAVA PROGRAMMING
9 pages
Java Journal
No ratings yet
Java Journal
31 pages
Datamining Lab Manual
No ratings yet
Datamining Lab Manual
62 pages
Weka Clustering
No ratings yet
Weka Clustering
15 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
Ab Initio Playbook 1
No ratings yet
Ab Initio Playbook 1
11 pages
DA_LabFile
No ratings yet
DA_LabFile
63 pages
Siri
No ratings yet
Siri
59 pages
Biblio Java PDF
No ratings yet
Biblio Java PDF
4 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Note
No ratings yet
Note
9 pages
Dwm practical ..
No ratings yet
Dwm practical ..
41 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
DWDM MANUAL-1
No ratings yet
DWDM MANUAL-1
96 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
01 Linux Basics
No ratings yet
01 Linux Basics
20 pages
Del Mar College Seminar 2010
No ratings yet
Del Mar College Seminar 2010
43 pages
Identification - Dimensions and Weights
No ratings yet
Identification - Dimensions and Weights
6 pages
Exploitation of New Business
No ratings yet
Exploitation of New Business
2 pages
Polaris Sportsman700-800-800 X2 EFI PDF
No ratings yet
Polaris Sportsman700-800-800 X2 EFI PDF
393 pages
Frankfurt Show Daily Day 1: October 16, 2019
No ratings yet
Frankfurt Show Daily Day 1: October 16, 2019
76 pages
Motivation:: WWW - Ee.iitb - Ac.in/course/ Shashankov/whyanalog PDF
No ratings yet
Motivation:: WWW - Ee.iitb - Ac.in/course/ Shashankov/whyanalog PDF
2 pages
Torch
No ratings yet
Torch
4 pages
PDF Print Complete Mathematics For Cambridge IGCSE® Fifth Edition Extended
No ratings yet
PDF Print Complete Mathematics For Cambridge IGCSE® Fifth Edition Extended
514 pages
Audible Beep Error Indicators
No ratings yet
Audible Beep Error Indicators
2 pages
Memory Segmentation of Intel 8086.pps
0% (1)
Memory Segmentation of Intel 8086.pps
17 pages
Updownload Softwarehandleiding Eng
No ratings yet
Updownload Softwarehandleiding Eng
48 pages
Webix Jet
No ratings yet
Webix Jet
32 pages
PPL NPL Fosl Egb Pwis Ele Lis 001 - Electrical Load List c02
No ratings yet
PPL NPL Fosl Egb Pwis Ele Lis 001 - Electrical Load List c02
2 pages
Interlocking Paving Concrete
100% (11)
Interlocking Paving Concrete
23 pages
M 281-96 (2004) Steel Fence Posts & Assemblies PDF
No ratings yet
M 281-96 (2004) Steel Fence Posts & Assemblies PDF
7 pages
Docx
No ratings yet
Docx
6 pages
EE MAC 2 Final Examination Question
No ratings yet
EE MAC 2 Final Examination Question
33 pages
For Example, A Queue of Customers at The Checkout Point in A Supermarket or Cars Backed Up at Traffic Lights
No ratings yet
For Example, A Queue of Customers at The Checkout Point in A Supermarket or Cars Backed Up at Traffic Lights
10 pages
Wk5 Classroom Slides-Energy Audit
No ratings yet
Wk5 Classroom Slides-Energy Audit
54 pages
clmt0365 0211
No ratings yet
clmt0365 0211
2 pages
Capture Board Connection (Eng)
No ratings yet
Capture Board Connection (Eng)
25 pages
Productivity+ Software For CNC Machine Tools: Powerful
No ratings yet
Productivity+ Software For CNC Machine Tools: Powerful
6 pages
APG43 DT Command
100% (2)
APG43 DT Command
120 pages
STP20NK50Z - STW20NK50Z STB20NK50Z - STB20NK50Z-S
No ratings yet
STP20NK50Z - STW20NK50Z STB20NK50Z - STB20NK50Z-S
13 pages
Energy and Voltage in Circuits 1 QP
No ratings yet
Energy and Voltage in Circuits 1 QP
15 pages
Comparative Safety Analysis of LNG Storage Tanks PDF
No ratings yet
Comparative Safety Analysis of LNG Storage Tanks PDF
247 pages