0% found this document useful (0 votes)

239 views

Step1. Open The Data/bank Data - CSV Dataset

1. The document describes steps for pre-processing a bank marketing dataset in Weka. It involves removing the ID attribute, discretizing numeric attributes like age and income, and handling missing values. 2. For discretizing, the age and income attributes are transformed into categorical bins. The children attribute is also made categorical by removing its numeric type designation. 3. To handle missing values, the most frequent category for nominal attributes and attribute mean for numeric ones are used to replace missing values. The process is demonstrated on the bank dataset with artificially introduced missing values.

Uploaded by

NitSal Nand

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

239 views

Step1. Open The Data/bank Data - CSV Dataset

Uploaded by

NitSal Nand

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Data pre-processing Hands on Datamining & Machine Learning with Weka

Step1. Open the data/bank‐data.csv Dataset

Click the “Open file…” button to open a data set and double click on the “data” directory.
Select the “bank‐data.csv” file to load the bank dataset.

id a unique identification number

age age of customer in years (numeric)

sex MALE / FEMALE

region inner_city/rural/suburban/town

income income of customer (numeric)

married is the customer married (YES/NO)

children number of children (numeric)

car does the customer own a car (YES/NO)

save_acct does the customer have a saving account (YES/NO)

current_acct does the customer have a current account (YES/NO)

mortgage does the customer have a mortgage (YES/NO)

pep did the customer buy a PEP (Personal Equity Plan) after the last mailing (YES/NO)

Step2. Selecting or Filtering Attributes

In our sample data file, each record is uniquely identified by a customer id (the "id" attribute). We
need to remove this attribute before the data mining step.
• We can do this by simply select the attribute and click on “Remove button”

• Using the Attribute filters in WEKA. In the "Filter" panel, click on the "Choose" button.
This will show a popup window with a list available filters. Scroll down the list and select
the "weka.filters.unsupervised.attribute.Remove" and click apply.
• Save the file with a name "bank‐data‐R1.arff"

Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.
Data pre-processing Hands on Datamining & Machine Learning with Weka

Step3: Discretization
Some techniques, such as association rule mining, can only be performed on categorical data. This
requires performing discretization on numeric or continuous attributes. There are 3 such attributes
in this data set: "age", "income", and "children". In the case of the "children" attribute the range of
possible values are only 0, 1, 2, and 3. In this case, we have opted for keeping all of these values in
the data. This means we can simply discretize by removing the keyword "numeric" as the type for
the "children" attribute in the ARFF file, and replacing it with the set of discrete values. We do this
directly in our text editor. In this case, we have saved the resulting relation in a separate file "bank‐
data2.arff".
Load the bank‐data2.arff dataset into weka.
If we select the "children" attribute in this new data set, we see that it is now a categorical attribute
with four possible discrete values.
once again we choose the Filter dialog box, but this time, we will select Discretize from the list.
Next, to change the defaults for this filter, click on the box immediately to the right of the "Choose"
button. This will open the Discretize Filter dialog box. We enter the index for the attributes to be
discretized. In this case we enter 1 corresponding to attribute "age". We also enter 3 as the number
of bins (note that it is possible to discretize more than one attribute at the same time by using a list
of attribute indices).
Save the file as bank‐data3.arff and check the dataset in text editor.
Next, we apply the same process to discretize the "income" attribute into 3 bins. Again, Weka
automatically performs the binning and replaces the values in the "income" column with the
appropriate automatically generated labels. We save the new file into "bank‐data3.arff", replacing
the older version. Editing by text editor made bank‐data‐final.arff.

Step4: Missing Values

1. Open file “bank‐data.arff”
2. Check if there is any missing values in any attribute.
3. Edit data to make some missing values.
4. Delete some data in “region”(Nominal) and “children”(Numeric) attributes. Click on “OK”
button when finish.
5. Make note of Label that has Max Count in “region” and Mean of “children” attributes.
6. Choose “ReplaceMissingValues” filter
(weka.filters.unsupervised.attribute.ReplaceMissingValues). Then, click on Apply button.

Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.
Data pre-processing Hands on Datamining & Machine Learning with Weka

7. Look into the data. How did those missing values get replaced ?
8. Edit “bank‐data.arff” with text editor. Make some data missing by replacing them with ‘?’. (Try
with nominal and numeric attributes). Save to “bank‐data‐missing.arff”.
9. Load “bank‐data‐missing.arff” into WEKA, observe the data and attribute information.
10. Replace missing values by the same procedure you had done before.

Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.

Iranian-EM Spectrum
No ratings yet
Iranian-EM Spectrum
1 page
LAB Manual: Course: CSC271: Database Systems
No ratings yet
LAB Manual: Course: CSC271: Database Systems
55 pages
Star Wars Age of Rebellion RPG Core Rulebook - Cff7tro PDF
No ratings yet
Star Wars Age of Rebellion RPG Core Rulebook - Cff7tro PDF
2 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Amdahl
No ratings yet
Amdahl
2 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Lab2 DataPreprocessing A1.2
No ratings yet
Lab2 DataPreprocessing A1.2
15 pages
Ch5 5 Data Preprocessing
No ratings yet
Ch5 5 Data Preprocessing
39 pages
MID EXAM ODD SEMESTER ACADEMIC YEAR 2021:2022-Database System Practicum
No ratings yet
MID EXAM ODD SEMESTER ACADEMIC YEAR 2021:2022-Database System Practicum
15 pages
Parallel Processing Assignment 1
No ratings yet
Parallel Processing Assignment 1
14 pages
Microprocessor (8086) Lab
No ratings yet
Microprocessor (8086) Lab
41 pages
An Introduction To Parallel Programming - Lecture Notes, Study Material and Important Questions, Answers
No ratings yet
An Introduction To Parallel Programming - Lecture Notes, Study Material and Important Questions, Answers
4 pages
Parallel Computers Networking PDF
No ratings yet
Parallel Computers Networking PDF
48 pages
Bab 3 Data Preprocessing: Arif Djunaidy
No ratings yet
Bab 3 Data Preprocessing: Arif Djunaidy
54 pages
Multi-Core Programming Digital Edition (06!29!06)
No ratings yet
Multi-Core Programming Digital Edition (06!29!06)
362 pages
Measurement of GHT Glucose, Heart Rate, Temperature Using Non Invasive Method
No ratings yet
Measurement of GHT Glucose, Heart Rate, Temperature Using Non Invasive Method
3 pages
Department of Education: Republic of The Philippines Region Iii
100% (1)
Department of Education: Republic of The Philippines Region Iii
2 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
Belizean Studies Lesson 3
No ratings yet
Belizean Studies Lesson 3
7 pages
Parallel Computer Models - Deepti Malhotra
No ratings yet
Parallel Computer Models - Deepti Malhotra
195 pages
Computational Mathematics With SageMath PDF
No ratings yet
Computational Mathematics With SageMath PDF
41 pages
Computer Studied Past Papers 2210
No ratings yet
Computer Studied Past Papers 2210
88 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Exponential and Logarithmic Equations
No ratings yet
Exponential and Logarithmic Equations
10 pages
Solved Assignment - Parallel Processing
63% (8)
Solved Assignment - Parallel Processing
29 pages
Ma5160 Applied Probability and Statistics 1 PDF
50% (2)
Ma5160 Applied Probability and Statistics 1 PDF
4 pages
Data Cleaning Data Transformation Data Reduction Discretization and Generating Concept Hierarchies
No ratings yet
Data Cleaning Data Transformation Data Reduction Discretization and Generating Concept Hierarchies
25 pages
Statistics Basics From IITM Statistits 2 Course Week - 0
100% (1)
Statistics Basics From IITM Statistits 2 Course Week - 0
71 pages
Maarka e Iman o Maddiyat PDF
No ratings yet
Maarka e Iman o Maddiyat PDF
159 pages
DBMS Lab Manual Lab 1 To 7
No ratings yet
DBMS Lab Manual Lab 1 To 7
33 pages
Lecture Notes
100% (1)
Lecture Notes
82 pages
Parallel and Distributed Computing Lecture 02
No ratings yet
Parallel and Distributed Computing Lecture 02
17 pages
Introduction To Parallel Computing
0% (1)
Introduction To Parallel Computing
34 pages
IP 12 BoardPracPracticeQuestions PDF
No ratings yet
IP 12 BoardPracPracticeQuestions PDF
2 pages
352ccs - Lab Manual
No ratings yet
352ccs - Lab Manual
48 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
30 pages
PROBABILITY Lecture 3 PDF
No ratings yet
PROBABILITY Lecture 3 PDF
48 pages
Daa Notes Full
No ratings yet
Daa Notes Full
55 pages
Discrete Structures
No ratings yet
Discrete Structures
404 pages
Chapter 01 - Introduction Distributed Syetem
No ratings yet
Chapter 01 - Introduction Distributed Syetem
45 pages
PIC - Module 1 - Part 1
No ratings yet
PIC - Module 1 - Part 1
12 pages
تحليل عددي الجزء الثاني PDF
No ratings yet
تحليل عددي الجزء الثاني PDF
57 pages
Real Time Systems - 7th Sem - ECE - VTU - Unit 2 - Concepts of Computer Control - Ramisuniverse
No ratings yet
Real Time Systems - 7th Sem - ECE - VTU - Unit 2 - Concepts of Computer Control - Ramisuniverse
36 pages
ERD To Relational Schema Mapping
No ratings yet
ERD To Relational Schema Mapping
5 pages
Algorithm and FlowChart (Week 01 & 02)
No ratings yet
Algorithm and FlowChart (Week 01 & 02)
22 pages
Linear Algebra
0% (1)
Linear Algebra
205 pages
Business Intelligence Carlo Vercellis
No ratings yet
Business Intelligence Carlo Vercellis
5 pages
Microsoft Access 2003 Introduction
No ratings yet
Microsoft Access 2003 Introduction
46 pages
Data Flow Diagram PDF
100% (1)
Data Flow Diagram PDF
5 pages
Discrete Structures
No ratings yet
Discrete Structures
350 pages
DAA Sem ANS
100% (1)
DAA Sem ANS
70 pages
Cs2258 Database Management Systems Lab Manual: Prepared by
No ratings yet
Cs2258 Database Management Systems Lab Manual: Prepared by
65 pages
BCA
No ratings yet
BCA
28 pages
Network Intrusion Detection System Using Machine Learning With Data Preprocessing and Feature Extraction
No ratings yet
Network Intrusion Detection System Using Machine Learning With Data Preprocessing and Feature Extraction
8 pages
Index Law
No ratings yet
Index Law
18 pages
Excel Lab Manual f[1]
100% (1)
Excel Lab Manual f[1]
51 pages
Sheet4ProbPower PDF
No ratings yet
Sheet4ProbPower PDF
2 pages
Data Analysis & Probability - Task & Drill Sheets Gr. 3-5
From Everand
Data Analysis & Probability - Task & Drill Sheets Gr. 3-5
Tanya Cook and Chris Forest
No ratings yet
hw2 Datapreproc
No ratings yet
hw2 Datapreproc
15 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
DWDM Lab Manual Using Weka-For MIC
No ratings yet
DWDM Lab Manual Using Weka-For MIC
42 pages
Journal Data Mining
No ratings yet
Journal Data Mining
31 pages
Catalog Moxa Product Selection Guide 2016
No ratings yet
Catalog Moxa Product Selection Guide 2016
105 pages
Unit-5 Database Security
No ratings yet
Unit-5 Database Security
14 pages
WAP Via CSD Settings For LG 500, 601: For Postpaid Enter Gprsmtnlmum For Prepaid Enter Gprsppsmum
No ratings yet
WAP Via CSD Settings For LG 500, 601: For Postpaid Enter Gprsmtnlmum For Prepaid Enter Gprsppsmum
15 pages
SBAM1991 DV5950 LINUX M24 V4.2.0and4.2.0.p3
100% (1)
SBAM1991 DV5950 LINUX M24 V4.2.0and4.2.0.p3
21 pages
Batch Management Configuration
89% (9)
Batch Management Configuration
71 pages
Udenze, S. and Oshionebo, B. (2020) - "Investigating Whatsapp' For Collaborative Learning Among
No ratings yet
Udenze, S. and Oshionebo, B. (2020) - "Investigating Whatsapp' For Collaborative Learning Among
28 pages
CSV Full Document PDF
100% (2)
CSV Full Document PDF
39 pages
Practical File CS 2023-24 - KV RAJAHMUNDRY
No ratings yet
Practical File CS 2023-24 - KV RAJAHMUNDRY
55 pages
Report Content - Digital Water Meter
No ratings yet
Report Content - Digital Water Meter
44 pages
Z Security Services
No ratings yet
Z Security Services
5 pages
What Is A Detailed Lesson Plan?
No ratings yet
What Is A Detailed Lesson Plan?
2 pages
Warehouse Management PDF
100% (1)
Warehouse Management PDF
322 pages
Smp-Man-0007 C10 SH-Z01 en
No ratings yet
Smp-Man-0007 C10 SH-Z01 en
81 pages
18CS81 IOT Notes Module1
No ratings yet
18CS81 IOT Notes Module1
13 pages
PassLeader 300-115 Exam Dumps (1-50)
No ratings yet
PassLeader 300-115 Exam Dumps (1-50)
13 pages
Class Xii CS Practical List Session 2020
No ratings yet
Class Xii CS Practical List Session 2020
3 pages
Cloudwatch User Guide PDF
No ratings yet
Cloudwatch User Guide PDF
435 pages
Ryann Z Elumba Resume
No ratings yet
Ryann Z Elumba Resume
4 pages
HiperPlus Broch REVC
No ratings yet
HiperPlus Broch REVC
2 pages
Emmc - 4gb 64gb - CTRD - 441 It - OrIGINAL 169-Ball 14x18mm
No ratings yet
Emmc - 4gb 64gb - CTRD - 441 It - OrIGINAL 169-Ball 14x18mm
24 pages
Z3WJG4VV9 Assignment ITDPA
No ratings yet
Z3WJG4VV9 Assignment ITDPA
10 pages
Kasus Puzzle#2
No ratings yet
Kasus Puzzle#2
1 page
Computer Devices and Peripherals
No ratings yet
Computer Devices and Peripherals
2 pages
EmpiFis PG Rel (3.0LV) - 20180809
No ratings yet
EmpiFis PG Rel (3.0LV) - 20180809
42 pages
Artificial Intelligence Social Media
No ratings yet
Artificial Intelligence Social Media
17 pages
Edwards, Jonathan - Charity and Its Fruits (New York, 1852)
100% (1)
Edwards, Jonathan - Charity and Its Fruits (New York, 1852)
566 pages
Deloitte Belgium - AI Brochure
No ratings yet
Deloitte Belgium - AI Brochure
4 pages
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
No ratings yet
SQL 2005/2008 DBA (Database Administrator) : Kebutuhan: 1 Orang
4 pages

Step1. Open The Data/bank Data - CSV Dataset

Uploaded by

Step1. Open The Data/bank Data - CSV Dataset

Uploaded by

Data pre-processing Hands on Datamining & Machine Learning with Weka

Step1. Open the data/bank‐data.csv Dataset

id a unique identification number

age age of customer in years (numeric)

sex MALE / FEMALE

income income of customer (numeric)

married is the customer married (YES/NO)

children number of children (numeric)

car does the customer own a car (YES/NO)

save_acct does the customer have a saving account (YES/NO)

current_acct does the customer have a current account (YES/NO)

mortgage does the customer have a mortgage (YES/NO)

Step2. Selecting or Filtering Attributes

Step4: Missing Values

You might also like