Step1. Open The Data/bank Data - CSV Dataset
Step1. Open The Data/bank Data - CSV Dataset
Click the “Open file…” button to open a data set and double click on the “data” directory.
Select the “bank‐data.csv” file to load the bank dataset.
region inner_city/rural/suburban/town
pep did the customer buy a PEP (Personal Equity Plan) after the last mailing (YES/NO)
• Using the Attribute filters in WEKA. In the "Filter" panel, click on the "Choose" button.
This will show a popup window with a list available filters. Scroll down the list and select
the "weka.filters.unsupervised.attribute.Remove" and click apply.
• Save the file with a name "bank‐data‐R1.arff"
Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.
Data pre-processing Hands on Datamining & Machine Learning with Weka
Step3: Discretization
Some techniques, such as association rule mining, can only be performed on categorical data. This
requires performing discretization on numeric or continuous attributes. There are 3 such attributes
in this data set: "age", "income", and "children". In the case of the "children" attribute the range of
possible values are only 0, 1, 2, and 3. In this case, we have opted for keeping all of these values in
the data. This means we can simply discretize by removing the keyword "numeric" as the type for
the "children" attribute in the ARFF file, and replacing it with the set of discrete values. We do this
directly in our text editor. In this case, we have saved the resulting relation in a separate file "bank‐
data2.arff".
Load the bank‐data2.arff dataset into weka.
If we select the "children" attribute in this new data set, we see that it is now a categorical attribute
with four possible discrete values.
once again we choose the Filter dialog box, but this time, we will select Discretize from the list.
Next, to change the defaults for this filter, click on the box immediately to the right of the "Choose"
button. This will open the Discretize Filter dialog box. We enter the index for the attributes to be
discretized. In this case we enter 1 corresponding to attribute "age". We also enter 3 as the number
of bins (note that it is possible to discretize more than one attribute at the same time by using a list
of attribute indices).
Save the file as bank‐data3.arff and check the dataset in text editor.
Next, we apply the same process to discretize the "income" attribute into 3 bins. Again, Weka
automatically performs the binning and replaces the values in the "income" column with the
appropriate automatically generated labels. We save the new file into "bank‐data3.arff", replacing
the older version. Editing by text editor made bank‐data‐final.arff.
Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.
Data pre-processing Hands on Datamining & Machine Learning with Weka
7. Look into the data. How did those missing values get replaced ?
8. Edit “bank‐data.arff” with text editor. Make some data missing by replacing them with ‘?’. (Try
with nominal and numeric attributes). Save to “bank‐data‐missing.arff”.
9. Load “bank‐data‐missing.arff” into WEKA, observe the data and attribute information.
10. Replace missing values by the same procedure you had done before.
Dr. G. Bhardwaja Kumar / Prof. Tulasi Prasad Sariki, SCSE, VIT University, Chennai.