MC0717 Lab Manual
MC0717 Lab Manual
Index
S.No
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Experiment
Demonstration of preprocessing on
dataset student.arff
Demonstration of preprocessing on
dataset labor.arff
Demonstration of Association rule
process on dataset contactlenses.arff
using apriori algorithm
Demonstration of Association rule
process on dataset test.arff using apriori
algorithm
Demonstration of classification rule
process on dataset student.arff using j48
algorithm
Demonstration of classification rule
process on dataset employee.arff using
j48 algorithm
Demonstration of classification rule
process on dataset employee.arff using
id3 algorithm
Demonstration of classification rule
process on dataset employee.arff using
nave bayes algorithm
Demonstration of clustering rule
process on dataset iris.arff using simple
k-means
Demonstration of clustering rule process
on dataset student.arff using simple kmeans
Page no
Signature
Discretization
1)Sometimes association rule mining can only be performed on categorical data.This requires
performing discretization on numeric or continuous attributes.In the following example let us
discretize age attribute.
Let us divide the values of age attribute into three bins(intervals).
First load the dataset into weka(student.arff)
Select the age attribute.
Activate filter-dialog box and select WEKA.filters.unsupervised.attribute.discretizefrom
the list.
To change the defaults for the filters,click on the box immediately to the right of the
choose button.
We enter the index for the attribute to be discretized.In this case the attribute is age.So we
must enter 1 corresponding to the age attribute.
Enter 3 as the number of bins.Leave the remaining field values as they are.
Click OK button.
Click apply in the filter panel.This will result in a new working relation with the selected
attribute partition into 3 bins.
Save the new working relation in a file called student-data-discretized.arff
Discretization
1)Sometimes association rule mining can only be performed on categorical data.This requires
performing discretization on numeric or continuous attributes.In the following example let us
discretize duration attribute.
Let us divide the values of duration attribute into three bins(intervals).
First load the dataset into weka(labor.arff)
Select the duration attribute.
Activate filter-dialog box and select WEKA.filters.unsupervised.attribute.discretizefrom
the list.
To change the defaults for the filters,click on the box immediately to the right of the
choose button.
We enter the index for the attribute to be discretized.In this case the attribute is duration So
we must enter 1 corresponding to the duration attribute.
Enter 1 as the number of bins.Leave the remaining field values as they are.
Click OK button.
Click apply in the filter panel.This will result in a new working relation with the selected
attribute partition into 1 bin.
Save the new working relation in a file called labor-data-discretized.arff
Dataset labor.arff
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
6. Demonstration of
algorithm
Aim: This experiment illustrates the use of j-48 classifier in weka.the sample data set used in
this experiment is employeedata available at arff format. This document assumes that
appropriate data pre processing has been performed.
Steps involved in this experiment:
Step 1: We begin the experiment by loading the data (employee.arff) into weka.
Step2: Next we select the classify tab and click choose button to select the
j48classifier.
Step3: Now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values the default
version does perform some pruning but does not perform error pruning.
Step4: Under the text options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we dont have separate evaluation data set, this is necessary to
get a reasonable idea of accuracy of generated model.
Step-5: We now click start to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.
Step-6: Note that the classification accuracy of model is about 69%.this indicates that we may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: Now weka also lets us a view a graphical version of the classification tree. This can
be done by right clicking the last result set and selecting visualize tree from the pop-up
menu.
Step-8: We will use our model to classify the new instances.
Step-9: In the main panel under text options click the supplied test set radio button and
then click the set button. This wills pop-up a window which will allow you to open the file
containing test instances.
The following screenshot shows the classification rules that were generated whenj48
algorithm is applied on the given dataset.
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.
The following screenshot shows the classification rules that were generated when naive bayes
algorithm is applied on the given dataset.
The following screenshot shows the clustering rules that were generated when simple k
means algorithm is applied on the given dataset.
10. Demonstration of clustering rule process on dataset student.arff using simple kmeans
Aim: This experiment illustrates the use of simple k-mean clustering with Weka explorer.
The sample data set used for this example is based on the student data available in ARFF
format. This document assumes that appropriate preprocessing has been performed. This
istudent dataset includes 14 instances.
Steps involved in this Experiment
Step 1: Run the Weka explorer and load the data file student.arff in preprocessing interface.
Step 2: Inorder to perform clustering select the cluster tab in the explorer and click on the
choose button. This step results in a dropdown list of available clustering algorithms.
Step 3 : In this case we select simple k-means.
Step 4: Next click in text button to the right of the choose button to get popup window shown
in the screenshots. In this window we enter six on the number of clusters and we leave the
value of the seed on as it is. The seed value is used in generating a random number which is
used for making the internal assignments of instances of clusters.
Step 5 : Once of the option have been specified. We run the clustering algorithm there we
must make sure that they are in the cluster mode panel. The use of training set option is
selected and then we click start button. This process and resulting window are shown in the
following screenshots.
Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid are
means vectors for each clusters. This clusters can be used to characterized the cluster.
Step 7: Another way of understanding characterstics of each cluster through visualization
,we can do this, try right clicking the result set on the result. List panel and selecting the
visualize cluster assignments.
Interpretation of the above visualization
From the above visualization, we can understand the distribution of age and instance number
in each cluster. For instance, for each cluster is dominated by age. In this case by changing
the color dimension to other attributes we can see their distribution with in each of the
cluster.
Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result student k-mean .The top portion of this file is shown in the following figure.
The following screenshot shows the clustering rules that were generated when simple kmeans algorithm is applied on the given dataset.