JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
RULE LEARNING WITH MACHINE LEARNING ALGORITHMS AND
ARTIFICIAL NEURAL NETWORKS
Yusuf Uzun1, Gülay Tezel2
1
Seydi ehir Vocational School of Higher Education, Selçuk University, Konya 42370,
Turkey, yuzun@selcuk.edu.tr
2
Department of Computer Engineering, Selçuk University, Konya 42370, Turkey,
gtezel@selcuk.edu.tr
ABSTRACT
Machine learning, a branch of artificial intelligence, is a scientific discipline that is
concerned with the design and development of algorithms that allow computers to
evolve behaviors based on empirical data, such as from sensor data or databases.
Artificial neural networks are composed of interconnecting artificial neurons
(programming constructs that mimic the properties of biological neurons). Artificial
neural networks may either be used to gain an understanding of biological neural
networks, or for solving artificial intelligence problems without necessarily creating a
model of a real biological system. In this paper, we made analysis with machine
learning algorithms and artificial neural networks classification from instances in data
set. Furthermore, machine learning algorithms and artificial neural networks with
constituted rules.
Keywords: classification, learning rules, machine learning, neural networks.
1. INTRODUCTION
Machine learning is all about learning rules from data. Machine learning is all about
making computers learn from experience. These techniques contain concept learning.
Machine learning researchers have grouped some of the techniques into three
categories. One is active learning which deals with interaction and asking questions
during learning, the second is learning from prior knowledge, and the third is learning
incrementally (Mitchell, 1997). Machine Learning is generally taken to encompass
automatic computing procedures based on logical or binary operations that learn a task
from a series of examples. Here we are just concerned with classification, and it is
arguable what should come under the Machine Learning umbrella (Langley and Simon,
1995).
54
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
In this paper, using machine learning algorithms, we can make classification and
analysis processes on data set. In this paper, Navie Bayes, JRip (to application of
RIPPER rule) , Ridor, SMO (support vector machine), J48 decision tree (to application
of C4.5 rule), LMT (logistic model tree), Conjunctive Rule, Decision Tables, NNge
(non-nested generalized), KStar (Instance-based classifier), IBk (K-Nearest Neighbours)
and PART machine learning algorithms have been used (Witten and Frank, 2000).
Furthermore, using artificial neural network classification made.
Artificial neural networks (ANNs) provide a general, practical method for learning realvalued, discrete-valued, and vector-valued functions from examples. Algorithms such as
backpropagation use gradient descent to tune network parameters to best fit a training
set of input-output pairs. ANN learning is robust to errors in the training data and has
been successfully applied to problems such as interpreting visual scenes, speech
recognition, and learning robot control strategies (Michie et al., 1994).
In this paper, machine learning algorithms determined above and ANNs used on taking
medical data from breast cancer patients. After postoperative, to decide to treat of
patients, patient's internal temperature, patient's surface temperature, oxygen saturation
in %, last measurement of blood pressure, stability of patient's surface temperature,
stability of patient's core temperature, stability of patient's blood pressure, patient's
perceived comfort at discharge, measured as an integer between 0 and 20, discharge
decision such as information have been used. Using patient s this information decides
sent to intensive care unit, prepared to go home or sent to general hospital floor.
55
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
2. MATERIALS AND METHODS
This breast cancer domain was obtained from the University Medical Centre, Institute
of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for
providing the data. Below, breast cancer data and attributes are taken from breast
cancer patients given in data set. In breast cancer data is used total ten attributes14. This
is in turn in order as below:
Attributes
Age
menopause
tumor-size
inv-nodes
node-caps
deg-malig
breast
breast-quad
irradiat
Class
Table 1. Breast Cancer Data Set.
Values
10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99
lt40, ge40, premeno
0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44,45-49, 50-54,
55-59
0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26,27-29, 30-32, 3335, 36-39
yes, no
1, 2, 3
left, right
left-up, left-low, right-up, right-low, central
yes, no
no-recurrence-events, recurrence-events
This is one of three domains provided by the Oncology Institute that has repeatedly
appeared in the machine learning literature (See also lymphography and primarytumor). This data set includes 201 instances of one class and 85 instances of another
class. The instances are described by 9 attributes, some of which are linear and some are
nominal.
Number of Instances: 286. Number of Attributes: 9 + the class attribute. Information of
the class attribute: 1. Class: no-recurrence-events, 2. Class: recurrence-events. Missing
attribute values denoted by "?".
Class Distribution, no-recurrence-events: 201
instances, recurrence-events: 85 instances.
To practice of machine learning algorithms WEKA (The Waikato Environment for
Knowledge Analysis) software used. WEKA is a workbench for machine learning that
56
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
is intended to aid in the application of machine learning techniques to a variety of realworld problems (Witten and Frank, 2000). The WEKA machine learning workbench
provides a general-purpose environment for automatic classification, regression,
clustering and feature selection-common data mining problems in bioinformatics
research. It contains an extensive collection of machine learning algorithms and data
pre-processing methods complemented by graphical user interfaces for data exploration
and the experimental comparison of different machine learning techniques on the same
problem (Frank et al., 2004).
3. MACHINE LEARNING ALGORITHMS
In this paper, machine learning algorithms developed for data mining is used. These
algorithms have been determined below.
OneR was described in a paper by (Holte, 1993) and learns a one-level decision tree.
OneR is a simple error-based rule induction approach (Quinlan, 1993). Naive Bayes is a
statistical learning algorithm that applies a simplified version of Bayes rule in order to
compute the posterior probability of a category given the input attribute values of an
example situation. Prior probabilities for categories and attribute values conditioned on
categories are estimated from frequency counts computed from the training data. Naive
Bayes is a simple and fast learning algorithm that often outperforms more sophisticated
methods (John et al, 1995).
JRip (Weka's implementation of the RIPPER rule learner) is a fast algorithm for
learning "IF-THEN" rules. Like decision trees rule learning algorithms are popular
because the knowledge representation is very easy to interpret. Ridor is the
implementation of a RIpple-DOwn Rule learner. It generates the default rule first and
57
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
then the exceptions for the default rule with the least (weighted) error rate (Witten and
Frank, 2000). SMO (Support Vector Machine) is implements John C. Platt's sequential
minimal optimization algorithm for training a support vector classifier using RBF
kernels (Holte, 1993). J48 is class for generating an unpruned or pruned C4.5 decision
tree (Quinlan,1992) (Kohavi, 1995). LMT is class for "logistic model tree" classifier
(Witten and Frank, 2000). Conjunctive Rule is this class implements a single
conjunctive rule learner that can predict for numeric and nominal class labels (Platt,
1998). Decision Tables are class for building and using a simple decision table majority
classifier (Mitchell, 1997). NNge classifier is nearest neighbor like algorithm using nonnested generalized exemplars (Witten and Frank, 2000). KStar is an instance-based
classifier that is the class of a test instance is based upon the class of those training
instances similar to it, as determined by some similarity function (Aha and Kibler,
1991) (Frank and Witten, 1998). IBk is K-nearest neighbours classifier. PART
classification algorithm constituted rule sets in C4.5 decision tree. PART is a rule
generator that uses J48 to generate pruned decision trees from which rules are extracted
(Witten and Frank, 2000).
4. ARTIFICIAL NEURAL NETWORKS (ANNs) CLASSIFIER
In this paper, multi layer perceptron networks (MLPs) classifier developed for data
mining is used. The kind of multilayer networks learned by the backpropagation
algorithm are capable of expressing a rich variety of nonlinear decision surfaces. For
example, a typical multilayer network and decision surface is depicted in Figure 1.
58
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
Figure 1. A feedforward Multi-Layer Perceptron (MLP)
An MLP network is composed of a number of identical units called neurons organized
in layers, with those on one layer connected to those on the next layer, except for the
last layer or output layer. Indeed, MLPs architecture is structured into an input layer of
neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent
layers are usually fully connected and the activation function of the neurons is generally
sigmoid or linear. In fact, the various types and architectures are identified both by the
different topologies adopted for the connections and by the choice of the activation
function. A complete network for a 3 x 5 x 2 functional mapping is shown by Figure 1
as an example (Taleb et al, 2009).
Figure 2. The sigmoid threshold unit
The sigmoid unit is illustrated in Figure 2. Like the perceptron, the sigmoid unit first
computes a linear combination of its inputs, then applies a threshold to the result. In the
case of the sigmoid unit, however, the threshold output is a continuous function of its
input. More precisely, the sigmoid unit computes its output
59
as
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
(1)
(2)
is often called the sigmoid function or, alternatively, the logistic function. Note its
output ranges between 0 and 1, increasing monotonically with its input (see the
threshold function plot in Figure 2) (Tan and Eshelman, 1988).
5. STUDY RESULTS
Here, using machine learning algorithms and ANNs, analysis made with obtained
experimental results from classifications on breast cancer data set.
Every machine learning algorithms have been applied separately on all data set, and
classification results have denoted in Table 2. 10 fold cross validation have made on
data set. Purpose from 10 fold cross validation, data set have been divided to 10 part. 1
part have used for test and other parts have used for training.
The kappa statistic is used as a means of classifying agreement in categorical data. In
Table 2, KS (Kappa Statistic) is used as a means of classifying agreement in categorical
data. A kappa coefficient of 1 means a statistically perfect modeling whereas a 0 means
every model value was different from the actual value. KS value calculated with
formula given below (Eq. 3).
(P(A)-P(E)) / (1-P(E))
(3)
Where, P (A) is the proportion of times the model values was equal to the actual value
and P (E) is the expected proportion by chance.
60
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
Table 2. Performance analysis results of machine learning algorithms on data set.
Correctly
Mean
Classified
Kappa
Absolute
Algorithims
Instances
Statistics
Error
(%)
Conjunctive Rule
66.43
0.0986
0.4074
Navie Bayes
71.67
0.3857
0,3272
Decision Tables
73.42
0.2462
0.3748
JRip
70.97
0.2409
0.3798
Ridor
70.97
0.1866
0.2902
SMO
69.58
0.1983
0.3042
J48
75.52
0.2826
0.3676
LMT
75.17
0.3042
0.3589
NNge
65.03
0.1212
0.3497
KStar
73.42
0.2864
0.3354
PART
71.32
0.1995
0.365
Table 3. Performance analysis results of ANN algorithm on data set.
Kappa Mean
Correctly
Hidden Epoch Learning Statistic Absolute Classified
Layer
Rate
Error
Instances
(%)
1
500
0.3
0.2637 0.3402
71.32
1
1000 0,3
0.2816 0.3397
72.02
2
500
0.3
0.2851 0.317
73.07
2
1000 0.3
0.2971 0.3194
73.42
2
1000 0.2
0.3258 0.3188
73.07
3
500
0.3
0.2775 0.3133
72.37
3
1000 0.3
0.2828 0.3138
72.37
3
1000 0.2
0.3386 0.3198
73.77
4
500
0.2
0.245
0.3181
70.27
4
500
0.3
0.2322 0.327
69.23
4
1000 0.3
0.2316 0.3293
68.88
Statistical results of the algorithms for breast cancer data set are given in Table 2 and in
Table 3. Navie Bayes algorithm realized best modeling with KS=0.3857 value in
making classification on data set. Conjunctive Rule algorithm realized worst modeling
with KS=0.0986 value in making classification on data set.
Accuracy classification percent rates of JRip and Ridor algorithms have fixed to same
value (% 70.97), accuracy classification percent rates of Decision Tables and KStar
algorithms have fixed to same value (% 73.42). J48 algorithm with % 75.52 accuracy
classification percent rate realized to best classifications on data set. NNge algorithm
with % 65.03 accuracy classification percent rate realized to worst classifications on
data set. Mean absolute error of Conjunctive Rule algorithm have fixed to maximum
61
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
value (0.4074). Mean absolute error of Ridor algorithm has fixed to minimum value
(0.2902).
For realized the best classification rate of ANNs algorithm was made by changing the classification
parameters of ANNs algorithm. The best accuracy classification percent rate of ANNs
(Multi Layer Perceptron) algorithm realized with % 73.77 on data set in Table 3. Here,
the number of hidden layer of ANNs algorithm is 3, the number of epoch of ANNs
algorithm is 1000, and learning rate of ANNs algorithm is 0.2 value.
Below, in table 4 and table 5, rules according to breast cancer data and attributes are
taken from breast cancer patients given in data set are obtained by some algorithms.
Table 4. JRIP algorithm rules list:
JRIP algorithm rules
(deg-malig = 3) and (node-caps = yes) => Class=recurrence-events
(inv-nodes = 3-5) and (breast = left) => Class=recurrence-events
=> Class=no-recurrence-events
Table 5. PART algorithm rules list.
PART algorithm rules
node-caps = no AND inv-nodes = 0-2 AND tumor-size = 10-14: no-recurrence-events
node-caps = no AND inv-nodes = 0-2 AND deg-malig = 1: no-recurrence- events
deg-malig = 2 AND inv-nodes = 0-2 AND breast-quad = left_low: no-recurrence-events
deg-malig = 2 AND inv-nodes = 0-2 AND breast-quad = left_up: no-recurrence-events
deg-malig = 2 AND tumor-size = 20-24 AND irradiat = no: no-recurrence-events
deg-malig = 2 AND tumor-size = 25-29: no-recurrence-events
node-caps = no AND tumor-size = 20-24 AND inv-nodes = 0-2: no-recurrence-events
deg-malig = 1: no-recurrence-events
deg-malig = 2 AND tumor-size = 0-4: no-recurrence-events
deg-malig = 2 AND tumor-size = 35-39: no-recurrence-events
tumor-size = 20-24: recurrence-events
deg-malig = 2 AND tumor-size = 30-34 AND irradiat = no: no-recurrence-events
tumor-size = 40-44 AND breast-quad = left_up: no-recurrence-events
node-caps = yes AND breast-quad = left_low AND deg-malig = 3: recurrence-events
tumor-size = 30-34: recurrence-events
tumor-size = 25-29 AND breast = left: recurrence-events
tumor-size = 15-19: no-recurrence-events
tumor-size = 25-29 AND menopause = ge40: no-recurrence-events
tumor-size = 35-39 AND menopause = premeno: recurrence-events
: no-recurrence-events
62
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
6. RESULTS AND DISCUSSIONS
During the number of instances decreased fall in performance of algorithms observed. Performances
of algorithms have been increased on large number instances. Data mining indicated high
performance on large dimension databases. Used machine learning algorithms have been developed
for data mining. Therefore, used machine learning algorithms have been indicated high performance
on large dimension databases.
Their ability to learn by example makes neural nets very flexible and powerful. There is no need to
devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal
mechanisms of that task. Along various other advantages of Neural nets there disadvantages too they
cannot be programmed to perform a specific task; the examples must be selected carefully otherwise
useful time is wasted or even worse the network might be functioning incorrectly.
While some of the algorithms show high performance, and some of them have shown poor
performance. While some of the algorithms make the best modeling and some of them have made
low modeling.
REFERENCES
Langley, P. & Simon, H. 1995. Communications of the ACM, 38 (11) 11-46.
Quinlan, J.R. 1992. C4.5: Programs for Machine Learning, Morgan Kaufman.
Platt, J. 1998. Fast Training of Support Vector Machines using Sequential Minimal
Optimization. Advances in Kernel Methods - Support Vector Learning, MIT Pres.
Holte, R.C. 1993. Very simple classification rules perform well on most commonly used
datasets. Machine Learning, Vol. 11, pp. 63-91.
Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann
Publishers, San Mateo, CA.
Kohavi, R. 1995. The Power of Decision Tables. In Proc European Conference on
Machine Learning.
Mitchell, T.M. 1997. Machine Learning. McGraw-Hill Science.
John, G., Cleary, E. & Leonard, E. 1995. K*: An Instance-based Learner Using an
Entropic Distance Measure. In: 12th International Conference on Machine Learning,
108-114.
Aha, D. & Kibler, D. 1991. Instance-based learning algorithms. Machine Learning.
6:37-66.
Frank, E. & Witten, I.H. 1998. Generating accurate rule sets without global
optimization. In Proceedings of the 15th International Conference on Machine Learning.
pp. 144 151. Morgan Kaufmann.
63
JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2
ONLINE ISSN:2147-3781
Michie, D., Spiegelhalter, D.J. & Taylor, C.C. 1994. Machine Learning, Neural
Statistical Classification, Ellis Horwood.
Frank, E. 2000. Machine Learning Techniques For Data Mining, University of Waikato,
New Zealand.
Witten, I.H. & Frank, E. 2000. Weka machine learning algorithms in java, in Data
Mining: Practical Machine Learning Tools and Techniques with Java
Implementations,Morgan Kaufmann Publishers, pp. 265-320.
Tan, M. & Eshelman, L. 1988. Using weighted networks to represent classification
knowledge in noisy domains. Proceedings of the Fifth International Conference on
Machine Learning, 121-134, Ann Arbor, MI.
Taleb, R., Meroufel A. & Wira, P. 2009. Harmonic elimination control of an inverter
based on an artificial neural network strategy, IFAC International Conference on
Intelligent Control Systems and Signal Processing, Istanbul, Turkey.
Frank, E., Hall, M., Trigg, L., Holmes, G. & Witten, I.H. Data mining in bioinformatics
using Weka, Bioinformatics Applications Note, 2479-2481, (2004)
64