RULE LEARNING WITH MACHINE LEARNING ALGORITHMS AND ARTIFICIAL NEURAL NETWORKS

Yusuf Uzun

JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 RULE LEARNING WITH MACHINE LEARNING ALGORITHMS AND ARTIFICIAL NEURAL NETWORKS Yusuf Uzun1, Gülay Tezel2 1 Seydi ehir Vocational School of Higher Education, Selçuk University, Konya 42370, Turkey, yuzun@selcuk.edu.tr 2 Department of Computer Engineering, Selçuk University, Konya 42370, Turkey, gtezel@selcuk.edu.tr ABSTRACT Machine learning, a branch of artificial intelligence, is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. Artificial neural networks are composed of interconnecting artificial neurons (programming constructs that mimic the properties of biological neurons). Artificial neural networks may either be used to gain an understanding of biological neural networks, or for solving artificial intelligence problems without necessarily creating a model of a real biological system. In this paper, we made analysis with machine learning algorithms and artificial neural networks classification from instances in data set. Furthermore, machine learning algorithms and artificial neural networks with constituted rules. Keywords: classification, learning rules, machine learning, neural networks. 1. INTRODUCTION Machine learning is all about learning rules from data. Machine learning is all about making computers learn from experience. These techniques contain concept learning. Machine learning researchers have grouped some of the techniques into three categories. One is active learning which deals with interaction and asking questions during learning, the second is learning from prior knowledge, and the third is learning incrementally (Mitchell, 1997). Machine Learning is generally taken to encompass automatic computing procedures based on logical or binary operations that learn a task from a series of examples. Here we are just concerned with classification, and it is arguable what should come under the Machine Learning umbrella (Langley and Simon, 1995). 54 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 In this paper, using machine learning algorithms, we can make classification and analysis processes on data set. In this paper, Navie Bayes, JRip (to application of RIPPER rule) , Ridor, SMO (support vector machine), J48 decision tree (to application of C4.5 rule), LMT (logistic model tree), Conjunctive Rule, Decision Tables, NNge (non-nested generalized), KStar (Instance-based classifier), IBk (K-Nearest Neighbours) and PART machine learning algorithms have been used (Witten and Frank, 2000). Furthermore, using artificial neural network classification made. Artificial neural networks (ANNs) provide a general, practical method for learning realvalued, discrete-valued, and vector-valued functions from examples. Algorithms such as backpropagation use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as interpreting visual scenes, speech recognition, and learning robot control strategies (Michie et al., 1994). In this paper, machine learning algorithms determined above and ANNs used on taking medical data from breast cancer patients. After postoperative, to decide to treat of patients, patient's internal temperature, patient's surface temperature, oxygen saturation in %, last measurement of blood pressure, stability of patient's surface temperature, stability of patient's core temperature, stability of patient's blood pressure, patient's perceived comfort at discharge, measured as an integer between 0 and 20, discharge decision such as information have been used. Using patient s this information decides sent to intensive care unit, prepared to go home or sent to general hospital floor. 55 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 2. MATERIALS AND METHODS This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data. Below, breast cancer data and attributes are taken from breast cancer patients given in data set. In breast cancer data is used total ten attributes14. This is in turn in order as below: Attributes Age menopause tumor-size inv-nodes node-caps deg-malig breast breast-quad irradiat Class Table 1. Breast Cancer Data Set. Values 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99 lt40, ge40, premeno 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44,45-49, 50-54, 55-59 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26,27-29, 30-32, 3335, 36-39 yes, no 1, 2, 3 left, right left-up, left-low, right-up, right-low, central yes, no no-recurrence-events, recurrence-events This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature (See also lymphography and primarytumor). This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal. Number of Instances: 286. Number of Attributes: 9 + the class attribute. Information of the class attribute: 1. Class: no-recurrence-events, 2. Class: recurrence-events. Missing attribute values denoted by "?". Class Distribution, no-recurrence-events: 201 instances, recurrence-events: 85 instances. To practice of machine learning algorithms WEKA (The Waikato Environment for Knowledge Analysis) software used. WEKA is a workbench for machine learning that 56 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 is intended to aid in the application of machine learning techniques to a variety of realworld problems (Witten and Frank, 2000). The WEKA machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem (Frank et al., 2004). 3. MACHINE LEARNING ALGORITHMS In this paper, machine learning algorithms developed for data mining is used. These algorithms have been determined below. OneR was described in a paper by (Holte, 1993) and learns a one-level decision tree. OneR is a simple error-based rule induction approach (Quinlan, 1993). Naive Bayes is a statistical learning algorithm that applies a simplified version of Bayes rule in order to compute the posterior probability of a category given the input attribute values of an example situation. Prior probabilities for categories and attribute values conditioned on categories are estimated from frequency counts computed from the training data. Naive Bayes is a simple and fast learning algorithm that often outperforms more sophisticated methods (John et al, 1995). JRip (Weka's implementation of the RIPPER rule learner) is a fast algorithm for learning "IF-THEN" rules. Like decision trees rule learning algorithms are popular because the knowledge representation is very easy to interpret. Ridor is the implementation of a RIpple-DOwn Rule learner. It generates the default rule first and 57 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 then the exceptions for the default rule with the least (weighted) error rate (Witten and Frank, 2000). SMO (Support Vector Machine) is implements John C. Platt's sequential minimal optimization algorithm for training a support vector classifier using RBF kernels (Holte, 1993). J48 is class for generating an unpruned or pruned C4.5 decision tree (Quinlan,1992) (Kohavi, 1995). LMT is class for "logistic model tree" classifier (Witten and Frank, 2000). Conjunctive Rule is this class implements a single conjunctive rule learner that can predict for numeric and nominal class labels (Platt, 1998). Decision Tables are class for building and using a simple decision table majority classifier (Mitchell, 1997). NNge classifier is nearest neighbor like algorithm using nonnested generalized exemplars (Witten and Frank, 2000). KStar is an instance-based classifier that is the class of a test instance is based upon the class of those training instances similar to it, as determined by some similarity function (Aha and Kibler, 1991) (Frank and Witten, 1998). IBk is K-nearest neighbours classifier. PART classification algorithm constituted rule sets in C4.5 decision tree. PART is a rule generator that uses J48 to generate pruned decision trees from which rules are extracted (Witten and Frank, 2000). 4. ARTIFICIAL NEURAL NETWORKS (ANNs) CLASSIFIER In this paper, multi layer perceptron networks (MLPs) classifier developed for data mining is used. The kind of multilayer networks learned by the backpropagation algorithm are capable of expressing a rich variety of nonlinear decision surfaces. For example, a typical multilayer network and decision surface is depicted in Figure 1. 58 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 Figure 1. A feedforward Multi-Layer Perceptron (MLP) An MLP network is composed of a number of identical units called neurons organized in layers, with those on one layer connected to those on the next layer, except for the last layer or output layer. Indeed, MLPs architecture is structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the activation function of the neurons is generally sigmoid or linear. In fact, the various types and architectures are identified both by the different topologies adopted for the connections and by the choice of the activation function. A complete network for a 3 x 5 x 2 functional mapping is shown by Figure 1 as an example (Taleb et al, 2009). Figure 2. The sigmoid threshold unit The sigmoid unit is illustrated in Figure 2. Like the perceptron, the sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result. In the case of the sigmoid unit, however, the threshold output is a continuous function of its input. More precisely, the sigmoid unit computes its output 59 as JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 (1) (2) is often called the sigmoid function or, alternatively, the logistic function. Note its output ranges between 0 and 1, increasing monotonically with its input (see the threshold function plot in Figure 2) (Tan and Eshelman, 1988). 5. STUDY RESULTS Here, using machine learning algorithms and ANNs, analysis made with obtained experimental results from classifications on breast cancer data set. Every machine learning algorithms have been applied separately on all data set, and classification results have denoted in Table 2. 10 fold cross validation have made on data set. Purpose from 10 fold cross validation, data set have been divided to 10 part. 1 part have used for test and other parts have used for training. The kappa statistic is used as a means of classifying agreement in categorical data. In Table 2, KS (Kappa Statistic) is used as a means of classifying agreement in categorical data. A kappa coefficient of 1 means a statistically perfect modeling whereas a 0 means every model value was different from the actual value. KS value calculated with formula given below (Eq. 3). (P(A)-P(E)) / (1-P(E)) (3) Where, P (A) is the proportion of times the model values was equal to the actual value and P (E) is the expected proportion by chance. 60 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 Table 2. Performance analysis results of machine learning algorithms on data set. Correctly Mean Classified Kappa Absolute Algorithims Instances Statistics Error (%) Conjunctive Rule 66.43 0.0986 0.4074 Navie Bayes 71.67 0.3857 0,3272 Decision Tables 73.42 0.2462 0.3748 JRip 70.97 0.2409 0.3798 Ridor 70.97 0.1866 0.2902 SMO 69.58 0.1983 0.3042 J48 75.52 0.2826 0.3676 LMT 75.17 0.3042 0.3589 NNge 65.03 0.1212 0.3497 KStar 73.42 0.2864 0.3354 PART 71.32 0.1995 0.365 Table 3. Performance analysis results of ANN algorithm on data set. Kappa Mean Correctly Hidden Epoch Learning Statistic Absolute Classified Layer Rate Error Instances (%) 1 500 0.3 0.2637 0.3402 71.32 1 1000 0,3 0.2816 0.3397 72.02 2 500 0.3 0.2851 0.317 73.07 2 1000 0.3 0.2971 0.3194 73.42 2 1000 0.2 0.3258 0.3188 73.07 3 500 0.3 0.2775 0.3133 72.37 3 1000 0.3 0.2828 0.3138 72.37 3 1000 0.2 0.3386 0.3198 73.77 4 500 0.2 0.245 0.3181 70.27 4 500 0.3 0.2322 0.327 69.23 4 1000 0.3 0.2316 0.3293 68.88 Statistical results of the algorithms for breast cancer data set are given in Table 2 and in Table 3. Navie Bayes algorithm realized best modeling with KS=0.3857 value in making classification on data set. Conjunctive Rule algorithm realized worst modeling with KS=0.0986 value in making classification on data set. Accuracy classification percent rates of JRip and Ridor algorithms have fixed to same value (% 70.97), accuracy classification percent rates of Decision Tables and KStar algorithms have fixed to same value (% 73.42). J48 algorithm with % 75.52 accuracy classification percent rate realized to best classifications on data set. NNge algorithm with % 65.03 accuracy classification percent rate realized to worst classifications on data set. Mean absolute error of Conjunctive Rule algorithm have fixed to maximum 61 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 value (0.4074). Mean absolute error of Ridor algorithm has fixed to minimum value (0.2902). For realized the best classification rate of ANNs algorithm was made by changing the classification parameters of ANNs algorithm. The best accuracy classification percent rate of ANNs (Multi Layer Perceptron) algorithm realized with % 73.77 on data set in Table 3. Here, the number of hidden layer of ANNs algorithm is 3, the number of epoch of ANNs algorithm is 1000, and learning rate of ANNs algorithm is 0.2 value. Below, in table 4 and table 5, rules according to breast cancer data and attributes are taken from breast cancer patients given in data set are obtained by some algorithms. Table 4. JRIP algorithm rules list: JRIP algorithm rules (deg-malig = 3) and (node-caps = yes) => Class=recurrence-events (inv-nodes = 3-5) and (breast = left) => Class=recurrence-events => Class=no-recurrence-events Table 5. PART algorithm rules list. PART algorithm rules node-caps = no AND inv-nodes = 0-2 AND tumor-size = 10-14: no-recurrence-events node-caps = no AND inv-nodes = 0-2 AND deg-malig = 1: no-recurrence- events deg-malig = 2 AND inv-nodes = 0-2 AND breast-quad = left_low: no-recurrence-events deg-malig = 2 AND inv-nodes = 0-2 AND breast-quad = left_up: no-recurrence-events deg-malig = 2 AND tumor-size = 20-24 AND irradiat = no: no-recurrence-events deg-malig = 2 AND tumor-size = 25-29: no-recurrence-events node-caps = no AND tumor-size = 20-24 AND inv-nodes = 0-2: no-recurrence-events deg-malig = 1: no-recurrence-events deg-malig = 2 AND tumor-size = 0-4: no-recurrence-events deg-malig = 2 AND tumor-size = 35-39: no-recurrence-events tumor-size = 20-24: recurrence-events deg-malig = 2 AND tumor-size = 30-34 AND irradiat = no: no-recurrence-events tumor-size = 40-44 AND breast-quad = left_up: no-recurrence-events node-caps = yes AND breast-quad = left_low AND deg-malig = 3: recurrence-events tumor-size = 30-34: recurrence-events tumor-size = 25-29 AND breast = left: recurrence-events tumor-size = 15-19: no-recurrence-events tumor-size = 25-29 AND menopause = ge40: no-recurrence-events tumor-size = 35-39 AND menopause = premeno: recurrence-events : no-recurrence-events 62 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 6. RESULTS AND DISCUSSIONS During the number of instances decreased fall in performance of algorithms observed. Performances of algorithms have been increased on large number instances. Data mining indicated high performance on large dimension databases. Used machine learning algorithms have been developed for data mining. Therefore, used machine learning algorithms have been indicated high performance on large dimension databases. Their ability to learn by example makes neural nets very flexible and powerful. There is no need to devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal mechanisms of that task. Along various other advantages of Neural nets there disadvantages too they cannot be programmed to perform a specific task; the examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. While some of the algorithms show high performance, and some of them have shown poor performance. While some of the algorithms make the best modeling and some of them have made low modeling. REFERENCES Langley, P. & Simon, H. 1995. Communications of the ACM, 38 (11) 11-46. Quinlan, J.R. 1992. C4.5: Programs for Machine Learning, Morgan Kaufman. Platt, J. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. Advances in Kernel Methods - Support Vector Learning, MIT Pres. Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, Vol. 11, pp. 63-91. Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA. Kohavi, R. 1995. The Power of Decision Tables. In Proc European Conference on Machine Learning. Mitchell, T.M. 1997. Machine Learning. McGraw-Hill Science. John, G., Cleary, E. & Leonard, E. 1995. K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108-114. Aha, D. & Kibler, D. 1991. Instance-based learning algorithms. Machine Learning. 6:37-66. Frank, E. & Witten, I.H. 1998. Generating accurate rule sets without global optimization. In Proceedings of the 15th International Conference on Machine Learning. pp. 144 151. Morgan Kaufmann. 63 JOURNAL OF SELÇUK UNIVERSITY NATURAL AND APPLI ED SCIENCE 2012 VOL.1 NO.2 ONLINE ISSN:2147-3781 Michie, D., Spiegelhalter, D.J. & Taylor, C.C. 1994. Machine Learning, Neural Statistical Classification, Ellis Horwood. Frank, E. 2000. Machine Learning Techniques For Data Mining, University of Waikato, New Zealand. Witten, I.H. & Frank, E. 2000. Weka machine learning algorithms in java, in Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations,Morgan Kaufmann Publishers, pp. 265-320. Tan, M. & Eshelman, L. 1988. Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning, 121-134, Ann Arbor, MI. Taleb, R., Meroufel A. & Wira, P. 2009. Harmonic elimination control of an inverter based on an artificial neural network strategy, IFAC International Conference on Intelligent Control Systems and Signal Processing, Istanbul, Turkey. Frank, E., Hall, M., Trigg, L., Holmes, G. & Witten, I.H. Data mining in bioinformatics using Weka, Bioinformatics Applications Note, 2479-2481, (2004) 64

RELATED PAPERS

RELATED TOPICS

Log In

RULE LEARNING WITH MACHINE LEARNING ALGORITHMS AND ARTIFICIAL NEURAL NETWORKS

RULE LEARNING WITH MACHINE LEARNING ALGORITHMS AND ARTIFICIAL NEURAL NETWORKS

Related Papers

RELATED PAPERS

RELATED TOPICS