Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Brain Data Analysis and Management

2015, GISAP:Medical Science, Pharmacology

GISAP Medical Science, PharMacology BRAIN DATA ANALYSIS AND MANAGEMENT R. Ilieva1, PhD, Associate Professor in Automated Systems for Data Processing and Management P. Georgieva2, PhD, Head of Signal Processing Lab (SPL), IEETA S. Petrova3, MEng Student in Electronic Management, ELFE Technical University of Sofia, Bulgaria1,3 University of Aveiro, Portugal2 Machine Learning (ML) techniques have been extensively applied in bioinformatics. In this paper, we chose RapidMiner software to analyze brain data (EEG signals) in order to discriminate human emotions while subjects were observing images. Five ML classification algorithms were studied: k-Nearest Neighbor (kNN), Naive Bayes, Support Vector Machine, Artificial Neural Networks and Decision Tree. kNN and ensemble classifiers achieved above 80% accuracy with test data. This is a very encouraging result taking into account the fact that brain signals are highly non-stationary and noisy and therefore it is quite challenging data for analysis and management. Keywords: bioinformatics, brain, Artificial Neural Networks, k-Nearest Neighbor (kNN), Naive Bayes, Support Vector Machine, Decision Tree. Conference participants Digital Object Identification: http://dx.doi.org/10.18007/gisap:msp.v0i7.1071 i. introduction Machine Learning (ML) is a subarea of artificial intelligence connected with the design, analysis, implementation and application of programs which study on examples [2]. Learning from data is commercially and scientifically important. ML consists of methods that automatically extract interesting knowledge (patterns, models, relationships) in large databases. The goal of this work is to find a reliable ML algorithm (or a combination of ML techniques) able to discriminate positive and negative human emotions based on Electroencephalogram (EEG) signals. The EEG data was collected while subjects were exposed to images typically provoking positive and negative emotions. This paper is organized as follows. Section 2 briefly introduces the ML techniques studied. Section 3 describes the dataset, the acquisition process and the metrics used to analyze the classifier performance. In Section 4 classification results are summarized, and finally our conclusions are presented in Section 5. ii. Machine learning techniques The ML classification techniques applied in the present study are briefly introduced. 1. K-Nearest Neighbor (k-NN) K-NN is a widely applied classifier [1]. The class of a new example (object) is defined based on the majority vote of the K nearest training examples with known class labels (Fig.1). K is usually a small positive integer unpaired number – this is to avoid that two classes have the same votes. kNN computes the most common class among the k Nearest Neighbors of the new example and assigns this class to the example. For K=1 the object is simply assigned to the class of its nearest neighbor. 2. Support Vector Machine (SVM) SVM is a classification technique that defines the hyper-plane maximizing the margin between two classes [1]. There may be several options (lines) that separate two classes (+/– on Fig. 2). SVM determines the closest objects between the two classes (termed support vectors). The optimal separation line is the one that maximizes the distance between the classes. For linearly non separable Fig. 1. knn classif. (K=3) classes (Fig. 3) the concept of kernel induced feature space is formulated. The Kernel SVM transforms the original data into a higher dimensional feature space where data is already linearly separable and then applies the same procedure as the one described above. 3. Artificial Neural Network (ANN) ANN is a mathematical architecture inspired by the structure and functionality of biological neural networks [1]. ANN consists of layers: typically one input, one hidden and one output layer (Fig. 4). Each layer has a Fig. 2. linear SVM Fig. 3. nonlinear SVM Fig. 4. general ann architecture 31 GISAP Medical Science, PharMacology number of parallel processing elements (PE), termed neurons (or nodes), which are mathematical functions that mimic the dynamical behavior of biological neurons at a macro scale. Due to their properties of adaptation, noise filtering and parallel processing, the ANNs are a powerful framework for classification and regression. Their main disadvantage is the usually long processing time for networks with high dimension (high number of PE). 4. Decision Tree (DT) DT is a classification technique based on the principle of complex division of a problem into a sequence of sub-problems and thus generation of a decision tree (Fig. 5). III. EEG Signal Classification 1. EEG signals acquisition. The goal of this study is to distinguish emotional bio-signals evoked by viewing selected affective pictures from the International Affective Picture System (IAPS) [5]. EEG-based brain-computer interfaces consist of very typical components, each of which performs its own critical function. Figure 1 shows the process cycle. Firstly, a stimulus set and test protocol are needed (1). During testing (2), the test subject will be exposed to the stimuli according to the test protocol. The resulting voltage changes in the brain are then recorded (3) as an electroencephalogram, from which noise and artifacts are removed (4). The resulting data will be analyzed (5) and relevant features (like power spectra) will be computed. Based on a test set from these features a classifier will be trained (6), and the rest of the data will be classified using this classifier. This step provides an interpretation of the original raw brain signals. The feedback step will not be used during this research. It is shown for the sake of completeness. A total of 26 female volunteers participated in the study. 21 channels of EEG were recorded – Frontal and Parietal (FP), Frontal (F); Temporal (T), Central (C), Parietal (P) and Occipital (O) channels. A total of 24 of high arousal (> 6) images with positive valence (7.29+/–0.65) and negative valence (1.47+/–0.24) were selected. Each image was shown 3 times in a pseudo-random order and each trial lasted for 3500 ms: during the first 750 ms, a fixation crosses was presented, then one of the images was shown for 500 ms and at last a black screen – for 2250 ms. Three schemes were implemented by choosing three different filters and detecting n maximums and n minimums at the output of the filters. i) Butterworth filter of fourth order with passband [0.5 – 15] Hz. 12 features are stored according to the latency (time of occurrence) and amplitude of the first 3 maximums and minimums (Fig. 7a). ii) Butterworth filter of fourth order with Delta band [0.5 – 4] Hz. 8 features are stored according to the latency and amplitude of the first 2 maximums and minimums (Fig. 7b). iii) Butterworth filter of fourth order with Theta band [4 – 8] Hz. 12 features a) Passband 0.5 - 15 Hz Fig. 5. Example of Decision Tree b) Delta passband 0.5 - 4 Hz Fig. 6. Brain Study Spiral Model 3 c) Theta passband 4 - 8 Hz Fig. 7: Filtered signal (with fourth order Butterworth filter) and features detection: positive (line) & negative (dot) GISAP Medical Science, PharMacology are stored according to the latency and amplitude of the first 3 maximums and minimums (Fig. 7c). 2. Classifier performance metrics The classifier performance is analyzed by the Confusion Matrix [3]. The basic structure of a Confusion Matrix for a two-class problem is presented in Table 1. Where TP (true positive) and TN (true negative) are the numbers of correctly classified as positive and negative examples respectively, while FP (false positive) and FN (false negative) are the numbers of wrongly classified as positive and negative examples respectively. The following performance measures are determined from the Confusion Matrix. Accuracy is the fraction of all (positive and negative) correctly classified examples [3]: Accuracy = TP+TN TP+FN+FP+TN Precision is the fraction of correctly classified positive examples from all classified as positive. Precision = TP TP+FP Specificity is the fraction of correctly classified negative examples from all classified as negative. Specificity = TN TN+FP Recall is the fraction of positive examples correctly classified as positive examples from all positive examples. Recall = TP TP+FN While accuracy is a performance measure more typical for balanced data (data with similar number of examples of all classes), the other measures (precision, specificity and recall) are more adequate for the unbalanced data. Tab. 1 Confusion Matrix PredicTed claSS ACTUAL claSS Class = Yes Class = No Class = Yes Tp fN Class = No fp TN Fig. 8. The process of data loading, filtering and storing Fig. 9. X-Validation of five classifiers applied to ClassAB_Data IV. EEG classification results with Rapid Miner (RM) RM is an open-source data mining system [4]. It is available as a standalone application for data analysis and as a data mining engine for integration into other products. RM is an environment for machine learning, data mining, text mining, predictive analytics and business analytics. 1. Data Preprocessing Before information extraction and classification the data was conveniently preprocessed, filtered and stored as follows. Data normalization: in order to avoid computational problems before loading into RM. Data Cleaning and storing: fig 8 depicts the process of: i) data loading into RM with the Read CSV operator; ii) filtering (removing) the lines with missing values (that correspond to lines with zeros) with the Filter Examples operator; and iii) storing the cleaned data into Brain Study Repository using the Store operator. The same process was applied for the three datasets corresponding to Filter 1, Filter 2 and Filter 3. 2. Classification based on all attributes and all channels The goal is to find the best binary (two-class) classifier based on all attributes and all channels. This scenario is implemented for three data sets Filter 1 (ClassAB_Data), Filter 2 (ClassABdelta_Data), Filter 3 (ClassABteta_Data). Five classifiers were compared. Figure 9 shows the summarized process of simultaneous training and testing of five classifiers with the same data set loaded by the Retrieve Data operator. The Multiply operator provides the same data to each classifier represented by the X-Validation operator. After playing the process, we obtained all values for accuracy, recall and precision from the confusion matrix. These performance measures are summarized in Table 2 for 5 classifiers, applied to 3 data sets. The highest classification rate is obtained for KNN classifier trained with CLassABdelta dataset. 33 GISAP Medical Science, PharMacology Tab. 2 Performance measures of 5 classifiers, applied to 3 data sets accuracy KNN classifier Neural Net Decision Tree SVM Naпve Baves 80.16% +/- 3.89% 70.82% +/- 5.06% 50.39% +/- 0.48% 62.26% +/- 4.47% 61/28% +/- 3.93% KNN classifier Neural Net Decision Tree SVM Naпve Baves 82.42% +/- 2.93% 70.86% +/- 4.95% 55.92% +/- 0.95% 64.29% +/- 4.66% 59.92% +/- 4.64% KNN classifier Neural Net Decision Tree SVM Naпve Baves 77.60% +/- 3.54% 66.07% +/- 4.16% 50.38% +/- 0.51% 59.64% +/- 4.15% 58.70% +/- 3.73% C. Classifier optimization by feature selection of data set with all channels Three datasets considered have between 8 and 12 features. Now we want to explore the possibility of improving the classification applying feature selection (reduction) procedures. Forward Selection and Backward Precision Predic1 Predic0 ClassAB Data(0,5-10)Hz 78 87% 81.53% 67.21% 76.08% 0.00% 50.39% 60.41% 64.71% 59.59% 63.51% ClassABdelta Data(0,5-4)Hz 79.00% 85.14% 68.30% 72.62% 0.00% 55.92% 58.71% 68.47% 53.90% 66.29% ClassABteta Data(4-8)Hz 78.46% 79 77% 65.21% 67.08% 50.38% 0.00% 58.49% 61.29% 57.29% 61.00% Elimination are among the most typical operators in RM for extracting the most influential features. KNN classifier optimization In order to illustrate the plausibility of the feature selection procedure, KNN classifier is optimized by both operators. The performance results in terms of confusion matrix and the Fig. 10. KNN classifier optimization after Forward Selection procedure Fig. 11. NN classifier optimization after Backward Elimination 3 Predic1 recall Predic0 81.96% 80.39% 0.00% 69.41% 68.24% 78 38% 61.39% 100.00% 55.21% 54.44% 80.84% 63.22% 0.00% 64.05% 62.81% 83.63% 76.87% 100.00% 64.50% 57.65% 76.55% 69.98% 100.00% 68.48% 70.73% 78.67% 62.10% 0.00% 50.67% 46.48% associated values for accuracy, precision and recall are summarized in Fig. 10. Here, Forward Selection leads to higher accuracy compared to the Backward Elimination. NN (neural network) classifier optimization The effect of feature selection procedure strongly depends on the classification algorithm. The results of NN classifier optimization are presented on Figure 11. In this case the latency 2 is the remover feature and the Backward Elimination leads to higher accuracy compared to the Forward Selection. D. Classification based on all attributes and selected channels Taking into account that Parietal and Occipital channels are responsible for visual processing in Group 1 we have isolated only Parietal channels, in Group 2 – only Occipital channels and in Group 3 – the combination of Parietal and Occipital channels from the complete dataset ABdelta data (Table 3, Table 4, Table 5). The highest classification rate was obtained for KNN classifier trained with CLassABdelta dataset. E. Ensemble classification Ensemble classification uses a combination of n-learned classifiers M1, M2, … Mn, in order to build an improved composite model [2]. In this study we applied the ensemble method called “Bagging” where the final result is obtained by the majority vote principle. GISAP Medical Science, PharMacology Tab. 3 Group 1 (data subset Parietal Channels in ABdelta data) Precision ClassABdelta_P_ Data(0,5-4)Hz accuracy KNN classifier recall Predic1 Predic0 Predic1 Predic0 87.36% +/- 11.86% 86.11% 88.46% 83.78% 90.20% Neural Net 84.31% +/- 12.33% 82.86% 84 91% 78 38% 88 24% Decision Tree 56.94% +/- 5.69% 40.00% 57.83% 5.41% 94 12% SVM 62.92% +/- 18.55% 60.00% 63.24% 32 43% 84 31% Naпve Baves 52.36% +/- 17.21% 40.74% 57.38% 29 73% 68.63% Tab. 4 Group 2 (data subset Occipital Channels in ABdelta data) Precision ClassABdelta_O_ Data(0,5-4)Hz accuracy KNN classifier recall Predic1 Predic0 Predic1 Predic0 84.67% +/- 17.46% 82.86% 87.50% 90.62% 77.78% Neural Net 71.33% +/- 25.74% 74 19% 67.86% 71 88% 70.37% Decision Tree 79.33% +/- 23.80% 79 41% 80.00% 84 38% 74.07% SVM 59.67% +/- 14.79% 58.33% 63.64% 87.50% 25.93% Naпve Baves 49.00% +/- 15.21% 52.27% 40.00% 71 88% 22 22% Tab. 5 Group 3 (data subset Occipital and Parietal Channels in ABdelta data) Precision ClassABdelta_OP_ Data(0,5-4)Hz accuracy KNN classifier recall Predic1 Predic0 Predic1 Predic0 84.71% +/- 11.13% 82.19% 87.84% 86.96% 83.33% Neural Net 82.14% +/- 9.17% 81.16% 83 33% 81.16% 83 33% Decision Tree 53.10% +/- 2.90% 0.00% 53.06% 0.00% 100.00% SVM 61.67% +/- 11.61% 60.00% 63.41% 56.52% 66.67% Naпve Baves 61.05% +/- 19.16% 58.33% 64.00% 60.87% 61.54% The sequence of nested processes aimed at implementation of ensemble Bagging classification with five classifiers is depicted on Fig. 12. In the previous study, DT and NB classifiers exhibited worse performance (see Table 2), therefore they were removed. The results are summarized on Fig. 13a) and b). The best ensemble classification (SVM, NN, KNN) does not provide significant advantages in comparison with the KNN from Table 3. Vi. conclusions In this paper a number of Machine Learning methods are studied and applied to a challenging classification problem of discriminating human emotions based on the EEG brain data. Among five classifiers, K-Nearest Neighbor (kNN) provided the best discrimination (84% accuracy) as an individual classification model. Ensemble classification (combination of SVM, NN and KNN classifiers) achieved slightly better results (85%). This study has shown that the preprocessing step on the row data collected from the EEG machine is crucial for extracting discriminative patterns. First, we need to choose the frequency band that adequately reflects 3 GISAP Medical Science, PharMacology Fig. 12. Implementation of ensemble classification with Bagging operator a) DT, NB, SVM, NN and KNN b) SVM, NN and KNN Fig. 13. Ensemble classifications with the affective brain states. Our conclusion is that apparently [0.5-4] Hz is the most corresponding band. Then, the most suitable features need to be identified. The classification is clearly affected by the choice of temporal (selected amplitudes and latencies) and spatial (selected channels) features. Last but not least, the classification algorithm is also important for the emotional patterns recognition. In case if no individual classifier is satisfactory, a mixture of weak classifiers (termed ensemble 3 classification) may be a reasonably good alternative. All required data analysis steps fit nicely to the modular structure of RM software platform and make the pipeline of procedures quite clear and intuitive. We are confident that the methodology studied in this paper can be easily adapted to other problems in bioinformatics or other private and public sectors, such as banking, insurance and medicine. acknowledgement The research, described in this paper, was carried out within the framework of the contract № IUNF 13000. All experiments are conducted during the Erasmus study period of the third author in the University of Aveiro (UA) under the supervision of the other two authors. The Erasmus scholarship provided by Technical University of Sofia and the excellent working conditions in UA are highly acknowledged. We would like to express gratitude to the PsyLab from UA, and particularly to Dr. Isabel Santos, for providing the data set. GISAP Medical Science, PharMacology references: 1 Han, J , M Kamber, Data Mining Concepts and Technique, University of Illinois at Urbana – Champaign , San francisco , CA94111, Second Edition, 2006 2 Bramer, M , principles of Data Mining, Springer – London , 2007 3 pang-Ning T , M Steinback, V Kumar, Introduction to Data Mining, 2003 4 RapidMiner 5 software platform Website: http://rapid-i com/content/ view/181/190/ 5 frantzidis, C , et al , “On the classification of emotional biosignals evoked while viewing affective pictures: An integrated Data-Mining-Based approach for Healthcare applications 6 Shoikova E , A peshev, M Krumova, ”ePortfolio – Identity аnd Professional Development”, ICEIRD 2011, 5-7 May 2011 - Ohrid, Macedonia, proceedings CD, pp 1061- 068 7 Nikolova, I , W Jahn, Die Еinführung und Кonkurenzvorteile des TQM, Wissenschaftliche konferenz Innovationen und wettbewerbsfähigkeit, Karlsruher Institut für Technologie, TU Sofia, TUBraunschweig, FOM Essen. 8. Tamošiūnienė, R., K. Angelov, project and programme Management and Evaluation, С., 2011. information about authors: 1 Roumiana Ilieva - ph D , Associate professor in Automated Systems for Data processing and Management Technical University of Sofia; address: Bulgaria, Sofia city; e-mail: rilieva@tu-sofia.bg 2 petia Georgieva - ph D , Head of Signal processing Lab (SpL), IEETA, University of Aveiro; address: Portugal, Aveiro city; e-mail: petia@ua.pt 3 Stanislava petrova - MEng Student in Electronic Management, ELFE, Technical University of Sofia, address: Bulgaria, Sofia city; e-mail: stanislava_asparuhova@abv bg 37