Comparative Analysis of Classification Algorithms for Email Spam Detection

Shafi&#39;i Muhammad  ABDULHAMID

Comparative Analysis of Classification Algorithms for Email Spam Detection

I. J. Computer Network and Information Security, 2018

The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Naï ve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naï ve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, F-Measure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms' performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result....Read more

I. J. Computer Network and Information Security, 2018, 1, 60-67 Published Online January 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2018.01.07 Copyright © 2018 MECS I.J. Computer Network and Information Security, 2018, 1, 60-67 Comparative Analysis of Classification Algorithms for Email Spam Detection Shafi’i Muhammad Abdulhamid, Maryam Shuaib, Oluwafemi Osho Department of Cyber Security, Federal University of Technology, Minna, Nigeria. E-mail: shafii.abdulhamid@futminna.edu.ng, maryambobi@gmail.com, femi.osho@futminna.edu.ng Idris Ismaila and John K. Alhassan Department of Cyber Security, Federal University of Technology, Minna, Nigeria E-mail: ismi.idris@futminna.edu.ng and jkalhassan@futminna.edu.ng Received: 24 June 2017; Accepted: 10 August 2017; Published: 08 January 2018 Abstract—The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Naïve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, F- Measure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms’ performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result. Index Terms—Email spam, classification algorithms, Bayesian Logistic Regression, Hidden Naïve Bayes, Rotation Forest. I. INTRODUCTION Email is a means of information transfer from any part of the world that is extremely fast and cost effective and can be used from personal computers, smartphones, and other last-generation electronic gadgets. [1], [2]. Despite the increase in usage of other forms of online communication such as instant messaging and social networking, emails have continued to take the lead in business communications and still serves as a requirement for other forms of communications and e- transactions. Emails are used by almost all humans. It is estimated that by the end of 2016, there will be over 2.6 billion email account holders worldwide and it is estimated that nearly half of the world population will be using emails by the end of 2020 [3]. The increase in the popularity and use of emails for transactions has led to a rise in the amount of spam emails globally. Spam emails also called junk emails are unsolicited messages that is non-requested and are almost identical sent to multiple recipients via emails. The sender of spam mails has no prior relationship with the receivers but gathers the addresses from different sources such as phone books and filled forms. Spam messages are fast growing to be one of the most serious threats to users of E-mail messages because it is a major means of sending threats, including viruses, worms and phishing attacks [4], [5],[6], [7]. Recently, data mining has drawn attention in the knowledge and information industry because of the immense accessibility of big data and the forthcoming need for converting such data into useful information and knowledge. According to [8], Data mining as an emergent field that requires extracting implicit, previously not known, and potentially helpful information from data is being explored and used as a means of building software that automatically sieves through databases in search of regularities or patterns. Strong patterns identified, are likely to be used to generalize and give accurate predictions. According to [9], classification or prediction tasks which are supervised methods that seek to discover the hidden associations between the target class and the independent variables are popularly used in data mining. For supervised learning, classifiers allow tags to be attributed to the observations, so that data not observed can be categorized based on the training data. Spam detection systems are built with the use of classification algorithms to group the emails as spam or non- spam[10],[11]

Comparative Analysis of Classification Algorithms for Email Spam Detection 61 Copyright © 2018 MECS I.J. Computer Network and Information Security, 2018, 1, 60-67 The aim of the paper is to evaluate the performance of classification algorithms that are used for grouping emails as spam or not spam including Bayesian Logistic Regression, Hidden Naïve Bayes, RBF Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, J48, Multilayer Perceptron and Random Tree. The remainder of the paper are organized as follows: section II presents related literatures in Comparative analysis of classification algorithms in the field of email spam detection and filtering. Section III shows the materials and methods employed in the research. Section IV chronicles the results obtained in the analysis of the classification algorithms and section V describes the conclusion and future recommendations. II. RELATED WORKS The rise in the number of email users has made the task of handling large volumes of email challenging for data mining and machine learning due to the rise in spam emails during the previous years. This has led a number of researchers to carryout comparative studies on the performance of classification algorithms in correctly classifying emails using a combination of performance metrics. It is therefore, necessary to determine which algorithm performs best for any chosen metric to assist in proper classification of emails as spam or non-spam is vital. Many works have been carried out to compare the performances of some classification algorithms in grouping emails. Classification algorithms whose performances have been so far compared include Naïve Bayes[1], [12]–[17], other algorithms compared include C-PLS, ANN, C-RT, CS-CRT, CS-MC4, CS-SVC, Continouns PLS-DA, PLS-LDA, LDA[1], Bayesnet[4], [12], [13], Multilayer perceptron [1], [15], SVM [1], [4], [12]–[14], [16], [17]. Table 1 shows the summary of algorithms used in previous comparative research. Particle Swarm Optimization and Artificial Neural Network were combined for feature selection and Support Vector Machine was used to classify and separate spam by[18]. Their method was compared with other methods such as data classification Self Organizing Map and K- Means based on criteria Area under Curve. The results indicate that the Area under Curve (AUC used as benchmark for performance evaluation) in the proposed method is better than other methods. [19]in their paper titled Spam Mail Detection using Classification carried out an experiment on many data mining techniques to the dataset of spam in an attempt to search the most suitable classifier to email classification as spam and non-spam. they checked the performance of many classifiers with the use of feature selection algorithm and found out that in the result analysis part the Naïve Bayes classifier provides finer accuracy of 76% with respect to other two classifiers such as support vector machine and J48 and also that time taken for Naïve Bayes classifier is lesser than other two classifiers which means that Naïve Bayes classifier is the best classifier among the other two classifier which are used for classifying the spam mails. A lot of conventional anti-spam techniques for evading spam such as Bayesian based sort, rule based system, IP blacklist, Heuristic based filter, White list and DNS black holes were identified by [20]. They used RBF, a neural network technique in which neurons were trained. The proposed mechanism improves the accuracy, precision, recall Frr and Far. The proposed mechanism is compared with SVM and the results were comparatively better. [12] in their paper Spam Mail Detection through Data Mining – A Comparative Performance Analysis, analyzed various data mining approach to spam dataset in order to find out the best classifier for email classification. In this paper they analyzed the performance of various classifiers with feature selection algorithm and without feature selection algorithm. The Best-First feature selection algorithm was applied in order to select the desired features and then apply various classifiers for classification. They found that results are improved in terms of accuracy when feature selection process is embedded in the experiment and also found Random Tree to be the best classifier for spam mail classification with accuracy = 99.72%. Still none of the algorithm achieves 100% accuracy in classifying spam emails but Random Tree is very nearby to that. [21] paper on Content-Based Spam Filtering and Detection Algorithms- an Efficient Analysis & Comparison focused on Spam as one of the major problems faced by the Internet community. The content of each item is represented as a set of descriptors or terms. The terms are typically, the words that occur in a document. User profiles are represented with the same terms and built up by analyzing the content of items seen by the user. Their research paper mainly contributes to the comprehensive study of spam detection algorithms under the category of content based filtering. Then, the implemented results were benchmarked to examine how accurately they have been classified into their original categories of spam. The efficient technique among the discussed techniques is chosen as Bayesian method to create a spam filter. [1] paper on Comparative Study on Email Spam Classifier using Data used spam data set analyzing with the use of TANAGRA data mining tool to explore the efficient classifier for email spam classification. Initially, feature construction and feature selection is done to extract the relevant features. Then various classification algorithms are applied over this dataset and cross validation is done for each of these classifiers. Finally, The Rnd tree classifier for email spam is identified as the best based on the error rate, precision and recall.

I. J. Computer Network and Information Security, 2018, 1, 60-67 Published Online January 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2018.01.07 Comparative Analysis of Classification Algorithms for Email Spam Detection Shafi’i Muhammad Abdulhamid, Maryam Shuaib, Oluwafemi Osho Department of Cyber Security, Federal University of Technology, Minna, Nigeria. E-mail: shafii.abdulhamid@futminna.edu.ng, maryambobi@gmail.com, femi.osho@futminna.edu.ng Idris Ismaila and John K. Alhassan Department of Cyber Security, Federal University of Technology, Minna, Nigeria E-mail: ismi.idris@futminna.edu.ng and jkalhassan@futminna.edu.ng Received: 24 June 2017; Accepted: 10 August 2017; Published: 08 January 2018 Abstract—The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Naïve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, FMeasure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms’ performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result. Index Terms—Email spam, classification algorithms, Bayesian Logistic Regression, Hidden Naïve Bayes, Rotation Forest. I. INTRODUCTION Email is a means of information transfer from any part of the world that is extremely fast and cost effective and can be used from personal computers, smartphones, and other last-generation electronic gadgets. [1], [2]. Despite the increase in usage of other forms of online communication such as instant messaging and social networking, emails have continued to take the lead in Copyright © 2018 MECS business communications and still serves as a requirement for other forms of communications and etransactions. Emails are used by almost all humans. It is estimated that by the end of 2016, there will be over 2.6 billion email account holders worldwide and it is estimated that nearly half of the world population will be using emails by the end of 2020 [3]. The increase in the popularity and use of emails for transactions has led to a rise in the amount of spam emails globally. Spam emails also called junk emails are unsolicited messages that is non-requested and are almost identical sent to multiple recipients via emails. The sender of spam mails has no prior relationship with the receivers but gathers the addresses from different sources such as phone books and filled forms. Spam messages are fast growing to be one of the most serious threats to users of E-mail messages because it is a major means of sending threats, including viruses, worms and phishing attacks [4], [5],[6], [7]. Recently, data mining has drawn attention in the knowledge and information industry because of the immense accessibility of big data and the forthcoming need for converting such data into useful information and knowledge. According to [8], Data mining as an emergent field that requires extracting implicit, previously not known, and potentially helpful information from data is being explored and used as a means of building software that automatically sieves through databases in search of regularities or patterns. Strong patterns identified, are likely to be used to generalize and give accurate predictions. According to [9], classification or prediction tasks which are supervised methods that seek to discover the hidden associations between the target class and the independent variables are popularly used in data mining. For supervised learning, classifiers allow tags to be attributed to the observations, so that data not observed can be categorized based on the training data. Spam detection systems are built with the use of classification algorithms to group the emails as spam or nonspam[10],[11] I.J. Computer Network and Information Security, 2018, 1, 60-67 Comparative Analysis of Classification Algorithms for Email Spam Detection The aim of the paper is to evaluate the performance of classification algorithms that are used for grouping emails as spam or not spam including Bayesian Logistic Regression, Hidden Naïve Bayes, RBF Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, J48, Multilayer Perceptron and Random Tree. The remainder of the paper are organized as follows: section II presents related literatures in Comparative analysis of classification algorithms in the field of email spam detection and filtering. Section III shows the materials and methods employed in the research. Section IV chronicles the results obtained in the analysis of the classification algorithms and section V describes the conclusion and future recommendations. II. RELATED WORKS The rise in the number of email users has made the task of handling large volumes of email challenging for data mining and machine learning due to the rise in spam emails during the previous years. This has led a number of researchers to carryout comparative studies on the performance of classification algorithms in correctly classifying emails using a combination of performance metrics. It is therefore, necessary to determine which algorithm performs best for any chosen metric to assist in proper classification of emails as spam or non-spam is vital. Many works have been carried out to compare the performances of some classification algorithms in grouping emails. Classification algorithms whose performances have been so far compared include Naïve Bayes[1], [12]–[17], other algorithms compared include C-PLS, ANN, C-RT, CS-CRT, CS-MC4, CS-SVC, Continouns PLS-DA, PLS-LDA, LDA[1], Bayesnet[4], [12], [13], Multilayer perceptron [1], [15], SVM [1], [4], [12]–[14], [16], [17]. Table 1 shows the summary of algorithms used in previous comparative research. Particle Swarm Optimization and Artificial Neural Network were combined for feature selection and Support Vector Machine was used to classify and separate spam by[18]. Their method was compared with other methods such as data classification Self Organizing Map and KMeans based on criteria Area under Curve. The results indicate that the Area under Curve (AUC used as benchmark for performance evaluation) in the proposed method is better than other methods. [19]in their paper titled Spam Mail Detection using Classification carried out an experiment on many data mining techniques to the dataset of spam in an attempt to search the most suitable classifier to email classification as spam and non-spam. they checked the performance of many classifiers with the use of feature selection algorithm and found out that in the result analysis part the Naïve Bayes classifier provides finer accuracy of 76% Copyright © 2018 MECS 61 with respect to other two classifiers such as support vector machine and J48 and also that time taken for Naïve Bayes classifier is lesser than other two classifiers which means that Naïve Bayes classifier is the best classifier among the other two classifier which are used for classifying the spam mails. A lot of conventional anti-spam techniques for evading spam such as Bayesian based sort, rule based system, IP blacklist, Heuristic based filter, White list and DNS black holes were identified by [20]. They used RBF, a neural network technique in which neurons were trained. The proposed mechanism improves the accuracy, precision, recall Frr and Far. The proposed mechanism is compared with SVM and the results were comparatively better. [12] in their paper Spam Mail Detection through Data Mining – A Comparative Performance Analysis, analyzed various data mining approach to spam dataset in order to find out the best classifier for email classification. In this paper they analyzed the performance of various classifiers with feature selection algorithm and without feature selection algorithm. The Best-First feature selection algorithm was applied in order to select the desired features and then apply various classifiers for classification. They found that results are improved in terms of accuracy when feature selection process is embedded in the experiment and also found Random Tree to be the best classifier for spam mail classification with accuracy = 99.72%. Still none of the algorithm achieves 100% accuracy in classifying spam emails but Random Tree is very nearby to that. [21] paper on Content-Based Spam Filtering and Detection Algorithms- an Efficient Analysis & Comparison focused on Spam as one of the major problems faced by the Internet community. The content of each item is represented as a set of descriptors or terms. The terms are typically, the words that occur in a document. User profiles are represented with the same terms and built up by analyzing the content of items seen by the user. Their research paper mainly contributes to the comprehensive study of spam detection algorithms under the category of content based filtering. Then, the implemented results were benchmarked to examine how accurately they have been classified into their original categories of spam. The efficient technique among the discussed techniques is chosen as Bayesian method to create a spam filter. [1] paper on Comparative Study on Email Spam Classifier using Data used spam data set analyzing with the use of TANAGRA data mining tool to explore the efficient classifier for email spam classification. Initially, feature construction and feature selection is done to extract the relevant features. Then various classification algorithms are applied over this dataset and cross validation is done for each of these classifiers. Finally, The Rnd tree classifier for email spam is identified as the best based on the error rate, precision and recall. I.J. Computer Network and Information Security, 2018, 1, 60-67 62 Comparative Analysis of Classification Algorithms for Email Spam Detection Table 1. Summary of Relevant Algorithms Compared in Related Research Works Table 2. Summary of relevant Performance Metrics used for Comparison in Related Research Work Copyright © 2018 MECS I.J. Computer Network and Information Security, 2018, 1, 60-67 Comparative Analysis of Classification Algorithms for Email Spam Detection [14] Looks at Machine Learning Methods for Spam EMail Classification. The authors reviewed some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs, Artificial immune system and Rough sets) and of their applicability to the problem of spam Email classification. Descriptions of the algorithms were presented, and the comparison of their performance on the Spam Assassin spam corpus was presented. The researchers employed the use of a combination of some performance metrics including Correctly Classified Instances, Kappa Statistics, Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error [12]. Other performance metrics used are TP Rate, FP Rate, Precision, Recall, F-Measure and ROC [4], [13]. A few researchers also considered the time taken to load models in determining the performance of the algorithms [15], [22]. Table 2 shows performance metrics employed by previous research works. Spam classifiers are built and tested on publicly available datasets for evaluation. For example Naïve Bayes, Bayesnet, SMO/SVM, ID3, FT, J48, Random Forest, Random Tree, C-PLS, C-RT, CS-CRT, CS-MC4, CS-SVC, Continuous PLS-DA and PLS-LDA is used on the Spambase dataset from UCI Library [1], [12], [23]. In some research works, two or more datasets are used for comparative analysis [16], [22]. The datasets are made publicly available and normally contain proper ham or spam ratio. There are still a number ofclassificaion algorithms that are yet to be compared in terms of their performance and accuracy in email spam classification including Spegasos, voted perceptron, IB1, MIWrapper, LWL, CitationKNN, AdaBoostM1, HyperPipes, Dagging, Deecorate, END, FilteredClassifier, Grading, LogitBoost, MetaCost,MultiBoostAB, DecisionTable, Multi Scheme, Ordinal Class Classifier, Raced Incremental, Logit Boost, RandomCommittee, RandomSubSpace, MIBoost, MISMO, IBK, kstarSimpleMI, Bagging,VFI, ConjuctiveRule, Multi Class Classifier,DTNB, Jrip, Nnge, OneR, PART, Ridor, ZeroR. III. MATERIALS AND METHODS In carrying out this research three steps were involved: Dataset Preparation, Pre-Processing and Application of various machine learning classifiers and evaluating the performance of machine learning classifiers. A. Dataset Preparation, Pre-Processing and Algorithm Application The Spambase dataset gotten from the UCI Machine Learning Repository was used. The dataset has 57 attributes of different variable types in 4601 instances. The Spambase dataset is converted into .arff format (a format compatible for machine learning) supported by the WEKA tool for input data that was used for the analysis. To adequately classify the Spambase dataset, Bayesian Logistic Regression, Hidden Naïve Bayes, RBFNetwork, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Copyright © 2018 MECS 63 Rotation Forest, NNge, Logistic Model Tree, REPTree, Naïve Bayes, J48, Multilayer Perceptron and Random Tree were used and a 10 folds cross validation was used in this research. The choice of 10 folds was due to results obtained from broad tests on various datasets, with varying learning procedures, that have demonstrated that 10 is about the correct number of folds to get the best gauge of error [8]. For cross-validation, a specified number of folds is chosen, the data is partitioned arbitrarily into 10 parts in which the class is represented in approximately the same proportions as in the full dataset. Each partition is held out in turn and the learning scheme trained on the remaining nine-tenths; then its error rate is processed on the holdout set. Hence, the learning procedure is carried out a total of 10 times on various training sets (each of which have a lot in common). Finally, the averages of the 10 error estimates are taken to give an overall error estimate. For Comparative reasons, the dataset was also run using percentage split which allows you to take out a certain percentage of the data for testing, 66% split was employed for this research work. IV. RESULTS The entire dataset was used for the experiment with 10 folds cross validation and 66% split. The comparison of performance in terms of Accuracy, Precision, Recall, FMeasure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error is summarised here. A. Accuracy The Accuracy is used to show the level of correct predictions. It does not consider positives or negatives independently and thus other measures for performance analysis aside from the accuracy are also used. The value 1 is the largest indicating highest accuracy, in this research work, the highest Accuracy is 0.942 gotten when the 10-folds cross validation was applied on Rotation Forest algorithm and the lowest was 0.891 gotten when 66% split was used with the REPTree algorithm. Fig 1 and Table 4 shows the Accuracy B. Precision, Recall and F-Measure Precision is the fraction of relevant recollected instances, while recall is the fraction of relevant instances that are recollected. Precision and recall depend on an understanding and measure of relevance. When discussing, precision and recall scores, either values for one measure are likened for a specific level at the other measure or both are combined as a single measure. In this research the F-measure is used. A high F-measure is required since both precision and recall are desired to be high and Rotation forest has the highest F-measure of 0.942 the charts are presented in Table 4 and Fig 2 to Fig 4. I.J. Computer Network and Information Security, 2018, 1, 60-67 64 Comparative Analysis of Classification Algorithms for Email Spam Detection selected positive instance higher than a randomly selected negative instance. Fig 5 shows the areas under ROC curves of classifiers used in this research with Rotation forest having the highest with 0.98 and Random Tree having the lowest with 0.905 Fig.1. Comparison of Accuracy Fig.5. Comparison of ROC Area D. Kappa Statistics The Kappa characteristic gives the level of agreements between the true classes and the classifications. The value 1 is the highest showing total agreement, in this study, the highest kappa characteristics is 0.879 which was gotten when the test was carried out on Rotation Forest with 10 folds cross validation. Table 4 and Fig 6 shows the respective kappa characteristics. Fig.2. Comparison of Precision Fig.3. Comparison of Recall Fig.6. Kappa Statistics E. According to root mean square error a low value is an indication of an excellent classifier. A low value for the root mean square error was recorded for Rotation Forest using 10-folds cross validation with 0.216. Fig 7 and Table 4 shows the Root Mean Squared Error. Fig.4. Comparison of the F-Measure F. C. ROC Area The ROC (AUC) Area of a classifier/algorithm is equal to the probability of the classifier ranking a randomly Copyright © 2018 MECS Root Mean Squared Error Root Relative Squared Error The relative squared error normalizes the total squared error by dividing it by the total squared error of the simple predictor. The error is reduced to the same I.J. Computer Network and Information Security, 2018, 1, 60-67 Comparative Analysis of Classification Algorithms for Email Spam Detection 65 dimension as the quality being predicted by taking the square root of the relative squared error. Fig 8 and Table 4 gives the respective values of the Root Relative Squared Error. Fig.8. Root Relative Square Error Fig.7. Root Mean Squared Error Table 3. Results of Accuracy, Precision, Recall, F-Measure, ROC Area, Kappa Statistic, RMSE and RRSE V. CONCLUSION AND RECOMMENDATIONS This research work was driven by the increasing rate of spam emails across the globe and the knowledge from literature review of the availability of classification algorithms that have not been compared in terms of their performance on email datasets. From the experiment and results obtained from running fourteen different classification algorithms (including commonly used algorithms) using two test options it has been established that some uncommon algorithms perform relatively well on the Spambase dataset our training and testing dataset Copyright © 2018 MECS on WEKA, the testing environment with Rotation Forest emerging as the best classifier. The results obtained shows that even with less feature selection employed, the Rotation Forest classification algorithm with 0.942 performs relatively well in email classification, even better than some commonly used classification algorithms including J48 which records 0.923 accuracy, Naïve Bayes with 0.885 and Multilayer Perceptron with 0.932. We recommend that the results obtained be compared with more spam datasets from different sources and using different Machine Learning tools. Also, more I.J. Computer Network and Information Security, 2018, 1, 60-67 66 Comparative Analysis of Classification Algorithms for Email Spam Detection classification algorithms should be analysed with email spam datasets. ACKNOWLEDGEMENTS The authors would like to acknowledge and appreciate the Department of Cyber Security, Federal University of Technology, Minna, Nigeria for their support. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] R.. Kumar, G. Pookuzhali, and P. Sudhakar, “Comparative Study on Email Spam Classifier using Data Mining Techniques,” 2012, vol. I. J. M. Carmona-cejudo, G. Castillo, M. Baena-garcía, and R. Morales-bueno, “Knowledge-Based Systems A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework,” vol. 46, pp. 81–94, 2013. R. Group, “Email Statistics Report , 2016-2020,” vol. 44, no. 0, pp. 0–3, 2016. A. Sharaff, N. . Nagwani, and A. Dhadse, “Comparative Study of Classification Algorithms for Spam Email Detection,” Springer, no. January, 2016. R. M. Alguliev, R. M. Aliguliyev, and S. A. Nazirova, “Classification of Textual E-Mail Spam Using Data Mining Techniques,” Appl. Comput. Intell. Soft Comput., vol. 2011, pp. 1–8, 2011. A. F. Yasin, “Spam Reduction by using E-mail History and Authentication (SREHA),” Int. J. Inf. Technol. Comput. Sci., vol. Vol.8, no. No.7, p. pp.17-22, 2016. M. Iqbal, M. A. Malik, A. Mushtaq, and K. Faisal, “Study on the Effectiveness of Spam Detection Technologies,” Int. J. Inf. Technol. Comput. Sci., vol. Vol.8, no. 1, pp. 11–21, 2016. I. H. Witten and F. Eibe, Data mining : practical machine learning tools and techniques, 2nd ed. Morgan Kaufmann Publishers, 2005. O. Maimon and L. Rokach, The data mining and knowledge discovery handbook, 2nd ed. Springer, 2010. S. M. Abdulhamid et al., “A Review on Mobile SMS Spam Filtering Techniques,” IEEE Access, 2017. Adebayo, O. S., D. O. Ugiomoh, and M. D. AbdulMalik, “The Design and Development of Real-Time E-Voting System in Nigeria with Emphasis on Security and Result Veracity.,” Int. J. Comput. Netw. Inf. Secur., vol. 5, no. 5, p. 9, 2013. M. Rathi and V. Pareek, “Spam Mail Detection through Data Mining – A Comparative Performance Analysis,” Int. J. Mod. Educ. Comput. Sci., vol. 5, no. December, pp. 31– 39, 2013. P. Panigrahi, “A comparative study of supervised machine learning techniques for spam E-mail filtering,” in Proceedings - 4th International Conference on Computational Intelligence and Communication Networks, CICN 2012, 2012, pp. 506–512. W. . Awad and S. . Elseuofi, “Machine Learning Methods for Spam E- mail Classification,” vol. 3, no. 1, pp. 173– 184, 2011. D. . Renuka, T. Hamsapriya, M. . Chakkaravarthi, and P. . Surya, “Spam Classification based on Supervised Learning using Machine Learning Techniques,” in Process Automation, Control and Computing (PACC), 2011, pp. 1–7. B. Yu and Z. Xu, “A comparative study for content-based dynamic spam classification using four machine learning algorithms,” Knowledge-Based Syst., vol. 21, no. 14, pp. Copyright © 2018 MECS 355–362, 2008. [17] S. Youn and D. Mcleod, “A Comparative Study for Email Classification,” Adv. Innov. Syst. Comput. Sci. Softw. Eng., pp. 387–391, 2007. [18] M. Zavvar, M. Rezaei, and S. Garavand, “Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine,” Int. J. Mod. Educ. Comput. Sci., vol. 7, no. July, pp. 68–74, 2016. [19] P. Parveen and P. G. Halse, “Spam Mail Detection using Classification,” vol. 5, no. 6, pp. 347–349, 2016. [20] R. Sharma and G. Kaur, “E-Mail Spam Detection Using SVM and RBF,” no. April, pp. 57–63, 2016. [21] R. Malarvizhi and K. Saraswathi, “Content-Based Spam Filtering and Detection Algorithms - An Efficient Analysis & Comparison,” Int. J. Eng. Trends Technol., vol. 4, no. 9, pp. 4237–4242, 2013. [22] P. Ozarkar and M. Patwardhan, “Efficient Spam Classification by Appropriate Feature Selection,” vol. 13, no. 5, 2013. [23] A. K. Sharma and S. Sahni, “A Comparative Study of Classification Algorithms for Spam Email Data Analysis,” Int. J. Comput. Sci. Eng., vol. 3, no. 5, pp. 1890–1895, 2011. Authors’ Profiles Shafi’i Muhammad Abdulhamid received his PhD in Computer Science from Universiti Teknologi Malaysia (UTM), MSc in Computer Science from Bayero University Kano (BUK), Nigeria and a Bachelor of Technology in Mathematics/Computer Science from the Federal University of Technology Minna, Nigeria. His current research interests are in Cyber Security, Cloud computing, Soft Computing and BigData. He has published many academic papers in reputable International journals, conference proceedings and book chapters. He has been appointed as an Editorial board member for UPI JCSIT and IJTRD. He has also been appointed as a reviewer of several ISI and Scopus indexed International journals such as JNCA Elsevier, ASOC Elsevier, EIJ Elsevier, JKSU-CIS Elsevier, NCAA Springer, BJST Springer, IJNS, IJST, IJCT, JITE:Research, JITE:IIP, JAIT, IJAER and JCEIT SciTechnol. He is a member of IEEE, International Association of Computer Science and Information Technology (IACSIT), Computer Professionals Registration Council of Nigeria (CPN), International Association of Engineers (IAENG), The Internet Society (ISOC), Cyber Security Experts Association of Nigeria (CSEAN) and Nigerian Computer Society (NCS). Presently he is a lecturer at the Department of Cyber Security Science, Federal University of Technology Minna, Nigeria. Maryam Shuaib is a Postgraduate student in the Department of Cyber Security Science, Federal University of Technology, Minna, Nigeria. She has a B.Tech. degree in Mathematics with Computer Science. She was the Special Assistant to the Governor of Niger State, Nigeria on ICT Development between 2012-2015. Her research interests include cybersecurity, IoT and Database Security. Maryam is an Oracle Database 11g I.J. Computer Network and Information Security, 2018, 1, 60-67 Comparative Analysis of Classification Algorithms for Email Spam Detection Certified Associate and a member of the Association of Nigerian Authors. Oluwafemi Osho is currently a lecturer in the Department of Cyber Security Science, Federal University of Technology, Minna, Nigeria. He holds an M.Tech. degree in Mathematics, and a B.Tech. degree in Mathematics/Computer Science. Before joining the institution, he served as Head of the IT Department of one of the leading mortgage banks in Nigeria. His current research interests include cybersecurity, mobile security, and security analysis. Oluwafemi is a Certified Ethical Hacker (CEH), and a member of the Cyber Security Experts Association of Nigeria (CSEAN), and a host of other professional associations. Dr. Ismaila Idris is with the Deparment of Cyber Security Science. He obtain his Bachelor degree with Federal University of Technology, Minna. M.Sc. with university of Ilorin and PhD degree with University of Teknologi Malaysia. His research interest are Information Security, Data Mining, Machine Learning, Evolutionary Algorithm. 67 Dr. J. K. Alhassan was born at Ganmu-Alhaeri, in Kwara State, Nigeria on 9th January, 1974 and obtained Bachelor of Technology in Mathematics/Computer Science, at Federal University of Technology, Minna, Niger State, Nigeria in 2000. Then Master of Science in Computer Science, at University of Ibadan, Nigeria in 2006, and Doctor of Philosophy in Computer Science, at Federal University of Technology, Minna, Niger State, Nigeria in 2014. The major field of study is computer science. He carried out part of his PhD research at United Institute of Informatics Problems, National Academy of Sciences of Belarus (UIIP NASB) Minsk, Republic of Belarus. He is currently the Ag. Head, at the Department of Cyber Security Science, Federal University of Technology, Minna, Niger State, Nigeria. He has published twelve journal articles and four conference proceedings. His research interest includes Artificial Intelligence, Data Mining, Internet Technology, Database Management System, Software Architecture, Machine Learning, Human Computer Interaction and Computer Security. Dr. Alhassan is a member of Computer Professionals Registration Council of Nigeria (CPN). How to cite this paper: Shafi’i Muhammad Abdulhamid, Maryam Shuaib, Oluwafemi Osho, Idris Ismaila, John K. Alhassan,"Comparative Analysis of Classification Algorithms for Email Spam Detection", International Journal of Computer Network and Information Security(IJCNIS), Vol.10, No.1, pp.60-67, 2018.DOI: 10.5815/ijcnis.2018.01.07 Copyright © 2018 MECS View publication stats I.J. Computer Network and Information Security, 2018, 1, 60-67

Como calcular a Classe de Armadura (CA) de uma criatura? O capítulo 1 do Livro do Jogador (pag. 14) descreve como calcular a CA, no entanto o cálculo da mesma gera dúvidas frequentes. Este não é um fato surpreendente, dada as diversas formas que o jogo fornece para você alterar a sua CA! Aqui estão algumas formas de calcular a sua CA básica: – Sem Armadura: 10 + o seu modificador de Destreza. – Com Armadura: Utilize o valor fornecido de CA para a armadura que você está utilizando (Livro do Jogador, pag. 145). Por exemplo, com um corselete de couro, você calcula a sua CA como 11 + o seu modificador de Destreza, e com uma cota de malha, a sua CA é, simplesmente, 16. – Defesa Sem Armadura (Bárbaro): 10 + o seu modificador de Destreza + o seu modificador de Constituição. – Defesa Sem Armadura (Monge): 10 + o seu modificador de Destreza + o seu modificador de Sabedoria. – Resiliência Draconata (Feiticeiro): 13 + o seu modificador de Destreza. – Armadura Natural: 10 + o seu modificador de Destreza + o seu bônus de armadura natural. Este é um modo de cálculo tipicamente usado somente para monstros e PdMs, embora seja relevante para um druida ou outro classe que possa assumir alguma outra forma que tenha uma armadura natural. Estes métodos – junto com quaisquer outros que possuam uma fórmula para calcular a sua CA – são mutuamente exclusivos; você só pode se beneficiar de um por vez. Se você tem acesso a mais de um, você deve escolher qual irá utilizar. Por exemplo, se você é um feiticeiro/monge, você pode usar Defesa Sem Armadura ou Resiliência Draconata, mas não os dois. De forma semelhante, um druida/bárbaro que se transforme em uma criatura com armadura natural, pode usar a armadura natural da mesma ou a Defesa Sem Armadura (você não é considerado como se estivesse usando armadura com a armadura natural da criatura). E o escudo? O escudo aumenta a sua CA em 2 enquanto você o estiver usando. Por exemplo, se você estiver sem armadura e usando um escudo, sua CA é 12 + o seu modificador de Destreza. Vale a pena lembrar que algumas formas de calcular a CA, como a Defesa Sem Armadura do monge, proíbem o uso de escudos. Quando você tiver a sua CA básica, ela pode ser temporariamente modificada por bônus ou penalidade situacionais. Por exemplo, tendo meia cobertura lhe fornece um bônus de + 2 para a sua CA, e cobertura superior um bônus de + 5. Magias também podem modificar a sua CA. Escudo da fé, por exemplo, fornece a uma criatura alvo um bônus de + 2 a sua CA até o término do magia. Itens mágicos também podem melhorar a sua CA. Aqui estão alguns exemplos: cota de malha +1 lhe fornece uma CA de 17, um anel de proteção fornece um bônus de + 1 para a CA, independentemente do que você estiver vestindo, e braçadeiras de defesa fornecem um bônus de + 2 para a CA se você não estiver usando armadura ou um escudo. TRAÇOS RACIAIS O traço Transe permite que um elfo termine um descanso longo em 4 horas? A intenção é que não permita. O traço racial Transe permite que um elfo medite por 4 horas e se sinta do jeito que um humano se sente após descansar por 8 horas, mas a intenção não foi encurtar o descanso longo de um elfo. O descanso longo é um período de relaxamento que dura ao menos 8 horas. Ele pode conter sono, leitura, conversa, alimentação e outras atividades de descanso. Ficar vigiando até é possível durante o descanso longo, mas não mais do que 2 horas; manter vigilância concentrada durante mais tempo do que isso não é revigorante. Resumindo, descanso longo e dormir não são a mesma coisa; você pode dormir enquanto não está no descanso longo e você pode ter um descanso longo sem dormir. Aqui vai o que isso tudo significa para um elfo. Um elfo pode gastar 4 horas em um transe durante o descanso longo e outras 4 horas adicionais de atividade leve. Enquanto os companheiros de um elfo estão numa soneca, o elfo pode estar acordado e engajado em uma porção de atividades, como a confecção de uma bijuteria adorável, composição de um soneto, leitura de um tomo com conhecimentos antigos ou tentando lembrar de algo vivido séculos atrás, enquanto fica atento ao

طوطی از پرندگان شناخته‌شدۀ ادب فارسی است و ویژگی‌های گوناگونش دستمایۀ مضمون‌پردازی‌ها و سخن‌سازی‌های بسیاری بوده است. بررسی ریزبینانۀ آثار ادبی نشان می‌دهد سرایندگان فارسی‌گو که در نواحی جغرافیایی گوناگون می‌زیسته‌اند، به خوبی با ویژگی‌های طبیعی و زیستی طوطی که بومی ایران نبوده، آشنایی داشته‌اند و در ساخت تشبیهات و تصاویر و شکل دادن به شبکۀ تداعی‌های بلاغی آثار خود از آن‌ها بهره برده‌اند. در این مقاله برآنیم با بررسی آثار منظوم ادبی نشان دهیم شناخت ویژگی‌های طبیعی طوطی تا چه اندازه در بررسی اشعار به کار می‌آیند، ظرافت‌ها و دقایق آن‌ها را روشن‌تر می‌سازند و در فهم معنای آن‌ها گره‌گشا هستند. بررسی شواهد متعددی از متون منظوم نشان می‌دهد از میان ویژگی‌های طبیعی طوطی، ویژگی‌های ظاهری و دیداری در ساخت تصاویر بلاغی که مبنای آن‌ها بر تشابه استوار است، نقش عمده دارند. در این میان رنگ سبز یکی از قوی‌ترین و متنوع‌ترین شبکهٔ تداعی‌ها را پیرامون طوطی ساخته است. Parrot is a well-known bird in Persian poetry, and various features of it have been implicated in literary themes and poetics. Scrutinizing literary works represents how Persian poets, living in different geographical regions, knew the natural and biological features of parrots closely and utilized them to create literary images, similes, and rhetorical associations. However, the bird is not native to Iran. In this article, by examining Persian poetry, we show how much the natural features of the parrot are helpful to study poems, enlighten their elegance and minutes, and help to unravel the text difficulties. Perusing cases found in Persian poetic texts shows that among the natural features of parrots, physical and visual ones play a significant role in constructing rhetorical images based on similarity. Meanwhile, the green color has created one of the strongest and most diverse networks of associations around parrots.

专业定制国外大毕业证【微信：wp158699 WhatsApp：+85244510406 Telegram：@CT989 】国外证件制作做国外大学毕业证，国外学位证书购买，日本学生卡仿制，英国硕士学位证书定制，美国录取通知书，澳洲毕业证成绩单，韩国大学在读证明，加拿大毕业完成信定制、国外大学毕业证丢了怎么办？如何补办国外大学文凭，定制国外研究生学历需要多少钱？【办证网：xiqingtang.com 】购买网上学历认证做国外研究生学历，做国外大学毕业证成绩单，「国外国外假毕业证知乎」，成绩单寄送、毕业证书电子版、学历学位证书区别、文凭认证、成绩单申请、学历学位证书区别、留信网认证。香港浸会大学毕业证制作/怎么做个假的德语歌德B1证书/哪里可以购买福莱森纽斯应用科学大学硕士文凭拉德堡德大学毕业证制作/加拿大硕士学历代办/购买一个假的香港公开大学硕士学位证书未毕业在线购买日本昭和女子大学学位记/合格通知书电子版制作/定制意大利大学毕业证/ACCA证书定制在中国可以购买日本名古屋女子大学学位记/入学许可证书PDF修改/补办台湾大学毕业证/香港高级程度会考证书定制日本武藏野美术大学毕业证制作/修士学位记多少钱/哪里可以购买假美国佛罗里达理工学院成绩单哪里可以购买日本高崎商科大学学位记/合格通知书电子版制作/克隆爱尔兰大学毕业证/香港高级程度会考证书定制日本秋田大学毕业证制作/定制本科卒业证书/哪里可以购买假美国林奇堡学院成绩单韩国群长大学毕业证制作/怎么做个假的美国CPA证书/哪里可以购买杜塞尔多夫IST管理学院硕士文凭如何制作日语N1成绩单/做一个假的日本九州女子大学录取通知书韩国启明大学毕业证制作/怎么做个假的CPA注册会计师证书/哪里可以购买ESCE高等商学院硕士文凭如何定制国外大学毕业证如何在喜庆堂留学网站上定制毕业证书如果您正在寻找一家专业的机构来定制您的毕业证书，喜庆堂留学是您的最佳选择。在喜庆堂留学网站上，您可以轻松地定制个性化的毕业证书，展示您的学术成就和个人荣誉。以下是一些简单步骤，帮助您了解如何在喜庆堂留学网站上定制毕业证书。第一步：浏览喜庆堂留学网站首先，访问喜庆堂留学的官方网站。您可以通过搜索引擎或直接输入网址来访问该网站。一旦进入网站，您将看到各种可定制的毕业证书样式和选项。第二步：选择合适的毕业证书样式在喜庆堂留学网站上，您可以浏览不同的毕业证书样式和设计。根据您的个人喜好和学校要求，选择适合您的样式。您可以预览每个样式的效果，并确保它符合您的期望。第三步：填写相关信息一旦您选择了合适的毕业证书样式，您需要填写一些相关信息，如您的姓名、学校名称、专业、毕业日期等。这些信息将被精确地印制在您的毕业证书上，确保其准确性和个性化。第四步：预览和确认在填写完相关信息后，您将有机会预览您定制的毕业证书。请仔细检查所有的信息和细节，确保一切都正确无误。如果需要进行任何修改，请及时提出。第五步：下单和支付确认毕业证书的预览后，您可以选择下单并进行支付。喜庆堂留学提供安全可靠的在线支付方式，以确保您的交易安全。第六步：等待交付一旦您下单并完成支付，您只需耐心等待交付。喜庆堂留学将按照约定的时间和方式将您的定制毕业证书快递至您指定的地址。您可以选择国际快递服务，确保您在世界各地都可以收到您的毕业证书。喜庆堂留学致力于为客户提供高品质的毕业证书定制服务。无论您是留学生还是毕业生，无论您的学校在世界的哪个角落，我们都可以根据您的要求制作出与您学校完全相符的毕业证书。我们的专业团队将确保每个细节都准确无误，并提供出色的客户服务。

Log In

Comparative Analysis of Classification Algorithms for Email Spam Detection