Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Edible and Poisonous Mushrooms Classification by Machine Learning Algorithms

2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, 2022
Of the millions of mushroom species growing all around the world, one type is edible, while the other is poisonous. It is not easy to distinguish edible and poisonous mushrooms from each other and it is a condition that requires expertise. The classification of poisonous and edible mushrooms is therefore important. Machine learning algorithms are an alternative method for classifying poisonous and edible mushrooms using morphological or physical features of fungi. The dataset used in this study is the Mushroom dataset available in the UC Irvine Machine Learning Repository. Based on 22 features in the Mushroom dataset and four different machine learning algorithms, models have been created for the classification of edible and poisonous fungi. The classification success rates of these models were obtained from Naive Bayes, Decision Tree, Support Vector Machine and AdaBoost algorithms with 90.99%, 98.82%, 99.98% and 100%, respectively. When these results were examined, taking into account the physical appearance features of the mushrooms, it was determined whether the mushrooms were edible and poisonous by 100% with the AdaBoost model....Read more
Edible and Poisonous Mushrooms Classification by Machine Learning Algorithms Kemal TUTUNCU Departmentxof Electrical & Electronic Engineering Selcuk University Konya,xTURKEY ktutuncu@selcuk.edu.tr Ilkay CINAR Department of Computer Engineering Selcuk University Konya, TURKEY ilkay.cinar@selcuk.edu.tr Ramazan KURSUN Guneysinir Vocational School Selcuk University Konya, TURKEY rkursun@selcuk.edu.tr Murat KOKLU Department of Computer Engineering Selcuk University Konya, TURKEY mkoklu@selcuk.edu.tr Abstract— Of the millions of mushroom species growing all around the world, one type is edible, while the other is poisonous. It is not easy to distinguish edible and poisonous mushrooms from each other and it is a condition that requires expertise. The classification of poisonous and edible mushrooms is therefore important. Machine learning algorithms are an alternative method for classifying poisonous and edible mushrooms using morphological or physical features of fungi. The dataset used in this study is the Mushroom dataset available in the UC Irvine Machine Learning Repository. Based on 22 features in the Mushroom dataset and four different machine learning algorithms, models have been created for the classification of edible and poisonous fungi. The classification success rates of these models were obtained from Naive Bayes, Decision Tree, Support Vector Machine and AdaBoost algorithms with 90.99%, 98.82%, 99.98% and 100%, respectively. When these results were examined, taking into account the physical appearance features of the mushrooms, it was determined whether the mushrooms were edible and poisonous by 100% with the AdaBoost model. Keywords-edible mushrooms, poisonous mushrooms, machine learning, mushroom dataset, UCI machine learning repository. I. INTRODUCTION Edible mushrooms, which grow spontaneously in nature and depending on the seasons, are an important foodstuff for people living in rural areas. Mushrooms, some of which are edible, and some are poisonous. There are many deaths caused by eating poisonous mushrooms every year. The determination of whether a mushroom is poisonous or not based on its look is a circumstance that necessitates expertise. A large part of the mushrooms consumed in the world is still supplied from nature by collection. But the inability to distinguish mushrooms collected from nature and included in the group of poisonous mushrooms leads to big problems and can even lead to death. This situation causes people to be more abstemious about mushroom consumption [1-4]. When it comes to determining whether mushrooms are edible or poisonous in general, the methods for identifying poisonous mushrooms are primarily based on visual identification and biochemical analysis [5]. It is difficult to conduct biochemical analysis in everyday life and to judge from its morphological features for people who are not specialists. For this reason, many researchers have worked on different models and methods. Examples of these studies are machine learning algorithms [6, 7], deep learning algorithms [8, 9], rule inference algorithms [10, 11] and image processing algorithms [12-14]. 22 features from the mushroom dataset were employed in this study, and their classification was done using machine learning algorithms. The following operations were used to get the study's contribution to the literature. These are: A dataset consisting of edible mushrooms (4208) and poisonous mushrooms (3916) was used. Decision Tree (DT), Naive Bayes (NB), AdaBoost (AB), and Support Vector Machine (SVM) machine learning algorithms have been used for classification. The performance metrics were calculated for each classification algorithm and compared to each other. The study is organized as follows: In Chapter 2, the dataset, cross-validation, confusion matrix, performance metrics, and machine learning methods used in the study, are explained. The results of the experiments are presented in Chapter 3. The results of the study are given in Chapter 4. II. MATERIAL AND METHODS In this section, mushroom data set, confusion matrix, performance metrics, k-fold cross-validation and machine learning methods are explained. A. Mushroom Dataset The mushroom dataset consists of a total of 8124 data with 22 features obtained from edible and poisonous mushrooms. This dataset was obtained by [15] and shared in the UC Irvine Machine Learning Repository [16]. The features of the mushroom dataset are presented in Table 1. 2022 11 th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO 978-1-6654-6827-5/22/$31.00 ©2022 IEEE – 629 –
TABLE I. FEATURES FOR THE MUSHROOM DATASET B. k-Fold Cross-Validation In order to measure objectively the success of classification models, this method was developed. This method divides the dataset evenly up to k. One of the k subsets is utilized as the test set each time. The k-1 subset is used as a training set. As a result, the average success in all k trials is calculated. Each dataset is used once as a test set and then k-1 times as a training set. This method has the disadvantage of having to run the training algorithm from scratch k times [17, 18]. In this study, the k value of 10 was taken. The operating mode of the 10-fold cross- validation used in the study is shown in Fig. 1. C. Confusion Matrix and Performance Metrics Accuracy, precision, recall and F-1 Score metrics are used to evaluate a classifier model. These metrics are calculated by obtaining them from the confusion matrix [19, 20]. The confusion matrix is presented in Table 2. The performance metrics derived from the confusion matrix are shown in Table 3. Figure 1. 10-fold cross validation TABLE II. CONFUSION MATRIX Mushroom Dataset Predicted Edible Poisonous Actual Edible (TP) The number of correctly predicted edible mushroom (FN) Predicted number of edible mushrooms as poisonous mushrooms Poisonous (FP) Predicted number of poisonous mushrooms as edible mushrooms (TN) The number of correctly predicted poisonous mushrooms TABLE III. FORMULAS FOR PERFORMANCE METRICS D. Machine Learning Methods The Decision Tree, Naive Bayes, AdaBoost, and Support Vector Machine algorithms were chosen from among the machine learning algorithms extensively used in the literature to categorize the Mushroom dataset. Naive Bayes (NB): The Bayes theorem is used to create naive bayes classifiers, which are probability-based classifiers. The classifier calculates the probability values for each class on it and aims to find the most likely class for each data to be classified. This algorithm is widely used due to its high computational speed, good performance, and simple structure [21]. Decision Tree (DT): It is a non-parametric supervised machine learning approach for classification and regression. By learning simple decision rules drawn from data attributes, decision trees are used to develop a model that predicts the value of a target variable. The decision tree learns to approximate a sine curve from data using a series of if-then-else decision rules. The decision rules become more sophisticated as the tree grows deeper, and the model performs better [22-24]. Support Vector Machine (SVM): The SVM is a statistical learning theory-based classification algorithm. The SVM's operation is based on the estimation of the best decision function that can separate the two classes, or, in other words, the definition of a hyperplane that can optimally separate the two classes [25, 26]. AdaBoost (AB): The AdaBoost algorithm aims to create a more powerful learning algorithm by combining many learning 2022 11 th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO – 630 –
2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO Edible and Poisonous Mushrooms Classification by Machine Learning Algorithms Kemal TUTUNCU Departmentxof Electrical & Electronic Engineering Selcuk University Konya,xTURKEY ktutuncu@selcuk.edu.tr Ilkay CINAR Department of Computer Engineering Selcuk University Konya, TURKEY ilkay.cinar@selcuk.edu.tr Ramazan KURSUN Guneysinir Vocational School Selcuk University Konya, TURKEY rkursun@selcuk.edu.tr Abstract— Of the millions of mushroom species growing all around the world, one type is edible, while the other is poisonous. It is not easy to distinguish edible and poisonous mushrooms from each other and it is a condition that requires expertise. The classification of poisonous and edible mushrooms is therefore important. Machine learning algorithms are an alternative method for classifying poisonous and edible mushrooms using morphological or physical features of fungi. The dataset used in this study is the Mushroom dataset available in the UC Irvine Machine Learning Repository. Based on 22 features in the Mushroom dataset and four different machine learning algorithms, models have been created for the classification of edible and poisonous fungi. The classification success rates of these models were obtained from Naive Bayes, Decision Tree, Support Vector Machine and AdaBoost algorithms with 90.99%, 98.82%, 99.98% and 100%, respectively. When these results were examined, taking into account the physical appearance features of the mushrooms, it was determined whether the mushrooms were edible and poisonous by 100% with the AdaBoost model. Keywords-edible mushrooms, poisonous mushrooms, machine learning, mushroom dataset, UCI machine learning repository. I. INTRODUCTION Edible mushrooms, which grow spontaneously in nature and depending on the seasons, are an important foodstuff for people living in rural areas. Mushrooms, some of which are edible, and some are poisonous. There are many deaths caused by eating poisonous mushrooms every year. The determination of whether a mushroom is poisonous or not based on its look is a circumstance that necessitates expertise. A large part of the mushrooms consumed in the world is still supplied from nature by collection. But the inability to distinguish mushrooms collected from nature and included in the group of poisonous mushrooms leads to big problems and can even lead to death. This situation causes people to be more abstemious about mushroom consumption [1-4]. When it comes to determining whether mushrooms are edible or poisonous in general, the methods for identifying poisonous mushrooms are primarily based on visual identification and biochemical analysis [5]. It is difficult to Murat KOKLU Department of Computer Engineering Selcuk University Konya, TURKEY mkoklu@selcuk.edu.tr conduct biochemical analysis in everyday life and to judge from its morphological features for people who are not specialists. For this reason, many researchers have worked on different models and methods. Examples of these studies are machine learning algorithms [6, 7], deep learning algorithms [8, 9], rule inference algorithms [10, 11] and image processing algorithms [12-14]. 22 features from the mushroom dataset were employed in this study, and their classification was done using machine learning algorithms. The following operations were used to get the study's contribution to the literature. These are:  A dataset consisting of edible mushrooms (4208) and poisonous mushrooms (3916) was used.  Decision Tree (DT), Naive Bayes (NB), AdaBoost (AB), and Support Vector Machine (SVM) machine learning algorithms have been used for classification.  The performance metrics were calculated for each classification algorithm and compared to each other. The study is organized as follows: In Chapter 2, the dataset, cross-validation, confusion matrix, performance metrics, and machine learning methods used in the study, are explained. The results of the experiments are presented in Chapter 3. The results of the study are given in Chapter 4. II. MATERIAL AND METHODS In this section, mushroom data set, confusion matrix, performance metrics, k-fold cross-validation and machine learning methods are explained. A. Mushroom Dataset The mushroom dataset consists of a total of 8124 data with 22 features obtained from edible and poisonous mushrooms. This dataset was obtained by [15] and shared in the UC Irvine Machine Learning Repository [16]. The features of the mushroom dataset are presented in Table 1. 978-1-6654-6827-5/22/$31.00 ©2022 IEEE – 629 – 2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING TABLE I. (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO TABLE II. FEATURES FOR THE MUSHROOM DATASET Actual Mushroom Dataset Edible Poisonous TABLE III. B. k-Fold Cross-Validation In order to measure objectively the success of classification models, this method was developed. This method divides the dataset evenly up to k. One of the k subsets is utilized as the test set each time. The k-1 subset is used as a training set. As a result, the average success in all k trials is calculated. Each dataset is used once as a test set and then k-1 times as a training set. This method has the disadvantage of having to run the training algorithm from scratch k times [17, 18]. In this study, the k value of 10 was taken. The operating mode of the 10-fold crossvalidation used in the study is shown in Fig. 1. C. Confusion Matrix and Performance Metrics Accuracy, precision, recall and F-1 Score metrics are used to evaluate a classifier model. These metrics are calculated by obtaining them from the confusion matrix [19, 20]. The confusion matrix is presented in Table 2. The performance metrics derived from the confusion matrix are shown in Table 3. CONFUSION MATRIX Predicted Edible Poisonous (TP) The number of correctly predicted edible mushroom (FP) Predicted number of poisonous mushrooms as edible mushrooms (FN) Predicted number of edible mushrooms as poisonous mushrooms (TN) The number of correctly predicted poisonous mushrooms FORMULAS FOR PERFORMANCE METRICS D. Machine Learning Methods The Decision Tree, Naive Bayes, AdaBoost, and Support Vector Machine algorithms were chosen from among the machine learning algorithms extensively used in the literature to categorize the Mushroom dataset. Naive Bayes (NB): The Bayes theorem is used to create naive bayes classifiers, which are probability-based classifiers. The classifier calculates the probability values for each class on it and aims to find the most likely class for each data to be classified. This algorithm is widely used due to its high computational speed, good performance, and simple structure [21]. Decision Tree (DT): It is a non-parametric supervised machine learning approach for classification and regression. By learning simple decision rules drawn from data attributes, decision trees are used to develop a model that predicts the value of a target variable. The decision tree learns to approximate a sine curve from data using a series of if-then-else decision rules. The decision rules become more sophisticated as the tree grows deeper, and the model performs better [22-24]. Support Vector Machine (SVM): The SVM is a statistical learning theory-based classification algorithm. The SVM's operation is based on the estimation of the best decision function that can separate the two classes, or, in other words, the definition of a hyperplane that can optimally separate the two classes [25, 26]. Figure 1. 10-fold cross validation AdaBoost (AB): The AdaBoost algorithm aims to create a more powerful learning algorithm by combining many learning – 630 – 2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO algorithms. Therefore, it is a dec preferred algorithm among the frequently used boosting algorithms [27]. III. EXPERIMENTAL RESULTS Four different machine learning algorithms were used to classify edible and poisonous mushrooms by using 22 features in the Mushroom dataset. The cross-validation method was employed to train these classification models, and k=10 was determined. The mushroom classification steps are given in Fig. 2. c) SVM d) AdaBoost Figure 3. Confusion matrixes of algorithms (a, b, c, d) When looking at the confusion matrices, the model with the lowest TP and TN values is NB, whereas the model with the highest is AB. When the performance metrics calculated using the data of the confusion matrices in Figure 3 are examined, it is seen that the most successful models are SVM and AB. In addition to the classification success, the model with the highest recall, precision, and F1-score metrics was again obtained from the AB algorithm. The classification success rates of the algorithms for the Mushroom dataset are 90.99%, 98.82%, 99.98%, and 100%, respectively, and belong to the NB, DT, SVM, and AB algorithms. The graph obtained from the results in Table 4 is given in Fig. 4. TABLE IV. PERFORMANCE METRICS OF ALGORITHMS Algorithms Accuracy Recallx Precision F1xScore Naive Bayes 90.99 98.60 86.04 91.89 Decision Tree 98.82 97.72 100.00 98.85 SVM 99.98 100.00 99.95 99.98 AdaBoost 100.00 100.00 100.00 100.00 Figure 2. Mushroom classification flow chart As a result of the training of models using the Mushroom dataset, the confusion matrices in Fig. 3 were obtained. The performance metrics of the models were calculated using the data in the confusion matrices in Fig. 4 and are given in Table 4. Figure 4. Performance evaluation of algorithms IV. a) Naive Bayes b) Decision Tree DISCUSSION AND CONCLUSIONS In this study, four different algorithm models frequently used in the literature were used using 22 characteristics belonging to the 8124 mushroom data set in order to detect edible and poisonous mushrooms. The classification success rates of these models were obtained from NB, DT, SVM, and AB algorithms as 90.99%, 98.82%, 99.98%, and 100%, respectively. The results of 4 different machine algorithms applied to the mushroom dataset were compared to each other. The best classification result was obtained from AdaBoost whereas the lowest classification result was obtained from the NB model. – 631 – 2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING Considering the physical appearance characteristics of mushrooms, whether the mushroom is edible or poisonous has been estimated as a 100% success in AdaBoost model. When the literature was examined, it has been seen that some other algorithms also obtained 100% success in the classification. This shows that this study is compatible with the previous studies and will make it possible to determine whether the mushrooms are edible or poisonous. People's doubts about fungi in nature will be dispelled using high-performance machine learning models that take into account the health and safety of consumers. In addition to the appearance features of fungi determined only by the expert, deep learning algorithms that extract features from mushroom images can also be applied to this dataset. Mobile applications that can distinguish instant and automatic mushroom images can be developed. ACKNOWLEDGMENT Thanks to Selcuk University Coordinatorship for their support. Scientific Research REFERENCES [1] C. Lei, et al., “Mushroomxpoisoning surveillance analysis, Yunnan province, China,” 2001-2006. OSIR Journal, vol. 1(1), p.x8-11, 2016. [2] J. White, et al., “Mushroomxpoisoning: A proposed new clinical classification,” Toxicon, vol. 157, p. 53-65, 2019. [3] Y. Wang, J. Du, H. Zhang, and X. Yang, “Mushroomxtoxicity recognition basedxon multigrained cascade forest”, Scientific Programming, (Special Issue), 2020. [4] H. Li, et al., “Reviewing the world's edible mushroomxspecies: A new evidence‐based classification system,” Comprehensive Reviews in Food Science and Food Safety, vol. 20(2), p.1982-2014, 2021. [5] H. Zhao, F. Ge, P. Yu, and H. Li, “Identification of WildxMushroom Basedxon Ensemble Learning,” IEEE, In 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence (BDAI), Qingdao, China, p.x43-47, July 2021. [6] S.B. Kotsiantis, I.D. Zaharakis, and P.E. Pintelas, “Machineplearning: a reviewpofpclassificationpandpcombiningptechniques,” Artificial IntelligencepReview, vol. 26(3), p.159-190, 2006. [7] I. Cinar, and M. Koklu, “Identification of Rice Varieties Using Machine Learning Algorithms,” Journal of Agricultural Sciences, vol. 28(2), p. 307-325, 2022. DOI: 10.15832/ankutbd.862482. [8] N. Zahan, M.Z. Hasan, M.A. Malek, and S.S. Reya, “A Deep LearningBased Approach for Edible, Inedible and PoisonouspMushroom Classification,” In 2021 InternationalpConferenceponpInformationpand CommunicationpTechnology for SustainablepDevelopment (ICICT4SD), Bangladesh, pp. 440-444, 2021. [9] K. Sabanci, M.F. Aslan, E. Ropelewska, and M.F. Unlersen, “A convolutional neural network‐basedxcomparative study for pepper seed classification:xAnalysis of selected deep features with support vector machine,” Journal of Food Process Engineering, 2021. [10] M. Koklu, H. Kahramanli, and N. Allahverdi, “A New Approach to Classification Rule Extraction Problem by the Real Value Coding,” International Journal of Innovative Computing, Information and Control, vol. 8(9), p.6303-6315, 2012. [11] M. Koklu, H. Kahramanli, and N. Allahverdi, “A new accurate and efficient approach to extract classification rules,” Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 29(3), p.477486, 2014. DOI:17341/gummfd.89433. [12] A.D. Arjun, S.K. Chakraborty, N.K. Mahanti and N. Kotwaliwale, “Nondestructivepassessment of qualitypparameters of white buttonxmushrooms (Agaricus bisporus) usingpimagepprocessing techniques,” Journal of FoodpSciencepandpTechnology, p.x1-13, 2021. [13] I. Cinar, M. Koklu, and S. Tasdemir, “Classification of raisin grains using machine vision and artificial intelligence methods,” Gazi Mühendislik Bilimleri Dergisi (GMBD), vol. 6(3), p.200-209, 2020. DOI: 10.30855/gmbd.2020.03.03 [14] M. Koklu, S. Sarigil, and O. Ozbek, “The Use of Machine Learning Methods in Classification of Pumpkin Seeds (Cucurbita Pepo L.),” Genetic Resources and Crop Evolution, vol. 68(6), p.2713-2726, 2021. DOI:10.1007/s10722-021-01226-0. [15] J. Schlimmer, “Mushroomxrecords drawn frompthe audubon societypfield guide to northpamericanpmushrooms,” GH Lincoff (Pres), New York, 1981. [16] A. Frank and A. Asuncion, “Mushroom Dataset,” UCI Machine Learning Repository, Irvine, CA: University of California,” 2022, [online] Available: http://archive.ics.uci.edu/ml, [17] M. Koklu, R. Kursun, Y.S. Taspinar, and I. Cinar, “Classification of Date Fruits into Genetic Varieties Using Image Analysis,” Mathematical Problems in Engineering, vol. 2021, 2021. DOI: 10.1155/2021/4793293. [18] Y.S. Taspinar, I. Cinar, and M. Koklu, “Classification by a stacking model using CNN features for COVID-19 infection diagnosis,” Journal of X-ray science and technology, vol. 30(1), p. 73-88, 2022. DOI: 10.3233/XST211031. [19] D. Singh, et al., “Classification and Analysis of Pistachio Species with Pre-Trained Deep Learning Models,” Electronics, vol. 11(7), p. 981, 2022. DOI: 10.3390/electronics11070981. [20] M. Koklu, Y.S. Taspinar, and I. Cinar, “Classification of rice varieties with deep learning methods,” Computers and Electronics in Agriculture, vol. 187, p. 106285, 2021. DOI: 10.1016/j.compag.2021.106285. [21] P. Liu, and L. Lei, “Missing data treatment methods and NBI model,” Proceedings - ISDA 2006, SixthxInternational Conferencexon Intelligent Systems Design and Applications, 1, p.633–638, 2006. DOI: 10.1109/ISDA.2006.194. [22] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and RegressionxTrees,” Wadsworth, Belmont, CA, 1984. [23] J.R. Quinlan, “C4. 5: programs for machine learning”, Morgan Kaufmann, 1993. [24] I.A. Ozkan, and M. Koklu, “Skin lesion classification using machine learning algorithms,” International Journal of Intelligent Systems and Applications in Engineering, vol. 5(4), p.285-289, 2017. DOI: 10.18201/ijisae.2017534420. [25] V.N. Vapnik, “The NaturexofxStatistical Learning Theory,” SpringerVerlag, New York, 2000. [26] M. Koklu, M.F. Unlersen, I.A. Ozkan, M.F. Aslan, and K. Sabanci, “A CNN-SVM study based on selected deep features for grapevine leaves classification” Measurement, vol. 188, p. 110425, 2022. DOI:10.1016/j.measurement.2021.110425. [27] Y. Freund, and R.E. Schapire, “Experimentsxwith axnew boosting algorithm”, In icml, vol. 96, p. 148-156, 1996. – 632 – View publication stats (MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Musabe Jean Bosco
Chongqing University of Posts and Telecommunications
naokant deo
Delhi Technological University, Delhi, India
Fabio Cuzzolin
Oxford Brookes University
Munish Jindal
GZS PTU Campus Bathinda