2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING
(MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO
Edible and Poisonous Mushrooms Classification by
Machine Learning Algorithms
Kemal TUTUNCU
Departmentxof Electrical &
Electronic Engineering Selcuk
University
Konya,xTURKEY
ktutuncu@selcuk.edu.tr
Ilkay CINAR
Department of Computer
Engineering
Selcuk University
Konya, TURKEY
ilkay.cinar@selcuk.edu.tr
Ramazan KURSUN
Guneysinir Vocational School
Selcuk University
Konya, TURKEY
rkursun@selcuk.edu.tr
Abstract— Of the millions of mushroom species growing all around
the world, one type is edible, while the other is poisonous. It is not
easy to distinguish edible and poisonous mushrooms from each
other and it is a condition that requires expertise. The
classification of poisonous and edible mushrooms is therefore
important. Machine learning algorithms are an alternative
method for classifying poisonous and edible mushrooms using
morphological or physical features of fungi. The dataset used in
this study is the Mushroom dataset available in the UC Irvine
Machine Learning Repository. Based on 22 features in the
Mushroom dataset and four different machine learning
algorithms, models have been created for the classification of
edible and poisonous fungi. The classification success rates of these
models were obtained from Naive Bayes, Decision Tree, Support
Vector Machine and AdaBoost algorithms with 90.99%, 98.82%,
99.98% and 100%, respectively. When these results were
examined, taking into account the physical appearance features of
the mushrooms, it was determined whether the mushrooms were
edible and poisonous by 100% with the AdaBoost model.
Keywords-edible mushrooms, poisonous mushrooms, machine
learning, mushroom dataset, UCI machine learning repository.
I.
INTRODUCTION
Edible mushrooms, which grow spontaneously in nature and
depending on the seasons, are an important foodstuff for people
living in rural areas. Mushrooms, some of which are edible, and
some are poisonous. There are many deaths caused by eating
poisonous mushrooms every year. The determination of whether
a mushroom is poisonous or not based on its look is a
circumstance that necessitates expertise. A large part of the
mushrooms consumed in the world is still supplied from nature
by collection. But the inability to distinguish mushrooms
collected from nature and included in the group of poisonous
mushrooms leads to big problems and can even lead to death.
This situation causes people to be more abstemious about
mushroom consumption [1-4].
When it comes to determining whether mushrooms are
edible or poisonous in general, the methods for identifying
poisonous mushrooms are primarily based on visual
identification and biochemical analysis [5]. It is difficult to
Murat KOKLU
Department of Computer
Engineering
Selcuk University
Konya, TURKEY
mkoklu@selcuk.edu.tr
conduct biochemical analysis in everyday life and to judge from
its morphological features for people who are not specialists. For
this reason, many researchers have worked on different models
and methods. Examples of these studies are machine learning
algorithms [6, 7], deep learning algorithms [8, 9], rule inference
algorithms [10, 11] and image processing algorithms [12-14].
22 features from the mushroom dataset were employed in
this study, and their classification was done using machine
learning algorithms. The following operations were used to get
the study's contribution to the literature. These are:
A dataset consisting of edible mushrooms (4208) and
poisonous mushrooms (3916) was used.
Decision Tree (DT), Naive Bayes (NB), AdaBoost
(AB), and Support Vector Machine (SVM) machine
learning algorithms have been used for classification.
The performance metrics were calculated for each
classification algorithm and compared to each other.
The study is organized as follows: In Chapter 2, the dataset,
cross-validation, confusion matrix, performance metrics, and
machine learning methods used in the study, are explained. The
results of the experiments are presented in Chapter 3. The results
of the study are given in Chapter 4.
II.
MATERIAL AND METHODS
In this section, mushroom data set, confusion matrix,
performance metrics, k-fold cross-validation and machine
learning methods are explained.
A. Mushroom Dataset
The mushroom dataset consists of a total of 8124 data with
22 features obtained from edible and poisonous mushrooms.
This dataset was obtained by [15] and shared in the UC Irvine
Machine Learning Repository [16]. The features of the
mushroom dataset are presented in Table 1.
978-1-6654-6827-5/22/$31.00 ©2022 IEEE
– 629 –
2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING
TABLE I.
(MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO
TABLE II.
FEATURES FOR THE MUSHROOM DATASET
Actual
Mushroom
Dataset
Edible
Poisonous
TABLE III.
B. k-Fold Cross-Validation
In order to measure objectively the success of classification
models, this method was developed. This method divides the
dataset evenly up to k. One of the k subsets is utilized as the test
set each time. The k-1 subset is used as a training set. As a result,
the average success in all k trials is calculated. Each dataset is
used once as a test set and then k-1 times as a training set. This
method has the disadvantage of having to run the training
algorithm from scratch k times [17, 18]. In this study, the k value
of 10 was taken. The operating mode of the 10-fold crossvalidation used in the study is shown in Fig. 1.
C. Confusion Matrix and Performance Metrics
Accuracy, precision, recall and F-1 Score metrics are used to
evaluate a classifier model. These metrics are calculated by
obtaining them from the confusion matrix [19, 20]. The
confusion matrix is presented in Table 2. The performance
metrics derived from the confusion matrix are shown in Table 3.
CONFUSION MATRIX
Predicted
Edible
Poisonous
(TP)
The number of
correctly predicted
edible mushroom
(FP)
Predicted number of
poisonous mushrooms
as edible mushrooms
(FN)
Predicted number of
edible mushrooms as
poisonous mushrooms
(TN)
The number of correctly
predicted poisonous
mushrooms
FORMULAS FOR PERFORMANCE METRICS
D. Machine Learning Methods
The Decision Tree, Naive Bayes, AdaBoost, and Support
Vector Machine algorithms were chosen from among the
machine learning algorithms extensively used in the literature to
categorize the Mushroom dataset.
Naive Bayes (NB): The Bayes theorem is used to create naive
bayes classifiers, which are probability-based classifiers. The
classifier calculates the probability values for each class on it
and aims to find the most likely class for each data to be
classified. This algorithm is widely used due to its high
computational speed, good performance, and simple structure
[21].
Decision Tree (DT): It is a non-parametric supervised
machine learning approach for classification and regression. By
learning simple decision rules drawn from data attributes,
decision trees are used to develop a model that predicts the value
of a target variable. The decision tree learns to approximate a
sine curve from data using a series of if-then-else decision rules.
The decision rules become more sophisticated as the tree grows
deeper, and the model performs better [22-24].
Support Vector Machine (SVM): The SVM is a statistical
learning theory-based classification algorithm. The SVM's
operation is based on the estimation of the best decision function
that can separate the two classes, or, in other words, the
definition of a hyperplane that can optimally separate the two
classes [25, 26].
Figure 1.
10-fold cross validation
AdaBoost (AB): The AdaBoost algorithm aims to create a
more powerful learning algorithm by combining many learning
– 630 –
2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING
(MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO
algorithms. Therefore, it is a dec preferred algorithm among the
frequently used boosting algorithms [27].
III.
EXPERIMENTAL RESULTS
Four different machine learning algorithms were used to
classify edible and poisonous mushrooms by using 22 features
in the Mushroom dataset. The cross-validation method was
employed to train these classification models, and k=10 was
determined. The mushroom classification steps are given in Fig.
2.
c) SVM
d) AdaBoost
Figure 3. Confusion matrixes of algorithms (a, b, c, d)
When looking at the confusion matrices, the model with the
lowest TP and TN values is NB, whereas the model with the
highest is AB. When the performance metrics calculated using
the data of the confusion matrices in Figure 3 are examined, it is
seen that the most successful models are SVM and AB. In
addition to the classification success, the model with the highest
recall, precision, and F1-score metrics was again obtained from
the AB algorithm. The classification success rates of the
algorithms for the Mushroom dataset are 90.99%, 98.82%,
99.98%, and 100%, respectively, and belong to the NB, DT,
SVM, and AB algorithms. The graph obtained from the results
in Table 4 is given in Fig. 4.
TABLE IV.
PERFORMANCE METRICS OF ALGORITHMS
Algorithms
Accuracy
Recallx
Precision
F1xScore
Naive Bayes
90.99
98.60
86.04
91.89
Decision Tree
98.82
97.72
100.00
98.85
SVM
99.98
100.00
99.95
99.98
AdaBoost
100.00
100.00
100.00
100.00
Figure 2. Mushroom classification flow chart
As a result of the training of models using the Mushroom
dataset, the confusion matrices in Fig. 3 were obtained. The
performance metrics of the models were calculated using the
data in the confusion matrices in Fig. 4 and are given in Table 4.
Figure 4. Performance evaluation of algorithms
IV.
a) Naive Bayes
b) Decision Tree
DISCUSSION AND CONCLUSIONS
In this study, four different algorithm models frequently used
in the literature were used using 22 characteristics belonging to
the 8124 mushroom data set in order to detect edible and
poisonous mushrooms. The classification success rates of these
models were obtained from NB, DT, SVM, and AB algorithms
as 90.99%, 98.82%, 99.98%, and 100%, respectively. The
results of 4 different machine algorithms applied to the
mushroom dataset were compared to each other. The best
classification result was obtained from AdaBoost whereas the
lowest classification result was obtained from the NB model.
– 631 –
2022 11th MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING
Considering the physical appearance characteristics of
mushrooms, whether the mushroom is edible or poisonous has
been estimated as a 100% success in AdaBoost model. When
the literature was examined, it has been seen that some other
algorithms also obtained 100% success in the classification.
This shows that this study is compatible with the previous
studies and will make it possible to determine whether the
mushrooms are edible or poisonous. People's doubts about fungi
in nature will be dispelled using high-performance machine
learning models that take into account the health and safety of
consumers.
In addition to the appearance features of fungi determined
only by the expert, deep learning algorithms that extract features
from mushroom images can also be applied to this dataset.
Mobile applications that can distinguish instant and automatic
mushroom images can be developed.
ACKNOWLEDGMENT
Thanks to Selcuk University
Coordinatorship for their support.
Scientific
Research
REFERENCES
[1]
C. Lei, et al., “Mushroomxpoisoning surveillance analysis, Yunnan
province, China,” 2001-2006. OSIR Journal, vol. 1(1), p.x8-11, 2016.
[2] J. White, et al., “Mushroomxpoisoning: A proposed new clinical
classification,” Toxicon, vol. 157, p. 53-65, 2019.
[3] Y. Wang, J. Du, H. Zhang, and X. Yang, “Mushroomxtoxicity recognition
basedxon multigrained cascade forest”, Scientific Programming, (Special
Issue), 2020.
[4] H. Li, et al., “Reviewing the world's edible mushroomxspecies: A new
evidence‐based classification system,” Comprehensive Reviews in Food
Science and Food Safety, vol. 20(2), p.1982-2014, 2021.
[5] H. Zhao, F. Ge, P. Yu, and H. Li, “Identification of WildxMushroom
Basedxon Ensemble Learning,” IEEE, In 2021 IEEE 4th International
Conference on Big Data and Artificial Intelligence (BDAI), Qingdao,
China, p.x43-47, July 2021.
[6] S.B. Kotsiantis, I.D. Zaharakis, and P.E. Pintelas, “Machineplearning: a
reviewpofpclassificationpandpcombiningptechniques,” Artificial IntelligencepReview, vol. 26(3), p.159-190, 2006.
[7] I. Cinar, and M. Koklu, “Identification of Rice Varieties Using Machine
Learning Algorithms,” Journal of Agricultural Sciences, vol. 28(2), p.
307-325, 2022. DOI: 10.15832/ankutbd.862482.
[8] N. Zahan, M.Z. Hasan, M.A. Malek, and S.S. Reya, “A Deep LearningBased Approach for Edible, Inedible and PoisonouspMushroom
Classification,” In 2021 InternationalpConferenceponpInformationpand
CommunicationpTechnology for SustainablepDevelopment (ICICT4SD),
Bangladesh, pp. 440-444, 2021.
[9] K. Sabanci, M.F. Aslan, E. Ropelewska, and M.F. Unlersen, “A
convolutional neural network‐basedxcomparative study for pepper seed
classification:xAnalysis of selected deep features with support vector
machine,” Journal of Food Process Engineering, 2021.
[10] M. Koklu, H. Kahramanli, and N. Allahverdi, “A New Approach to
Classification Rule Extraction Problem by the Real Value Coding,”
International Journal of Innovative Computing, Information and Control,
vol. 8(9), p.6303-6315, 2012.
[11] M. Koklu, H. Kahramanli, and N. Allahverdi, “A new accurate and
efficient approach to extract classification rules,” Journal of the Faculty
of Engineering and Architecture of Gazi University, vol. 29(3), p.477486, 2014. DOI:17341/gummfd.89433.
[12] A.D. Arjun, S.K. Chakraborty, N.K. Mahanti and N. Kotwaliwale, “Nondestructivepassessment
of
qualitypparameters
of
white
buttonxmushrooms (Agaricus bisporus) usingpimagepprocessing
techniques,” Journal of FoodpSciencepandpTechnology, p.x1-13, 2021.
[13] I. Cinar, M. Koklu, and S. Tasdemir, “Classification of raisin grains using
machine vision and artificial intelligence methods,” Gazi Mühendislik
Bilimleri Dergisi (GMBD), vol. 6(3), p.200-209, 2020. DOI:
10.30855/gmbd.2020.03.03
[14] M. Koklu, S. Sarigil, and O. Ozbek, “The Use of Machine Learning
Methods in Classification of Pumpkin Seeds (Cucurbita Pepo
L.),” Genetic Resources and Crop Evolution, vol. 68(6), p.2713-2726,
2021. DOI:10.1007/s10722-021-01226-0.
[15] J. Schlimmer, “Mushroomxrecords drawn frompthe audubon
societypfield guide to northpamericanpmushrooms,” GH Lincoff (Pres),
New York, 1981.
[16] A. Frank and A. Asuncion, “Mushroom Dataset,” UCI Machine Learning
Repository, Irvine, CA: University of California,” 2022, [online]
Available: http://archive.ics.uci.edu/ml,
[17] M. Koklu, R. Kursun, Y.S. Taspinar, and I. Cinar, “Classification of Date
Fruits into Genetic Varieties Using Image Analysis,” Mathematical
Problems in Engineering, vol. 2021, 2021. DOI: 10.1155/2021/4793293.
[18] Y.S. Taspinar, I. Cinar, and M. Koklu, “Classification by a stacking model
using CNN features for COVID-19 infection diagnosis,” Journal of X-ray
science and technology, vol. 30(1), p. 73-88, 2022. DOI: 10.3233/XST211031.
[19] D. Singh, et al., “Classification and Analysis of Pistachio Species with
Pre-Trained Deep Learning Models,” Electronics, vol. 11(7), p. 981,
2022. DOI: 10.3390/electronics11070981.
[20] M. Koklu, Y.S. Taspinar, and I. Cinar, “Classification of rice varieties
with deep learning methods,” Computers and Electronics in Agriculture,
vol. 187, p. 106285, 2021. DOI: 10.1016/j.compag.2021.106285.
[21] P. Liu, and L. Lei, “Missing data treatment methods and NBI model,”
Proceedings - ISDA 2006, SixthxInternational Conferencexon Intelligent
Systems Design and Applications, 1, p.633–638, 2006. DOI:
10.1109/ISDA.2006.194.
[22] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and
RegressionxTrees,” Wadsworth, Belmont, CA, 1984.
[23] J.R. Quinlan, “C4. 5: programs for machine learning”, Morgan
Kaufmann, 1993.
[24] I.A. Ozkan, and M. Koklu, “Skin lesion classification using machine
learning algorithms,” International Journal of Intelligent Systems and
Applications in Engineering, vol. 5(4), p.285-289, 2017. DOI:
10.18201/ijisae.2017534420.
[25] V.N. Vapnik, “The NaturexofxStatistical Learning Theory,” SpringerVerlag, New York, 2000.
[26] M. Koklu, M.F. Unlersen, I.A. Ozkan, M.F. Aslan, and K. Sabanci, “A
CNN-SVM study based on selected deep features for grapevine leaves
classification” Measurement, vol. 188, p. 110425, 2022.
DOI:10.1016/j.measurement.2021.110425.
[27] Y. Freund, and R.E. Schapire, “Experimentsxwith axnew boosting
algorithm”, In icml, vol. 96, p. 148-156, 1996.
– 632 –
View publication stats
(MECO), 7-10 JUNE 2022, BUDVA, MONTENEGRO