(IJCST-V11I2P16) :shikha, Jatinder Singh Saini

International Journal of Computer Science Trends and Technology (IJCST) – Volume 11 Issue 2, Mar-Apr 2023
RESEARCH ARTICLE OPEN ACCESS
A Review of Various Machine Learning Models for Email

Spam Prediction
Shikha [1], Jatinder Singh Saini [2]
Student [1], Head of Department [2]
Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib - Punjab.
ABSTRACT
Email users often face an issue of number of spam emails coming from unfamiliar senders in their mailboxes
daily. Spamming is also triggering online cyber fraud based on social engineering. Most of these frauds starts
via an email from an unauthentic origin in which a URL is comprised and show compromised one's personal
data after its opening. The email spam can be detected in numerous stages such to pre-process the data, extract
the attributes and classify the emails. Researchers have constructed several ML (Machine Learning) algorithms
in order to detect the email spam. This paper conducts a review on diverse methods used to detect the email
spam.
Keywords: - Email Spam, Machine Learning, Supervised learning
I. INTRODUCTION a set of semantic qualities. Then, each set of

semantic features provides the fundamental
Email is a robust, effective, and private mode of properties required to build a domain-specific
communication. Spammers are interested in using classifier for spam identification. Semantic analysis
this kind of communication to disseminate spam. takes place in two levels. Using a classification
Now that almost everyone has access to email, technique, emails from a huge training dataset are
businesses must deal with the spam issue. Both users automatically segmented into the five categories
and Internet service providers struggle with spam being taken into account at the first level [2]. In the
(ISPs). The variables include the speed of electronic second level, a set of semantic features are
communication innovation from one perspective automatically mined from each domain's dataset.
and the acceleration of spam innovation from The semantic properties are then used to build
another perspective [1]. Email is accessible, which specialized classifiers for detecting spam specific to
puts it at risk for a number of dangers caused by a given topic. To classify emails by domain, each
hackers. Spam poses a serious threat to email and is email in the global training dataset is assigned a
a problem for all email clients worldwide. Unwanted category. Be aware that suitable email pre-
email and messages sent to internet users' inboxes processing procedures must be followed before the
are referred to as spam. Email spam can thus be information in an email's topic and content can be
defined as the act of transmitting unrequested data to used successfully for classification. Figure 1 depicts
email boxes. Email spammers benefit greatly from the overall classification process for emails.
being able to quickly and cheaply send a big number
of messages to a large number of clients. It makes Emails
this issue relevant to everyone who uses the internet
and frequently receives erratic email. Spam emails
ultimately lead to lower productivity, take up space Pre-processing Feature Selection Classification
in letter boxes, transmit bugs, trojans, and materials
containing possibly lethal data for a particular
clientele, disrupt the stability of receiving mails, and Figure 1: An outline of the different steps used for
as a result, customers waste their valuable time email classification
organizing incoming mail and deleting unpleasant
messages. The majority of the real-time data that is currently
accessible is imperfect and made up of mixed, noisy,
One of the fundamental processes in semantic-based and missing numbers. Before beginning the mining
spam detection is the classification of the spam using process, it is crucial to prepare the dataset for data
ISSN: 2347-8578 www.ijcstjournal.org Page 82

mining [3]. The main objective of this stage is to frequently Boolean formulas that relate to the term
exclude some terms from email structures that aren't weights in the document. By starting at the base of
crucial for classification, like combination words the tree and working up and down its branches, one
and articles. A few typical pre-processing activities can categorise a document by choosing conditions
are keyword identification, tokenization, stop-word that are believed to be true. Once a leaf is reached,
removal, stemming, and spell checking. To reduce successive assessments place the document in the
the amount of data after pre-processing, a subset of category that was used to annotate the leaf. The
features is selected during feature selection. With learning tree is computed using a number of modern
this strategy, a certain cost function is minimized. techniques, including ID3, C4.5, and C5 [7].
Feature selection, as opposed to feature extraction,
does not change the data and is used to clean the data The k-nearest neighbour (K-NN) is an example-
before a classifier model is trained. Another name based classifier. In other words, this system
for this procedure is variable selection, which is also compares training documents rather than explicitly
known as feature reduction and variable subset describing categories. Often, there is no training
selection [4]. phase with this method. To classify a new document,
the k most similar documents are searched. Unless
Some of the most useful characteristics for email another class has been assigned to the bulk of these
spam identification are the mail body and subject, documents, the new document is likewise included
word count, word size, circadian rhythms, recipient in this group. Moreover, this strategy may discover
age, gender, and nation, recipient reacted (indicates the closest neighbours more quickly than traditional
whether the recipient responded to the message), indexing techniques. The class of the messages that
mature content, and bag of words from the mail are closest to a communication while determining
content. Spam emails frequently include many whether it is spam or ham is taken into
semantic anomalies. The framework for email consideration. Real-time vector comparison is
categorization by domain's last step involves possible.
learning classification algorithms while utilizing the
features that were selected in the stage before. The 1.1 Email Semantic Features Extraction
learned models are used to classify the new email
documents (test data) into one of the predefined This stage involves extracting the semantic
categories, such as Health, Education, Money, features from email text. A group of obscure ideas
Adult, or Computing. Several methods are compared that characterize an email's content is referred to as
in the experimental section in classifying emails into email semantics. The ultimate goal is to create a
different domains [5]. semantic representation for spam identification that
is extremely accurate. An effective method for
The Bayes classifier, often known as naive bayes, is automatically extracting semantic information in
one of the most frequently used statistical spam this situation is CN2-SD [8]. The classification rule
classifiers. It is referred regarded as the "naive" learner CN2 and the Subgroup Discovery are the two
technique because it ignores any dependencies or most often employed techniques for sematic feature
correlations among the inputs and breaks down a extraction (SD). The class labels are predicted using
multivariate problem into a series of univariate CN2's induction of classification rules, and the
problems. Spam emails can be categorized using this training data are inspected for intriguing patterns
technique. Probabilities are used as the main using SD. Finding a subgroup is different from
operational strategy for these classifiers. If specific classifying something since finding a subgroup is a
terms are regularly found in spam but not in ham, descriptive work, but classifying anything is a
then this incoming email is most likely spam. The prediction activity. These two algorithms are
use of this classification approach has become very described as follows:
common in mail filtering software [6]. It is
necessary to receive good Bayesian filter training. In • Subgroup discovery algorithm: The subgroup
its database, every word has a predefined probability discovery algorithm's descriptive induction
of turning up in spam or trash email. Similar to a feature makes it possible to look for patterns that
finite tree, a decision tree has branches that represent most closely match the data [9]. The semantic
tests and leaves that represent categories. Tests are ideas in email communications are explained

using this technique. Condensing and making to stop the same rule from being injected in
understandable the features of a target further rounds.
population (domain) into a set of patterns is a
vital function of data mining's semantic concept 1.2 Generation of Domain-specific Classifiers
description. The SD is a data mining technique
for figuring out connections between different For the purpose of developing a domain-specific
things (like emails) and particular characteristics classifier for each distinct domain, the collection of
of a target variable (class). These relations are semantic features that were extracted in the
encoded using the form rules: preceding stage are used as learning attributes [12].
The classification of email messages is a supervised
r ∶ cond → y learning activity. It seeks to create a probabilistic
model of a function for email classification. The
where cond is a combination of properties of the supervised learning of text in email messages
form, and y is the target variable (in our case presents a learning algorithm with a set of pre-
spam or ham). The objective of SD is not to classified, or labelled, patterns, where a whole email
generate a global model. Instead, it makes it dataset serves as one example of a message to be
possible to spot particular patterns of interest and classified. This is referred to as the practise set.
extract knowledge that can then be analysed and Certain classified messages from the training set are
evaluated for descriptive purposes. eliminated before creating a model to be used for
testing its efficacy. This collection serves as the
• CN2 rule induction algorithm: The CN2 testing set. Several models are created utilising
algorithm is one of the conventional rule-based different partitioning of the instances into training
learning methods for producing propositional and testing sets in order to evaluate the classification
classification rules. The algorithm is made up of accuracy of the obtained model [13]. After then, the
two fundamental parts: a low-level component categorization error for all models is averaged. The
and a high-level component. A low-level number of divisions of the instance set, "n," is the
component, usually referred to as a search number of times this procedure is performed.
strategy, searches for a single rule that applies to Several models are created through this cycle for
numerous circumstances [10]. A high-level analysis and repeated cross validations. Once
component, also referred to as a control developed, the model can be used to classify
procedure, repeatedly executes the lower level to incoming emails.
enforce a set of rules. Many heuristic metrics are
used in the literature to assess the quality of an II. LITERATURE REVIEW
induced rule at the low level. The two high-level
control processes that the CN2 algorithm can N. Saidani, et.al (2020) emphasized on analyzing a
employ are a technique for producing an ordered text semantic for enhancing the accuracy to detect
list of rules and a way for producing an the spam [14]. A two semantic level analysis based
unordered list of rules. The low-level part technique was investigated for detecting the spam.
generates an ordered list of rules by using Primarily, the particular domains such as healthcare,
heuristic metrics to choose the best rule in the educational and commercial sectors, were utilized
training set. During each iteration of the search for classifying the emails so that a separate
procedure, the high-level section deletes all cases conceptual view was separated for spams in every
domain. Subsequently, a set of manual and
covered by the induced (learned) rule until all
automatic semantic attributes was incorporated to
examples in the training data are covered [11]. In
detect the spam in every domain. These features
order to learn the rules for each class separately
assisted in summarizing the email content into
in an unordered set of rules, the control approach compact topics to distinguish the spam from
(high-level) is repeated. With each learned rule, authentic emails efficiently. The results depicted that
just the covered examples that are part of the rule the investigated technique offered higher efficiency
class are deleted rather than all covered examples as compared to the traditional techniques and
as is the case for an ordered list. CN2 removes provided more interpretability in results.
the circumstances that learnt rules cover in order

G. Andresini, et.al (2022) developed a novel experimental results revealed the effectiveness of
technique known as EUPHORIA for distinguishing the introduced algorithm for recognizing the
amid spam authentic reviews [15]. In this, MVL balanced and unbalanced dataset. Additionally,
(multi-view learning) was integrated with DL (deep based on some characteristics, the introduced
learning) for attaining more accuracy with regard to algorithm was capable of detecting the spam more
different information related to the content of successfully in contrast to the conventional methods.
reviews and behavior of reviewers. Two datasets of
Yelp.com – Hotel and Restaurant employed to G. Al-Rawashdeh, et.al (2019) devised a hybrid
conduct the experiments. The results validated that approach of WC (Water Cycle) and SA (Simulated
the developed technique assisted in enhancing the Annealing) implemented for optimizing the results
efficacy of DL (deep learning) algorithm to detect and to detecting the spam [19]. The groundwork,
the spam in reviews. Moreover, this technique introduction, enhancement, estimation and
offered AUC-ROC around 0.813 on initial dataset comparison quality were comprised in this
and 0.708 on second dataset. approach. The data was trained and tested using the
cross-validation and the devised approach was
C. Kumar, et.al (2023) formulated a hybrid computed on 7 datasets for classifying the spam.
mechanism called SMOTE-ENN (Synthetic This work exploited meta-heuristic called WCFS
Minority Oversampling Technique-Edited Nearest (water cycle feature selection) and 3 schemes of
Neighbor) for detecting the spam on Twitter [16]. hybridization with SA as a technique of selecting
Both the algorithms were put together for generating features. The experimental results confirmed that the
the balanced data. Different DL (deep learning) devised approach attained an accuracy 96.3%. This
methods were presented which made the approach assisted in diminishing the amount of
deployment of this data for recognizing the tweet as attributes.
spam or genuine. Moreover, classifiers namely DT
(Decision Tree), SVM (Support Vector Machine), S. A. A. Ghaleb, et.al (2022) designed a wrapper
LR (Logistic Regression) etc. were implemented. technique on the basis of MOGOA (multi-objective
The simulation and comparative analysis was grasshopper optimization algorithm) to improve the
conducted to quantify the formulated mechanism efficiency of SDS (spam detection system) [20].
with respect to different parameters. The formulated Hence, the attributes were extracted. Moreover,
mechanism performed well and the RF algorithm recently revised EGOA algorithm was utilized to
yielded an accuracy of 99.26%, recall of 99.07% and train MLP (multilayer perceptron). SpamBase,
precision of 99.49%. SpamAssassin, and UK-2011 datasets were applied
to evaluate the designed technique. The simulation
X. Liu, et.al (2021) suggested a modified outcomes demonstrated the supremacy of the
Transformer algorithm in order to detect SMS spam designed technique over other methods. In addition,
messages [17]. SMS Spam Collection v.1 dataset the accuracy of the designed technique was
and UtkMl's dataset were applied to simulate the measured 97.5% on first dataset, 98.3% on second,
suggested algorithm against diverse ML (machine and 96.4% on last dataset.
learning) algorithms. The experimental results
reported that the suggested algorithm was more D. Liu, et.al (2020) projected an innovative
effective and yielded an accuracy of 98.92%, recall detection technique in which the viewpoint of users
up to 94.51%, and F1-Score of 96.13%. Moreover, was considered and screenshots of malevolent
the suggested algorithm offered higher performance webpages were captured for invalidating the Web
on second dataset that represented its adaptability spams [21]. CNN (Convolutional Neural Network),
for dealing with other similar issues as compared to form of DNN (deep neural network) was
other methods. implemented as a classifier. The projected technique
was quantified in the experimentation. Initially, this
Z. Zhang, et.al (2020) focused on analyzing Twitter technique was compared with the other ML
spam attributes as the user attribute, content, activity (machine learning) based methods. Subsequently,
and association [18]. A new algorithm of detecting the testing of the projected technique was done for
the spam was introduced on the basis of RELM detecting the malicious websites in a real-time Web
(regularized extreme learning machine) recognized environment. The experimental outcomes revealed
as I2FELM (Improved Incremental Fuzzy-kernel- the applicability of the projected technique to a
regularized Extreme Learning Machine), for practical Web environment in contrast to the
detecting the Twitter spam in accurate manner. The traditional methods.

J. D. Rosita, et.al (2022) recommended MOGA– classified when this model helped in enhancing the
CNN–DLAS (Multi-Objective Genetic Algorithm structure of the classic CapsNet (capsule network)
and a CNN-based Deep Learning Architectural and optimizing the dynamic routing algorithm.
Scheme) method to detect the Twitter spam [22]. Hence, the established model offered higher
The MO (multi-objective optimization) procedure accuracy at higher running speed. Experimental
was integrated with selection, mutation, and cross- results reported the superiority of the stablished
layer to assist in classifying the tweets as genuine model over the existing methods for classifying and
and malevolent spam tweets. The experimental detecting the spam at accuracy of 98.72% on an
outcomes proved that the recommended method was unbalanced dataset and 99.30% on a balanced
more efficient to enhance the accuracy up to 0.17, dataset.
precision around 0.13, recall of 0.10 and F-score of
0.19 and mitigate the RMSE around 19%, MAD of A. S. Mashaleh, et.al (2022) introduced a new
16%, and MAE of 21% method in which HHO (Harris Hawks optimizer)
algorithm was combined with the KNN (K-Nearest
X. Tong, et.al (2021) established a CapsNet (capsule Neighbor) algorithm for classifying the spam [24].
network) model in which LSA (long-short attention) HHO algorithm was based on cooperative relations
mechanism was adopted for attaining higher of Harris’ Hawks. The introduced algorithm assisted
efficacy to detect Chinese spam [23]. The text was in handling the data of higher dimensionality.
represented using a MCS (multi-channel structure) Moreover, its accuracy was counted higher in
on the basis of LSA mechanism for capturing the comparison with the traditional methods. According
complex text attributes in spam and generating the to the experimental results, the introduced method
contextual word vectors with more semantic yielded an accuracy of 94.3% for classifying and
information. The attributes were mined and detecting the spam.
2.1 Comparison Table
Author Year Technique Used Results Limitations

N. Saidani, et.al 2020 A two semantic level The results depicted that The major task was of
analysis-based the investigated maintaining the
technique technique offered higher efficacy to filter the
efficiency as compared spam in the long run.
to the traditional
techniques and provided
more interpretability in
results.
G. Andresini, et.al 2022 EUPHORIA The results validated This technique had
that the developed not any online
technique assisted in learning phase due to
enhancing the efficacy which it was
of DL (deep learning) incapable of
algorithm to detect the periodically
spam in reviews. augmenting the
Moreover, this trained classifier after
technique offered AUC- the recording of new
ROC around 0.813 on reviews over time.
initial dataset and 0.708
on second dataset.
C. Kumar, et.al 2023 A hybrid mechanism The formulated When the amount of
called SMOTE-ENN mechanism performed spam tweets was
well and the RF maximized, the
efficiency of the
algorithm yielded an
formulated
accuracy of 99.26%, mechanism was
affected.

recall of 99.07% and

precision of 99.49%.
X. Liu, et.al 2021 Modified Transformer The experimental results The utilized datasets
algorithm revealed the had only thousands of
effectiveness of the messages which led to
introduced algorithm for provide false
recognizing the prediction in diverse
balanced and scenarios.
unbalanced dataset.
Additionally, based on
some characteristics, the
introduced algorithm
was capable of detecting
the spam more
successfully in contrast
to the conventional
methods.
Z. Zhang, et.al 2020 I2FELM (Improved The experimental results This algorithm was
Incremental Fuzzy- revealed the ineffective to analyze
kernel-regularized effectiveness of the the semantic and
emotional data.
Extreme Learning introduced algorithm for
Machine) recognizing the
balanced and
unbalanced dataset.
G. Al-Rawashdeh, 2019 a hybrid approach of The experimental results The devised approach
et.al WC (Water Cycle) and confirmed that the was not applicable on
SA (Simulated devised approach all the applications.
Annealing) attained an accuracy
96.3%. This approach
assisted in diminishing
the amount of attributes.
S. A. A. Ghaleb, 2022 wrapper method The accuracy of the This method was not
et.al designed technique was useful to detect
measured 97.5% on first malevolent attacks
dataset, 98.3% on
namely phishing and
second, and 96.4% on
last dataset. botnets
D. Liu, et.al 2020 an innovative detection The experimental This technique
technique outcomes revealed the worked slowly and
applicability of the inflexible to large-
projected technique to a scale detection.
practical Web
environment in contrast
to the traditional
methods.
J. D. Rosita, et.al 2022 MOGA–CNN–DLAS The experimental The multi-objective
(Multi-Objective outcomes proved that optimization was not
Genetic Algorithm and the recommended possible using the
method was more recommended
a CNN-based Deep
efficient to enhance the method.
Learning Architectural accuracy up to 0.17,
Scheme) method precision around 0.13,
recall of 0.10 and F-
score of 0.19 and
mitigate the RMSE

around 19%, MAD of

16%, and MAE of 21%
X. Tong, et.al 2021 CapsNet (capsule Experimental results The employed dataset
network) model reported the superiority was relatively old and
of the stablished model ineffective of
over the existing
reflecting the
methods for classifying
and detecting the spam attributes of the latest
at accuracy of 98.72% spam.
on an unbalanced
dataset and 99.30% on a
balanced dataset.
A. S. Mashaleh, 2022 A new method According to the Some of its metrics
et.al experimental results, the were not optimized
introduced method due to which the
yielded an accuracy of performance was
94.3% for classifying found poor.
and detecting the spam.
Computing (IACC), Tiruchirappalli, India, 2019, pp.

69-74,
CONCLUSION
A surge in the number of spammers and spam emails
has been noticed in recent years, as the investment [3] N. A. Farahisya and F. A. Bachtiar, "Spam Email
required for the spamming business is minimum. Detection with Affect Intensities using Recurrent
This has led to a system that finds each email Neural Network Algorithm," 2022 2nd International
suspicious, causing substantial investments in Conference on Information Technology and
defence mechanisms. The most commonly used Education (ICIT&E), Malang, Indonesia, 2022, pp.
mail filtering schemes are Knowledge Engineering 206-211
(KE) and Machine Learning (ML). The approaches
based on KE generate a set of rules so as to classify [4] P. Thakur, K. Joshi, P. Thakral and S. Jain,
messages as spam or genuine mail. The email spam "Detection of Email Spam using Machine Learning
detection has various phases like feature extraction Algorithms: A Comparative Study," 2022 8th
and classification. The various schemes are International Conference on Signal Processing and
analyzed in this paper for the email spam detection. Communication (ICSC), Noida, India, 2022, pp.
It is analyzed that the machine learning algorithms 349-352,
are best performing algorithms as compared content
filtering techniques. [5] S. Nandhini and J. Marseline K.S., "Performance
Evaluation of Machine Learning Algorithms for
REFERENCES Email Spam Detection," 2020 International
Conference on Emerging Trends in Information
Technology and Engineering (ic-ETITE), Vellore,
[1] K. Debnath and N. Kar, "Email Spam Detection India, 2020, pp. 1-4
using Deep Learning Approach," 2022 International
Conference on Machine Learning, Big Data, Cloud [6] R. Amin, M. M. Rahman and N. Hossain, "A
and Parallel Computing (COM-IT-CON), Bangla Spam Email Detection and Datasets Creation
Faridabad, India, 2022, pp. 37-41 Approach based on Machine Learning Algorithms,"
2019 3rd International Conference on Electrical,
[2] S. Suryawanshi, A. Goswami and P. Patil, "Email Computer & Telecommunication Engineering
Spam Detection: An Empirical Comparative Study (ICECTE), Rajshahi, Bangladesh, 2019, pp. 169-
of Different ML and Ensemble Classifiers," 2019 172
IEEE 9th International Conference on Advanced

[7] S. Shrivastava and R. Anju, "Spam mail [15] G. Andresini, A. Iovine and A. Appice,
detection through data mining techniques," 2017 “EUPHORIA: A neural multi-view approach to
International Conference on Intelligent combine content and behavioral features in review
Communication and Computational Techniques spam detection”, Journal of Computational
(ICCT), Jaipur, India, 2017, pp. 61-64 Mathematics and Data Science, vol. 7, no. 4, pp.
170003-170011, 22 April 2022
[8] W. Peng, L. Huang, J. Jia and E. Ingram,
"Enhancing the Naive Bayes Spam Filter Through [16] C. Kumar, T. S. Bharti and S. Prakash, “A
hybrid Data-Driven framework for Spam detection
Intelligent Text Modification Detection," 2018 17th
in Online Social Network”, Procedia Computer
IEEE International Conference On Trust, Security
Science, vol. 218, pp. 124-132, 31 January 2023
And Privacy In Computing And Communications/
12th IEEE International Conference On Big Data
[17] X. Liu, H. Lu and A. Nayak, "A Spam
Science And Engineering (TrustCom/BigDataSE),
Transformer Model for SMS Spam Detection," in
New York, NY, USA, 2018, pp. 849-854 IEEE Access, vol. 9, pp. 80253-80263, 2021
[9] S. E. Rahman and S. Ullah, "Email Spam [18] Z. Zhang, R. Hou and J. Yang, "Detection of
Detection using Bidirectional Long Short Term Social Network Spam Based on Improved Extreme
Memory with Convolutional Neural Network," 2020 Learning Machine," in IEEE Access, vol. 8, pp.
IEEE Region 10 Symposium (TENSYMP), Dhaka, 112003-112014, 2020
Bangladesh, 2020, pp. 1307-1311,
[19] G. Al-Rawashdeh, R. Mamat and N. Hafhizah
[10] R. P. Cota and D. Zinca, "Comparative Results Binti Abd Rahim, "Hybrid Water Cycle
of Spam Email Detection Using Machine Learning Optimization Algorithm With Simulated Annealing
Algorithms," 2022 14th International Conference on for Spam E-mail Detection," in IEEE Access, vol. 7,
Communications (COMM), Bucharest, Romania, pp. 143721-143734, 2019
2022, pp. 1-5
[20] S. A. A. Ghaleb et al., "Feature Selection by
[11] N. Nisar, N. Rakesh and M. Chhabra, "Voting- Multiobjective Optimization: Application to Spam
Detection System by Neural Networks and
Ensemble Classification for Email Spam Detection,"
Grasshopper Optimization Algorithm," in IEEE
2021 International Conference on Communication
Access, vol. 10, pp. 98475-98489, 2022
information and Computing Technology (ICCICT),
Mumbai, India, 2021, pp. 1-6
[21] D. Liu and J. -H. Lee, "CNN Based Malicious
Website Detection by Invalidating Multiple Web
[12] V. Vishagini and A. K. Rajan, "An Improved Spams," in IEEE Access, vol. 8, pp. 97258-97266,
Spam Detection Method with Weighted Support 2020
Vector Machine," 2018 International Conference on
Data Science and Engineering (ICDSE), Kochi, [22] J. D. Rosita P and W. S. Jacob, “Multi-Objective
India, 2018, pp. 1-5 Genetic Algorithm and CNN-Based Deep Learning
Architectural Scheme for effective spam detection”,
[13] T. Toma, S. Hassan and M. Arifuzzaman, "An International Journal of Intelligent Networks, vol.
Analysis of Supervised Machine Learning 10, no. 2, pp. 5207-5222, 2 February 2022
Algorithms for Spam Email Detection," 2021
International Conference on Automation, Control [23] X. Tong et al., "A Content-Based Chinese Spam
and Mechatronics for Industry 4.0 (ACMI), Detection Method Using a Capsule Network With
Rajshahi, Bangladesh, 2021, pp. 1-5 Long-Short Attention," in IEEE Sensors Journal,
vol. 21, no. 22, pp. 25409-25420, 15 Nov.15, 2021
[14] N. Saidani, K. Adi and M. S. Allili, “A
semantic-based classification approach for an [24] A. S. Mashaleh, N. F. B. Ibrahim and Q. M.
enhanced spam detection”, Computers & Security, Yaseen, “Detecting Spam Email with Machine
vol. 11, no. 2, pp. 6594-6609, 9 January 2020 Learning Optimized with Harris Hawks optimizer
(HHO) Algorithm”, Procedia Computer Science,
vol. 201, pp. 659-664, 27 April 2022


(IJCST-V11I2P16) :shikha, Jatinder Singh Saini

Uploaded by

Copyright:

Available Formats

(IJCST-V11I2P16) :shikha, Jatinder Singh Saini

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(IJCST-V11I2P16) :shikha, Jatinder Singh Saini

Uploaded by

Copyright:

Available Formats

International Journal of Computer Science Trends and Technology (IJCST) – Volume 11 Issue 2, Mar-Apr 2023

RESEARCH ARTICLE OPEN ACCESS

A Review of Various Machine Learning Models for Email

I. INTRODUCTION a set of semantic qualities. Then, each set of

ISSN: 2347-8578 www.ijcstjournal.org Page 82

ISSN: 2347-8578 www.ijcstjournal.org Page 83

ISSN: 2347-8578 www.ijcstjournal.org Page 84

ISSN: 2347-8578 www.ijcstjournal.org Page 85

2.1 Comparison Table

Author Year Technique Used Results Limitations

ISSN: 2347-8578 www.ijcstjournal.org Page 86

recall of 99.07% and

ISSN: 2347-8578 www.ijcstjournal.org Page 87

around 19%, MAD of

Computing (IACC), Tiruchirappalli, India, 2019, pp.

ISSN: 2347-8578 www.ijcstjournal.org Page 88

ISSN: 2347-8578 www.ijcstjournal.org Page 89

ISSN: 2347-8578 www.ijcstjournal.org Page 90

You might also like