0% found this document useful (0 votes)

179 views

Final ML Report

This document is a mini project report on developing a fraud detection system. It was created by Nikita Lawande, Prakarsa Dahat, and Riya Thakur under the guidance of Mrs. Puja Padia at Ramrao Adik Institute of Technology, University of Mumbai. The report includes an abstract, contents, introduction, literature survey on existing systems, details of the proposed fraud detection system using machine learning algorithms like SVM and Naive Bayes, results and discussion. The goal is to classify emails as fraudulent or legitimate to protect users from phishing attacks and identity theft.

Uploaded by

ackrin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views

Final ML Report

Uploaded by

ackrin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Mini Project Report on

FRAUD DETECTION SYSTEM

NIKITA LAWANDE(16CE1045)
PRAKARSHA DAHAT(16CE1057)
RIYA THAKUR (16CE2025)
Under the guidance of
Mrs PUJA PADIA

Department of Computer Engineering

Ramrao Adik Institute of Technology

Dr. D. Y. Patil Vidyanagar, Nerul, Navi Mumbai

University of Mumbai
April 2019
Ramrao Adik Institute of Technology
Dr. D. Y. Patil Vidyanagar, Nerul, Navi Mumbai

CERTIFICATE
This is to certify that Mini Project report entitled

FRAUD DETECTION SYSTEM

By
NIKITA LAWANDE(16CE1045)
PRAKARSHA DAHAT(16CE1057)
RIYA THAKUR (16CE2025)

is successfully completed for Third Year Computer Engineering as prescribed

by University of Mumbai.
Supervisor ProjectCo-ordinator

(Name of ProjectCoordinator)

Mrs PUJA PADIA

HeadofDepartment Principal

(Dr.LeenaRagha) (Dr. RameshVasappanavara)

Mini Project Report Approval

This is to certify that the Mini Project entitled “FRAUD DETECTION SYSTEM” is a bonafide
work done by PRAKARSHA DAHAT, NIKITA LAWANDE AND RIYA THAKUR under the
supervision of Mrs. PUJA PADIA This Mini Project has been approved for third Year Computer
Engineering.

Internal Examiner:

1...............................

2...............................

External Examiners:

1...............................

2...............................

Date :. . . /. . . /. . . . . .

Place :. . . . . . . . . . . .
DECLARATION

I declare that this written submission represents my ideas and does not involve plagiarism. I
have adequately cited and referenced the original sources wherever others ideas or words have
been included. I also declare that I have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. I understand that any violation of the above will be cause for disciplinary action
against me by the Institute and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission has not been taken when needed.

Date:

NIKITA LAWANDE(16CE1045)
RIYA THAKUR (16CE2025)
PRAKARSHA DAHAT (16CE1057)
Abstract

Fraud is Wrongful deception with the intent to gain personally or financially.It is

Intentional deception in order to persuade another person to part with something of
value .A person who pretends to be something or someone he is not .Phishing is a
fraud technique used for identity theft where users receive fake e-mails from
deceiving addresses that seem as belonging to legitimate and real business in an
attempt to steel the receiver’s personal details. This act endangers the privacy of
many users and therefore, researchers work continuously on finding detection tools
and developing existing ones. Classification is one of the machine learning
techniques that can be effectively used to detect received phishing emails.
The broad category fraudulent email can be the one that bothers the receiver means
receiver is not interested. It is intended for deception purpose. It is intended to get
crucial information from receiver. It may contain virus that harms receiver’s
computer. It may redirect receiver to illegitimate web site.
Contents

List of Tables

List of Figures

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Organization of report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Survey 3
2.1 Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Outcome of Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Proposed System 5
3.1 Proposed Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 5
3.1.2 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Proposed Methodology/Techniques . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.1 Sample and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2 Variables measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Design of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Hardware/Software Requirement . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.1 Hardware System configuration: . . . . . . . . . . . . . . . . . . . . . 7
3.4.2 Software System configuration: . . . . . . . . . . . . . . . . . . . . . 7

4 Results and Discussion 13

4.1 Result and Analysis .................................................................................................... 13

4.2 Discussion .................................................................................................................. 14

5 Conclusion and Future Works 16

5.1 Conclusion ................................................................................................................. 16

5.2 Future Work ........................................................................................................................ 16
List of Tables

4.1 Distinguishing Fraud emails and legitimate emails ....................................................... 13

4.2 Distinguishing Various types of algorithm used ............................................................ 13
4.3 Distinguishing various methods ...................................................................................... 14
List of Figures

3.1 Tweet Extraction Pipeline …………………………………………………………...9

3.2 Word Cloud for Control ............................................................................................. 10
3.3 Word Cloud for farud emails ..................................................................................... 11
3.4 Word Cloud for malicious words.................................................................................. 11
3.5 Loading Data .............................................................................................................. 11
3.6 Analysis ...................................................................................................................... 12
3.7 Testing Samples ......................................................................................................... 12
3.8 Interactive CLI ............................................................................................................ 12

iii
Chapter 1

Introduction

1.1 Overview
Machine learning is an application of artificial intelligence that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed. This project requires a classification algorithm
(supervised learning) for predicting the fraud emails. There are many algorithms
available to do so. Naïve Bayes algorithm was found to be the most accurate
algorithm as it outclasses the performance of other algorithms. Naive Bayes
classifiers are a collection of classification algorithms based on Bayes Theorem.
It is not a single algorithm but a family of algorithms where all of them share a
common principle, i.e. every pair of features being classified is independent of
each other.

1.2 Motivation
The harmful effects of spam emails could be extent to access the user's
confidential details, which could result in financial losses for users and even
prevent them from access their own accounts. Therefore, we will quantify and
qualify the spam email features to prevent and mitigate the risk of fraudulent
emails. Hence the need of developing a fraud email detector is necessary to avoid
loss of confidential data.

1.3 Problem definition

The aim of the research is to separate fraudulent emails from the normal ones,
with the intention that the receiver may not get affected from the fraudulent
email in due course. The fraudulent emails often contain certain words, that, the
receiver performs specific actions instantly which are harmful and result in
frauds .We consider detection of the fraudulent email as a classification
problem.

iv
1.4 Objectives
The objectives of this project are as follows:
Determine and evaluate the best set of features to be used for fraud Emails
detection using Manual feature selection based on the Email structure and
automated selection techniques. To determine the best classification algorithm
for email fraud detection. The idea is to learn the differences between legit and
malicious emails and predict the nature of an unseen email later..

1.5 Organization of the project

The remainder of this project discusses a fraud detection platform trained on the
open dataset provided by Eron , Ling spam and nigerian fraud emails. Section 2
describes related work in the field of fraud detection, and work that has been
done to classify fraud emails. Section 3 describes the implementation of the
system, and the pipeline for data processing. Next, section and provides an in-
depth look at the machine learning framework used in the final version of the
fraud detetction program.Section 4 provides an overall assessment of the results
of this project and provides future work that can be done to improve this
project. The paper concludes with Section 5, which offers concluding remarks
on the project as a whole.

Chapter 2

Literature Survey
Some work has been done on searching out the spam and fraud emails on the
internet using various algorithm. The most famous fraud email detection filter
“Spambayes” used by Microsoft outlook as a plug-in uses Baye’s theorem and
uses keyword based approach for fraud email detection. They note that the
internet is an effective tool for the solicitation of fraud emails, and that there

v
are systematic patterns in the ways that these contens of messages in the emails
are dissemi- nated. Thus, we can ascertain that there may be patterns in such
type of emails that are detectable by some form of classifier.

2.1 Existing Systems

ALGORITHM COMPARISON

ALGORITHM RECALL(%) PRECISION(%) ACCURACY(%)

Naive Bayes 98.46 99.66 99.46
SVM 95 93.12 96.90
K-NEAREST 97.14 87.00 96.20
LABEL
NEURAL 96.92 96.02 96.83
NETWORK
RANDOM 92.66 87.07 92.31
FOREST

In this section, our results are compared with some of the related work. Of the
references mentioned throughout this paper, we only compare our results with the
commen- surate ones—those that used the same dataset. Moreover, the compared
results are evaluated with a student t-test with a significance level set to 5% (i.e.,
α = 0.05) to report if the differences are significant.We have mixed results for the
CSDMC2010 dataset. Qaroush et al. [19], for instance, investigated the
performance of several learning algorithms on this dataset. They concluded that
RF outperforms the rests. The reported spam recall in their paper is 0.958, which
is significantly better than what we found (0.912) (Table VIa). Whereas their
precision is similar to that of our approach, because of their high recall, their
0.958 F-score also outperforms our F-score of 0.922 (Table VIa). Surprisingly,
we outperform them if we do a cost-sensitive analysis of our data. The AUC that
we found for the dataset is 0.988 (Table VIa) which is better than what they found
(0.981). An SVM-based spam filter developed by Yang et al. [28], on the other

vi
hand, reported 0.943 precision, 0.965 recall, and a promising AUC of 0.995.
Among the three measures, we only obtained a better precision. Their second
anti-spam filter uses an NB classifier. This filter, interestingly, achieved 100%
recall. Its precision of 0.935 and AUC of 0.976, however, was outperformed by
our approach (Table VIa). Note that, the differences in the results are statistically
significant. By using 328 features, the filter developed by Ma et al. generates a
Neural Network classifier. On the SpamAssassin dataset, they reported that both
their precision and accuracy was 0.920. On the other hand, our approach achieved
a 0.948 precision and 0.957 accuracy. Both of these results are statistically
significant. Another Neural Network based filter developed by Srisanyalak and
Sornil [29] uses immunity- based features from emails. The filter has been
reported to be accurate 92.4% of the time. Our reported accuracy is better than
this (Table VIb). The phenomenal FPR and FNR achieved by the filter developed
by Bratko et al. (FPR=0.001 and FNR=0.012) indicates that our approach needs
further improvement in these measures; our reported FPR and FNR are 0.023 and
0.079, respectively (Table VIb).
From previous studies, we found that the performance of the filters are relatively
low on the LingSpam dataset. Prab- hakar and Basavaraju [10], for instance,
applied K-NNC and a data clustering algorithm called BIRCH on this dataset.
Their filter achieved 0.698 precision, 0.637 recall, 0.828 specificity, and an
accuracy of 0.755. In contrast, the data in Table Vic show that our approach has
a precision of 0.944 with 0.838 recall, 0.990 specificity (1 − FPR), and 0.960
accuracy. Our reported AUC on LingSpam also outperformed that reported by
Cormack and Bratko [30]; our AUC of 0.986 is significantly better than their
AUC of 0.960. The recall we have on this dataset is much better than that reported
by Yang et al. [28]; the precisions, however, are similar. Their NB-based filter
achieved 0.943 precision and 0.820 recall. Surprisingly, the AUC of their filter
(e.g., 0.992) significantly outperformed the AUC of our approach (Table VIc).

As mentioned in Section VI-B, our results with the Enron- Spam dataset are not
satisfactory because of the properly balanced property of the dataset. The curators
of the dataset, however, reported a spectacular spam recall of 0.975 [9] while our
best spam recall on the dataset is 0.929 with BOOSTED RF.
Moreover, their reported ham recall is 0.972; ours is a mere 0.842 (Table VId).
However, we have recently surpassed the results reported by Metsis et al. [9]
using an anti-spam filter named SENTINEL [31] that we have developed using
the ideas presented in this paper.

2.2 Outcome of Literature Survey

vi
i
Various research papers were studied based on our topic. Techniques used were
The naïve Bayes classifier, New convolution neural network based multimodal
fraud email detection algorithm, Back propagation Algorithm. On studying the
research papers, we found that the best suited algorithm was The Naïve Bayes
classifier since it works well with large amount of data and can predict the result
accurately.

Chapter 3

Proposed System
3.1 Proposed Work
In this project, an efficient multiclass Naïve Bayes algorithm is used for
prediction of illegitimate emails by training it on a set of data before
implementation. Fraud emails can result in loss of confidential information and
cause harms to the recipient. The algorithm helps to identify the malicious emails
and warn the recipient of the fraud that could take place due to either opening
the email or clicking on the links or websites attached to the particular email sent
by the attacker.
The first approach taken used a Naive Bayes algorithm. Naive Bayes
constructs class labels in classifier problems by looking at interrelations. The
algorithm leverages Bayes’ theorem and conditional probabilities. For each
email text T, we look for the class label that maximizes the posteriori
probabilities. This dataset involves only two potential labels in a complex
dataset. Many of the conventional methods for improving Naive Bayes
performance, like the inclusion of tf-idf scores, were very effective for this
implementation because each individual document had many tokens.
Additionally, each email had varying length of tokens and some emails had
unnecessarily high number of tokens. Thus, td-if scores werr considered,
irrespective of the length of the email.

3.2 Proposed Methodology/Techniques

vi
ii
Naïve Bayes Algorithm is a conditional probability type of classifier under the
Bayes’ theorem, which describes the probability of an event based on prior
knowledge of conditions related to the event.
For example, if phishing emails are arrived at due existence of phishing email
keywords, then a particular keyword can be used to more accurately assess the
probability that a particular email is indeed a phishing email, compared to the
assessment of the probability of phishing emails made without considering that
particular keyword.

 i) Posterior probability (𝑪| 𝒙)-the probability of the target 𝑪 (in our case
could be the probability of fraud email) given the predictor 𝒙 (a Keyword
fed to the classifier).
 (ii) (𝑪) - The prior probability of the target.
 (iii)(𝒙 | 𝑪) - The likelihood i.e. the probability of the predictor given the
target.
 (iv)(𝒙) - The probability of the predictor

3.3 Design of the System

It has jupyter where we type text and get the reply from the system.
System uses the data-set to recognize the emails received by user
and generates the response.
The Email Fraud Detector generates responses by fetching dataset.

3.4 Hardware/Software Requirement

 Processor - Intel i5
 Speed - 1.9 GHz
 RAM - 4 GB
 Hard disk - 1 GB ROM(min)
 Key board - Standard Keyboard
 Mouse - Two or three button
 Monitor – SVGA

3.5 Software system configuration:

 Notebook- Jupyter
 Programming language- Python

ix
 Frontend- Tkinter
 Dataset- CSV file format.

3.6 Implementation Details:

STEP 1:Access the legitimate and fraud text files and combine them into .csv
file

Fig. 3.6.1

The individual fraudulent and legitimate emails are stored as txt files in the spam and ham folder. By
using the os module in python, the individual txt files are fetched iteratively. The fetched emails are
stored in the dataframe with column names Text and Class. The emails from the ham folder are
given class 0 wheras emails from spam folder are assigned class as 1. This dataframe is randomized
and then saved as a .csv file. The top 5 entries of the dataframe are viewed as shown in Fig.3.6.1.

STEP 2: Remove the subject of mail and keep only body.

x
Fig.3.6.2

The headers of the columns i.e ‘Text’ and ‘Class’ are fetched and stored in a dictionary. The
information of the header is retrieved. Only the Text column from the dataframe is stored in a
variable ‘message’. The email strings in the Text column is converted to message object structure.
Finally the body of the messages is retrieved using the get_payload() function which then replaces
the emails in the Text column. The top 5 entries of this modified dataframe are viewed as shown in
Fig.3.6.2.This modified dataframe is saved in the machine as a .csv file.

STEP 3: Combine the second sample of emails, randomize and store as new
.csv file
Remove the Null values

xi
Fig.3.6.3

The .csv file with the Nigerian fraud emails is combined with the previously modified .csv file using
the command line prompt and saved as a new .csv file on our system. The final .csv file with both the
email dataset is fetched in a dataframe which is randomized. The number of null values in our final
dataframe is viewed. Using the dropna() command the null rows are dropped and dataframe is
saved using the inplace() function as shown in Fig.3.6.3.

STEP 4:Remove duplicate entries and extra labels.

xi
i
3.6.4

Next, the number of duplicate rows in our dataframe is checked. As shown in Fig.3.6.4 we can see
that our dataframe has 1884 duplicate rows. These rows are dropped using the drop_duplicates()
command. Further we check the extra labels in our dataframe. We have only 2 classes ‘1’ and ‘0’ in
our dataset but in Fig.3.6.4 we can see that our dataframe has 3 class labels. So we drop the extra
label and then save our dataframe.

xi
ii
STEP 5: Save the cleaned data in new .csv file. Split the cleaned data into
training and testing sample with 70% and 30% data respectively.

3.6.5
Here the cleaning of our data is complete and we save the dataframe as a .csv
file. The cleaned data is divided into test and train set. The email body and the
class labels of the test and train sets are saved separately in x_train, x_test,
y_train, y_test respectively. The splitting is done using the inbuilt
train_test_split() function of python in the ratio 7:3. The length of the train and
test sets are printed as shown in Fig.3.6.5.
STEP 6: Preprocess the training and testing data to make it lower case, remove
html tags, punctuations, stop words and stem the data.

xi
v
Fig.3.6.6

We create a preprocess function. This function recognizes the punctuation marks in the email body
and strips the email body of all punctuation marks iteratively. The whole body is changed to
lowercase using the the lower() function. The words with same meaning are stemmed using the
Snowball stemmer in python. The stemmed words are then lemmatized using the WordNet
lemmatizer. All the unnecessary stop words are removed from the body. As shown in Fig.3.6.6, the
training and test datasets are preprocessed usung this function and the first 5 entries are viewed.

STEP 7: Vectorize the training and testing data. Train the Naïve Bayes model
with the vectorized training data and training labels. Test the model with
vectorized testing data and find its accuracy, confusion matrix.

xv
Fig.3.6.7

The preproessed training data is vectorized using the CountVectorizer function with maximum
features as 8000. The tf-idf values are calculated of the vectors of these top 8000 features using the
TfidfTransformer function. The classifier MultinomialNB() is called. This classifier fits the tf-idf values
to the training classes and trains our model. The test set is again vectorized like the training set and
given to our trained model to calculate its accuracy. From Fig.3.6.7 we can see that the accuracy of
our model is 95.832%.

STEP 8:Find classification metric for our model.

Fig.3.6.8

xv
i
To find the classification metrics of our trained model,weuse the classification_report function
available in python. As shown in Fig.3.6.8, this function inputs the test lables and predicted test
labels and outputs the classification metrics. The classification metrics include the precision, recall,
f1-score and support of our trained model.

STEP 9: Take input to classify the email.

Fig.3.6.9

Finally we deploy our model using the tkinter module of python. As shown in Fig.3.6.9, the user is
asked to enter an email in the text box field. This input is fetched by the text box field of the tkinter
module and stored in the variable which is forwarded to our trained model.

xv
ii
STEP 10: Display the result of the classification.

Fig 3.6.10

This input email from the user goes through all the steps that our training and testing datasets went
through. Finally as shown in Fig.3.6.10, our trained model predicts the class of the user entered
email and display the result as either Legitimate or Fraud.

xv
iii
xi
x
Chapter 4

4.1 Result and Analysis

When the user receives any email the naïve Baye’s algorithm is applied on it
and it separates the fraudulent emails(spam) from the normal ones. The spam is
predicted based on the calculated probability. It was found that Naïve Bayes
works well on the emails selected by the user and gives the accurate predicted
fraud emails.
The interdependent nature of these spams’ attributes disagrees with the
conditional independence assumption of this algorithm, which can sometimes
cause mistakes or errors when classifying emails, further affecting the accuracy
of Naive Bayes Classifiers. Furthermore, the value of core parameters may
cause different results in NBC, such as the default probability of spams and
threshold values. At the same time, corresponding issues comes up. To fix
these issues and enhance the performance of spam filtering, Machine Leaning
(ML) is used in resent researches and applications. Higher accuracy The first
feature of ML is the supervision mechanism. The outputs can be used as the
references to the inputs, which is called feedback. For example, the Back
propagation algorithm is one of the most famous methods in image recognition
[6]. In the spam filtering, ML could help with refining the prior parameters, to
tune the filtering results.

Chapter 5

5.1 Conclusion
To sum up, we consider the task of email classification as a supervised
machine-learning problem. The novelty of this work is the use of a set of
features related to the readability of email texts. Because the features are
language-independent, the method reported in this paper is potentially able to
classify emails written in any language. The aforementioned features as well as
the traditional ones are used to generate binary classifiers by five well-known
learning algorithms. We then evaluate the classifier performances on four
benchmark email datasets. The evidence from this study suggests that although
traditional features are individually more important than the other feature types,

xx
the combination of all of the features produces the optimal results. Extensive
experiments also imply that classifiers generated using meta-learning algorithms
perform better than trees, functions, and probabilistic methods. Finally, we
compare the results of our method with that of many stateof-the-art anti-spam
filters. Although the performance of our method is not always superior to other
filter-dataset instances, we find that our approach surpasses a number of them.
Taken together, the results suggest that the method described in this paper can
be a good means to classify spam emails. Because our results suggest that meta-
learning algorithms perform the best, further tests should be carried out to see
the performance of classifiers generated by stacking several algorithms .

5.2 FUTURE
Even though this algorithm is more sophisticated and effective in real life, it still
has certain limitations, which accounts for the mistakes made by these anti-
spam filters, as users can find a spam email in their “Inbox” sometimes while
finding certain legitimate emails in the “Spam” section once in a while. One
significant weakness of this algorithm is the conditional independence
assumption, which is the premise to use Bayes Theorem. Often in reality,
events, or attributes, are interrelated to and dependent on each other. Thus, the
presence or absence of each word or phrase have an impact on the presence or
absence of other words or phrases as words of relevant topics appear together,
vice versa.
Looking towards future developments, with the ever-advancing Artificial
Intelligence technology and the efforts of countless mathematicians, computer
scientists, and researcher, the Naive Bayes Classifier will certain evolve and
improve overtime to cater to the needs of email users. One possible direction is
the better classifying criteria. Currently, the Naive Bayes Classifier simply
computes the probability of an email being spam based solely on the presence
and absence of words and phrases. However, as the field of AI makes more
progress on the robots’ ability to understand semantic meanings of languages,
better classification decisions can be made considering the more complex
defining properties of spam emails as decided by the meanings of sentences.
Therefore, a future with enhanced and userfriendlier spam email filters is in
prospect.

xx
i
Bibliography
[1] http://airccse.org/journal/jcsit/0211ijcsit12.pdf
[2] https://meu.edu.jo/libraryTheses/590422b4d5dd8_1.pdf
[3] https://aip.scitation.org/doi/abs/10.1063/1.5038979
[4] https://www.sciencedirect.com/science/article/pii/
S1110866514000280
[5] https://su-plus.strathmore.edu/handle/11071/5616
[6] Yuanyuan Grace Zeng, “Identifying Email Threats Using
Predictive Analysis”, Rep. No. 08074848.
[7] Rushdi Shams and Robert E. Mercer, “Classifying Spam Emails
using
Text and Readability Features”, Department of Computer Science,
University of Western Ontario.
[8] Reshma Varghese and Dhanya K.A, “Efficient Feature Set for Spam
Email
Filtering”, IEEE 7th International Advance Computing Conference,
2017.
[9] Ion Androutsopoulos, John Koutsias, Konstantinos V. Chandrinos,
George
Paliouras and Constantine D. Spyropoulos, “An Evaluation of Naive
Bayesian Anti-Spam Filtering”, 11th European Conference on Machine
Learning, Barcelona, Spain, pp. 9-17, 2000.

xx
ii
Acknowledgements
We would like to express our deepest appreciation to all those who provided us
the possibility to complete this report. We wish to express sincere gratitude to,
Dr. Ramesh Vasappanavara , Principal, R.A.I.T. College, Nerul. We owe a deep
sense of gratitude to Dr. Leena Ragha, Head of Department, Computer
Engineering, R.A.I.T. College, Nerul . A special gratitude we give to our guide
Mrs. Puja Padiya, and evaluator Mrs. Trupti Patil whose contribution in
stimulating suggestions and encouragement, helped us to coordinate the project
especially in writing this report. Lastly, we would like to extend our thanks to
the faculties of the college who have directly or indirectly helped us in
exploring this topic and completing the study of ‘Email Fraud Detector’.

xx
iii
Date:

xx
iv

ABA Language and Cognition PDF
100% (1)
ABA Language and Cognition PDF
377 pages
Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Kidney Stone Detection From Ultra Sound Images by Using Canny Edge Detection and CNN Classification
No ratings yet
Kidney Stone Detection From Ultra Sound Images by Using Canny Edge Detection and CNN Classification
80 pages
PR3197 - DiseasePredictionUsingMachineLearning - Report - MAYUR SHIVAKU
No ratings yet
PR3197 - DiseasePredictionUsingMachineLearning - Report - MAYUR SHIVAKU
51 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
Ravi Internship Report
No ratings yet
Ravi Internship Report
39 pages
Visvesvaraya Technological University: City Engineering College
No ratings yet
Visvesvaraya Technological University: City Engineering College
31 pages
Industrial Training Report
No ratings yet
Industrial Training Report
24 pages
Internship - Report Nithin
No ratings yet
Internship - Report Nithin
25 pages
Project Report
100% (1)
Project Report
63 pages
Major Project Report
No ratings yet
Major Project Report
100 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
TIE Report
No ratings yet
TIE Report
54 pages
Internship Report (Data Science)
No ratings yet
Internship Report (Data Science)
32 pages
Final Intership Report
No ratings yet
Final Intership Report
32 pages
Virtual Mirror - A Hassle Free Approach To The Use of Trial Room
No ratings yet
Virtual Mirror - A Hassle Free Approach To The Use of Trial Room
38 pages
Java Full Stack Internship Report
No ratings yet
Java Full Stack Internship Report
2 pages
Kumar Mu Tie Rep
No ratings yet
Kumar Mu Tie Rep
30 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
Deep Learning Based Car Damage Detection, Classification and Severity
No ratings yet
Deep Learning Based Car Damage Detection, Classification and Severity
7 pages
Disease Prediction Using ML
100% (1)
Disease Prediction Using ML
43 pages
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
No ratings yet
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
6 pages
AIML Internship Report
No ratings yet
AIML Internship Report
53 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Weather Prediction 2
No ratings yet
Weather Prediction 2
33 pages
Fruit Old
No ratings yet
Fruit Old
37 pages
Aws Intern
No ratings yet
Aws Intern
49 pages
A Project Report
No ratings yet
A Project Report
18 pages
Additional Blue Light in Traffic Signal Light To Indicate Traffic Ahead
No ratings yet
Additional Blue Light in Traffic Signal Light To Indicate Traffic Ahead
73 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Rotten Fruit Vegetable Detector Machine
No ratings yet
Rotten Fruit Vegetable Detector Machine
71 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Kidney Stone Detection Using Ultrasound
No ratings yet
Kidney Stone Detection Using Ultrasound
26 pages
MCA Project Report Format - MU - Updated
100% (1)
MCA Project Report Format - MU - Updated
20 pages
Theft Identification - Alert Through Motion Detection - Facial Recognition Using IOT - Report
No ratings yet
Theft Identification - Alert Through Motion Detection - Facial Recognition Using IOT - Report
52 pages
Fruit Disease Detection Using Color, Texture Analysis: A Project Report
No ratings yet
Fruit Disease Detection Using Color, Texture Analysis: A Project Report
10 pages
Internship Report 2023-24 Data Science
100% (1)
Internship Report 2023-24 Data Science
23 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
A Facial Expression Recognition System A PDF
No ratings yet
A Facial Expression Recognition System A PDF
45 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Python Training Report (ML)
No ratings yet
Python Training Report (ML)
19 pages
Drowsiness Detection Using Opencv Final
No ratings yet
Drowsiness Detection Using Opencv Final
83 pages
Digital Media Marketing Using Trend Analysis On Social Media Seminar Presentation
100% (1)
Digital Media Marketing Using Trend Analysis On Social Media Seminar Presentation
16 pages
Internship Report File
No ratings yet
Internship Report File
35 pages
Data Valley 21VV1A0510
No ratings yet
Data Valley 21VV1A0510
85 pages
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
No ratings yet
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
5 pages
Final Year Project Report 2
No ratings yet
Final Year Project Report 2
96 pages
Project
100% (1)
Project
25 pages
QR Code Menu Final
No ratings yet
QR Code Menu Final
48 pages
Drug Recommender System Using Machine Learning For Sentiment Analysis
No ratings yet
Drug Recommender System Using Machine Learning For Sentiment Analysis
4 pages
Finalprojectreportsrms
No ratings yet
Finalprojectreportsrms
38 pages
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
No ratings yet
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
67 pages
Fake and Automated Account - Report (Sathiyabama)
No ratings yet
Fake and Automated Account - Report (Sathiyabama)
108 pages
Report Kedar Internship
No ratings yet
Report Kedar Internship
56 pages
Classification of Lung Sounds Using CNN
No ratings yet
Classification of Lung Sounds Using CNN
10 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Documentation-Fake News Detection
100% (1)
Documentation-Fake News Detection
57 pages
Human Detection System Report
No ratings yet
Human Detection System Report
39 pages
Verifiable and Multi-Keyword Searchable Attribute-Based Encryption Scheme For Cloud Storage
No ratings yet
Verifiable and Multi-Keyword Searchable Attribute-Based Encryption Scheme For Cloud Storage
83 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Chapter1 Bio100part1
No ratings yet
Chapter1 Bio100part1
55 pages
Hyland Hyland (1998) Boosting-Hedging-And-The-Negotiation-Of-Academic-Knowledge PDF
No ratings yet
Hyland Hyland (1998) Boosting-Hedging-And-The-Negotiation-Of-Academic-Knowledge PDF
33 pages
Pierre Manent The Greatness and Misery of Liberalism
No ratings yet
Pierre Manent The Greatness and Misery of Liberalism
8 pages
Cbse Ugc Net Paper 1 June 2014
No ratings yet
Cbse Ugc Net Paper 1 June 2014
16 pages
Bio-Biomedical Graduate Programs Dropping GRE Requirement
No ratings yet
Bio-Biomedical Graduate Programs Dropping GRE Requirement
2 pages
Ethnicity and Education
No ratings yet
Ethnicity and Education
33 pages
Class 8 PDF
No ratings yet
Class 8 PDF
1 page
Dissertation Rugby
100% (2)
Dissertation Rugby
8 pages
Artikel Ilmiah Kelompok 2
No ratings yet
Artikel Ilmiah Kelompok 2
8 pages
Chapter II Review of Related Literature
No ratings yet
Chapter II Review of Related Literature
8 pages
Saic P 3105
No ratings yet
Saic P 3105
2 pages
Finals
No ratings yet
Finals
2 pages
Instructional Design
No ratings yet
Instructional Design
15 pages
Consumer Behavior Towards Times of India News Paper
No ratings yet
Consumer Behavior Towards Times of India News Paper
45 pages
Introduction To Digital System Laboratory PDF
100% (1)
Introduction To Digital System Laboratory PDF
32 pages
ROVsim2 O&G Brochure
No ratings yet
ROVsim2 O&G Brochure
7 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
65 pages
Sex Role Orientation and Leadership Style of Women Managers
No ratings yet
Sex Role Orientation and Leadership Style of Women Managers
7 pages
4 - Assignment 1 Situation Analysis
No ratings yet
4 - Assignment 1 Situation Analysis
3 pages
Yohannes Dejene Final Research
No ratings yet
Yohannes Dejene Final Research
100 pages
Polynomial and Synthetic Division
No ratings yet
Polynomial and Synthetic Division
2 pages
High Tide in Tucson
No ratings yet
High Tide in Tucson
2 pages
Computer-Graphics PDF
No ratings yet
Computer-Graphics PDF
59 pages
Technical Interview Questions in Oracle
No ratings yet
Technical Interview Questions in Oracle
10 pages
Adbepkondryuk
No ratings yet
Adbepkondryuk
3 pages
Examen Final Ingles 7
No ratings yet
Examen Final Ingles 7
2 pages
English 10 Q2 WK6
No ratings yet
English 10 Q2 WK6
20 pages
Head-Driven Phrase Structure Grammar: July 2002
No ratings yet
Head-Driven Phrase Structure Grammar: July 2002
9 pages
Checking For Understanding - Material
No ratings yet
Checking For Understanding - Material
2 pages