Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
45 views

Fake News Detection Using Machine Learning Report PDF

The document discusses developing a model to detect fake news using machine learning techniques. It aims to use datasets of real and fake news to test the accuracy of different machine learning algorithms in identifying fake news. The project focuses on natural language processing and clustering techniques to build a fake news detection system.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Fake News Detection Using Machine Learning Report PDF

The document discusses developing a model to detect fake news using machine learning techniques. It aims to use datasets of real and fake news to test the accuracy of different machine learning algorithms in identifying fake news. The project focuses on natural language processing and clustering techniques to build a fake news detection system.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

FAKE NEWS DETECTION

A PROJECT REPORT

Submitted in partial fulfilment of the requirements for the

Award of the degree of

BACHELOR OF TECHNOLOGY
In

ARTIFICIAL INTELLIGENCE & DATA SCIENCE

Submitted By: Under the Supervision of

Gaurav Khandelwal (20EMCAD010) Mr. Dipayan Kumar Ghosh

Mohit Bhatia (20EMCAD018) (Assistant Professor)

BIKANER TECHNICAL UNIVERSITY, BIKANER

MODERN INSTITUTE OF TECHNOLOGY & RESEARCH CENTRE

Alwar

May 2024

i
BIKANER TECHNICAL UNIVERSITY, RAJASTHAN
CERTIFICATE

Certified that this project report “FAKE NEWS DETECTION” is the original work of
“GAURAV KHANDELWAL, MOHIT BHATIA” students of B. Tech. Final Year VIII
Semester (Artificial Intelligence & Data Science Branch) who carried out the project work
under my supervision.

SIGNATURE SIGNATURE

Prof. Dr. J.R Arun Kumar Mr. Dipayan Kumar Ghosh

HEAD OF THE DEPARTMENT ASSISTANT PROFESSOR

Computer Science Department Computer Science Department

MITRC College, Alwar MITRC College, Alwar

ii
ACKNOWLEDGEMENT
Firstly, we would like to express our gratitude to our advisor for the beneficial
comments and remarks. We express our sincere thanks to Prof. S. K. Sharma
(Director) of the Modern Institute of Technology & Research Centre, Alwar. We pay
our deep sense of gratitude to Prof. J.R Arun Kumar, Head of the Computer Science
& Engineering Department of Modern Institute of Technology & Research Centre,
Alwar for encouraging us to the highest peak and providing us the opportunity to present
the Project.

We acknowledge a deep sense of gratitude to Mr. Dipayan Kumar Ghosh(Asst.


Professor), Supervisor, Department of Computer Science & Engineering, Modern
Institute of Technology & Research Centre, Alwar (Rajasthan) for his constant support
and guidance during this work. His honesty, thoroughness, and perseverance have been a
constant source of inspiration for us. We would like to thank our institute and faculty
members without them this project would have been a distant reality, we also extend our
heartfelt thanks to our families and well-wishers. Finally, but not least, our parents are
also an important inspiration for us. With due respect, we express our gratitude to them.

Gaurav Khandelwal (20EMCAD010)

Mohit Bhatia (20EMCAD018)

iii
TABLE OF CONTENT

LIST OF FIGURES VI

ABSTRACT VIII

CHAPTER 1 INTRODUCTION 1

Introduction 1

Natural Language Processing 2

Fake News Detection 3

Problem Statement 4

Objectives 5

Methodology 6

Dataset 6

Flowchart 10

Algorithm 7

CHAPTER 2 LITERATURE SURVEY 10

CHAPTER 3 SYSTEM DEVELOPMENT 12

System Configuration 12

Data Pre-processing 12

Design of Project 14

Sample Code 16

CHAPTER 4 RESULT AND EXPERIMENTAL ANALYSES 32

Models applied and their result 32

CONCLUSIONS 41

Conclusions 41

iv
Future Scope 41

REFERENCES 43

v
LIST OF FIGURES

FIGURENO. TITLE PAGE NO.

Fig. 1: Deep Learning vs Machine Learning vs Artificial Intelligence 2

Fig. 2: Comparison of Fake and Real news 5

Fig. 3: Flowchart 7

Fig. 4: Fake.csv and True.csv 13

Fig. 5: Design of the Project 15

Fig. 6: Importing Libraries 16

Fig. 7: Mounting Google Drive 16

Fig. 8: Fake.csv 16

Fig. 9: True.csv 17

Fig. 10: Comparing Fake and True Dataset 17

Fig. 11: Describing Fake and True Dataset 18

Fig. 12: Inserting a column “Outcome” 18

Fig. 13: Removing last 10 rows from both dataset for manual testing 19

Fig. 14: Merging the manual data frame 20

Fig. 15: Manual testing dataset 20

Fig. 16: Merging the main fake and true data frame 20

Fig. 17: Whitespace Tokenizer 22

Fig. 18: Checking the columns 23

Fig. 19: Removing “title”, “subject” and “date” columns 23

Fig. 20: Randomly Shuffling the data frame 24

Fig. 21: Count Vectorizer 24

Fig. 22: Pre-processing task of words 25

vi
Fig. 23: Train-Test Split 25

Fig. 24: Importing for Confusion Matrix 26

Fig. 25: Logistic Regression 27

Fig. 26: Support Vector Machine 27

Fig. 27: Decision Tree Classifier 28

Fig. 28: Gradient Boosting Classifier 28

Fig. 29: Random Forest Classifier 29

Fig. 30: Testing 30

Fig. 32: Support Vector Machine (SVM) 32

Fig. 33: Confusion Matrix from Support Vector Machine 33

Fig. 34: Logistic Regression 34

Fig. 35: Confusion matrix from Logistic Regression 35

Fig. 36: Decision Tree 36

Fig. 37: Confusion Matrix from Decision Tree Classification 37

Fig. 38: Confusion Matrix from Gradient Boosting Classifier 38

Fig. 39: Confusion Matrix from Random Forest Classifier 39

Fig. 40: Web Browser Output 40

vii
ABSTRACT
Fake News has become one of the major problems in the existing society. Fake News
has high potential to change opinions, facts and can be the most dangerous weapon in
influencing society.

The proposed project uses NLP techniques for detecting the 'fake news', that is,
misleading news stories which come from non-reputable sources. By building a model
based on a K-Means clustering algorithm, the fake news can be detected. The data
science community has responded by taking actions against the problem. It is impossible
to determine whether the news was real or fake accurately. So, the proposed project uses
the datasets that are trained using the count vectorizer method for the detection of fake
news and its accuracy will be tested using machine learning algorithms.

In this research, we concentrate on how to spot fake news in internet news sources. We
are dedicated in two ways. In order to determine the percentage of correct news that is
phony, we will use multiple datasets of actual and fake news. We provide a thorough
description of the selection, justification, and approval process as well as a few
exploratory analyses on the observable evidence of etymological differences in false and
legitimate news material. In order to create precise false news identifiers, we focus a lot
of learning studies. Additionally, we provide close examinations of the automatic and
manual evidence of bogus news. Python can be used to spot fake news posted on social
media.

viii
CHAPTER-1
INTRODUCTION
1.1) Introduction

Machine learning (ML) is the study of the statistical models and methods used by
computers to do certain tasks devoid of explicit instructions and in favour of patterns and
inference. As part of artificial intelligence, it is viewed. Without explicit instructions,
machine learning algorithms construct a mathematical model using sample data, or
"training data," in order to provide predictions or judgements. Computational statistics,
which focuses on computer-aided prediction, and machine learning have a lot in
common. Machine learning may benefit from the ideas, practises, and fields of
application that come from the study of mathematical optimisation. s

The quantity of modifications that the data goes through is referred to as "deep learning"
in this context. The credit assignment path (CAP) depth is significant, especially for deep
learning systems. The series of changes that take place from input to output make up the
CAP. CAPs define the possible causal connections between input and outcome. For a
feed-forward neural network, the depth of the CAPs is equal to the depth of the network
plus one, given that the output layer is also parameterized. Since a signal can pass
through a layer more than once in recurrent neural networks, the CAP depth may be
limitless.

Fake news, to put it simply, is information that is untrue. whether or whether it is


accurate. Fake news contains verifiable erroneous information. Many significant
companies, even government agencies, are working to address issues related to false
news. However, given that millions of articles are produced or purged every minute in
this age, they are neither responsible nor humanely feasible because they rely on manual
human detection. A machine learning algorithm that creates a trustworthy automated
index score or rating for the authenticity of various publications and can assess whether
the news is true or misleading may provide a solution to this problem.

1
Fig. 1: Deep Learning vs Machine Learning vs Artificial Intelligence

1.1.1) Natural Language Processing (NLP)

The study of how computers interact with human (natural) languages is known as natural
language processing, or NLP, and it is a branch of computer science and artificial
intelligence that focuses on instructing computers to efficiently analyse massive volumes
of natural language data. In the fields of linguistics, computer science, information
engineering, and artificial intelligence, natural language processing (NLP) studies how
computers interact with human (natural) languages. Its major goal is to instruct computer
programmers in how to study and analyse vast amounts of natural language.

1.1.2) Fake News Detection

With the rising use of social media platforms, false news has become a severe problem
in recent years. Finding fake news is a difficult problem that necessitates the use of
several computer techniques, such as data mining, machine learning, and natural
language processing. In this abstract, the current state of false news detection will be
discussed, along with its challenges and potential solutions. Finally, it will consider how

2
cutting-edge technology like blockchain and artificial intelligence may be used in the
future to improve the efficiency and precision of fake news detection.

As a result, there is a larger than ever need for accurate and reliable techniques to
distinguish fake news. The field of fake news detection has rapidly evolved as a result of
researchers and engineers developing a number of techniques and tactics to identify and
combat misleading information. These methods include human fact-checking by
educated professionals as well as sophisticated computers that use machine learning to
examine and classify news content. Automated processes are also a part of them.

It is important to research and create fake news detection, but it is also a challenging and
complex problem. The ability to recognise fake news requires knowledge of linguistic
nuance, social and cultural contexts, and the complex network dynamics of online
communication. Despite these challenges, work has been done to establish effective
methods for spotting false news, and the area is still developing as new tools and
technology are created.

1.2) Problem Statement

Both benefits and drawbacks come with reading the news. On the other hand, news is
actively sought for and consumed since it is easily available, inexpensive, and quickly
spread. It makes it possible for "fake news," or negative news with blatantly inaccurate
material, to be widely disseminated.

As a result, research into the detection of bogus news has recently made significant
strides. First off, identifying fake news just on the basis of the content is challenging and
nontrivial since it is purposefully designed to lead people to accept incorrect information.

1.3) Objective

Our project's primary goal is to determine the veracity of news in order to determine if it
is real or phoney. the development of a machine learning model that would allow us to
recognise bogus information.

It can be difficult and difficult to identify fake news only based on its content since it is
intentionally produced to influence readers to believe false information.

3
By applying a range of methods and models, machine learning makes it easy to detect
bogus news. Additionally, to examine the relationship between two words, we will apply
deep learning-based NLP.

You may eliminate stop words using this method as well.

1.4) Methodology

1.4.1) Dataset

Two datasets are available. a mix of the two. There are 44898 news stories total in the
csv file, which is a sizable quantity. While the true dataset only comprises 21417, the
fraudulent dataset has 23481. This data collection is accessible at:

The dataset contains the following attributes:

The following elements are included in a news article: • Id: Special ID for News Article;
▪ title;
▪ text;
▪ Subject;
▪ It describes the topic of the news.
▪ Date: It provides news's publication date.
▪ The conclusion that the information might not be trustworthy.

0: Untrustworthy or False News

1: Reliable or Accurate News

First, the dataset is quite balanced, as we have shown. There are 21417 accurate news
items and 23481 false news pieces in it. This is beneficial feature of dataset. It will aid
models in making objective judgments.

4
Fig. 2: Comparison of Fake and Real news

The dataset has undergone some processing, and as was indicated, stop terms have been
included. The most common words in the dataset are "the," "to," "of," "and," etc. The top
20 terms in the sample were as follows before stop words were eliminated:

Fake.csv

Graph 1: Frequent words in Fake news

5
True.csv

Graph 2: Frequent words in Real News

The terms "said," "mr," "trump," "new," "people," and "year," which are now the most
popular ones, can provide the models important information.

We also examined the bigrams in the dataset to have a better understanding of the news
story subjects. Before stop words are removed, the topics of the news stories are not at
all clear. As a result, removing stop words makes it simpler to comprehend the news
reports' themes.

The graph below displays the top 20 bigrams from the dataset before stop words are
removed. As one can see, often used phrases like "of the," "in," and "to the" do not help
one comprehend the content of the story.

6
Graph 3: Frequent bigrams

To display the data, we plotted the frequencies of subject of the news:

Graph 4: Frequency of subject of the new

1.4.2) Flowchart:

Fig. 3: Flowchart

7
1.4.3) Algorithm for The Proposed System

Step 1: Pre-processing


Load the dataset of news items with their labels, whether they are true or false;


Clean the text by eliminating punctuation and stopwords;


Divide the dataset into training and testing sets.

Step 2: Count Vectorization


▪ Count Vectorizer from the Sklearn toolkit may be used to transform text data into
numerical data.
▪ Produce a document-term matrix showing the frequency of each word used in each
document.
▪ Fit the Count Vectorizer using the training set, then convert the data.
▪ Utilise the testing set to change the data.

Step 3: TFIDF Vectorization


▪ Utilise the Tfidf Vectorizer in the Sklearn package to turn the text data into
numerical data.
▪ Use the training set to fit the Tfidf Vectorizer and convert the data.
▪ Create a document-term matrix that depicts the significance of each word in each
document.
▪ Utilise the testing set to change the data.

Step 4: Training the Models


▪ Utilise the data that has been modified by Count Vectorizer and Tfidf Vectorizer to
train a variety of models, including Naive Bayes, Logistic Regression, Support Vector
Machines (SVM), Random Forest, etc.
▪ Fit the models using the training set.
▪ Use the testing set to predict the news article labels

8
▪ Determine each model's accuracy score using the actual and projected labels.

Step 5: Confusion Matrix


▪ The confusion matrix displays the amount of true positives, true negatives, false
positives, and false negatives for each model, allowing you to assess each one's
performance.
▪ Measurements like accuracy, recall, and F1-score may be calculated using the
confusion matrix.

Step 6: Accuracy
▪ Determine each model's accuracy by comparing its predicted labels to its actual
labels.
▪ The accuracy measures the proportion of news stories that were accurately identified
as being true or false.
▪ Evaluate the accuracy of various models to find which one is most effective at
spotting fake news.

Step 7: Representing the Output in Web Browser using Streamlit


▪ Use the Streamlit Python module to build an interactive web application for
showcasing the outcomes of false news detection models.
▪ Create a user interface that clearly displays the confusion matrices, accuracy of each
model, and other performance indicators.

Provide tools that allow users to submit their own content for categorization and display
the key terms and phrases used to categorise news items, among other capabilities

9
CHAPTER-2

LITERATURE SURVEY
A. S. A. Ahmed, A. Abidin, M. A. Maarof, and R. A. Rashid [1] is only a survey and
does not contain any experiments or findings. Instead, the study offers a thorough
analysis of the many false news detection techniques put out in the literature, as well as
their advantages and disadvantages, as well as the datasets employed for testing. In terms
of feature selection, feature extraction, classification algorithms, and assessment
measures, the authors examine and contrast the methodologies utilised by various
research. In the area of false news identification, they also emphasise the difficulties and
potential avenues for further study. The article makes use of a number of datasets,
including those from BuzzFeed, LIAR, FakeNewsNet, and PolitiFact.

S. Asghar, S. Mahmood, and H. Kamran, "Fake news detection using machine learning
[2] the article also addresses a number of datasets that have been used in studies on the
identification of fake news, including the LIAR dataset, the Fake News Challenge
dataset, and the BuzzFeed News dataset. According to the authors, ensemble learning-
based algorithms had the greatest results on the LIAR dataset, with accuracy rates of up
to 78%. On the BuzzFeed News dataset, on the other hand, deep learning-based methods
perform better, achieving an accuracy of up to 91%.

J. H. Kim, S. H. Lee, and H. J. Kim, "Fake news detection using ensemble learning with
context and attention mechanism,"[3] For their experiments, the authors employ two
datasets: the Celebrity dataset and the LIAR dataset. To capture both local and global
aspects of news items, the proposed model combines convolutional neural networks
(CNNs) with recurrent neural networks (RNNs). The experimental findings demonstrate
that the suggested model outperforms numerous baseline models and reaches an
accuracy of up to 73.7%, reaching state-of-the-art performance on both the LIAR and
Celebrity datasets.

M. F. Hossain, M. M. Islam, M. A. H. Khan, and J. J. Jung, "Fake news detection using


hybrid machine learning algorithms," [4] the LIAR dataset, a gold standard for research
on fake news identification, is used by the authors. It consists of statements that are

10
either labelled as true or false and also include extra labels for the degree of falsehood.
The Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random
Forest (RF) machine learning techniques are used in the suggested hybrid method. To
choose the most pertinent characteristics for each algorithm, the authors employ a feature
selection technique known as Chi-Square. They then integrate the results of the three
algorithms and arrive at a final forecast using a weighted voting system. According to
the experimental findings, the suggested hybrid technique works better than each
individual algorithm and a number of baseline models, obtaining an accuracy of up to
72.28%.

S. S. Ghosh, A. Mukherjee, and N. Ganguly, "A multi-perspective approach to fake news


detection," [5] in their research, the authors used the FakeNewsNet dataset together with
another one. Word embedding and term frequency-inverse document frequency (TF-
IDF) approaches are used by the authors to extract aspects from the news articles that are
content-based. In order to determine the veracity of the news stories based on these
attributes, the authors utilise a support vector machine (SVM) classifier. The
experimental findings demonstrate that the proposed multi-perspective strategy
outperforms numerous baseline models and achieves state-of-the-art performance on the
FakeNewsNet and BuzzfeedNews datasets, attaining an accuracy of up to 94.7%.

11
CHAPTER-3

SYSTEM DEVELOPMENT
3.1) System Configuration

Run this project using standard hardware. We utilised an Intel I5 CPU with 8 GB of
RAM, a 2 GB Nvidia graphics processor, and 2 cores that have a frequency of 1.7 GHz
and 2.1 GHz, respectively, to complete the project. The test phase, which follows the
training phase and lasts for around 10-15 minutes, allows for predictions to be made and
accuracy to be determined quickly.

3.2) Data Pre-processing

Data Missing Imputation

Missing values in datasets can be a difficulty for some machine learning techniques.
Therefore, any missing values in each column of the input data must be found and
replaced before we model the prediction problem. Missing The use of data assignment or
assignment is made for this.A space (' ') should be used in place of the null value for
each attribute. Use this approach instead of removing tuples containing null values.

Removal of Stop Words

Stop words like "if," "the," "is," "a," and "an," among others, shouldn't be given much
weight by a machine learning model because they are common English expressions and
don't increase the novelty or believability of any tale. Being present in the dataset may
have an impact on the model's forecast because they are often used.

Removal of Special Characters

The use of special characters in a sentence has no bearing on whether a piece of news is
accurate or not. We do this to eliminate all punctuation from the dataset. Regular
expressions are used to eliminate all punctuation. A random function was developed to
remove special characters, links, extra space, underlines, etc.

Lemmatization

12
The word "play" serves as the origin for other words, including "playing" and "plays." It
is possible to carry out a more extensive examination of the term's frequency by
swapping out the term's core word with words in other tenses and participles. As a result,
we substitute that word for any phrase that only has one source word.

Count Vectorization

For machine learning algorithms to accept the preprocessed text as input, it must next be
encoded as integers or floating-point values. The phrase used to describe this method is
feature extraction (or vectorization).

If a vocabulary word is present in the text data, we will add one to the corresponding
vector's dimension, which will have the same number of dimensions as our vocabulary.
We will add one to the total for each additional instance of that term, leaving zeros in the
spots where we didn't see it even once.

TF-IDF Transformation

In order to create a matrix with TF-IDF values for each feature, we utilise the count
vectorized matrix as a transformation.

IDF, or Inverse Document Frequency, or Term Frequency (TF), which is identical to


what we previously saw in the Count Vectorizer

Because some words may prove to be incredibly unimportant, word frequency alone
might not be accurate. Thus, we employ TF-IDF to maintain harmony between a word's
significance and frequency within the text. The acronym TF-IDF stands for term
frequency and inverse document frequency.

Fake.csv and True.csv

13
Fig. 4: Fake.csv and True.csv

3.3) Design of Project

Dataset: The first step is to collect or obtain a dataset of news articles, labeled as "fake"
or "real". This dataset will be used to train and evaluate the performance of different fake
news detection models.

Preprocessing: The dataset must now be cleaned up by eliminating any extraneous or


irrelevant data, including stop words, punctuation, and digits. Additionally, the text may
need to be normalised by making all characters lowercase and eliminating any special
characters or symbols.

Count Vectorizer (BOW): The Bag-of-Words (BOW) format can be used to transform
textual data into numerical characteristics after preprocessing the text. This entails
building a matrix where each row represents a news item and each column represents a
distinct term from the dataset. The value in each cell indicates how often the term
appears in the related art.

Train-Test Split: Once we have the BOW matrix, we can split the data into training and
testing sets. The training set will be used to train the fake news detection model, while
the testing set will be used to evaluate the model's performance on new, unseen data.

Text-to-vectors (TF-IDF): In addition to BOW, we can also express the textual data
using the Term Frequency-Inverse Document Frequency (TF-IDF) representation. The
frequency of the terms in each article as well as their frequency throughout the whole
dataset is taken into consideration in this representation. This helps to downplay terms
that are prevalent across the whole dataset and to emphasise words that are exclusive to a
certain article.

14
Models: After obtaining the numerical features from the text data, several machine
learning methods such as logistic regression, decision trees, or neural networks can be
employed to train a fake news detection model. The objective of the model is to learn a
function that can accurately classify news stories as either "real" or "fake" based on the
derived attributes from the text.

Accuracy and Confusion Matrix: It's crucial to assess the false news detection model's
performance on the testing set after we've trained it. By assessing its accuracy, precision,
recall, and F1 score, we may do this. To see how many true positives, true negatives,
false positives, and false negatives the model produces, we may also develop a confusion
matrix.

Testing: We may use the model to categorise fresh and previously unheard news pieces
as "real" or "fake" after assessing the model's performance. This entails applying the
same feature extraction and preprocessing operations to the fresh data that we did during
training. After that, we can apply the trained model to the cleaned-up data to provide a
categorization label.

Result: Streamlit library of python is used to represent the result in web browser where
user input the news and algorithm tell that the news is “Real” or “Fake”.

Fig 5: Design of the Project

3.4) Sample Code

15
Fig. 6: Importing Libraries

Fig. 7: Mounting Google Drive

Fig. 8: Fake.csv

16
Fig. 9: True.csv

Fig. 10: Comparing Fake and True Dataset

17
Fig. 11: Describing Fake and True Dataset

Pre-processing of Dataset

Fig. 12: Inserting a column “Outcome”

18
Fig. 13: Removing last 10 rows from both dataset for manual testing

19
Fig. 14: Merging the manual data frame

Fig. 15: Manual_testing dataset

Fig. 16: Merging the main fake and true dataframe

20
Graph 5: Frequency of subject of the news

21
Graph 6: Fake and Real News

Fig. 17: Whitespace Tokenizer

22
Fig. 18: Checking the column

Fig. 19: Removing “title”, “subject” and “date” columns

23
Fig. 20: Randomly Shuffling the data frame

Fig. 21: Count Vectorizer

24
Fig. 22: Pre-processing task of words

Train-Test Split

Fig. 23: Train-Test Split

25
Fig. 24: Importing for Confusion Matrix
Models

26
Fig. 25: Logistic Regression

Fig. 26: Support Vector Machine

27
Fig. 27: Decision Tree Classifier

Fig. 28: Gradient Boosting Classifier

28
Fig. 29: Random Forest Classifier

29
Graph 8: Comparison of the accuracies of different models

Testing

30
Fig. 30: Testing

Sample Input

Fig. 31: Sample Input

31
CHAPTER-4
RESULTS AND EXPERIMENTAL ANALYSIS
4.1) Models Applied And their Results

Support Vector Machine (SVM)


▪ Classification and regression problems are resolved using Support Vector Machine, or
SVM, one of the most used supervised learning techniques. It is mostly used,
nevertheless, in Machine Learning Classification problems.
▪ SVM chooses the extreme vectors and points that help build the hyperplane. The
foundation of the SVM approach is the support vectors, which are utilised to represent
these extreme situations. Take a look at the image below, where a decision boundary
or hyperplane is used to classify two separate categories:

Fig. 32: Support Vector Machine (SVM) [6]

32
Below are the Results from applying Support Vector Machine model:

Table 1: Classification Report of SVM

Confusion Matrix:

Fig. 33: Confusion Matrix from Support Vector Machine

33
Logistic regression


In binary classification issues, where the goal is to predict one of two outcomes,
logistic regression is a frequently used approach. Through the use of a sigmoid
function, it converts the output of the linear regression into a probability value
between 0 and 1, which can then be used to decide whether to classify data by
applying a threshold.


With applications in many areas, including credit scoring, spam filtering, and medical
diagnosis, this simple yet reliable algorithm may be taught well on big datasets.
However, because it depends on certain presumptions, such as the linearity and
independence of the characteristics, it could not work well with highly coupled or
nonlinear data.

Fig. 34: Logistic Regression [7]

Below are the Results from applying Logistic Regression model:

Table 2: Classification Report of LR

34
Confusion Matrix:

Fig. 35: Confusion matrix from Logistic Regression

Decision Tree Classification

35

For both binary and multi-class classification tasks, decision tree classification is a
popular machine learning approach. The input data are recursively divided into
subgroups depending on the most instructive characteristic.


Decision trees can handle category and numerical data and are simple to understand
and use. Additionally, they are resistant to noise and missing data and are capable of
capturing intricate non-linear correlations between features.

Fig. 36: Decision Tree [8]

Below are the Results from applying the Decision Tree Classification model:

Table 3: Classification Report from Decision Tree

36
Confusion Matrix:

Fig 37: Confusion Matrix from Decision Tree Classification

Gradient Boosting Classifier


• Gradient Boosting Classifier is a powerful algorithm for both classification and
regression problems. It works by combining multiple weak models, such as decision
trees, to create a strong ensemble model.
• One of the advantages of Gradient Boosting Classifier is that it can handle complex
non-linear relationships between features and the target variable. Additionally, it has
a built-in mechanism for handling missing data and can automatically select
important features for better accuracy. However, it can be computationally expensive
and prone to overfitting if not tuned properly.

Below are the Results from applying Gradient boosting classifier model:

37
Table 4: Classification Report of GBC

Confusion Matrix:

Fig. 38: Confusion Matrix from Gradient Boosting Classifier

Random Forest Classifier

As the name implies, a Random Forest consists of numerous independent decision trees
that work together as an ensemble. Each tree in the Random Forest spits out a class
prediction, and the classification that recieves the most votes becomes the prediction of
our model.

38
Below are the Results from applying Random Forest Classifier model:

Table 5: Classification Report of RFC

Confusion Matrix:

Fig. 39: Confusion Matrix from Random Forest Classifier

Sample Input:

39
Fig. 40: Web Browser Output

40
CONCLUSIONS
5.1) Conclusions

Considering the accuracy scores, we were able to establish for the various models, it
appears that all of the models are doing a good job of identifying false news items. The
SVM, Decision Tree, and Gradient Boosting classifiers notably achieved a very high
accuracy of 99.5%, although the Random Forest Classifier performed just slightly lower,
at 98.71%.

All things considered, these results suggest that a range of classifiers may be used with
equal success rates and that machine learning techniques may be extremely successful in
spotting bogus news. It's important to keep in mind that accuracy is only one measure
and that the models should be evaluated using multiple metrics including precision,
recall, and F1-score in addition to factors like interpretability, scalability, and processing
requirements. Investigating different feature extraction and selection methods, classifier
types, and ensemble approaches may also be useful to see whether even better results
may be produced.

We utilised the datasets real and fake, each of which had 21417 and 23481 entries,
respectively. We converted text into a numerical model using TF-ID F Vectorizer and
utilised the following models:

Accuracy of 99.31% for support vector machines Decision Tree: 99.5% precision

Classifier using Gradient Boosting: Accuracy = 99.5% Accuracy of 98.7% for the random
forest classifier

5.2) Future Scope


Future research and advancement in the field of false news detection are abundantly
possible. Future efforts to identify bogus news may go in the following directions:

Including more varied and subtle aspects: For the most part, current methods for
detecting false news rely on simple text-based traits like TF-IDF vectors or bag-of-
words. Research in the future could concentrate on more complex and diverse aspects,

41
such sentiment analysis, network analysis, or multimedia analysis (for instance,
identifying false images or videos).

Creating more interpretable models: Existing methods for spotting fake news
sometimes rely on complex machine learning algorithms that might be difficult to
comprehend. In the future, it would be beneficial to develop more intelligible models
that might provide more information on how people make decisions.

Combining information from other sources: In addition to social media, news articles,
and videos, fake news is regularly spread through other media channels and platforms.
The development of methods that can incorporate data from several sources may be
crucial in the future to improve false news identification.

Adapting to shifting strategies: It will be crucial for fake news detection technologies
to develop alongside the tactics used by those who create and spread it. For this, the
detection methods might need to be regularly reviewed and improved.

42
REFERENCES
[1] A. S. A. Ahmed, A. Abidin, M. A. Maarof, and R. A. Rashid, "Fake news
detection: A survey," IEEE Access, vol. 9, pp. 113051-113071, 2021. doi:
10.1109/ACCESS.2021.3104178

[2] S. Asghar, S. Mahmood, and H. Kamran, "Fake news detection using machine
learning: A survey," IEEE Access, vol. 9, pp. 57613-57639, 2021. doi:
10.1109/ACCESS.2021.3075392

[3] J. H. Kim, S. H. Lee, and H. J. Kim, "Fake news detection using ensemble
learning with context and attention mechanism," IEEE Access, vol. 9, pp. 27569-27579,
2021. doi: 10.1109/ACCESS.2021.3057736

[4] M. F. Hossain, M. M. Islam, M. A. H. Khan, and J. J. Jung, "Fake news detection


using hybrid machine learning algorithms," IEEE Access, vol. 8, pp. 233350-233364,
2020. doi: 10.1109/ACCESS.2020.3041149

[5] S. S. Ghosh, A. Mukherjee, and N. Ganguly, "A multi-perspective approach to


fake news detection," IEEE Intelligent Systems, vol. 35, no. 5, pp. 31-39, 2020. doi:
10.1109/MIS.2020.3012915

[6] https://www.google.co.in/imgres?imgurl=https%3A%2F%2Fdata-
flair.training%2Fblogs%2Fwpcontent%2Fuploads%2Fsites%2F2%2F2019%2F07%2Fin
troductiontoSVM.png&tbnid=p7ua2IdzmLsjqM&vet=12ahUKEwjf26Kfru
DAhW6JrcAHdMIAagQMygCegUIARDlAQ..i&imgrefurl=https%3A%2F%2Fdata-
flair.training%2Fblogs%2Fsvm-support-vector-machine-
tutorial%2F&docid=7oy5_irTaN4UfM&w=801&h=420&q=svm&ved=2ahUKE
wjf26KfruD-AhW6JrcAHdMIAagQMygCegUIARDlAQ
[7] https://www.google.co.in/imgres?imgurl=https%3A%2F%2Fstatic.javatpoint.com%2
Ftutorial%2Fmachine-learning%2Fimages%2Flogistic-regression-
inmachinelearning.png&tbnid=LuaHnfur76i8eM&vet=12ahUKEwjFoPGSruDAh
VNnNgFHUjLCl8QMygCegUIARDjAQ..i&imgrefurl=https%3A%2F%2Fwww.j
avatpoint.com%2Flogisticregressioinmachinelearning&docid=makIlDmuc8naW

43
M&w=500&h=300&itg=1&q=logistic%20regression&ved=2ahUKEwjFoPGSru D-
AhVNnNgFHUjLCl8QMygCegUIARDjAQ

[8]https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.geeksforgeek
s.org%2Fdecision-tree%2F&psig=AOvVaw0sYuRq-TZe0WWhW-
9YQUnl&ust=1683450911500000&source=images&cd=vfe&ved=0CBEQjRxqF
woTCLDwi7qt4P4CFQAAAAAdAAAAABAE

44

You might also like