Yogeesh

Online bullying detection using machine
learning and deep learning

1Dr.T Subburaj and 2Yogeesh D R
1,2Department of Master of Computer Applications, RajaRajeswari College of Engineering
Bangalore
Abstract
Due to the rise of cyberbullying, data innovation has increased, and websites now provide a
big portion of it rather than mobile devices, gaming consoles, and informational platforms.
Cyberbullying can manifest in various ways, including sexual insults, threats, hate letters, and
uploading fake information about someone that millions of people can see and read.In
comparison to traditional bullying, cyberbullying has a longer lasting effect on the victim,
which can harm them physically, emotionally, psychologically, or in any combination of these
ways. Suicides due to cyberbullying have surged in recent years, and India is one among four
nations with the highest number of occurrences. Owing to an increase in incidents since 2015,
colleges and institutions have made cyberbullying prevention mandatory.The motivation
behind this project's goal is to automatically identify cyberbullying comments using Profound
Learning and AI techniques.
Catchphrases: AI, regular language processing, cyberbullying, and web-based
entertainment.
I. INTRODUCTION
Social networking is a platform that allows users to upload anything they want, such as images,
videos, and documents, and communicate with others [1]. People use computers or cell phones
to access social media. Facebook1, Twitter2, Instagram3, TikTok4, and among the most
popular virtual entertainment stages are others. Virtual entertainment is now used for a variety
of purposes, such as training [2, 3], business [4], and charitable causes [5].Social networking
is also benefiting the global economy by offering several new job possibilities.
Whereas social networking offers many advantages, it also has significant disadvantages.
Malevolent users of this medium commit unethical and deceptive behaviours to be able to harm
others' feelings and destroy their reputation.
In recent years, cyberbullying has likely been one of the biggest concerns with social media.
Cyberbullying, often known as cyber-harassment, is an electronic form of bullying or
harassment. Online bullying refers to cyberbullying and cyber-harassment.
1
Cyberbullying has become very widespread as the digital domain has evolved and technology
has advanced, particularly among teens.
Cyberbullying affects almost 50% of American youths. Bullying has a physical and mental
effects on the victim. Considering the trauma of cyberbullying, victims adopt self-destructive
behaviours such as suicide. Thus, identifying and preventing cyberbullying is essential for
safeguarding. In this case, we propose a machine learning-based cyberbullying detection model
capable determine if a communication is associated with cyberbullying or not. In the proposed
cyberbullying detection model, we investigated a variety of machine learning techniques, such
as Naive Bayes, Vector Machines for Support, Decision Tree, and Random Forest. Research is
conducted using the help of two datasets derived from Twitter and Facebook comments and
posts. We employ two separate feature vectors for performance analysis: BoW and TF-IDF.
The findings demonstrate the TF-IDF feature outperforms BoW in comes to accuracy, while
SVM outperforms regarding the performance employed in this article
The frequency of cyberbullying incidents has risen due to the increased use of social media
platforms. which can have serious psychological and emotional consequences for victims. The
purpose of this study is to develop a deep learning framework for detecting instances of
cyberbullying in social media posts using networks with Long Short-Term Memory (LSTM).
existing research focuses on established languages, highlighting a significant void in recently
adopted resource-poor languages. For recognising and combating cyberbullying, this suggested
system employs the model of Long Short Term Memory (LSTM), deep learning techniques.
The project is separated into aspects include data gathering, data preparation, model creation,
and evaluation.
II. LITERATURE SURVEY:
Nearly all publications have taken information obtained from a single source and conducted a
comparison analysis on several machine learning or deep learning approaches in conjunction
with various word vectors or feature extraction techniques, determining the optimal
combination. There were just a few studies that targeted on optimising the detection model by
either developing ensemble ml models or stacking alternative feature preprocessing strategies.
Even in those studies, the emphasis was on testing the model on the dataset, with no real-time
detection.
Significant preprocessing was done on Roman Urdu microtext, including the construction of a
Roman Urdu slang-dictionary and the mapping of slangs after tokenization. The unstructured
The information then processed further account for encoded text formats and metadata/non-
linguistic aspects. Following the preprocessing step, extensive testing were performed using
RNN-LSTM, RNN-BiLSTM, and CNN models. Several criteria were previously utilised to
analyse the performance and accuracy of models to provide a study for inspection. On Roman
Urdu text, RNN-LSTM and RNN-BiLSTM performed the best.
2
Cyberbully location was taken into account using a BiGRU-CNN feeling characterization
model, which consists of a BiGRU layer, consideration component layer, CNN layer, complete
association layer, and grouping layer.
This project's dataset was taken from Kaggle, a prominent dataset source, and was preprocessed
to eliminate unnecessary data and transform the text into a mathematical structure that may be
used as input for an LSTM model. A variety of measurements, including exactness, correctness,
review, and F1 score, were used to evaluate the feasibility of the LSTM model that was created
using the gone back over information. The model attained an accuracy of 95.6% on the test set.
indicating which it is capable efficiently recognise instances of cyberbullying in social media
messages. The trained model was stored and may now be used with forecast fresh data. This
initiative has the potential to be utilised as a tool to prevent cyberbullying and assist victims.
The present system in use both models of Deep learning and machine learning, and they
discovered that SVM performed better in the machine learning technique while GRU
performed better in deep learning is a method of learning derived through experience. Deep
learning methodologies, nonetheless, clearly outperform AI techniques.
In the existing system, it was discovered that, of all the algorithms Gated Recurrent In the
ongoing test, units performed the best. With an accuracy of 95.47%, Gated Repeated Units
performed best in the accessible explorations.
The two approaches based on deep learning and AI were used to assemble the order models in
the current framework. and a pre-processing work was performed to enhance the model's
performance.
In this approach, utilising count vectorizer for embedding words and It uses machine learning
for deep learning. As a component of the pre-processing, all sentences are transformed from
title case or capital case to lower case to confirm the dataset's consistency. Furthermore,
tokenization is performed to produce tokens from the text that might help the model grasp the
context. Finally, stopwords and punctuation were eliminated from the text, which is regarded
an essential duty in pre-processing because these elements do not contribute to the model
development process.
GRUs have a restricted modelling capacity when compared in comparison to more complex
models such as Long Short-Term Memory (LSTM), which might result in poorer performance
for complex tasks.Difficulty learning long-term dependencies: Although GRUs are meant to
capture long-term relationships, they may struggle to learn and store information over lengthy
sequences.
GRUs are sensitive to initialization, which can have an impact on their capacity to learn well.
GRUs can be tough to interpret because to their complicated design, making it harder to
understand how the model makes predictions and discover areas for improvement. Difficulty
in dealing with noisy data: GRUs can be prone to data noise, which can impair their capacity
to produce correct predictions.
3
III. PROPOSED SYSTEM:
In the proposed system, The LSTM, or Long Short Term Memory model is use deep learning
method, to identify cyberbullying. This framework was presented with the intention of creating
and fostering a cyberbullying identification system that could be used to examine and identify
instances of online harassment by users of virtual entertainment.
The suggested method will function as an ideal model for the proper identification of
cyberbullying posts through virtual entertainment platforms, eliminating many flaws in the
previous location cycle. Furthermore, the suggested system employs efficient training models
and word embedding approaches, which distinguishes the system. The technique is beneficial
for analysing cyberbullying rates on various social media platforms so that appropriate
safeguards and measures may be taken.
The suggested system is created initially by collecting data: A dataset of social media posts is
compiled, with posts labelled as either non-cyberbullying or bullying online. Preprocessing the
data involves eliminating unnecessary information and translating the text into a numerical
representation appropriate for input into an LSTM model. The Model Development follows:
On the preprocessed data, an LSTM model is created and trained. The model is intended to
analyse the word sequence in each post and forecast whether it is or not cyberbullying. The
model's name is then evaluated: On a different test dataset, the model's performance is assessed
using several metrics like as F1 score, Precision, Recall, and Accuracy. The assessment
measures are employed to assess how effectively the model performs.
Fig. 1. Proposed Architecture
Fig 1 shows proposed architecture when the design has been trained and assessed, it might be
used to identify cyberbullying in a real-world setting.
4
a) IMPLEMENTATION
When applied to certain domains, a single machine learning or deep learning form may predict
the outcome quite well. However, each provides a unique set of benefits. Also negative aspects.
LSTM typically outperforms CNN, despite the fact that processing requires a longer time, has
fewer hyperparameters, and requires less supervision. Meanwhile, The LSTM has greater
accuracy for lengthier phrases but requires more time to analyse. Because RNN suffers from
significant gradient loss when processing sequences, the perception of nodes at the front
reduces as nodes go further behind. BiLSTM used for the gradient's solution vanishing
problem.
Fig.2. Precision
Fig 2 precision fixes the problem sequence to sequence prediction issue. RNN is limited by the
requirement that the input and output have a comparable size.
Natural Language Processing :
A variety of unnecessary characters or text can be found in the helpful postings or messages.
For example, numbers and language slightly alter the identification of harassment. Before
doing AI calculations on the notes for the place, we should organise and tidy them.
Machine Learning:
The subject makes use of a variety of AI calculations, such as Choice Tree(DT), Random
Support Vector Device, Forest, and Naive Bayes to recognise bullying messages and text.
Regarding a certain public cyberbullying dataset, the classifier with the best accuracy is
identified. In the following section, several well-known using Investigated are AI methods for
separating online entertainment content from cyberbullying.
Because LSTMs are designed to identify long-term patterns in sequential information, they are
suitable for deciphering online entertainment communications, which frequently feature
confused express examples and lengthy dialogues.
Enhanced precision: LSTMs are capable of producing high precision rates when trained on
large datasets, making them effective for accurately detecting instances of cyberbullying.
5
Because LSTMs are adaptable and teachable on a variety of variables in a continuous request,
they are appropriate for use in a range of applications other than identifying cyberbullying.
LSTMs automatically extract relevant features from input data, eliminating the need for human
feature engineering and allowing the model to learn more complicated correlations between
input and output.
Fig.3. Confusion matrix
Fig 3 shows Real-time monitoring: Once trained and deployed, the model may be used to
monitor social media posts in real-time, showing examples of cyberbullying as they happen
and enabling victims to get immediate intervention and assistance. Over existing methods,
utilising LSTM for cyberbullying detection provides significant advantages.
b) RESULTS
Fig.4. Entering the text Fig.5.Predicating the text

Fig 4 and Fig 5 shows to enter the text and predicates the text whether the text it is cyberbulling
or not.
6
IV. CONCLUSIONS
With the increasing prominence of social media sites and growing social media use by youths,
cyberbullying has grown more frequent and has begun to pose important societal difficulties.
To avoid the negative impacts of cyberbullying, an automatic cyberbullying detection system
must be developed. Given the importance of cyberbullying detection, we studied the automated
identification of postings on social media linked to cyberbullying using two characteristics,
BoW and TF-IDF, in this study. There are four machine learning algorithms that are used to
recognise harassing text. adding SVM for both TF-IDF and BoW. In order to create a system
for programmed ID and grouping of cyberbullying in Bengali compositions later on, we must
use sophisticated learning techniques.
REFERENCES
[1] C. Fuchs, Social media: A critical introduction. Sage, 2017.
[2] N. Selwyn, “Social media in higher education,” The Europa world oflearning, vol. 1, no. 3,
pp. 1–10, 2012.
[3] H. Karjaluoto, P. Ulkuniemi, H. Kein¨anen, and O. Kuivalainen, “An-tecedents of social
media b2b use in industrial marketing context:customers’ view,” Journal of Business &
Industrial Marketing, 2015.
[4] W. Akram and R. Kumar, “A study on positive and negative effects ofsocial media on
society,” International Journal of Computer Sciencesand Engineering, vol. 5, no. 10, pp. 351–
354, 2017.
[5] D. Tapscott et al.,The digital economy. McGraw-Hill Education,, 2015.
[6] S. Bastiaensens, H. Vandebosch, K. Poels, K. Van Cleemput, A. Desmet,and I. De
Bourdeaudhuij, “Cyberbullying on social network sites. anexperimental study into bystanders’
behavioural intentions to help thevictim or reinforce the bully,” Computers in Human
Behavior, vol. 31,pp. 259–271, 2014.
[7] D. L. Hoff and S. N. Mitchell, “Cyberbullying: Causes, effects, andremedies,” Journal of
Educational Administration, 2009.
[8] S. Hinduja and J. W. Patchin, “Bullying, cyberbullying, and suicide,”Archives of suicide
research, vol. 14, no. 3, pp. 206–221, 2010.
[9] D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Ed-wards, “Detection of
harassment on web 2.0,” Proceedings of theContent Analysis in the WEB, vol. 2, pp. 1–7, 2009.
[10] K. Dinakar, R. Reichart, and H. Lieberman, “Modeling the detectionof textual
cyberbullying,” in In Proceedings of the Social Mobile Web.Citeseer, 2011.
7
8
9

Yogeesh

Uploaded by

Copyright:

Available Formats

Yogeesh

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Yogeesh

Uploaded by

Copyright:

Available Formats

Online bullying detection using machine

learning and deep learning

II. LITERATURE SURVEY:

Fig. 1. Proposed Architecture

Natural Language Processing :

Fig.3. Confusion matrix

Fig.4. Entering the text Fig.5.Predicating the text

You might also like