International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 11 Issue: 04 | April 2024 www.irjet.net p-ISSN: 2395-0072

Crime Analysis and Prediction Using Machine Learning

Shradha Rajput1, Minal Thombare2, Sawan Kumar3, Aachal Gupta4, Dr. Radhika Nanda5

1,2,3,4Student, Department of CSE(IOT, Cybersecurity and Blockchain), Smt. Indira Gandhi College of Engineering,
Ghansoli, Navi Mumbai, Maharashtra, India
5Professor, Department Of CSE (IOT, Cybersecurity and Blockchain), Smt. Indira Gandhi College of Engineering,

Ghansoli, Navi Mumbai, Maharashtra, India

Abstract - Crime represents a significant and pervasive extensive data is utilized to construct a comprehensive crime
challenge in our society, making the prevention of criminal report database [8]. The insights derived through data
activities is a paramount responsibility. To address this, it is mining techniques play a pivotal role in reducing crime by
essential to maintain comprehensive records of all offenses facilitating the identification of culprits and areas most
and establish a database for future reference. The central issue affected by criminal activities [7].
at hand revolves around maintaining a dependable crime
database and leveraging data analysis to assist in forecasting Crime analysis serves as the initial phase in the examination
and resolving potential future crimes. The primary aim of this of criminal activities. It involves the exploration, analysis,
project is to assess a dataset containing various criminal and identification of connections among various crimes and
incidents and predict the potential type of crimes that might crime-related variables. The machine learning algorithm
occur in the future based on different factors. In this project, trains the data to make predictions based on the provided
we will harness the power of machine learning and data dataset [2]. We can train the data and create models to
science techniques for forecasting crimes using Indian crime effectively analyze and predict the crime in a certain area.
data. Crime analysis and prediction involve a systematic This analytical process aids in generating real-time statistics,
approach to identifying patterns of criminal activity. This queries, and maps. It also helps in determining whether a
algorithmic approach can anticipate and delineate areas crime has taken place in a particular, well-defined location.
susceptible to criminal incidents. By employing machine The prediction of crime helps in the security of the area and
learning, we have the capability to uncover valuable insights thus lowers the crime rate increasing safety of the citizens.
from unstructured data, revealing previously unknown
information. The extraction of new insights relies on the 1.1 OBJECTIVES
analysis of current datasets. Crime is a grave and widespread
societal issue that impacts individuals globally. It influences • The primary objective is to provide valuable insights
people's well-being, economic prosperity, and a nation's into crime trends and patterns using historical crime
reputation. To protect our communities from crime, we must data.
employ cutting-edge technology and innovative crime • It will also predict future crime locations, etc. on the
analytics techniques. We introduce a system capable of basis of historical crime data
analyzing, identifying, and predicting various crime
probabilities within a given location. This project delves into • The web application will display the crime rate in
various forms of criminal analysis and crime prediction using various forms such as graph, etc.
machine learning methodologies.
• It will also display the predictions in the same manner.
Key Words: Crime Datasets, Crime Prediction, Machine
Learning, Prophet.
The primary goal of this crime evaluation is to successfully
1.INTRODUCTION track crime rate in various areas. It will display crime rate of
a particular area. This will help the police forces and other
The frequency of criminal activities is on the rise due to the related defense forces to effectively track and stop crime in a
continuous advancement of technology, which provides particular area and eventually in the society. IT will also
criminals with more sophisticated tools to carry out their predict the crime rate of different types of crime using
unlawful actions. Various types of crimes such as burglary, various types of machine learning algorithms. The prediction
arson, and others, as reported by the Crime Record Bureau, will be based on various historical crime data. This will also
have seen an increase. This includes more severe offenses help the defense personnel to keep checking the area with
like murder, rape, abuse, and gang rape, among others. high crime rate to prevent crimes in future.
Crime-related data is gathered from a wide range of sources,
including blogs, news websites, and online platforms. This

Crime analysis and future prediction using machine learning
have garnered significant attention from both the academic
and law enforcement communities in recent years. These
applications have the potential to revolutionize the way we
approach crime prevention and public safety. A significant
body of literature exists in this field, highlighting various
methodologies, algorithms, and the integration of predictive
analytics into web applications. Machine learning algorithms,
such as support vector machines, decision trees, and neural
networks, have been widely employed in the development of
these web applications [1]. For instance, the methodology
involved using a Logistic regression model for crime
classification, followed by k-means clustering to group
districts based on their crime rates, demonstrating the
feasibility of this technology [2]. Additionally, deep learning
Fig.3.1. System Design
techniques, particularly convolutional neural networks, have
been employed for image-based crime prediction, further Machine learning systems architecture involves creating a
expanding the scope of ML applications in this domain [3]. blueprint for the software, infrastructure, algorithms, and
The integration of spatial and temporal data has been a data required to fulfill specific requirements. This blueprint
major focus in the literature. Researchers have investigated guides the development of software for web applications by
the correlation between environmental factors, urban detailing the intricacies of how the program should be
development, and crime patterns, allowing the development constructed. The system uses various technologies and
of predictive models that consider not only historical data methods for gathering, processing displaying data and
but also contextual information. Recent studies have also making predictions based on the given data.
explored the fusion of real-time data, such as social media
updates and weather conditions, to enhance prediction 4. METHODOLOGY
The working of the system is based on various data
Use of Data mining helps to find hidden patterns in large analyzing, data visualization and machine learning
crime datasets quickly and efficiently [4]. An essential algorithms for accurate analyses of data and also to make
aspect of these web applications is the interpretability of ML predictions based on the given data. We will also look at
models. Researchers have explored methods to make these tools and algorithms used in this system.
models more transparent and interpretable to law
enforcement personnel and the public [5]. This includes 4.1. DATA COLLECTION
research on explainable AI and feature importance analysis
to understand the factors contributing to predictions, The data is collected from various government, non-
ensuring accountability and trust in the technology. government websites, available datasets, and other websites.
Moreover, ethical considerations and potential biases in Also some of the datasets are collected from other resources
crime prediction algorithms have gained prominence in such as online news using web scraping, etc.
recent literature [6]. Scholars have emphasized the
importance of fairness, accountability, and transparency in 4.2. DATA PREPROCESSING
the development of these applications, addressing issues
related to bias in historical crime data and the potential for The data is cleaned and pre-processed to remove
reinforcing existing inequalities in law enforcement redundancy and fill the gaps in the data for achieving a
practices. Community engagement and collaboration have smooth and complete data set. This dataset results in a
been another key area of research. Developing web smooth and accurate prediction. The data is arranged as
applications that empower communities to participate in required.
crime reporting and safety concerns fosters a more
comprehensive and inclusive approach to crime prevention. 4.3. DATA ANALYSIS
Researchers have explored ways to facilitate information
The data is analyzed for required information which will
sharing and feedback mechanisms between law enforcement
become an input to the predicting algorithm later. Data
agencies and the public.
analyses helps to know the data and take required measures
for the machine learning model to perform accurately.

4.4. DATA PREDICTION level mathematical functions tailored for manipulating these
arrays. NumPy is open-source software and has many
The data is then feed to the prophet tool which predicts the contributors [13].
crime rate of certain crimes in a specific area. This tool
works on the date time column i.e., the time series, to 4.8.3. MATPLOTLIB
produce its output.
Matplotlib, a plotting library compatible with both Python
4.5. DATA VISUALIZATION and its numerical mathematics extension NumPy, stands as a
potent tool for individuals engaged in Python and NumPy-
This website provides various forms in which the data can based endeavors, offering extensive capabilities for creating
be visualized such as heat map, pie chart, bar graph, etc. It visualizations. And for making statistical interference, it
helps in understanding large datasets. becomes very necessary to visualize our data and Matplotlib
4.6. TOOLS is the tool that can be very helpful for this purpose. It
provides MATLAB like interface only difference is that it uses
4.6.1. PROPHET Python and is open source [14].

Prophet is a method for predicting time series data using an 4.8.4. SEABORN
additive model that accommodates non-linear trends
alongside yearly, weekly, and daily seasonality, as well as Seaborn, built upon matplotlib, serves as a Python data
holiday impacts. It performs most effectively with time visualization library offering a sophisticated interface for
series characterized by robust seasonal patterns and ample crafting visually appealing and informative statistical
historical data spanning multiple seasons. Prophet is robust graphics. Seaborn is a library for making statistical graphics
to missing data and shifts in the trend, and typically handles in Python. Expanding upon matplotlib and tightly integrating
outliers well. Prophet is open source software released by with pandas data structures, Seaborn's plotting functions are
Facebook’s Core Data Science team [9]. tailored to operate seamlessly on dataframes and arrays
encompassing entire datasets. Internally, they execute
Prophet's input always consists of a dataframe containing essential semantic mapping and statistical aggregation,
two columns: 'ds' and 'y'. The 'ds' (datestamp) column culminating in the creation of insightful plots. Its dataset-
should adhere to the format anticipated by Pandas, oriented, declarative API lets us focus on what the different
preferably YYYY-MM-DD for dates or YYYY-MM-DD elements of our plots mean, rather than on the details of how
HH:MM:SS for timestamps.The ‘y’ column must be numeric, to draw them [16].
and represents the measurement we wish to forecast [10].
4.7.1. FLASK
This system uses supervised learning algorithm for
Flask is a web framework that allows developers to build
prediction. Supervised learning falls within the realm of
lightweight web applications quickly and easily with Flask
machine learning, where labeled datasets are employed to
Libraries. It was developed by Armin Ronacher, leader of the
train algorithms, enabling them to predict outcomes and
International Group of Python Enthusiasts(POCCO). It is
identify patterns. Labelled data refers to input data that has
based on the WSGI toolkit and Jinja2 templating engine [11].
been pre-assigned with corresponding correct output values.
4.8. LIBRARIES In supervised learning, the training data provided to the
machines work as the supervisor that teaches the machines
4.8.1. PANDAS to predict the output correctly [15].

Pandas is a Python library designed for data manipulation

and analysis, providing specialized data structures and
functions tailored for working with numerical tables and
time series data. The library is built upon another library,
NumPy [12].

4.8.2. NUMPY

In 2005, Travis Oliphant amalgamated features from

Numarray into Numeric, implementing extensive
modifications to create NumPy, a Python library. NumPy
enhances Python with support for large, multi-dimensional
Fig.4.9.1. Supervised Machine Learning
arrays and matrices, complemented by a vast array of high-

Prophet utilizes a form of piecewise linear regression to

model the trend component of the time-series data. But
Prophet’s approach to linear regression is not the same as
traditional linear regression models. In Prophet, the trend
component is modeled as a piecewise linear function that
allows for changes in the trend direction at specific change
points. These change points are automatically selected based
on historical data and represent times when the trend
undergoes significant shifts.

The piecewise linear regression model in Prophet captures

the overall trend in the data while allowing for flexibility and
adaptability to changes over time. This approach differs from
traditional linear regression models, which assume a single
linear relationship between the predictor variables and the
target variable.

By incorporating piecewise linear regression, Prophet can

capture complex trends and patterns in the time-series data, Fig.5.1. Heat Map
making it particularly suitable for forecasting tasks where
the trend may exhibit non-linear behavior or undergo We have also created crime intensity metrices graph to
changes over time. depict intensity of crime in a specific area.

Fig.5.2. Crime Intensity Metrices

Fig. Piecewise Linear Regression Then we have also created graphs for prediction of crime
rate in the future using prophet.
The required data is collected and preprocessed as
required. Then we have created a heat map for depictions of
areas with high crime rate.

Fig.5.3. Prediction Graph

Thus we have created a system which predicts the crime rate I express my heartfelt gratitude for the warm and caring
and visualizes crime intensity of different areas in various environment provided by Smt. Indira Gandhi College of
ways. Engineering. This nurturing atmosphere and the excellent
working conditions have been truly encouraging and
6. CONCLUSION appreciated.

Crime is an unlawful act which disturbs the peace and 8. REFERENCES

harmony of the society. This projects aims to successfully
predict crime and their locations based on the historical [1] Raza, D. M. & Victor, D. B. ‘’Data mining and region
crime data. The project uses machine learning which is an prediction based on crime using random forest’’, 980–987
advanced and latest technology for accurate prediction. The (IEEE, 2021).
web application will display crime rate in various areas. It is
extremely useful for both the higher investigating authorities [2] Prakash Maurya, Tahir Shaikh, Imran Ahmed, Amaan
and officers designated to handle low level crime for tracking Firdosi, Prof. Kiran Deshmukh, “Crime Analysis and
and stopping the crime. The predictions will help to ensure Prediction Using Machine Learning”, IJRASET, Volume 11,
increased security and thus could help in lowering the crime Issue 4.
rate. Overall, the project demonstrates the potential of data
analysis and mapping technologies to improve public safety [3] Varun Mandalapu , Lavanya Elluri, Piyush Vyas, and
and inform decision-making. Proactive measures can be Nirmala Roy, “Crime Prediction Using Machine Learning and
taken to prevent crime and improve public safety by using Deep Learning: A Systematic Review and Future Directions” ,
data to identify crime hotspots and trends. Although there's IEEE, Volume 11.
more work needed to enhance the precision and breadth of
[4] Suhong Kim , Param Joshi, Parminder Singh Kalsi, and
the project, it marks a significant stride towards employing
Pooya Taheri, “Crime Analysis Through Machine Learning”,
data-driven strategies to tackle intricate social challenges.
IEEE, Conference: November 2018.
[5] Yujunrong Ma; Kiminori Nakamura; Eung-Joo Lee; Shuvra
Every project requires the guidance and support of the S. Bhattacharyya, “EADTC: An Approach to Interpretable and
experts for its completion. Therefore, we would like to take a Accurate Crime Prediction”, IEEE 2022.
moment to express our gratitude to all the individuals who
[6] Tzu-Wei Hung1 ,Chun-Ping Yen, “Predictive policing and
played a crucial role in shaping and realizing this project.
algorithmic fairness”, Synthese (2023)
Our heartfelt appreciation goes out to our project mentor,
[7] Ruaa Mohammed Saeed, Husam Ali Abdulmohsin, “A
Dr. Radhika Nanda, from the Department of Computer
study on predicting crime rates through machine learning
Science and Engineering (with a specialization in IOT and
and data mining using text”, Journal of Intelligent Systems,
Cybersecurity including Blockchain Technology) at Smt.
Volume 32, Issue 1.
Indira Gandhi College of Engineering, University of Mumbai.
Her guidance, unwavering support, and dedication [8] Shanjana A.S, Dr.R.Porkodi, “CRIME ANALYSIS AND
throughout the project have been invaluable. PREDICTION USING DATAMINING: A REVIEW”, 2021 IJCRT;
Volume 9, Issue 2; February 2021.
We'd also like to extend our thanks to our project
coordinator, Prof. Sarita Bhopalkar, for her assistance in [9] Prashant Banerjee,” Tutorial: Time Series Forecasting
selecting this project and providing us with essential insights with Prophet”, Kaggle.
on how to present it effectively.
URL: https://www.kaggle.com/code/prashant111/tutorial-
Our sincere recognition goes to all our professors at Smt. time-series-forecasting-with-prophet
Indira Gandhi College of Engineering, who provided us with
valuable guidance and tips during the project's design phase. [10] Prophet Documentation, facebook.github.
Their contributions have been so significant that it's
challenging to individually acknowledge each one. URL:https://facebook.github.io/prophet/docs/quick_start.ht
We are grateful for the support, both direct and indirect,
from our Head of Department, Dr. Madhu Nashipudimath. [11] “Flask Tutorial”, geeksforgeeks.org.
Her assistance was instrumental in various aspects of our
project, and we genuinely appreciate her contributions URL: https://www.geeksforgeeks.org/flask-tutorial/
through various channels.
[12] “pandas(software)”, en.wikipedia.org

URL: https://en.wikipedia.org/wiki/Pandas_(software)

[13] “NumPy”, en.wikipedia.org

URL: https://en.wikipedia.org/wiki/NumPy

[14] “Matplotlib”,en.wikipedia.org

URL: https://en.wikipedia.org/wiki/Matplotlib

[15] “Supervised Machine Learning”, javatpoint.com

URL: https://www.javatpoint.com/supervised-machine-

[16] “An introduction to seaborn”, seaborn.pydata.org

URL: https://seaborn.pydata.org/tutorial/introduction.html

