Mca Format Crime Prediction

CHAPTER 1
INTRODUCTION
1.1 DOMAIN INTRODUCTION
Machine Learning is the most popular technique of predicting the future or classifying
information to help people in making necessary decisions. Machine Learning
algorithms are trained over instances or examples through which they learn from past
experiences and also analyze the historical data. Therefore, as it trains over the
examples, again and again, it is able to identify patterns in order to make predictions
about the future. Data is the core backbone of machine learning algorithms. With the
help of the historical data, we are able to create more data by training these machine
learning algorithms. For example, Generative Adversarial Networks are an advanced 3
concept of Machine Learning that learns from the historical images through which they
are capable of generating more images. This is also applied towards speech and text
synthesis. Therefore, Machine Learning has opened up a vast potential for data science
applications.
1.1.1MACHINE LEARNING
Machine Learning combines computer science, mathematics, and statistics. Statistics

is essential for drawing inferences from the data. Mathematics is useful for developing
machine learning models and finally, computer science is used for implementing
algorithms. However, simply building models is not enough. You must also optimize
and tune the model appropriately so that it provides you with accurate results.
Optimization techniques involve tuning the hyper parameters to reach an optimum
result. The world today is evolving and so are the needs and requirements of people.
Furthermore, we are witnessing a fourth industrial revolution of data. In order to derive
meaningful insights from this data and learn from the way in which people and the
system interface with the data, we need computational algorithms that can churn the
data and provide us with results that would benefit us in various ways. Machine
Learning has revolutionized industries like medicine, healthcare, manufacturing,
banking, and several other industries. Therefore, Machine Learning has become an
1
essential part of modern industry. Data is expanding exponentially and in order to
harness the power of this data, added by the massive increase in computation power,
Machine Learning has added another dimension to the way we perceive information.
Machine Learning is being utilized everywhere. The electronic devices you use, the
applications that are part of your everyday life are powered by powerful machine
learning algorithms. With an exponential increase in data, there is a need for having a
system that can handle this massive load of data. Machine Learning models like Deep
Learning allow the vast majority of data to be handled with an accurate generation of
predictions. Machine Learning has revolutionized the way we perceive information and
the various insights we can gain out of it. These machine learning algorithms use the
patterns contained in the training data to perform classification and future predictions.
Whenever any new input is introduced to the ML model, it applies its learned patterns
over the new data to make future predictions. Based on the final accuracy, one can
optimize their models using various standardized approaches. In this way, Machine
Learning model learns to adapt to new examples and produce better results.
1.1.2. TYPES OF MACHINE LEARNING
Machine Learning Algorithms can be classified into 3 types as follows –
 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning
1.1.3. SUPERVISED LEARNING
In the majority of supervised learning applications, the ultimate goal is to develop a

finely tuned predictor function h(x) (sometimes called the “hypothesis”). “Learning”
consists of using sophisticated mathematical algorithms to optimize this function so
that, given input data x about a certain domain (say, square footage of a house), it will
accurately predict some interesting value h(x) (say, market price for said house). This
function takes input in four dimensions and has a variety of polynomial terms. Deriving
a normal equation for this function is a significant challenge. Many modern machine
learning problems take thousands or even millions of dimensions of data to build
predictions using hundreds of coefficients. Predicting how an organism’s genome will
2
be expressed, or what the climate will be like in fifty years, are examples of such
complex problems. Under supervised ML, two major subcategories are:
 Regression machine learning systems: Systems where the value being predicted falls
somewhere on a continuous spectrum.
 Classification machine learning systems: Systems where we seek a yes-or-no

prediction.
In practice, x almost always represents multiple data points. So, for example, a housing
price predictor might take not only square-footage (x1) but also number of bedrooms
(x2), number of bathrooms (x3), number of floors (x4), year built (x5), zip code (x6),
and so forth. Determining which inputs to use is an important part of ML design.
However, for the sake of explanation, it is easiest to assume a single input value is used.
Steps Involved in Supervised Learning:
 First Determine the type of training dataset
 Collect/Gather the labelled training data.
 Split the training dataset into training dataset, test dataset, and validation
dataset.
 Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
 Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
 Execute the algorithm on the training dataset. Sometimes we need validation

sets as the control parameters, which are the subset of training datasets.
 Evaluate the accuracy of the model by providing the test set.
3
1.1.4. REGRESSION
Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc.
 Linear Regression
 Regression Trees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression
1.1.5. CLASSIFICATION
Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc. Spam Filtering,
 Random Forest
 Decision Tree
 Logistic Regression
 Support vector Machines
1.2 PROPOSED ALGORITHMS
1.2.1 HYBRID MODELS
Combining clustering algorithms (like K-means or DBSCAN) with predictive

models (like Random Forests or Neural Networks) can provide comprehensive
insights. For example, clustering can be used to identify hotspots, and predictive models
can forecast future crime occurrences within these clusters.
4
1.2.2 INTEGRATING ALGORITHMS FOR COMPREHENSIVE ANALYSIS
To effectively analyze and predict crime hotspots, it is often beneficial to integrate

multiple algorithms. For instance, combining K-means clustering for spatial analysis
with time-series forecasting (e.g., ARIMA) can provide a more comprehensive view of
crime trends. Additionally, using ensemble methods can enhance prediction accuracy
by leveraging the strengths of various models.
1.3 REALTIME IMPLEMENTATION IN HYBRID ALGORITHM
1.3.1 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
DBSCAN is an unsupervised learning algorithm used to identify clusters in data with

varying densities. It is particularly useful in crime hotspot analysis for detecting
irregularly shaped clusters of crime incidents and distinguishing noise (outliers) from
clusters.
1.3.2 KEY CONCEPTS
 Density: DBSCAN defines clusters based on the density of points in a region.

 Eps (ε): The radius within which points are considered neighbors.
 Min Pts: The minimum number of points required to form a dense region (a
cluster).
 Core Points: Points that have at least Min Pts neighbors within Eps.
 Border Points: Points that are within Eps of a core point but do not have
enough neighbors to be a core point themselves.
 Noise Points: Points that do not belong to any cluster.
1.3.3 STEPS IN USING DBSCAN FOR CRIME HOTSPOT ANALYSIS
 Data Preparation:
 Collect spatial data on crime incidents, including coordinates (latitude
and longitude) and possibly timestamps.
 Preprocess data to handle missing values and normalize the spatial
coordinates.
5
 Parameter Selection:
 Determine appropriate values for Eps and Min Pts. This often requires
domain knowledge and experimentation. Methods like the k-distance
graph can help in selecting Eps.
 Cluster Detection:
 Apply DBSCAN to the preprocessed crime data to identify clusters of
crime incidents. Clusters represent crime hotspots, and noise points
represent isolated incidents.
 Analysis:
 Analyze the identified clusters to understand the spatial distribution of
crime hotspots.
 Visualize clusters using tools like QGIS or Geo Pandas.
1.3.4 ADVANTAGES OF DBSCAN
 Can detect clusters of arbitrary shape.

 Identifies noise and outliers effectively.
 Does not require specifying the number of clusters in advance.
1.3.5 USE CASE EXAMPLE:
 A city police department uses DBSCAN to analyze historical crime data and
identify areas with high crime density. The resulting clusters help in deploying
patrol units more effectively and targeting hotspot areas for community policing
efforts.
1.3.6 NEURAL NETWORKS
Neural Networks, particularly deep learning models like Convolutional Neural

Networks (CNNs) and Recurrent Neural Networks (RNNs), are powerful tools for
capturing complex patterns in data. In the context of crime hotspot prediction, they can
handle both spatial and temporal data, making them highly effective.
6
1.3.7 KEY CONCEPTS
 Artificial Neurons: Basic units that mimic biological neurons, processing

inputs and producing outputs based on learned weights.
 Layers: Neural networks consist of input, hidden, and output layers. Deep
networks have multiple hidden layers.
 Activation Functions: Functions like ReLU, Sigmoid, or Tanh that introduce
non-linearity into the network.
 Training: The process of learning weights using backpropagation and
optimization algorithms like gradient descent.
1.3.8 STEPS IN USING NEURAL NETWORKS FOR CRIME HOTSPOT

PREDICTION
 Data Preparation
 Collect and preprocess spatial (location) and temporal (time) crime data.
Normalize features and handle missing values.
 If using CNNs, spatial data might be represented as images or grids. If
using RNNs, data sequences need to be prepared.
 Model Selection
 CNNs: Effective for spatial data, capturing spatial dependencies by
applying convolutional filters. Useful for grid-based representations of
crime data.
 RNNs (e.g., LSTM, GRU): Effective for temporal data, capturing
sequential dependencies. Useful for time series forecasting of crime
incidents.
 Model Architecture
 Design the neural network architecture based on the problem. For
CNNs, this involves convolutional and pooling layers. For RNNs, this
involves recurrent layers like LSTM or GRU units.
 Add fully connected layers at the end for final prediction.
 Training and Validation
 Train the model using historical crime data, splitting the data into
training and validation sets.
7
 Use loss functions (e.g., mean squared error for regression tasks) and
optimization algorithms to update weights.
 Regularize the model to prevent overfitting (e.g., dropout layers, early
stopping).
 Prediction and Analysis:
 Use the trained model to predict future crime incidents and identify
potential hotspots.
 Analyze predictions to understand temporal trends and spatial
distributions.
1.3.9 ADVANTAGES OF NEURAL NETWORKS:
 Can model complex, non-linear relationships in data.

 Effective for large datasets with high dimensionality.
 Capable of capturing spatial and temporal dependencies.
1.3.10 USE CASE EXAMPLE:
 A city police department uses an LSTM network to forecast future crime

incidents based on historical crime data and socio-economic indicators. The
predictions help in proactive resource allocation and strategic planning to
prevent crime.
1.4 DEEP LEARNING
Deep learning is an artificial intelligence function that imitates the workings of

the human brain in processing data and creating patterns for use in decision making.
Deep learning is a subset of machine learning in artificial intelligence (AI) that has
networks capable of learning unsupervised from data that is unstructured or unlabeled.
Also known as deep neural learning or deep neural network.
Deep learning has evolved hand-in-hand with the digital era, which has brought
about an explosion of data in all forms and from every region of the world. This data,
known simply as big data, is drawn from sources like social media, internet search
engines, e-commerce platforms, and online cinemas, among others. This enormous
amount of data is readily accessible and can be shared through fintech applications like
8
cloud computing. However, the data, which normally is unstructured, is so vast that it
could take decades for humans to comprehend it and extract relevant information.
Companies realize the incredible potential that can result from unraveling this
wealth of information and are increasingly adapting to AI systems for automated
support. One of the most common AI techniques used for processing big data is machine
learning, a self-adaptive algorithm that gets increasingly better analysis and patterns
with experience or with newly added data.
Deep learning, a subset of machine learning, utilizes a hierarchical level of artificial

neural networks to carry out the process of machine learning. The artificial neural
networks are built like the human brain, with neuron nodes connected together like a
web. While traditional programs build analysis with data in a linear way, the
hierarchical function of deep learning systems enables machines to process data with a
nonlinear approach.
 Fundamental concepts of Deep Learning, including various Neural Networks

for supervised and unsupervised learning.
 Use of popular Deep Learning libraries such as Keras, PyTorch, and Tensor
flow applied to industry problems.
 Build, train, and deploy different types of Deep Architectures, including

Convolutional Networks, Recurrent Networks, and Autoencoders.
 Application of Deep Learning to real-world scenarios such as object recognition

and Computer Vision, image and video processing, text analytics, Natural
Language Processing, recommender systems, and other types of classifiers.
 Master Deep Learning at scale with accelerated hardware and GPUs.
1.5 TYPES OF DEEP LEARNING METHODS
Feedforward Neural Networks (FNN): These are the simplest form of neural
networks, where information flows in one direction, from input nodes through hidden
layers to output nodes. They're commonly used for tasks like classification and
regression.
9
Convolutional Neural Networks (CNN): CNNs are designed for processing
structured grid data such as images. They utilize convolutional layers to automatically
and adaptively learn spatial hierarchies of features from input data. CNNs are widely
used in image recognition and computer vision tasks.
Recurrent Neural Networks (RNN): Unlike feedforward networks, RNNs have

connections that form directed cycles, allowing them to exhibit dynamic temporal
behavior. This makes them suitable for sequence data like time series, text, and speech.
Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are
popular for handling long-term dependencies.
Generative Adversarial Networks (GAN): GANs consist of two neural networks, a

generator and a discriminator, trained simultaneously. The generator learns to produce
data that is similar to the training data, while the discriminator learns to distinguish
between real and fake data. GANs are used for generating realistic synthetic data,
image-to-image translation, and more.
Autoencoders: Autoencoders are a type of neural network used for unsupervised

learning. They aim to learn efficient representations of data by compressing the input
into a latent-space representation and then reconstructing the input from this
representation. Variants like Variational Autoencoders (VAE) also learn probabilistic
distributions over the latent space.
Recursive Neural Networks (Rec NN): These networks are designed to handle
hierarchical structures by recursively applying the same set of weights to different parts
of the input. They are commonly used in tasks involving tree-structured data such as
parsing and semantic compositionality.
Deep Belief Networks (DBN): DBNs are probabilistic generative models composed of
multiple layers of stochastic, latent variables. They are typically trained greedily, layer
by layer, using unsupervised learning techniques such as Restricted Boltzmann
Machines (RBMs).
Capsule Networks: Capsule networks are a recent advancement in deep learning that
aim to overcome some of the limitations of CNNs, particularly in handling hierarchical
relationships and spatial hierarchies within images.
10
1.6 INTRODUCTION OF CONVOLUTIONAL NEURAL NETWORKS (CNN):
Convolutional Neural Networks (CNNs) represent a groundbreaking advancement in

the realm of artificial intelligence, specifically within the domain of computer vision.
Their inception stemmed from the need to automate the process of feature extraction
and pattern recognition from visual data, tasks that were traditionally challenging for
conventional algorithms. At the heart of CNNs lies the concept of convolution, a
mathematical operation that involves combining input data with a predetermined filter
or kernel to produce a feature map. This operation is inspired by the human visual
system, where different parts of the retina are sensitive to different areas of the visual
field. By applying convolution operations across multiple layers, CNNs can effectively
learn hierarchical representations of features, gradually transforming raw pixel values
into higher-level abstractions.
The architecture of a CNN typically consists of several layers, each serving a specific
purpose in the feature extraction process. The first layer, known as the input layer,
receives the raw pixel values of an image. Subsequent layers, called convolutional
layers, apply convolution operations to extract various features such as edges, textures,
and shapes. These layers are followed by activation functions, such as the Rectified
Linear Unit (ReLU), which introduce non-linearities to the network, allowing it to learn
more complex relationships within the data. Pooling layers are another crucial
component of CNNs, often inserted after convolutional layers. Pooling operations, such
as max pooling or average pooling, reduce the spatial dimensions of the feature maps,
thereby decreasing the computational complexity of the network while retaining
important information. This downsampling process also helps in making the network
more robust to variations in the input data, such as changes in scale or orientation.
Beyond the convolutional and pooling layers, CNN architectures may also include fully
connected layers, which serve as the final stages of the network for classification or
regression tasks. These layers take the high-level features extracted by the preceding
layers and map them to the desired output, whether it be class labels in the case of image
classification or numerical values for regression tasks. One of the key strengths of
CNNs lies in their ability to automatically learn features from raw data, alleviating the
need for manual feature engineering. Through the process of backpropagation and
gradient descent, CNNs adjust their internal parameters, known as weights and biases,
11
to minimize the discrepancy between predicted outputs and ground truth labels. This
iterative learning process allows CNNs to adapt to various datasets and tasks, making
them highly versatile across a wide range of applications.
CNNs have demonstrated remarkable performance across numerous computer vision

tasks, surpassing human-level accuracy in tasks such as image classification, object
detection, and semantic segmentation. Their success can be attributed to several factors,
including their ability to capture hierarchical representations of features, their
parameter-sharing scheme, which enables efficient learning from limited data, and their
translational invariance properties, which make them robust to shifts and distortions in
input images.
Moreover, the widespread availability of pre-trained CNN models, such as VGG,

ResNet, and Inception, has facilitated the adoption of CNNs in both academia and
industry. These pre-trained models, often trained on large-scale image datasets like
ImageNet, can be fine-tuned or used as feature extractors for downstream tasks,
enabling rapid development and deployment of computer vision systems.
12
CHAPTER 2
LITERATURE SURVEY
2.1. TITLE: CRIME HOTSPOT IDENTIFICATION USING SVM IN

MACHINE LEARNING
AUTHOR: K VINOTHKUMAR; KUMAR S RANJITH; RAJ R VIKRAM; N.

MEKALA; R. RESHMA; S.P. SASIREKHA
DESCRIPTION:
This study is to identify and predict the criminal hotspot in crime data. Crime is one
of the society’s most pressing issues and preventing it is critical. This necessitates
keeping track of all crimes and maintaining a record of them for future reference
based on traditional literature. This work uses the Support Vector Machine algorithm
in machine learning to identify crime hotspot. Crime mapping is an important area of
research for crime analysis because it allows to visualize, analyze, and track crime or
illegal activities. The analyst can use crime mapping to identify high-crime areas,
trends, and patterns. This paper summarizes the latest spatiotemporal crime datasets
that are available publicly. The primary objective is to present recent advances in
crime hotspot prediction and identification.
DISADVANTAGES:
 Evaluation Metrics: The paper may lack a thorough discussion on evaluation
metrics used to assess the performance of the SVM model. Without proper
evaluation, it's difficult to ascertain the reliability and effectiveness of the
proposed approach in comparison to existing methods.
 Generalizability: The paper may not sufficiently address the generalizability
of the findings. Crime patterns can vary significantly across different regions
and time periods, and it's essential to discuss the transferability of the SVM
model to diverse contexts.
 Ethical Considerations: There might be ethical implications associated with
the use of machine learning in crime prediction, such as issues related to
privacy, bias, and potential misuse of predictive models. The paper should
ideally address these concerns and discuss strategies for mitigating them.
13
2.2 TITLE: CRIME PREDICTION USING MACHINE LEARNING AND DEEP
LEARNING: A SYSTEMATIC REVIEW AND FUTURE DIRECTIONS
AUTHOR: VARUN MANDALAPU, LAVANYA ELLURI, PIYUSH VYAS, AND

NIRMALYA ROY
DESCRIPTION:
Crime prediction is a complex problem requiring advanced analytical tools to
effectively address the gaps in existing detection mechanisms. With the increasing
availability of crime data and through the advancement of existing technology,
researchers were provided with a unique opportunity to study and research crime
detection using machine learning and deep learning methodologies. Based on the
recent advances in this field this article will explore current trends in machine
learning and deep learning for crime prediction and discuss how these cutting-edge
technologies are being used to detect criminal activities, predict crime patterns, and
prevent crime. Our primary goal is to provide a comprehensive overview of recent
advancements in this field and contribute to future research efforts.
DISADVANTAGES:
 Scope Limitations: Depending on the scope of the review, the paper may not
cover all relevant studies in the field. Exclusion of certain research works
could lead to gaps in the understanding of the current state-of-the-art in crime
prediction using machine learning and deep learning.
 Bias in Selection: There might be a risk of bias in the selection of studies
included in the review, which could impact the comprehensiveness and
objectivity of the analysis. Ensuring transparency in the selection criteria and
methodology is essential to mitigate this risk.
 Limited Insights into Implementation: While the paper likely discusses
various methodologies, it may provide limited insights into the practical
implementation of machine learning and deep learning models for crime
prediction. Practical considerations such as data preprocessing, feature
engineering, and model deployment are crucial for real-world applications but
may not be adequately addressed.
14
2.3 TITLE: AN EMPIRICAL ANALYSIS OF MACHINE LEARNING
ALGORITHMS FOR CRIME PREDICTION USING STACKED
GENERALIZATION: AN ENSEMBLE APPROACH
AUTHOR: SAPNA SINGH KSHATRI, DEEPAK SINGH, BHAVANA NARAIN,
SURBHI BHATIA, MOHAMMAD TABREZ QUASIM
DESCRIPTION:
Recently, a lot of research and predictions have been attempted on how to curb crimes
by various criminologists and researchers using different modelling and statistical tools.
As the rate of crime is still on the hike, therefore, there is a potential need of some
important research that can help the policy makers and the concerned department about
challenges and issues in the area of crime prediction and control mechanisms. Skillset
of human fails to keep track of criminal records, if handled manually. So, there is need
for identifying in a novel way, which will help in analysing crime related information.
Analysis on crime prediction is currently based on two significant aspects, prediction
of crime risk field [1], [2] and crime hotspot forecast [3]. Data processing techniques
are applied to facilitate this task. The expanded accessibility of computers and data
innovations have empowered law authorization offices to incorporate broad databases
with detailed information about major felonies, such as murder, rape, arson etc. In
recent years, huge number of crimes is being reported in the world.
DISADVANTAGES:
 Complexity and Interpretability: Ensemble methods like stacked
generalization can be complex, making it challenging to interpret the resulting
models and understand the underlying relationships between predictors and
outcomes. This lack of interpretability may limit the practical utility of the
approach.
 Data Availability and Quality: The effectiveness of machine learning
algorithms heavily depends on the availability and quality of the input data.
Without sufficient discussion on data preprocessing, feature engineering, and
data quality assessment, the validity and generalizability of the results may be
compromised.
 Evaluation Metrics: The paper may lack a thorough discussion on the
evaluation metrics used to assess the performance of the machine learning.
15
2.4 TITLE: REVIEW OF CRIME PREDICTION THROUGH MACHINE
LEARNING
AUTHOR: ABDULRAHMAN ABDULLAH ALSUBAYHIN, BANDER
ALZAHRANI, MUHAMMAD SHER RAMZAN
DESCRIPTION:
In every society, crime is a pervasive concern . Criminality is a
deleterious global phenomenon in-developed and developing countries. It
influences a society’s quality of life and economic prosperity. It is essential to
determine whether people should travel to a city or nation at a particular time
or, if they choose to, which places they should avoid[3]. It is also an important
indicator of a nation's social and economic development. Crime analysis is a crucial
aspect of criminology that focuses on studying patterns of conduct and
detecting criminals. Nearly every sector of society, including law
enforcement, has reaped the benefits of artificial intelligence, particularly
data science and machine learning. Consequently, reducing criminal activity has
always been a government priority.
DISADVANTAGES:
 Methodological Limitations: Depending on the scope of the paper, there may
be limitations in the methodology employed for crime prediction. Lack of
detail on the data sources, feature selection, model evaluation techniques, and
validation procedures could undermine the rigor and reproducibility of the
results.
 Generalizability: The effectiveness of machine learning models in crime
prediction may vary across different geographical locations, time periods, and
crime types. Without addressing the generalizability of the findings, the
paper's conclusions may be limited in their applicability to diverse contexts.
 Ethical Considerations: The use of predictive algorithms in crime prediction
raises ethical concerns related to privacy, fairness, accountability, and potential
biases in the data and models. It's essential for the paper to address these
ethical considerations and discuss strategies for mitigating risks and ensuring
responsible use of technology.
16
2.5 TITLE: STUDY ON CRIME EXAMINATION AND FORECASTING
USING MACHINE LEARNING
AUTHOR: PANKAJ SHINDE, ANCHAL SHUKLA, ROHIT PATIL, GAYATRI
MALI, MAHIMA KAKAD, PRASAD DHORE
DESCRIPTION:
Using existing data on crime scene Crime identification can be done for finding when
and where most of the crimes are occurring one can analyse the past crime that has
occurred the most and we can predict what type of crime is most likely to occur. The
increasing use of computer operated systems to track crimes may improve the process
of detecting and predicting crimes. Crime Examination is an important aspect in the
KNN field as there is a huge crime happening at present that needs to be efficiently
handled. So that the crime rate will decrease or reduce. A solution to this can be
proposed using various techniques such as KNN, SVM, clustering, and many others.
Automated data collection has encouraged the use of KNN for invasion and crime
Examination. Indeed, in many cities, states, and countries, etc. crime is rapidly
increasing, such as murder, robbery, etc.
DISADVANTAGES:
 Data Limitations: The effectiveness of machine learning models in crime
examination and forecasting depends heavily on the availability, quality, and
representativeness of the input data. Without addressing potential data
limitations and biases, the study's conclusions may be limited in their validity
and generalizability.
 Model Interpretability: Machine learning models, particularly complex ones
like neural networks, may lack interpretability, making it challenging to
understand the factors driving their predictions. Lack of model interpretability
could hinder the study's ability to provide actionable insights for law
enforcement agencies and policymakers.
 Ethical Considerations: The use of predictive algorithms in crime analysis
raises ethical concerns related to privacy, fairness, and potential biases in the
data and models. It's essential for the study to address these ethical
considerations and discuss strategies for mitigating risks and ensuring
responsible use of technology.
17
2.6 TITLE: CRIME PREDICTION USING MACHINE LEARNING: A
COMPARATIVE ANALYSIS
AUTHOR: ABDULRAHMAN ALSUBAYHIN, MUHAMMAD RAMZAN AND
BANDER ALZAHRANI
DESCRIPTION:
Generally, crimes are rather common social issues, influencing a country's reputation,
economic growth, and quality of life. They are perhaps a prime factor in influencing
several critical decisions in a person's life, such as avoiding dangerous areas, visiting
at the right time, and moving to a new place (ToppiReddy et al., 2018). Crimes define
and affect the impact and reputation of a community while placing a rather large
financial burden on a country due to the need for courts and additional police forces
(Saraiva et al., 2022). With an increase in crimes, there is an increased need to reduce
them systematically. In recent times, there has been a record increase in crime rates
throughout the world. It is possible to reduce these figures by analysing and predicting
crime occurrences. In such a situation, preventive measures can be taken quickly
(ToppiReddy et al., 2018). Crime forecasting in real-time is capable of helping save
lives and prevent crimes, gradually decreasing the crime rate (Wang et al., 2019).
With a comprehensive crime data analysis and modern techniques, crimes can be
predicted and support can be deployed without delay.
DISADVANTAGES:
 Methodological Limitations: Depending on the study design and
methodology, there may be limitations in the selection of machine learning
techniques, choice of evaluation metrics, and data preprocessing procedures.
These methodological limitations could impact the validity and
generalizability of the study's findings.
 Data Quality: The effectiveness of machine learning models in crime
prediction hinges on the quality of the input data. Without sufficient discussion
on data quality assessment and preprocessing techniques, the study's
conclusions may be compromised by issues such as missing data, outliers, and
biases.
 Interpretation of Results: Comparative analyses of machine learning
techniques can be complex, requiring careful interpretation of results and
consideration of various factors influencing performance. Lack of clarity in
18
interpreting the comparative analysis may hinder the study's impact and
practical utility.
2.7.TITLE: A STUDY ON PREDICTING CRIME RATES THROUGH

MACHINE LEARNING AND DATA MINING USING TEXT
AUTHOR: RUAA MOHAMMED SAEED, HUSAM ALI ABDULMOHSIN
DESCRIPTION:
Violations of the law pose a danger to the administration of justice and should be
curtailed. Computational crime prediction and forecasting can help improve the safety
of metropolitan areas. The inability of humans to process large amounts of complicated
data from big data makes it difficult to make early and accurate predictions about
criminal activity. Computational problems and opportunities arise from accurately
predicting crime rates, types, and hot locations based on historical patterns. Still, there
is a need for stronger prediction algorithms that target police patrols toward criminal
events, despite extensive research efforts [1]. Crime analysis is a methodology approach
used to identify crime spots and it is not an easy approach. In year 2020, Geographical
Information Systems (GIS) was the non-machine learning tool used earlier for temporal
and spatial data. GIS used the crime spots technique that mainly depends on crime type
to help reduce crime rates.
DISADVANTAGES:
 Incomplete or Biased Data: Crime data might be incomplete, inaccurate, or
biased, leading to skewed predictions. For example, certain crimes might be
underreported or overrepresented in the data.
 Noise in Text Data: Text data can be noisy and unstructured, requiring
extensive preprocessing to extract meaningful information.
 Bias and Discrimination: Predictive models might inadvertently perpetuate
existing biases in the data, leading to discriminatory policing practices.
 Privacy Issues: Using personal data, especially from social media or other
sources, raises significant privacy concerns and requires careful handling to
avoid misuse.
19
2.8 TITLE: PERFORMANCE ANALYSIS FOR CRIME PREDICTION AND
DETECTION USING MACHINE LEARNING ALGORITHMS
AUTHOR: R.GANESAN1, DR.SUBAN RAVICHANDRAN
DESCRIPTION:
In recent times, crime rates are increased day by day in various ways such as robbery,
drugs, murder, etc. Further, crime activities are varying from zone to zone. Hence, it is
an essential to solve the crime activities very fast manner. Nowadays, getting and
analysing the crime data are critical. The crime data can be identified by various factors
such as location of occurrence, crime detected time, and also predicting their future
relationship is an essential in crime preventing system [1]. In this research, time and
place are considered as main aspects in identifying the crime pattern. Machine
Learning (ML) techniques offers to extract the information from the collected
datasets also find the relationship between the crime, place, and time. Many
researchers stated that identifying the crime pattern is very critical and time-consuming
task [2]. It can be resolved by the ML techniques.
DISADVANTAGES:
 Incomplete or Inaccurate Data: The effectiveness of machine learning models
is heavily dependent on the quality of the data. Incomplete or inaccurate data
can lead to incorrect predictions.
 Data Integration Challenges: Combining data from multiple sources can be
complex and may require significant preprocessing to ensure consistency and
accuracy.
 High Computational Requirements: Machine learning models, especially those
involving deep learning, require significant computational resources and time
for training and deployment.
20
2.9 TITLE: CRIME ANALYSIS AND PREDICTION
AUTHOR: K SIREESHA, B.RAMYA, P.SRIJA, A.VAISHNAVI
DESCRIPTION:
The crime activities have been increased at a faster rate and it is the responsibility of
police department to control and reduce the crime activities. Crime prediction and
criminal identification are the major problems to the police department as there are
tremendous amount of crime data that exist. There is a need of technology through
which the case solving could be faster. The rate of crime is rising on a daily basis as
current technologies and high-tech ways assist criminals in carryingout their unlawful
activities.Crimes are neither systematic nor random otherwise crime cannot be analysis.
When crimes like robbery, firebombing etc. have been decreased, crimes like murder,
sex abuse, gang rape etc. have been increased. We cannot analyze the victims of crime
but can analyze the place where crime occurred or happened.Data about crime will be
gathered from a variety of blogs, news outlet, and websites. The massive data is used
to create a crime report database as a record.
DISADVANTAGES:
 Incomplete Data: The effectiveness of predictions depends on the quality and
completeness of the data. Incomplete or inaccurate data can lead to unreliable
predictions.
 Data Integration: Combining data from multiple sources can be complex and
may require extensive preprocessing.
 High Resource Demand: Advanced machine learning models often require
significant computational resources and time for training and deployment.
 Complexity: The complexity of these models can make them difficult to
understand and interpret for non-specialists.
21
2.10 TITLE: INTELLIGENT AUTOMATION OF CRIME PREDICTION
USING DATA MINING
ALGORITHM: Intelligent automation, Industrial electronics, Machine learning
algorithms, Urban areas, Boosting, Knowledge discovery, Linear discriminant
analysis
DESCRIPTION:
Crime characterizes the act of felony or grave offense against society or someone else's
property, or any illegal activity which is prohibited by law and happens almost
everywhere and at every possible time. However, crime studies have revealed that
crime does not happen evenly across all places and that specific types of crime tend to
occur more often in certain areas that are called crime hotspots for those types of
crimes. So, the spatial analysis of different types of crimes and their areas of occurrence
are of immense help to predict the types of crime that will occur in such areas in the
future, and to some extent predict the timing and the day of crime. That means, higher
percentage of crime occurs in hotspots and predicting them beforehand can be effective
for law enforcement, by helping law enforcement agencies to assign more resources to
the areas with higher probability of crime occurrence, and that way, the residents can
feel safer in their cities.
DISADVANTAGES:
 Bias in Data: Data used for training predictive models may contain inherent
biases, leading to skewed predictions or reinforcing existing disparities in
policing practices.
 Privacy Concerns: Analyzing large volumes of personal data to predict crime
may raise privacy concerns among the public, especially if there are doubts
about how the data is collected, stored, and used.
22
CHAPTER 3
SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
Data mining in the study and analysis of criminology can be categorized into main
areas, crime control and crime suppression. De Bruin et. Al. introduced a framework
for crime trends using a new distance measure for comparing all individuals based on
their profiles and then clustering them accordingly. Manish Gupta et. Al. highlights the
existing systems used by Indian police as e-governance initiatives and also proposes an
interactive query based interface as crime analysis tool to assist police in their activities.
He proposed interface which is used to extract useful information from the vast crime
database maintained by National Crime Record Bureau (NCRB) and find crime hot
spots using crime data mining techniques such as clustering etc. The effectiveness of
the proposed interface has been illustrated on Indian crime records. Sutapat Thiprungsri
examines the application of cluster analysis in the accounting domain, particularly
discrepancy detection in audit. The purpose of his study is to examine the use of
clustering technology to automate fraud filtering during an audit. He used cluster
analysis to help auditors focus their efforts when evaluating group life insurance claims.
3.2 PROPOSED SYSTEM
PROPOSED SYSTEM In this project, we will be using the technique of machine

learning and data science for crime prediction of crime data set. The crime data is
extracted from the official portal of police. It consists of crime information like location
description, type of crime, date, time, latitude, longitude. Before training of the model
data preprocessing will be done following this feature selection and scaling will be done
so that accuracy obtain will be high. The Logistic Regression classification and various
other algorithms (Decision Tree and Random Forest) will be tested for crime prediction
and one with better accuracy will be used for training. Visualization of dataset will be
done in terms of graphical representation of many cases for example at which time the
criminal rates are high or at which month the criminal activities are high. The whole
purpose of this project is to give a just idea of how machine learning can be used by the
law enforcement agencies to detect, predict and solve crimes at a much faster rate and
23
thus reduces the crime rate. This can be used in other states or countries depending upon
the availability of the dataset.
3.2.1. ADVANTAGES
 Utilizes HYBRID algorithm technology for precise crime prediction in real-

time video feeds, enhancing security effectiveness.
 Integrates Flask web application for seamless scalability and performance
optimization, ensuring reliable operation under heavy loads.
 Regular updates to deep learning algorithms mitigate false positives,
maintaining high trust levels and bolstering security measures.
 Cutting-edge technology enables instant identification of firearms, knives, or
other threats, facilitating swift responses.
24
CHAPTER 4
SYSTEM SPECIFICATION
4.1HARDWARE REQUIREMENTS
 Processor : Intel core processor 2.6.0 GHZ
 RAM : 4 GB
 Hard disk : 320 GB
 Compact Disk : 650 Mb
 Keyboard : Standard keyboard
 Monitor : 15 inch color monitor
4.2 SOFTWARE REQUIREMENTS

 Operating system : WINDOWS OS
 Front End : PYTHON
 IDE : JUPYTER LAB
 Application : Windows Application
25
CHAPTER 5
SYSTEM IMPLEMENTATION
5.1 MODULE LIST
LIST OF MODULES
 Data Collection Module

 Data Preprocessing Module
 Feature selection Module
 Building and Training Model
 Prediction Module
 Visualization Module
Data collection Module
Crime dataset from kaggle having 8000 entries of crime data is used in CSV format.
Data Preprocessing Module
8000 entries are present in the dataset. The null values are removed using df =
df.dropna() where df is the data frame. The categorical attributes (Location, Block,
Crime Type, and Community Area) are converted into numeric using Label Encoder.
The date 25 attribute is splitted into new attributes like month and hour which can be
used as feature for the model.
Feature selection Module
Features selection is done which can be used to build the model. The attributes used
for feature selection are Block, Location, District, Community area, X co-ordinate , Y
coordinate, Latitude , Longitude, Hour and month.
Building and Training Model
After feature selection location and month attribute are used for training. The dataset
is divided into pair of xtrain, ytrain and xtest, y test. The algorithms model is imported
form skleran. Building model is done using model. Fit (xtrain, ytrain).
26
Prediction Module
After the model is build using the above process, prediction is done using model.
Predict (xtest). The accuracy is calculated using accuracy score imported from metrics
- metrics. accuracy score (ytest, predicted).
Visualization Module
Using matpoltlib library from sklearn. Analysis of the crime dataset is done by
plotting various graphs.
27
CHAPTER 6
SYSTEM DESIGN
6.1 SYSTEM ARCHITECTURE
Figure no : 6.1.SYSTEM ARCHITECTURE
6.2 DATA FLOW DIAGRAM
A data flow diagram shows the way information flows through a process or system. It
includes data inputs and outputs, data stores, and the various sub processes the data
moves through. DFDs are built using standardized symbols and notation to describe
various entities and their relationships. Data flow diagrams visually represent systems
and processes that would be hard to describe in a chunk of text. You can use these
diagrams to map out an existing system and make it better or to plan out a new system
for implementation. Visualizing each element makes it easy to identify inefficiencies
and produce the best possible system.
It is also known as a context diagram. It’s designed to be an abstraction view, showing
the system as a single process with its relationship to external entities. It represents the
entire system as a single bubble with input and output data indicated by
incoming/outgoing arrows.
28
In 1-level DFD, the context diagram is decomposed into multiple bubbles/processes. In
this level, we highlight the main functions of the system and breakdown the high-level
process of 0-level DFD into sub processes.
2-level DFD goes one step deeper into parts of 1-level DFD. It can be used to plan or
record the specific/necessary detail about the system’s functioning.
6.2.1 LEVEL 0
It is also known as a context diagram. It’s designed to be an abstraction view, showing

the system as a single process with its relationship to external entities. It represents the
entire system as a single bubble with input and output data indicated by
incoming/outgoing arrows.
Training Phase Trained Data
Stored on
Database
29
6.2.2 LEVEL 1
In 1-level DFD, the context diagram is decomposed into multiple bubbles/processes. In

this level, we highlight the main functions of the system and breakdown the high-level
process of 0-level DFD into sub processes.
Testing Phase Testing Data
Crime Prediction
Using DBSCAN
Algorithm
Matched with trained

Pre-processing dataset using CNN
Performance Evaluation
30
6.3 UML DIAGRAMS
6.3.1 USE CASE DIAGRAM
A use case diagram is a dynamic or behavior diagram in UML. Use case

diagrams model the functionality of a system using actors and use cases. Use cases are
a set of actions, services, and functions that the system needs to perform.
Figure no:6.3. UML diagram
31
6.3.2 CLASS DIAGRAM
In software engineering, a class diagram in the Unified Modeling Language

(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, operations (or methods), and the
relationships among objects.
Figure no : 6.4.CLASS DIAGRAM
32
6.3.3 SEQUENCE DIAGRAM
A sequence diagram simply depicts interaction between objects in

a sequential order i.e. the order in which these interactions take place. We can also use
the terms event diagrams or event scenarios to refer to a sequence diagram.
Figure no :6.5 .SEQUENCE DIAGRAM
33
6.3.4 COLLABORATION DIAGRAM
A collaboration diagram, also known as a communication diagram, is an

illustration of the relationships and interactions among software objects in the Unified
Modeling Language (UML). These diagrams can be used to portray the dynamic
behavior of a particular use case and define the role of each object.
Figure no : 6.6 .COLLABORATION DIAGRAM
34
6.3.5 ACTIVITY DIAGRAM
An activity diagram is a behavioral diagram i.e. it depicts the behavior of a

system. An activity diagram portrays the control flow from a start point to a finish point
showing the various decision paths that exist while the activity is being executed.
Figure no : 6.7.ACTIVITY DIAGRAM
35
CHAPTER 7
SOFTWARE DESCRIPTION
7.1 PYTHON
Python is an interpreted high-level programming language for general-purpose

programming. Created by Guido van Rossum and first released in 1991, Python has a
design philosophy that emphasizes code readability, notably using significant
whitespace. It provides constructs that enable clear programming on both small and
large scales. In July 2018, Van Rossum stepped down as the leader in the language
community. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-oriented,
imperative, functional and procedural, and has a large and comprehensive standard
library.
Python interpreters are available for many operating systems. CPython, the
reference implementation of Python, is open source software and has a community-
based development model, as do nearly all of Python's other implementations. Python
and CPython are managed by the non-profit Python Software Foundation. Rather than
having all of its functionality built into its core, Python was designed to be highly
extensible. This compact modularity has made it particularly popular as a means of
adding programmable interfaces to existing applications.
Van Rossum's vision of a small core language with a large standard library and
easily extensible interpreter stemmed from his frustrations with ABC, which espoused
the opposite approach. While offering choice in coding methodology, the Python
philosophy rejects exuberant syntax (such as that of Perl) in favor of a simpler, less-
cluttered grammar. As Alex Martelli put it: "To describe something as 'clever' is not
considered a compliment in the Python culture."Python's philosophy rejects the Perl
"there is more than one way to do it" approach to language design in favour of "there
should be one—and preferably only one—obvious way to do it".
Python's developers strive to avoid premature optimization, and reject patches

to non-critical parts of CPython that would offer marginal increases in speed at the cost
of clarity.When speed is important, a Python programmer can move time-critical
functions to extension modules written in languages such as C, or use PyPy, a just-in-
36
time compiler. CPython is also available, which translates a Python script into C and
makes direct C-level API calls into the Python interpreter. An important goal of
Python's developers is keeping it fun to use. This is reflected in the language's name a
tribute to the British comedy group Monty Python and in occasionally playful
approaches to tutorials and reference materials, such as examples that refer to spam and
eggs (from a famous Monty Python sketch) instead of the standard for and bar.
A common neologism in the Python community is pythonic, which can have a

wide range of meanings related to program style. To say that code is pythonic is to say
that it uses Python idioms well, that it is natural or shows fluency in the language, that
it conforms with Python's minimalist philosophy and emphasis on readability. In
contrast, code that is difficult to understand or reads like a rough transcription from
another programming language is called unpythonic. Users and admirers of Python,
especially those considered knowledgeable or experienced, are often referred to as
Pythonists, Pythonistas, and Pythoneers.
Python is an interpreted, object-oriented, high-level programming language

with dynamic semantics. Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability and
therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed. Often,
programmers fall in love with Python because of the increased productivity it provides.
Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation
fault. Instead, when the interpreter discovers an error, it raises an exception. When the
program doesn't catch the exception, the interpreter prints a stack trace. A source level
debugger allows inspection of local and global variables, evaluation of arbitrary
expressions, setting breakpoints, stepping through the code a line at a time, and so on.
The debugger is written in Python itself, testifying to Python's introspective power. On
the other hand, often the quickest way to debug a program is to add a few print
37
statements to the source: the fast edit-test-debug cycle makes this simple approach very
effective.
Python’s initial development was spearheaded by Guido van Rossum in the late
1980s. Today, it is developed by the Python Software Foundation. Because Python is a
multipara digmlanguage, Python programmers can accomplish their tasks using
different styles of programming: object oriented, imperative, functional or reflective.
Python can be used in Web development, numeric programming, game development,
serial port access and more.
There are two attributes that make development time in Python faster than in other
programming languages:
1. Python is an interpreted language, which precludes the need to compile code

before executing a program because Python does the compilation in the
background. Because Python is a high-level programming language, it abstracts
many sophisticated details from the programming code. Python focuses so
much on this abstraction that its code can be understood by most novice
programmers.
2. Python code tends to be shorter than comparable codes. Although Python offers
fast development times, it lags slightly in terms of execution time. Compared to
fully compiling languages like C and C++, Python programs execute slower. Of
course, with the processing speeds of computers these days, the speed
differences are usually only observed in benchmarking tests, not in real-world
operations. In most cases, Python is already included in Linux distributions and
Mac OS X machines.
Python Use Cases
• Creating web applications on a server

• Building workflows that can be used in conjunction with software
• Connecting to database systems
• Reading and modifying files
• Performing complex mathematics
• Processing big data
38
• Fast prototyping
• Developing production-ready software
Professionally, Python is great for backend web development, data analysis, artificial
intelligence, and scientific computing. Developers also use Python to build productivity
tools, games, and desktop apps.
Features and Benefits of Python
 Compatible with a variety of platforms including Windows, Mac, Linux,

Raspberry Pi, and others
 Uses a simple syntax comparable to the English language that lets developers
use fewer lines than other programming languages
 Operates on an interpreter system that allows code to be executed
immediately, fast-tracking prototyping
 Can be handled in a procedural, object-orientated, or functional way
Python Syntax
 Somewhat similar to the English language, with a mathematical influence,

Python is built for readability
 Unlike other languages that use semicolons and/or parentheses to complete a
command, Python uses new lines for the same function
 Defines scope (i.e., loops, functions, classes) by relying indentation, using
whitespace, rather than braces (aka curly brackets)
Python Flexibility
Python, a dynamically typed language, is especially flexible, eliminating hard rules for
building features and offering more problem-solving flexibility with a variety of
methods. It also allows uses to compile and run programs right up to a problematic area
because it uses run-time type checking rather than compile-time checking.
The Less Great Parts of Python
39
On the down side, Python isn’t easy to maintain. One command can have multiple
meanings depending on context because Python is a dynamically typed language. And,
maintaining a Python app as it grows in size and complexity can be increasingly
difficult, especially finding and fixing errors. Users will need experience to design code
or write unit tests that make maintenance easier. Speed is another weakness in Python.
Its flexibility, because it is dynamically typed, requires a significant amount of
referencing to land on a correct definition, slowing performance. This can be mitigated
by using alternative implementation of Python.
Python and AI
AI researchers are fans of Python. Google TensorFlow, as well as other libraries (scikit-
learn, Keras), establish a foundation for AI development because of the usability and
flexibility it offers Python users. These libraries, and their availability, are critical
because they enable developers to focus on growth and building.
Good to Know
The Python Package Index (PyPI) is a repository of software for the Python
programming language. PyPI helps users find and install software developed and shared
by the Python community.
How Easy Is It to Learn Python?
Is Python easy to learn? Yes and no. When compared to other programming languages,
such as Java or C, Python is easy to learn. One aspect of the language that makes Python
easy to learn is that its syntax mimics human-readable language. The fact that its syntax
is designed to be clear, concise and easy to read eliminates many errors and softens the
learning curve.
Python also has a large standard library with prewritten code, which reduces the need
to write every line of code. Python’s supportive community and abundance of learning
resources also help make the language more friendly to newcomers.
But while many coders consider Python easy to learn, becoming proficient in any
programming language is challenging. Software development takes time, patience and
problem-solving skills.
40
Python Is a Scripting Language
A scripting language is a programming language that’s designed for automating tasks.

Scripting languages, such as Python, are interpreted and executed directly by an
interpreter or runtime environment. Scripting languages are not compiled into machine
code before being run, whereas traditional programming languages like C++ or Java
are compiled at runtime. Because Python is a scripting language, it excels at tasks such
as file operations, system administration, web scraping, network programming and
network automation, as well as big data operations such as data processing, scientific
computing and data analysis.
Usage of Python
Python is a popular language choice for web and software development. Frameworks
like Django and Flask make it easier to create robust and scalable web applications. But
in many cases, other tools can work as a replacement for Python.
Where Python really stands out is in the big data ecosystem. Python is often used for
data science and analytics, scientific computing, machine learning (ML) and artificial
intelligence (AI). Because of this, Python’s ecosystem is rich with libraries such as
Pandas, NumPy and Matplotlib that enable data manipulation and analysis. Tools such
as TensorFlow, PyTorch, scikit-learn and Keras dominate the ML/AI space.
Introduction to Python Programming
The Python language is dynamically typed, meaning the variable types don’t have to
be explicitly specified. However, variables do have types and their types matter. The
Python interpreter checks variable types at runtime: this makes the language also
strongly typed. The Python interpreter is a program that reads and executes Python
code. The interpreter interprets the developer-written source code into computer
hardware readable form. There are several implementations of the Python interpreter,
the standard-bearer and most popular being CPython.
Python is a single-threaded language. The Global Interpreter Lock (GIL) is essentially

a lock that keeps one thread of Python in a state of execution at a time. Since the GIL
is specific to CPython, there are interpretations of Python that don’t include the GIL.
41
Macs now include the Python language with every operating system. To confirm
whether a Mac has Python included, open the Terminal and type “python –version”.
Either a version number or “file not found” will appear. To install Python on a Mac,
visit the Python website at python.org.
Introduction to Python Coding Fundamentals
Variables and Data Types
Python stores data in variables.
Python programming fundamentals include several different data types and data
structures. Python’s primitive data types are basic data types that represent single values
with no methods or attributes.
Python’s data structures organize complex information and store varied types of data.
Lists are ordered collections of data and can include data of any type. The list’s “order”
refers to the list’s indices, not the arrangement of elements inside the list.
Tuples are immutable lists. Tuples can’t be changed after they’re created.
Dictionaries store data in key-value pairs.
Sets are collections of unique elements. Sets automatically remove repeated terms if
they were previously included in the set. Try copying and pasting the following code
into an IDE to see how duplicate items are removed.
7.2 MY SQL
MySQL is the world's most used open source relational database management
system (RDBMS) as of 2008 that run as a server providing multi-user access to a
number of databases. The MySQL development project has made its source code
available under the terms of the GNU General public Licence , as well as under a variety
of proprietary agreements. MySQL was owned and sponsored by a single for profit
firm, the Swedish company MySQL AB, now owned by Oracle corporation.
MySQL is a popular choice of database for use in web applications, and is a

central component of the widely used LAMP open source web application software
42
stack—LAMP is an acronym for "Linux, Apache, MySQL, perl/PHP/python."Free
software-open source projects that require a full-featured database management system
often use MySQL.For commercial use, several paid editions are available, and offer
additional functionality. Applications which use MySQL databases include: TYPO3,
Joomla, word press, phpBB, MyBB, Drupal and other software built on the software
stack. MySQL is also used in many high-profile, large-scale World wide web products,
including Wikipedia, Google(though not for searches), imagebook Twitter, flickr,
Nokia.com, and YouTube.
Inter images
MySQL is primarily an RDBMS and ships with no GUI tools to administer

MySQL databases or manage data contained within the databases. Users may use the
included command line tools, or use MySQL "front-ends", desktop software and web
applications that create and manage MySQL databases, build database structures, back
up data, inspect status, and work with data records. The official set of MySQL front-
end tools, MySQL Workbench is actively developed by Oracle, and is freely available
for use.
Graphical
The official MySQL Workbench is a free integrated environment developed by

MySQL AB, that enables users to graphically administer MySQL databases and
visually design database structures. MySQL Workbench replaces the previous package
of software, GUI tools. Similar to other third-party packages, but still considered the
authoritative MySQL frontend, MySQL Workbench lets users manage database design
& modeling, SQL development (replacing MySQL Query Browser) and Database
administration (replacing MySQL Administrator).MySQL Workbench is available in
two editions, the regular free and open source Community Edition which may be
downloaded from the MySQL website, and the proprietary Standard Edition which
extends and improves the feature set of the Community Edition.
PhpMyAdmin and MySQL
phpMyAdmin is a web-based software that allows you the ability to interact with your
MySQL databases. phpMyAdmin is an easy way to run MySQL commands as well as
43
database operations, like browsing and changing data tables, or importing, exporting,
or deleting data. It especially is useful when you want to perform maintenance on data
and backup or edit information in the case of WordPress itself not functioning properly.
However, keep in mind that phpMyAdmin won’t work properly if the database is
misconfigured or broken.
MySQL Features
MySQL is a relational database management system (RDBMS) based on the SQL

(Structured Query Language) queries. It is one of the most popular languages for
accessing and managing the records in the table. MySQL is open-source and free
software under the GNU license. Oracle Company supports it.
The following are the most important features of MySQL:
Relational Database Management System (RDBMS)
MySQL is a relational database management system. This database language is based

on the SQL queries to access and manage the records of the table.
Easy to use: MySQL is easy to use. We have to get only the basic knowledge of SQL.
We can build and interact with MySQL by using only a few simple SQL statements.
It is secure: MySQL consists of a solid data security layer that protects sensitive data
from intruders. Also, passwords are encrypted in MySQL.
Client/ Server Architecture: MySQL follows the working of a client/server

architecture. There is a database server (MySQL) and arbitrarily many clients
(application programs), which communicate with the server; that is, they can query
data, save changes, etc.
Free to download: MySQL is free to use so that we can download it from MySQL
official website without any cost.
It is scalable: MySQL supports multi-threading that makes it easily scalable. It can

handle almost any amount of data, up to as much as 50 million rows or more. The
44
default file size limit is about 4 GB. However, we can increase this number to a
theoretical limit of 8 TB of data.
Speed: MySQL is considered one of the very fast database languages, backed by a large
number of the benchmark test.
High Flexibility: MySQL supports a large number of embedded applications, which

makes MySQL very flexible.
Compatible on many operating systems: MySQL is compatible to run on many

operating systems, like Novell NetWare, Windows* Linux*, many varieties of UNIX*
(such as Sun* Solaris*, AIX, and DEC* UNIX), OS/2, FreeBSD*, and others. MySQL
also provides a facility that the clients can run on the same computer as the server or on
another computer (communication via a local network or the Internet).
Allows roll-back: MySQL allows transactions to be rolled back, commit, and crash
recovery.
Memory efficiency: Its efficiency is high because it has a very low memory leakage
problem.
High Performance: MySQL is faster, more reliable, and cheaper because of its unique
storage engine architecture. It provides very high-performance results in comparison to
other databases without losing an essential functionality of the software. It has fast
loading utilities because of the different cache memory.
High Productivity: MySQL uses Triggers, Stored procedures, and views that allow
the developer to give higher productivity.
Platform Independent: It can download, install, and execute on most of the available
operating systems.
Partitioning: This feature improves the performance and provides fast management of
the large database.
Disadvantages/Drawback of MySQL
Following are the few disadvantages of MySQL:
45
 MySQL version less than 5.0 doesn't support ROLE, COMMIT, and stored
procedure.
 MySQL does not support a very large database size as efficiently.
 MySQL doesn't handle transactions very efficiently, and it is prone to data
corruption.
 MySQL is accused that it doesn't have a good developing and debugging tool
compared to paid databases.
 MySQL doesn't support SQL check constraints.
Architecture of MySQL
MySQL is a Relational Database Management system which is free Open Source

Software Under GNU License. It is also supported by Oracle Company . It is fast ,
scalable, easy to use database management System. MySQL support many operating
system like Windows, Linux, MacOS etc.
MySQL is a Structured Query Language which is used to manipulate, manage and

retrieve data with the help of various Queries.
MySQL is developed and supported by MySQL AB, which is a Swedish Company and
written in C and C++ programming language. It was developed by Michael Widenius
and David Hughes . It is often used to say that MySQL is named after the name of the
daughter of the co-founder Michael Widenius whose name is ‘My’.
Architecture of MySQL:
Architecture of MySQL describes the relation among the different components of

MySQL System. MySQL follow Client-Server Architecture. It is designed so that end
user that is Clients can access the resources from Computer that is server using various
networking services. The Architecture of MY SQL contain following major layer’s :
Client Layer:
This layer is the topmost layer in the above diagram. The Client give request
instructions to the Serve with the help of Client Layer .The Client make request through
Command Prompt or through GUI screen by using valid MySQL commands and
46
expressions .If the Expressions and commands are valid then the output is obtained on
the screen. Some important services of client layer are :
Connection Handling :
When a client send request to the server and server will accept the request and the client
is connected .. When Client is connected to the server at that time , a client get it’s own
thread for it’s connection. With the help of this thread all the queries from client side is
executed.
Authentication:
Authentication is performed on the server side when client is connected to the

MySQL server. Authentication is done with the help of username and password.
Security:
After authentication when the client gets connected successfully to MySQL server, the
server will check that a particular client has the privileges to issue in certain queries
against MySQL server.
Server Layer:
The second layer of MySQL architecture is responsible for all logical functionalities of
relational database management system of MySQL. This Layer of MySQL System is
also known as “Brain of MySQL Architecture”. When the Client give request
instructions to the Server and the server gives the output as soon as the instruction is
matched. The various subcomponents of MySQL server are:
Thread Handling:
When a client send request to the server and server will accept the request and the client
is connected .. When Client is connected to the server at that time , a client get it’s own
thread for it’s connection. This thread is provided by thread handling of Server Layer.
Also the queries of client side which is executed by the thread is also handled by
Thread Handling module.
47
Parser:
A Parser is a type of Software Component that built a data structure(parse tree) of

given input . Before parsing lexical analysis is done i.e. input is broken into number of
tokens . After the data is available in the smaller elements parser perform Syntax
Analysis , Semantics Analysis after that parse tree is generated as output.
Optimizer –
As soon as the parsing is done , various types of optimization techniques are applied
at Optimizer Block. These techniques may include rewriting the query, order of
scanning of tables and choosing the right indexes to use etc.
Query Cache:
Query Cache stores the complete result set for inputted query statement. Even before
Parsing , MySQL Server consult query cache . When client write a query if the query
written by client is identical in the cache then the server simply skip the parsing,
optimization and even execution, it just simply display the output from the cache.
Buffer and Cache:
Cache and will buffer store the previous query or problem asked by user. When User
write a query then it firstly goes to Query Cache then query cache will check that the
same query or problem is available in the cache. If the same query is available then it
will provide output without interfering Parser, Optimizer.
Table Metadata Cache:
The metadata cache is a reserved area of memory used for tracking information on
databases, indexes, or objects. The greater the number of open databases, indexes, or
objects, the larger the metadata cache size.
Key Cache:
A key cache is an index entry that uniquely identifies an object in a cache. By default,
edge servers cache content based on the entire resource path and a query string.
48
CHAPTER 8
SYSTEM TESTING
8.1. TESTING PROCESS
Testing is a set activity that can be planned and conducted systematically.

Testing begins at the module level and work towards the integration of entire
computers-based system. Nothing is complete without testing, as it is vital success of
the system.
Testing Objectives:
There are several rules that can serve as testing objectives, they are
1. Testing is a process of executing a program with the intent of finding an

error
2. A good test case is one that has high probability of finding an undiscovered
error.
3. A successful test is one that uncovers an undiscovered error.
If testing is conducted successfully according to the objectives as stated above,

it would uncover errors in the software. Also testing demonstrates that software
functions appear to the working according to the specification, that performance
requirements appear to have been met.
8.2 STRATEGIC APPROACH TO SOFTWARE TESTING
The development process involves various types of testing. Each test type
addresses a specific testing requirement. The most common types of testing involved
in the development process are:
 Unit Test
 Functional Test
 Integration Test
 White box Test
 Black box Test
49
 System Test
 Validation Test
 Acceptance Test
8.2.1 Unit Testing:
The first test in the development process is the unit test. The source code is
normally divided into modules, which in turn are divided into smaller units called units.
These units have specific behavior. The test done on these units of code is called unit
test. Unit test depends upon the language on which the project is developed. Unit tests
ensure that each unique path of the project performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
8.2.2 Functional Testing:
Functional test can be defined as testing two or more modules together with
the intent of finding defects, demonstrating that defects are not present, verifying that
the module performs its intended functions as stated in the specification and
establishing confidence that a program does what it is supposed to do.
8.2.3 Integration Testing:
In integration testing modules are combined and tested as a group. Modules are
typically code modules, individual applications, source and destination applications on
a network, etc. Integration Testing follows unit testing and precedes system testing.
Testing after the product is code complete. Betas are often widely distributed or even
distributed to the public at large in hopes that they will buy the final product when it is
released.
8.2.4 White Box Testing:
Testing based on an analysis of internal workings and structure of a piece of

software. This testing can be done sing the percentage value of load and energy. The
tester should know what exactly is done in the internal program. It includes techniques
such as Branch Testing and Path Testing. White box testing also called as Structural
Testing or Glass Box Testing.
50
8.2.5 Black Box Testing:
In block box testing without knowledge of the internal workings of the item
being tested. Tests are usually functional. This testing can be done by the user who has
no knowledge of how the shortest path is found.
8.2.6 System Testing
System testing is defined as testing of a complete and fully integrated software product.
This testing falls in black-box testing wherein knowledge of the inner design of the
code is not a pre-requisite and is done by the testing team. It is the final test to verify
that the product to be delivered meets the specifications mentioned in the requirement
document. It should investigate both functional and non-functional requirements.
8.2.7 Validation Testing
The process of evaluating software during the development process or at the end of
the development process to determine whether it satisfies specified business
requirements. Validation Testing ensures that the product actually meets the client's
needs. It can also be defined as to demonstrate that the product fulfils its intended use
when deployed on appropriate environment.
8.2.8 Acceptance Testing
This is a type of testing done by users, customers, or other authorised entities to

determine application/software needs and business processes. Acceptance testing is the
most important phase of testing as this decides whether the client approves the
application/software or not. It may involve functionality, usability, performance, and
U.I of the application. It is also known as user acceptance testing (UAT), operational
acceptance testing (OAT), and end-user testing.
51
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT
9.1 CONCLUSION
In this paper focused on building predictive models for crime frequencies per crime
type per month. The crime rates in India are increasing day by day due to many factors
such as increase in poverty, implementation, corruption, etc. The proposed model is
very useful for both the investigating agencies and the police official in taking necessary
steps to reduce crime. The project helps the crime analysis to analysis these crime
networks by means of various interactive visualization. Future enhancement of this
research work on training bots to predict the crime prone areas by using machine
learning techniques. Since, machine learning is similar to data mining advanced concept
of machine learning can be used for better prediction. The data privacy, reliability,
accuracy can be improved for enhanced prediction.
52
9.2 FUTURE ENHANCEMENT
Future Enhancement Crime analysis takes past crime data to predict future crime
locations and time. Crime prediction for future crime is process that finds out crime rate
change from one year to the next and projects those changes into the future. Crime
predictions can be made through both qualitative and quantitative methods.
53
APPENDIX 1
(SAMPLE CODE)
# import numpy as np
import pandas as pd
from flask import Flask, request, jsonify, render_template, redirect, flash, send_file
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
import pickle
app = Flask(_name_) #Initialize the flask App
model = pickle.load( open('model.pkl', 'rb') )
@app.route('/')
@app.route('/first')
def first():
return render_template('first.html')
@app.route('/login')
def login():
return render_template('login.html')
@app.route('/upload')
def upload():
return render_template('upload.html')
@app.route('/preview',methods=["POST"])
def preview():
44
if request.method == 'POST':
dataset = request.files['datasetfile']
df = pd.read_csv(dataset,encoding = 'unicode_escape')
df.set_index('Id', inplace=True)
return render_template("preview.html",df_view = df)
#@app.route('/home')
#def home():
# return render_template('home.html')
@app.route('/prediction', methods = ['GET', 'POST'])
def prediction():
54
return render_template('prediction.html')
#@app.route('/upload')
#def upload_file():
# return render_template('BatchPredict.html')
@app.route('/predict',methods=['POST'])
def predict():
int_feature = [x for x in request.form.values()]
print(int_feature)
45
int_feature = [float(i) for i in int_feature]
final_features = [np.array(int_feature)]
prediction = model.predict(final_features)
output = format(prediction[0])
print(output)
return render_template('prediction.html', prediction_text= output)
@app.route('/chart')
def chart():
return render_template('chart.html')
@app.route('/performance')
def performance():
return render_template('performance.html')
if _name_ == "_main_":
app.run(debug=True)
55
APPENDIX 2
(SCREEN SHOT)
56
57
58
59
60
61
REFERENCES
1. Heinold, Brian. "A practical introduction to Python programming." (2021).
2. Kneusel, Ronald T. Practical deep learning: A Python-based introduction. No
Starch Press, 2021.
3. Dhruv, Akshit J., Reema Patel, and Nishant Doshi. "Python: the most advanced
programming language for computer science applications." Science and
Technology Publications, Lda (2021): 292-299.
4. Sundnes, Joakim. Introduction to scientific programming with Python. Springer
Nature, 2020.
5. Hill, Christian. Learning scientific programming with Python. Cambridge
University Press, 2020.
62

Mca Format Crime Prediction

Uploaded by

Copyright:

Available Formats

Mca Format Crime Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mca Format Crime Prediction

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.1 DOMAIN INTRODUCTION

Machine Learning combines computer science, mathematics, and statistics. Statistics

1.1.2. TYPES OF MACHINE LEARNING

Machine Learning Algorithms can be classified into 3 types as follows –

1.1.3. SUPERVISED LEARNING

In the majority of supervised learning applications, the ultimate goal is to develop a

 Classification machine learning systems: Systems where we seek a yes-or-no

Steps Involved in Supervised Learning:

 First Determine the type of training dataset

 Collect/Gather the labelled training data.

 Execute the algorithm on the training dataset. Sometimes we need validation

 Evaluate the accuracy of the model by providing the test set.

 Bayesian Linear Regression

 Support vector Machines

1.2 PROPOSED ALGORITHMS

1.2.1 HYBRID MODELS

Combining clustering algorithms (like K-means or DBSCAN) with predictive

To effectively analyze and predict crime hotspots, it is often beneficial to integrate

1.3 REALTIME IMPLEMENTATION IN HYBRID ALGORITHM

1.3.1 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

DBSCAN is an unsupervised learning algorithm used to identify clusters in data with

1.3.2 KEY CONCEPTS

 Density: DBSCAN defines clusters based on the density of points in a region.

1.3.3 STEPS IN USING DBSCAN FOR CRIME HOTSPOT ANALYSIS

1.3.4 ADVANTAGES OF DBSCAN

 Can detect clusters of arbitrary shape.

1.3.5 USE CASE EXAMPLE:

1.3.6 NEURAL NETWORKS

Neural Networks, particularly deep learning models like Convolutional Neural

 Artificial Neurons: Basic units that mimic biological neurons, processing

1.3.8 STEPS IN USING NEURAL NETWORKS FOR CRIME HOTSPOT

1.3.9 ADVANTAGES OF NEURAL NETWORKS:

 Can model complex, non-linear relationships in data.

1.3.10 USE CASE EXAMPLE:

 A city police department uses an LSTM network to forecast future crime

1.4 DEEP LEARNING

Deep learning is an artificial intelligence function that imitates the workings of

Deep learning, a subset of machine learning, utilizes a hierarchical level of artificial

 Fundamental concepts of Deep Learning, including various Neural Networks

 Build, train, and deploy different types of Deep Architectures, including

 Application of Deep Learning to real-world scenarios such as object recognition

 Master Deep Learning at scale with accelerated hardware and GPUs.

1.5 TYPES OF DEEP LEARNING METHODS

Recurrent Neural Networks (RNN): Unlike feedforward networks, RNNs have

Generative Adversarial Networks (GAN): GANs consist of two neural networks, a

Autoencoders: Autoencoders are a type of neural network used for unsupervised

Convolutional Neural Networks (CNNs) represent a groundbreaking advancement in

CNNs have demonstrated remarkable performance across numerous computer vision

Moreover, the widespread availability of pre-trained CNN models, such as VGG,

2.1. TITLE: CRIME HOTSPOT IDENTIFICATION USING SVM IN

AUTHOR: K VINOTHKUMAR; KUMAR S RANJITH; RAJ R VIKRAM; N.

AUTHOR: VARUN MANDALAPU, LAVANYA ELLURI, PIYUSH VYAS, AND

2.7.TITLE: A STUDY ON PREDICTING CRIME RATES THROUGH

3.1 EXISTING SYSTEM

3.2 PROPOSED SYSTEM

PROPOSED SYSTEM In this project, we will be using the technique of machine

 Utilizes HYBRID algorithm technology for precise crime prediction in real-

4.2 SOFTWARE REQUIREMENTS