Mini Project

Crime Analysis and Detection using Data Mining
A PROJECT REPORT
Submitted by
Anuj Sharma (22BCS17012)

Himanshu (22BCS16602)
Vinay Pratap (22BCS13693)
in partial fulfillment for the award of the degree of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE ENGINEERING
Chandigarh University
MAY 2024
TABLE OF CONTENTS
List of Figures ............................................................................................................ i
List of Tables ........................................................................................................... ii
Abstract ........................................................................................................................... 4
Graphical Abstract ............................................................ Error! Bookmark not defined.
CHAPTER 1. INTRODUCTION .......................................................................... 3
1.1 Client Identification/Need Identification/Identification of relevant Contemporary issue1.2

Existing solutions ......................................................................................................................5
2.1 Bibliometric analysis .........................................................................................................7
2.2 Review Summary
...............................................................................................................9 2.3 Problem
Definition ...........................................................................................................10
2.4 Goals/Objectives ..............................................................................................................11
List of Figures
Figure 3.1 ………………………………………………………………………………….
Figure 3.2 ………………………………………………………………………………….
Figure 4.1 …………………………………………………………………………….……

ABSTRACT
Data mining can be used to model crime detection problems. Crimes are a social nuisance and cost our
society dearly in several ways. Any research that can help in solving crimes faster will pay for itself. About
10% of the criminals commit about 50% of the crimes. Here we look at use of clustering algorithm for a
data mining approach to help detect the crimes patterns and speed up the process of solving crime. We
will look at k-means clustering with some enhancements to aid in the process of identification of crime
patterns. We applied these techniques to real crime data from a sheriff’s office and validated our results.
We also use semi-supervised learning technique here for knowledge discovery from the crime records and
to help increase the predictive accuracy. We also developed a weighting scheme for attributes here to deal
with limitations of various out of the box clustering tools and techniques. This easy to implement data
mining framework works with the geospatial plot of crime and helps to improve the productivity of the
detectives and other law enforcement officers. It can also be applied for counter terrorism for homeland
security.
CHAPTER 1.
INTRODUCTION
1.1. Client Identification/Need Identification/Identification of relevant

Contemporary issue
1.1.1 Client Identification Face Detection in Images: Overview

Data mining - the database industry’s latest buzzword is all about using a simple analogy; it’s about
finding the proverbial needle in the haystack. In this case the needle is that single piece of intelligence
your business needs and the haystack is the large data warehouse you’ve built up over a long period of
time. Data Mining, also known as the Knowledge Discovery in Databases (KDD), is the nontrivial
extraction of implicit, previously unknown, and potentially useful information from data. It is concerned
with the analysis of data and the use of software techniques (data mining tools) for finding patterns and
regularities in sets of data.
1.1.2 Need Identification The Crime Analysis

The current problem lies in the inefficiency and inaccuracy of existing face detection algorithms,
particularly in complex environments. Traditional face detection methods, which often rely on
predefined features and static models, may struggle to accurately identify faces under varying
conditions such as different lighting, angles, and occlusions. The need for more reliable, adaptive,
and cost-effective face detection solutions is pressing, as these improvements would enhance
applications in security, user interaction, and social media.
Justification through Research and Documentation

Much of the current work is focused in two major directions:
 Predicting surges and hotspots of crime, and
 Understanding patterns of criminal behavior that could help in solving criminal investigations.
The Role of Data Mining in Crime Analysis
The increase in crime data recording coupled with data analytics resulted in the growth of research
approaches aimed at extracting knowledge from crime records to better understand criminal behavior
and ultimately prevent future crimes. While many of these approaches make use of clustering and
association rule mining techniques, there are fewer approaches focusing on predictive models of crime.
In this paper, we explore models for predicting the frequency of several types of crimes by LSOA code
(Lower Layer Super Output Areas — an administrative system of areas used by the UK police) and the
frequency of anti-social behavior crimes. Three algorithms are used from different categories of
approaches: instance-based learning, regression and decision trees. The data are from the UK police and
contain over 600,000 records before preprocessing. The results, looking at predictive performance as
well as processing time, indicate that decision trees (M5P algorithm) can be used to reliably predict
crime frequency in general as well as anti-social behavior frequency. The experiments were conducted
using the SCIAMA High Performance Computer Cluster at the University of Portsmouth.
Survey Support for Predictive Modeling

Data Mining is the procedure which includes evaluating and examining large pre-existing databases in
order to generate new information which may be essential to the organization. The extraction of new
information is predicted using the existing datasets. Many approaches for analysis and prediction in data
mining had been performed. But, many few efforts has made in the criminology field. Many few have
taken efforts for comparing the information all these approaches produce. The police stations and other
similar criminal justice agencies hold many large databases of information which can be used to predict
or analyze the criminal movements and criminal activity involvement in the society. The criminals can
also be predicted based on the crime data. The main aim of this work is to perform a survey on the
supervised learning and unsupervised learning techniques that has been applied towards criminal
identification. This paper presents the survey on the Crime analysis and crime prediction using several
Data Mining techniques. The quantitative analysis produced results which shows the increase in the
Accuracy level of classification because of using the GA to optimize the parameters.
1.2. Identification of Problem
The increase in crime data recording coupled with data analytics resulted in the growth of research
approaches aimed at extracting knowledge from crime records to better understand criminal behavior
and ultimately prevent future crimes. While many of these approaches make use of clustering and
association rule mining techniques, there are fewer approaches focusing on predictive models of crime.
In this paper, we explore models for predicting the frequency of several types of crimes by LSOA code
(Lower Layer Super Output Areas — an administrative system of areas used by the UK police) and the
frequency of anti-social behavior crimes. Three algorithms are used from different categories of
approaches: instance-based learning, regression and decision trees. The data are from the UK police
and contain over 600,000 records before preprocessing. The results, looking at predictive performance
as well as processing time, indicate that decision trees (M5P algorithm) can be used to reliably predict
crime frequency in general as well as anti-social behavior frequency. The experiments were conducted
using the SCIAMA High Performance Computer Cluster at the University of Portsmouth.
1.3. Identification of Tasks
The Economic Impact of Face Detection Technology

Data mining in the study and analysis of criminology can be categorized into main areas, crime control
and crime suppression. De Bruin et. Al. introduced a framework for crime trends using a new distance
measure for comparing all individuals based on their profiles and then clustering them accordingly.
Manish Gupta et. Al. highlights the existing systems used by Indian police as e-governance initiatives
and also proposes an interactive query based interface as crime analysis tool to assist police in their
activities. He proposed interface which is used to extract useful information from the vast crime
database maintained by National Crime Record Bureau (NCRB) and find crime hot spots using crime
data mining techniques such as clustering etc. The effectiveness of the proposed interface has been
illustrated on Indian crime records examines the application of cluster analysis in the accounting
domain, particularly discrepancy detection in audit. The purpose of his study is to examine the use of
clustering technology to automate fraud filtering during an audit. He used cluster analysis to help
auditors focus their efforts when evaluating group life insurance claims .
There are three main components in a machine learning algorithm: representation, evaluation,
and optimization.
REPRESENTATION: This describes how we want to express and organize our data.
EVALUATION: Here, we assess the accuracy of the designed model, either through scoring or
manual evaluation.
OPTIMIZATION: We identify the learner with the highest rating from the evaluation function
using various optimization techniques.
Machine learning is divided into two main fields: supervised and unsupervised learning.
Supervised learning accounts for approximately 70% of machine learning applications, while
unsupervised learning constitutes about 10-20%.
The dataset selected for our studies was initially inconsistent. We have since standardized it to
enhance reliability. In this paper, we explored various machine learning algorithms, including
Naïve Bayes, Logistic Regression, Multilayer Perceptron, SMO, IBk, Kstar, Multi Scheme,
Random Tree, and Random Forest. While performance improved, we recognized that this is a
sensitive area that can significantly impact user experience and security. Through feature
selection, we achieved desirable results that raised accuracy to a commendable level.
The process begins with inputting data into the selected algorithms, followed by testing the
latest input data for compatibility. We then check predictions against outcomes; if they do not
align, the algorithm can be refined through multiple iterations. Early and accurate face
detection can dramatically reduce costs associated with security breaches and enhance user
interactions. Unfortunately, current detection methods can be slow and inefficient, highlighting
the critical need for streamlined and effective screening procedures. Thus, a time-efficient and
convenient face detection system is on the horizon, enabling professionals to make informed
decisions regarding security measures and user engagement.
1.4. Timeline
1.5. Organization of the Report
Chapter 1: Introduction
The opening chapter will lay the groundwork for the research by exploring the critical need for
enhanced accuracy in face detection technology. This section will identify key stakeholders, such
as developers, businesses, security professionals, and end-users, emphasizing the real-world
implications of reliable and efficient face detection systems. It will also delve into contemporary
issues surrounding current face detection practices, highlighting the gaps that necessitate this
research.
The problem statement will be clearly articulated, setting the stage for the research objectives and
tasks. Additionally, a detailed timeline will be provided, offering a roadmap for the project’s
progression. The chapter will conclude with an overview of the report's structure, guiding the
reader on what to expect in subsequent sections.

Mini Project

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Mini Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mini Project

Uploaded by

Copyright:

Available Formats

Crime Analysis and Detection using Data Mining

Anuj Sharma (22BCS17012)

in partial fulfillment for the award of the degree of

COMPUTER SCIENCE ENGINEERING

CHAPTER 1. INTRODUCTION .......................................................................... 3

1.1 Client Identification/Need Identification/Identification of relevant Contemporary issue1.2

Figure 3.1 ………………………………………………………………………………….

Figure 3.2 ………………………………………………………………………………….

Figure 4.1 …………………………………………………………………………….……

1.1. Client Identification/Need Identification/Identification of relevant

1.1.1 Client Identification Face Detection in Images: Overview

1.1.2 Need Identification The Crime Analysis

Justification through Research and Documentation

Survey Support for Predictive Modeling

1.2. Identification of Problem

1.3. Identification of Tasks

The Economic Impact of Face Detection Technology

1.5. Organization of the Report

You might also like

Mini Project

Uploaded by

Document Informationclick to expand document informationproject

Document Informationclick to expand document information

Copyright:

Available Formats

Mini Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mini Project

Uploaded by

Copyright:

Available Formats

Crime Analysis and Detection using Data Mining

Anuj Sharma (22BCS17012)

in partial fulfillment for the award of the degree of

COMPUTER SCIENCE ENGINEERING

CHAPTER 1. INTRODUCTION .......................................................................... 3

1.1 Client Identification/Need Identification/Identification of relevant Contemporary issue1.2

Figure 3.1 ………………………………………………………………………………….

Figure 3.2 ………………………………………………………………………………….

Figure 4.1 …………………………………………………………………………….……

1.1. Client Identification/Need Identification/Identification of relevant

1.1.1 Client Identification Face Detection in Images: Overview

1.1.2 Need Identification The Crime Analysis

Justification through Research and Documentation

Survey Support for Predictive Modeling

1.2. Identification of Problem

1.3. Identification of Tasks

The Economic Impact of Face Detection Technology

1.5. Organization of the Report

You might also like