Mca Format Crime Prediction
Mca Format Crime Prediction
Mca Format Crime Prediction
INTRODUCTION
Machine Learning is the most popular technique of predicting the future or classifying
information to help people in making necessary decisions. Machine Learning
algorithms are trained over instances or examples through which they learn from past
experiences and also analyze the historical data. Therefore, as it trains over the
examples, again and again, it is able to identify patterns in order to make predictions
about the future. Data is the core backbone of machine learning algorithms. With the
help of the historical data, we are able to create more data by training these machine
learning algorithms. For example, Generative Adversarial Networks are an advanced 3
concept of Machine Learning that learns from the historical images through which they
are capable of generating more images. This is also applied towards speech and text
synthesis. Therefore, Machine Learning has opened up a vast potential for data science
applications.
1.1.1MACHINE LEARNING
1
essential part of modern industry. Data is expanding exponentially and in order to
harness the power of this data, added by the massive increase in computation power,
Machine Learning has added another dimension to the way we perceive information.
Machine Learning is being utilized everywhere. The electronic devices you use, the
applications that are part of your everyday life are powered by powerful machine
learning algorithms. With an exponential increase in data, there is a need for having a
system that can handle this massive load of data. Machine Learning models like Deep
Learning allow the vast majority of data to be handled with an accurate generation of
predictions. Machine Learning has revolutionized the way we perceive information and
the various insights we can gain out of it. These machine learning algorithms use the
patterns contained in the training data to perform classification and future predictions.
Whenever any new input is introduced to the ML model, it applies its learned patterns
over the new data to make future predictions. Based on the final accuracy, one can
optimize their models using various standardized approaches. In this way, Machine
Learning model learns to adapt to new examples and produce better results.
Supervised Learning
Unsupervised Learning
Reinforcement Learning
2
be expressed, or what the climate will be like in fifty years, are examples of such
complex problems. Under supervised ML, two major subcategories are:
Regression machine learning systems: Systems where the value being predicted falls
somewhere on a continuous spectrum.
In practice, x almost always represents multiple data points. So, for example, a housing
price predictor might take not only square-footage (x1) but also number of bedrooms
(x2), number of bathrooms (x3), number of floors (x4), year built (x5), zip code (x6),
and so forth. Determining which inputs to use is an important part of ML design.
However, for the sake of explanation, it is easiest to assume a single input value is used.
Split the training dataset into training dataset, test dataset, and validation
dataset.
Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
3
1.1.4. REGRESSION
Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc.
Linear Regression
Regression Trees
Non-Linear Regression
Polynomial Regression
1.1.5. CLASSIFICATION
Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc. Spam Filtering,
Random Forest
Decision Tree
Logistic Regression
4
1.2.2 INTEGRATING ALGORITHMS FOR COMPREHENSIVE ANALYSIS
Data Preparation:
Collect spatial data on crime incidents, including coordinates (latitude
and longitude) and possibly timestamps.
Preprocess data to handle missing values and normalize the spatial
coordinates.
5
Parameter Selection:
Determine appropriate values for Eps and Min Pts. This often requires
domain knowledge and experimentation. Methods like the k-distance
graph can help in selecting Eps.
Cluster Detection:
Apply DBSCAN to the preprocessed crime data to identify clusters of
crime incidents. Clusters represent crime hotspots, and noise points
represent isolated incidents.
Analysis:
Analyze the identified clusters to understand the spatial distribution of
crime hotspots.
Visualize clusters using tools like QGIS or Geo Pandas.
A city police department uses DBSCAN to analyze historical crime data and
identify areas with high crime density. The resulting clusters help in deploying
patrol units more effectively and targeting hotspot areas for community policing
efforts.
6
1.3.7 KEY CONCEPTS
Data Preparation
Collect and preprocess spatial (location) and temporal (time) crime data.
Normalize features and handle missing values.
If using CNNs, spatial data might be represented as images or grids. If
using RNNs, data sequences need to be prepared.
Model Selection
CNNs: Effective for spatial data, capturing spatial dependencies by
applying convolutional filters. Useful for grid-based representations of
crime data.
RNNs (e.g., LSTM, GRU): Effective for temporal data, capturing
sequential dependencies. Useful for time series forecasting of crime
incidents.
Model Architecture
Design the neural network architecture based on the problem. For
CNNs, this involves convolutional and pooling layers. For RNNs, this
involves recurrent layers like LSTM or GRU units.
Add fully connected layers at the end for final prediction.
Training and Validation
Train the model using historical crime data, splitting the data into
training and validation sets.
7
Use loss functions (e.g., mean squared error for regression tasks) and
optimization algorithms to update weights.
Regularize the model to prevent overfitting (e.g., dropout layers, early
stopping).
Prediction and Analysis:
Use the trained model to predict future crime incidents and identify
potential hotspots.
Analyze predictions to understand temporal trends and spatial
distributions.
Deep learning has evolved hand-in-hand with the digital era, which has brought
about an explosion of data in all forms and from every region of the world. This data,
known simply as big data, is drawn from sources like social media, internet search
engines, e-commerce platforms, and online cinemas, among others. This enormous
amount of data is readily accessible and can be shared through fintech applications like
8
cloud computing. However, the data, which normally is unstructured, is so vast that it
could take decades for humans to comprehend it and extract relevant information.
Companies realize the incredible potential that can result from unraveling this
wealth of information and are increasingly adapting to AI systems for automated
support. One of the most common AI techniques used for processing big data is machine
learning, a self-adaptive algorithm that gets increasingly better analysis and patterns
with experience or with newly added data.
Use of popular Deep Learning libraries such as Keras, PyTorch, and Tensor
flow applied to industry problems.
Feedforward Neural Networks (FNN): These are the simplest form of neural
networks, where information flows in one direction, from input nodes through hidden
layers to output nodes. They're commonly used for tasks like classification and
regression.
9
Convolutional Neural Networks (CNN): CNNs are designed for processing
structured grid data such as images. They utilize convolutional layers to automatically
and adaptively learn spatial hierarchies of features from input data. CNNs are widely
used in image recognition and computer vision tasks.
Recursive Neural Networks (Rec NN): These networks are designed to handle
hierarchical structures by recursively applying the same set of weights to different parts
of the input. They are commonly used in tasks involving tree-structured data such as
parsing and semantic compositionality.
Deep Belief Networks (DBN): DBNs are probabilistic generative models composed of
multiple layers of stochastic, latent variables. They are typically trained greedily, layer
by layer, using unsupervised learning techniques such as Restricted Boltzmann
Machines (RBMs).
Capsule Networks: Capsule networks are a recent advancement in deep learning that
aim to overcome some of the limitations of CNNs, particularly in handling hierarchical
relationships and spatial hierarchies within images.
10
1.6 INTRODUCTION OF CONVOLUTIONAL NEURAL NETWORKS (CNN):
The architecture of a CNN typically consists of several layers, each serving a specific
purpose in the feature extraction process. The first layer, known as the input layer,
receives the raw pixel values of an image. Subsequent layers, called convolutional
layers, apply convolution operations to extract various features such as edges, textures,
and shapes. These layers are followed by activation functions, such as the Rectified
Linear Unit (ReLU), which introduce non-linearities to the network, allowing it to learn
more complex relationships within the data. Pooling layers are another crucial
component of CNNs, often inserted after convolutional layers. Pooling operations, such
as max pooling or average pooling, reduce the spatial dimensions of the feature maps,
thereby decreasing the computational complexity of the network while retaining
important information. This downsampling process also helps in making the network
more robust to variations in the input data, such as changes in scale or orientation.
Beyond the convolutional and pooling layers, CNN architectures may also include fully
connected layers, which serve as the final stages of the network for classification or
regression tasks. These layers take the high-level features extracted by the preceding
layers and map them to the desired output, whether it be class labels in the case of image
classification or numerical values for regression tasks. One of the key strengths of
CNNs lies in their ability to automatically learn features from raw data, alleviating the
need for manual feature engineering. Through the process of backpropagation and
gradient descent, CNNs adjust their internal parameters, known as weights and biases,
11
to minimize the discrepancy between predicted outputs and ground truth labels. This
iterative learning process allows CNNs to adapt to various datasets and tasks, making
them highly versatile across a wide range of applications.
12
CHAPTER 2
LITERATURE SURVEY
13
2.2 TITLE: CRIME PREDICTION USING MACHINE LEARNING AND DEEP
LEARNING: A SYSTEMATIC REVIEW AND FUTURE DIRECTIONS
14
2.3 TITLE: AN EMPIRICAL ANALYSIS OF MACHINE LEARNING
ALGORITHMS FOR CRIME PREDICTION USING STACKED
GENERALIZATION: AN ENSEMBLE APPROACH
AUTHOR: SAPNA SINGH KSHATRI, DEEPAK SINGH, BHAVANA NARAIN,
SURBHI BHATIA, MOHAMMAD TABREZ QUASIM
DESCRIPTION:
Recently, a lot of research and predictions have been attempted on how to curb crimes
by various criminologists and researchers using different modelling and statistical tools.
As the rate of crime is still on the hike, therefore, there is a potential need of some
important research that can help the policy makers and the concerned department about
challenges and issues in the area of crime prediction and control mechanisms. Skillset
of human fails to keep track of criminal records, if handled manually. So, there is need
for identifying in a novel way, which will help in analysing crime related information.
Analysis on crime prediction is currently based on two significant aspects, prediction
of crime risk field [1], [2] and crime hotspot forecast [3]. Data processing techniques
are applied to facilitate this task. The expanded accessibility of computers and data
innovations have empowered law authorization offices to incorporate broad databases
with detailed information about major felonies, such as murder, rape, arson etc. In
recent years, huge number of crimes is being reported in the world.
DISADVANTAGES:
Complexity and Interpretability: Ensemble methods like stacked
generalization can be complex, making it challenging to interpret the resulting
models and understand the underlying relationships between predictors and
outcomes. This lack of interpretability may limit the practical utility of the
approach.
Data Availability and Quality: The effectiveness of machine learning
algorithms heavily depends on the availability and quality of the input data.
Without sufficient discussion on data preprocessing, feature engineering, and
data quality assessment, the validity and generalizability of the results may be
compromised.
Evaluation Metrics: The paper may lack a thorough discussion on the
evaluation metrics used to assess the performance of the machine learning.
15
2.4 TITLE: REVIEW OF CRIME PREDICTION THROUGH MACHINE
LEARNING
AUTHOR: ABDULRAHMAN ABDULLAH ALSUBAYHIN, BANDER
ALZAHRANI, MUHAMMAD SHER RAMZAN
DESCRIPTION:
In every society, crime is a pervasive concern . Criminality is a
deleterious global phenomenon in-developed and developing countries. It
influences a society’s quality of life and economic prosperity. It is essential to
determine whether people should travel to a city or nation at a particular time
or, if they choose to, which places they should avoid[3]. It is also an important
indicator of a nation's social and economic development. Crime analysis is a crucial
aspect of criminology that focuses on studying patterns of conduct and
detecting criminals. Nearly every sector of society, including law
enforcement, has reaped the benefits of artificial intelligence, particularly
data science and machine learning. Consequently, reducing criminal activity has
always been a government priority.
DISADVANTAGES:
Methodological Limitations: Depending on the scope of the paper, there may
be limitations in the methodology employed for crime prediction. Lack of
detail on the data sources, feature selection, model evaluation techniques, and
validation procedures could undermine the rigor and reproducibility of the
results.
Generalizability: The effectiveness of machine learning models in crime
prediction may vary across different geographical locations, time periods, and
crime types. Without addressing the generalizability of the findings, the
paper's conclusions may be limited in their applicability to diverse contexts.
Ethical Considerations: The use of predictive algorithms in crime prediction
raises ethical concerns related to privacy, fairness, accountability, and potential
biases in the data and models. It's essential for the paper to address these
ethical considerations and discuss strategies for mitigating risks and ensuring
responsible use of technology.
16
2.5 TITLE: STUDY ON CRIME EXAMINATION AND FORECASTING
USING MACHINE LEARNING
AUTHOR: PANKAJ SHINDE, ANCHAL SHUKLA, ROHIT PATIL, GAYATRI
MALI, MAHIMA KAKAD, PRASAD DHORE
DESCRIPTION:
Using existing data on crime scene Crime identification can be done for finding when
and where most of the crimes are occurring one can analyse the past crime that has
occurred the most and we can predict what type of crime is most likely to occur. The
increasing use of computer operated systems to track crimes may improve the process
of detecting and predicting crimes. Crime Examination is an important aspect in the
KNN field as there is a huge crime happening at present that needs to be efficiently
handled. So that the crime rate will decrease or reduce. A solution to this can be
proposed using various techniques such as KNN, SVM, clustering, and many others.
Automated data collection has encouraged the use of KNN for invasion and crime
Examination. Indeed, in many cities, states, and countries, etc. crime is rapidly
increasing, such as murder, robbery, etc.
DISADVANTAGES:
Data Limitations: The effectiveness of machine learning models in crime
examination and forecasting depends heavily on the availability, quality, and
representativeness of the input data. Without addressing potential data
limitations and biases, the study's conclusions may be limited in their validity
and generalizability.
Model Interpretability: Machine learning models, particularly complex ones
like neural networks, may lack interpretability, making it challenging to
understand the factors driving their predictions. Lack of model interpretability
could hinder the study's ability to provide actionable insights for law
enforcement agencies and policymakers.
Ethical Considerations: The use of predictive algorithms in crime analysis
raises ethical concerns related to privacy, fairness, and potential biases in the
data and models. It's essential for the study to address these ethical
considerations and discuss strategies for mitigating risks and ensuring
responsible use of technology.
17
2.6 TITLE: CRIME PREDICTION USING MACHINE LEARNING: A
COMPARATIVE ANALYSIS
AUTHOR: ABDULRAHMAN ALSUBAYHIN, MUHAMMAD RAMZAN AND
BANDER ALZAHRANI
DESCRIPTION:
Generally, crimes are rather common social issues, influencing a country's reputation,
economic growth, and quality of life. They are perhaps a prime factor in influencing
several critical decisions in a person's life, such as avoiding dangerous areas, visiting
at the right time, and moving to a new place (ToppiReddy et al., 2018). Crimes define
and affect the impact and reputation of a community while placing a rather large
financial burden on a country due to the need for courts and additional police forces
(Saraiva et al., 2022). With an increase in crimes, there is an increased need to reduce
them systematically. In recent times, there has been a record increase in crime rates
throughout the world. It is possible to reduce these figures by analysing and predicting
crime occurrences. In such a situation, preventive measures can be taken quickly
(ToppiReddy et al., 2018). Crime forecasting in real-time is capable of helping save
lives and prevent crimes, gradually decreasing the crime rate (Wang et al., 2019).
With a comprehensive crime data analysis and modern techniques, crimes can be
predicted and support can be deployed without delay.
DISADVANTAGES:
Methodological Limitations: Depending on the study design and
methodology, there may be limitations in the selection of machine learning
techniques, choice of evaluation metrics, and data preprocessing procedures.
These methodological limitations could impact the validity and
generalizability of the study's findings.
Data Quality: The effectiveness of machine learning models in crime
prediction hinges on the quality of the input data. Without sufficient discussion
on data quality assessment and preprocessing techniques, the study's
conclusions may be compromised by issues such as missing data, outliers, and
biases.
Interpretation of Results: Comparative analyses of machine learning
techniques can be complex, requiring careful interpretation of results and
consideration of various factors influencing performance. Lack of clarity in
18
interpreting the comparative analysis may hinder the study's impact and
practical utility.
19
2.8 TITLE: PERFORMANCE ANALYSIS FOR CRIME PREDICTION AND
DETECTION USING MACHINE LEARNING ALGORITHMS
AUTHOR: R.GANESAN1, DR.SUBAN RAVICHANDRAN
DESCRIPTION:
In recent times, crime rates are increased day by day in various ways such as robbery,
drugs, murder, etc. Further, crime activities are varying from zone to zone. Hence, it is
an essential to solve the crime activities very fast manner. Nowadays, getting and
analysing the crime data are critical. The crime data can be identified by various factors
such as location of occurrence, crime detected time, and also predicting their future
relationship is an essential in crime preventing system [1]. In this research, time and
place are considered as main aspects in identifying the crime pattern. Machine
Learning (ML) techniques offers to extract the information from the collected
datasets also find the relationship between the crime, place, and time. Many
researchers stated that identifying the crime pattern is very critical and time-consuming
task [2]. It can be resolved by the ML techniques.
DISADVANTAGES:
Incomplete or Inaccurate Data: The effectiveness of machine learning models
is heavily dependent on the quality of the data. Incomplete or inaccurate data
can lead to incorrect predictions.
Data Integration Challenges: Combining data from multiple sources can be
complex and may require significant preprocessing to ensure consistency and
accuracy.
High Computational Requirements: Machine learning models, especially those
involving deep learning, require significant computational resources and time
for training and deployment.
20
2.9 TITLE: CRIME ANALYSIS AND PREDICTION
AUTHOR: K SIREESHA, B.RAMYA, P.SRIJA, A.VAISHNAVI
DESCRIPTION:
The crime activities have been increased at a faster rate and it is the responsibility of
police department to control and reduce the crime activities. Crime prediction and
criminal identification are the major problems to the police department as there are
tremendous amount of crime data that exist. There is a need of technology through
which the case solving could be faster. The rate of crime is rising on a daily basis as
current technologies and high-tech ways assist criminals in carryingout their unlawful
activities.Crimes are neither systematic nor random otherwise crime cannot be analysis.
When crimes like robbery, firebombing etc. have been decreased, crimes like murder,
sex abuse, gang rape etc. have been increased. We cannot analyze the victims of crime
but can analyze the place where crime occurred or happened.Data about crime will be
gathered from a variety of blogs, news outlet, and websites. The massive data is used
to create a crime report database as a record.
DISADVANTAGES:
Incomplete Data: The effectiveness of predictions depends on the quality and
completeness of the data. Incomplete or inaccurate data can lead to unreliable
predictions.
Data Integration: Combining data from multiple sources can be complex and
may require extensive preprocessing.
High Resource Demand: Advanced machine learning models often require
significant computational resources and time for training and deployment.
Complexity: The complexity of these models can make them difficult to
understand and interpret for non-specialists.
21
2.10 TITLE: INTELLIGENT AUTOMATION OF CRIME PREDICTION
USING DATA MINING
ALGORITHM: Intelligent automation, Industrial electronics, Machine learning
algorithms, Urban areas, Boosting, Knowledge discovery, Linear discriminant
analysis
DESCRIPTION:
Crime characterizes the act of felony or grave offense against society or someone else's
property, or any illegal activity which is prohibited by law and happens almost
everywhere and at every possible time. However, crime studies have revealed that
crime does not happen evenly across all places and that specific types of crime tend to
occur more often in certain areas that are called crime hotspots for those types of
crimes. So, the spatial analysis of different types of crimes and their areas of occurrence
are of immense help to predict the types of crime that will occur in such areas in the
future, and to some extent predict the timing and the day of crime. That means, higher
percentage of crime occurs in hotspots and predicting them beforehand can be effective
for law enforcement, by helping law enforcement agencies to assign more resources to
the areas with higher probability of crime occurrence, and that way, the residents can
feel safer in their cities.
DISADVANTAGES:
Bias in Data: Data used for training predictive models may contain inherent
biases, leading to skewed predictions or reinforcing existing disparities in
policing practices.
Privacy Concerns: Analyzing large volumes of personal data to predict crime
may raise privacy concerns among the public, especially if there are doubts
about how the data is collected, stored, and used.
22
CHAPTER 3
SYSTEM ANALYSIS
Data mining in the study and analysis of criminology can be categorized into main
areas, crime control and crime suppression. De Bruin et. Al. introduced a framework
for crime trends using a new distance measure for comparing all individuals based on
their profiles and then clustering them accordingly. Manish Gupta et. Al. highlights the
existing systems used by Indian police as e-governance initiatives and also proposes an
interactive query based interface as crime analysis tool to assist police in their activities.
He proposed interface which is used to extract useful information from the vast crime
database maintained by National Crime Record Bureau (NCRB) and find crime hot
spots using crime data mining techniques such as clustering etc. The effectiveness of
the proposed interface has been illustrated on Indian crime records. Sutapat Thiprungsri
examines the application of cluster analysis in the accounting domain, particularly
discrepancy detection in audit. The purpose of his study is to examine the use of
clustering technology to automate fraud filtering during an audit. He used cluster
analysis to help auditors focus their efforts when evaluating group life insurance claims.
23
thus reduces the crime rate. This can be used in other states or countries depending upon
the availability of the dataset.
3.2.1. ADVANTAGES
24
CHAPTER 4
SYSTEM SPECIFICATION
4.1HARDWARE REQUIREMENTS
Processor : Intel core processor 2.6.0 GHZ
RAM : 4 GB
Hard disk : 320 GB
Compact Disk : 650 Mb
Keyboard : Standard keyboard
Monitor : 15 inch color monitor
25
CHAPTER 5
SYSTEM IMPLEMENTATION
LIST OF MODULES
Crime dataset from kaggle having 8000 entries of crime data is used in CSV format.
8000 entries are present in the dataset. The null values are removed using df =
df.dropna() where df is the data frame. The categorical attributes (Location, Block,
Crime Type, and Community Area) are converted into numeric using Label Encoder.
The date 25 attribute is splitted into new attributes like month and hour which can be
used as feature for the model.
Features selection is done which can be used to build the model. The attributes used
for feature selection are Block, Location, District, Community area, X co-ordinate , Y
coordinate, Latitude , Longitude, Hour and month.
After feature selection location and month attribute are used for training. The dataset
is divided into pair of xtrain, ytrain and xtest, y test. The algorithms model is imported
form skleran. Building model is done using model. Fit (xtrain, ytrain).
26
Prediction Module
After the model is build using the above process, prediction is done using model.
Predict (xtest). The accuracy is calculated using accuracy score imported from metrics
- metrics. accuracy score (ytest, predicted).
Visualization Module
Using matpoltlib library from sklearn. Analysis of the crime dataset is done by
plotting various graphs.
27
CHAPTER 6
SYSTEM DESIGN
A data flow diagram shows the way information flows through a process or system. It
includes data inputs and outputs, data stores, and the various sub processes the data
moves through. DFDs are built using standardized symbols and notation to describe
various entities and their relationships. Data flow diagrams visually represent systems
and processes that would be hard to describe in a chunk of text. You can use these
diagrams to map out an existing system and make it better or to plan out a new system
for implementation. Visualizing each element makes it easy to identify inefficiencies
and produce the best possible system.
It is also known as a context diagram. It’s designed to be an abstraction view, showing
the system as a single process with its relationship to external entities. It represents the
entire system as a single bubble with input and output data indicated by
incoming/outgoing arrows.
28
In 1-level DFD, the context diagram is decomposed into multiple bubbles/processes. In
this level, we highlight the main functions of the system and breakdown the high-level
process of 0-level DFD into sub processes.
2-level DFD goes one step deeper into parts of 1-level DFD. It can be used to plan or
record the specific/necessary detail about the system’s functioning.
6.2.1 LEVEL 0
Stored on
Database
29
6.2.2 LEVEL 1
Crime Prediction
Using DBSCAN
Algorithm
Performance Evaluation
30
6.3 UML DIAGRAMS
31
6.3.2 CLASS DIAGRAM
32
6.3.3 SEQUENCE DIAGRAM
33
6.3.4 COLLABORATION DIAGRAM
34
6.3.5 ACTIVITY DIAGRAM
35
CHAPTER 7
SOFTWARE DESCRIPTION
7.1 PYTHON
Python interpreters are available for many operating systems. CPython, the
reference implementation of Python, is open source software and has a community-
based development model, as do nearly all of Python's other implementations. Python
and CPython are managed by the non-profit Python Software Foundation. Rather than
having all of its functionality built into its core, Python was designed to be highly
extensible. This compact modularity has made it particularly popular as a means of
adding programmable interfaces to existing applications.
Van Rossum's vision of a small core language with a large standard library and
easily extensible interpreter stemmed from his frustrations with ABC, which espoused
the opposite approach. While offering choice in coding methodology, the Python
philosophy rejects exuberant syntax (such as that of Perl) in favor of a simpler, less-
cluttered grammar. As Alex Martelli put it: "To describe something as 'clever' is not
considered a compliment in the Python culture."Python's philosophy rejects the Perl
"there is more than one way to do it" approach to language design in favour of "there
should be one—and preferably only one—obvious way to do it".
36
time compiler. CPython is also available, which translates a Python script into C and
makes direct C-level API calls into the Python interpreter. An important goal of
Python's developers is keeping it fun to use. This is reflected in the language's name a
tribute to the British comedy group Monty Python and in occasionally playful
approaches to tutorials and reference materials, such as examples that refer to spam and
eggs (from a famous Monty Python sketch) instead of the standard for and bar.
37
statements to the source: the fast edit-test-debug cycle makes this simple approach very
effective.
Python’s initial development was spearheaded by Guido van Rossum in the late
1980s. Today, it is developed by the Python Software Foundation. Because Python is a
multipara digmlanguage, Python programmers can accomplish their tasks using
different styles of programming: object oriented, imperative, functional or reflective.
Python can be used in Web development, numeric programming, game development,
serial port access and more.
There are two attributes that make development time in Python faster than in other
programming languages:
2. Python code tends to be shorter than comparable codes. Although Python offers
fast development times, it lags slightly in terms of execution time. Compared to
fully compiling languages like C and C++, Python programs execute slower. Of
course, with the processing speeds of computers these days, the speed
differences are usually only observed in benchmarking tests, not in real-world
operations. In most cases, Python is already included in Linux distributions and
Mac OS X machines.
38
• Fast prototyping
• Developing production-ready software
Professionally, Python is great for backend web development, data analysis, artificial
intelligence, and scientific computing. Developers also use Python to build productivity
tools, games, and desktop apps.
Python Syntax
Python Flexibility
Python, a dynamically typed language, is especially flexible, eliminating hard rules for
building features and offering more problem-solving flexibility with a variety of
methods. It also allows uses to compile and run programs right up to a problematic area
because it uses run-time type checking rather than compile-time checking.
39
On the down side, Python isn’t easy to maintain. One command can have multiple
meanings depending on context because Python is a dynamically typed language. And,
maintaining a Python app as it grows in size and complexity can be increasingly
difficult, especially finding and fixing errors. Users will need experience to design code
or write unit tests that make maintenance easier. Speed is another weakness in Python.
Its flexibility, because it is dynamically typed, requires a significant amount of
referencing to land on a correct definition, slowing performance. This can be mitigated
by using alternative implementation of Python.
Python and AI
AI researchers are fans of Python. Google TensorFlow, as well as other libraries (scikit-
learn, Keras), establish a foundation for AI development because of the usability and
flexibility it offers Python users. These libraries, and their availability, are critical
because they enable developers to focus on growth and building.
Good to Know
The Python Package Index (PyPI) is a repository of software for the Python
programming language. PyPI helps users find and install software developed and shared
by the Python community.
Is Python easy to learn? Yes and no. When compared to other programming languages,
such as Java or C, Python is easy to learn. One aspect of the language that makes Python
easy to learn is that its syntax mimics human-readable language. The fact that its syntax
is designed to be clear, concise and easy to read eliminates many errors and softens the
learning curve.
Python also has a large standard library with prewritten code, which reduces the need
to write every line of code. Python’s supportive community and abundance of learning
resources also help make the language more friendly to newcomers.
But while many coders consider Python easy to learn, becoming proficient in any
programming language is challenging. Software development takes time, patience and
problem-solving skills.
40
Python Is a Scripting Language
Usage of Python
Python is a popular language choice for web and software development. Frameworks
like Django and Flask make it easier to create robust and scalable web applications. But
in many cases, other tools can work as a replacement for Python.
Where Python really stands out is in the big data ecosystem. Python is often used for
data science and analytics, scientific computing, machine learning (ML) and artificial
intelligence (AI). Because of this, Python’s ecosystem is rich with libraries such as
Pandas, NumPy and Matplotlib that enable data manipulation and analysis. Tools such
as TensorFlow, PyTorch, scikit-learn and Keras dominate the ML/AI space.
The Python language is dynamically typed, meaning the variable types don’t have to
be explicitly specified. However, variables do have types and their types matter. The
Python interpreter checks variable types at runtime: this makes the language also
strongly typed. The Python interpreter is a program that reads and executes Python
code. The interpreter interprets the developer-written source code into computer
hardware readable form. There are several implementations of the Python interpreter,
the standard-bearer and most popular being CPython.
41
Macs now include the Python language with every operating system. To confirm
whether a Mac has Python included, open the Terminal and type “python –version”.
Either a version number or “file not found” will appear. To install Python on a Mac,
visit the Python website at python.org.
Python programming fundamentals include several different data types and data
structures. Python’s primitive data types are basic data types that represent single values
with no methods or attributes.
Python’s data structures organize complex information and store varied types of data.
Lists are ordered collections of data and can include data of any type. The list’s “order”
refers to the list’s indices, not the arrangement of elements inside the list.
Tuples are immutable lists. Tuples can’t be changed after they’re created.
Sets are collections of unique elements. Sets automatically remove repeated terms if
they were previously included in the set. Try copying and pasting the following code
into an IDE to see how duplicate items are removed.
7.2 MY SQL
MySQL is the world's most used open source relational database management
system (RDBMS) as of 2008 that run as a server providing multi-user access to a
number of databases. The MySQL development project has made its source code
available under the terms of the GNU General public Licence , as well as under a variety
of proprietary agreements. MySQL was owned and sponsored by a single for profit
firm, the Swedish company MySQL AB, now owned by Oracle corporation.
42
stack—LAMP is an acronym for "Linux, Apache, MySQL, perl/PHP/python."Free
software-open source projects that require a full-featured database management system
often use MySQL.For commercial use, several paid editions are available, and offer
additional functionality. Applications which use MySQL databases include: TYPO3,
Joomla, word press, phpBB, MyBB, Drupal and other software built on the software
stack. MySQL is also used in many high-profile, large-scale World wide web products,
including Wikipedia, Google(though not for searches), imagebook Twitter, flickr,
Nokia.com, and YouTube.
Inter images
Graphical
phpMyAdmin is a web-based software that allows you the ability to interact with your
MySQL databases. phpMyAdmin is an easy way to run MySQL commands as well as
43
database operations, like browsing and changing data tables, or importing, exporting,
or deleting data. It especially is useful when you want to perform maintenance on data
and backup or edit information in the case of WordPress itself not functioning properly.
However, keep in mind that phpMyAdmin won’t work properly if the database is
misconfigured or broken.
MySQL Features
Easy to use: MySQL is easy to use. We have to get only the basic knowledge of SQL.
We can build and interact with MySQL by using only a few simple SQL statements.
It is secure: MySQL consists of a solid data security layer that protects sensitive data
from intruders. Also, passwords are encrypted in MySQL.
Free to download: MySQL is free to use so that we can download it from MySQL
official website without any cost.
44
default file size limit is about 4 GB. However, we can increase this number to a
theoretical limit of 8 TB of data.
Speed: MySQL is considered one of the very fast database languages, backed by a large
number of the benchmark test.
Allows roll-back: MySQL allows transactions to be rolled back, commit, and crash
recovery.
Memory efficiency: Its efficiency is high because it has a very low memory leakage
problem.
High Performance: MySQL is faster, more reliable, and cheaper because of its unique
storage engine architecture. It provides very high-performance results in comparison to
other databases without losing an essential functionality of the software. It has fast
loading utilities because of the different cache memory.
High Productivity: MySQL uses Triggers, Stored procedures, and views that allow
the developer to give higher productivity.
Platform Independent: It can download, install, and execute on most of the available
operating systems.
Partitioning: This feature improves the performance and provides fast management of
the large database.
Disadvantages/Drawback of MySQL
45
MySQL version less than 5.0 doesn't support ROLE, COMMIT, and stored
procedure.
MySQL does not support a very large database size as efficiently.
MySQL doesn't handle transactions very efficiently, and it is prone to data
corruption.
MySQL is accused that it doesn't have a good developing and debugging tool
compared to paid databases.
MySQL doesn't support SQL check constraints.
Architecture of MySQL
MySQL is developed and supported by MySQL AB, which is a Swedish Company and
written in C and C++ programming language. It was developed by Michael Widenius
and David Hughes . It is often used to say that MySQL is named after the name of the
daughter of the co-founder Michael Widenius whose name is ‘My’.
Architecture of MySQL:
Client Layer:
This layer is the topmost layer in the above diagram. The Client give request
instructions to the Serve with the help of Client Layer .The Client make request through
Command Prompt or through GUI screen by using valid MySQL commands and
46
expressions .If the Expressions and commands are valid then the output is obtained on
the screen. Some important services of client layer are :
Connection Handling :
When a client send request to the server and server will accept the request and the client
is connected .. When Client is connected to the server at that time , a client get it’s own
thread for it’s connection. With the help of this thread all the queries from client side is
executed.
Authentication:
Security:
After authentication when the client gets connected successfully to MySQL server, the
server will check that a particular client has the privileges to issue in certain queries
against MySQL server.
Server Layer:
The second layer of MySQL architecture is responsible for all logical functionalities of
relational database management system of MySQL. This Layer of MySQL System is
also known as “Brain of MySQL Architecture”. When the Client give request
instructions to the Server and the server gives the output as soon as the instruction is
matched. The various subcomponents of MySQL server are:
Thread Handling:
When a client send request to the server and server will accept the request and the client
is connected .. When Client is connected to the server at that time , a client get it’s own
thread for it’s connection. This thread is provided by thread handling of Server Layer.
Also the queries of client side which is executed by the thread is also handled by
Thread Handling module.
47
Parser:
Optimizer –
As soon as the parsing is done , various types of optimization techniques are applied
at Optimizer Block. These techniques may include rewriting the query, order of
scanning of tables and choosing the right indexes to use etc.
Query Cache:
Query Cache stores the complete result set for inputted query statement. Even before
Parsing , MySQL Server consult query cache . When client write a query if the query
written by client is identical in the cache then the server simply skip the parsing,
optimization and even execution, it just simply display the output from the cache.
Cache and will buffer store the previous query or problem asked by user. When User
write a query then it firstly goes to Query Cache then query cache will check that the
same query or problem is available in the cache. If the same query is available then it
will provide output without interfering Parser, Optimizer.
The metadata cache is a reserved area of memory used for tracking information on
databases, indexes, or objects. The greater the number of open databases, indexes, or
objects, the larger the metadata cache size.
Key Cache:
A key cache is an index entry that uniquely identifies an object in a cache. By default,
edge servers cache content based on the entire resource path and a query string.
48
CHAPTER 8
SYSTEM TESTING
Testing Objectives:
There are several rules that can serve as testing objectives, they are
2. A good test case is one that has high probability of finding an undiscovered
error.
The development process involves various types of testing. Each test type
addresses a specific testing requirement. The most common types of testing involved
in the development process are:
Unit Test
Functional Test
Integration Test
White box Test
Black box Test
49
System Test
Validation Test
Acceptance Test
The first test in the development process is the unit test. The source code is
normally divided into modules, which in turn are divided into smaller units called units.
These units have specific behavior. The test done on these units of code is called unit
test. Unit test depends upon the language on which the project is developed. Unit tests
ensure that each unique path of the project performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
Functional test can be defined as testing two or more modules together with
the intent of finding defects, demonstrating that defects are not present, verifying that
the module performs its intended functions as stated in the specification and
establishing confidence that a program does what it is supposed to do.
In integration testing modules are combined and tested as a group. Modules are
typically code modules, individual applications, source and destination applications on
a network, etc. Integration Testing follows unit testing and precedes system testing.
Testing after the product is code complete. Betas are often widely distributed or even
distributed to the public at large in hopes that they will buy the final product when it is
released.
50
8.2.5 Black Box Testing:
In block box testing without knowledge of the internal workings of the item
being tested. Tests are usually functional. This testing can be done by the user who has
no knowledge of how the shortest path is found.
System testing is defined as testing of a complete and fully integrated software product.
This testing falls in black-box testing wherein knowledge of the inner design of the
code is not a pre-requisite and is done by the testing team. It is the final test to verify
that the product to be delivered meets the specifications mentioned in the requirement
document. It should investigate both functional and non-functional requirements.
The process of evaluating software during the development process or at the end of
the development process to determine whether it satisfies specified business
requirements. Validation Testing ensures that the product actually meets the client's
needs. It can also be defined as to demonstrate that the product fulfils its intended use
when deployed on appropriate environment.
51
CHAPTER 9
9.1 CONCLUSION
In this paper focused on building predictive models for crime frequencies per crime
type per month. The crime rates in India are increasing day by day due to many factors
such as increase in poverty, implementation, corruption, etc. The proposed model is
very useful for both the investigating agencies and the police official in taking necessary
steps to reduce crime. The project helps the crime analysis to analysis these crime
networks by means of various interactive visualization. Future enhancement of this
research work on training bots to predict the crime prone areas by using machine
learning techniques. Since, machine learning is similar to data mining advanced concept
of machine learning can be used for better prediction. The data privacy, reliability,
accuracy can be improved for enhanced prediction.
52
9.2 FUTURE ENHANCEMENT
Future Enhancement Crime analysis takes past crime data to predict future crime
locations and time. Crime prediction for future crime is process that finds out crime rate
change from one year to the next and projects those changes into the future. Crime
predictions can be made through both qualitative and quantitative methods.
53
APPENDIX 1
(SAMPLE CODE)
# import numpy as np
import pandas as pd
from flask import Flask, request, jsonify, render_template, redirect, flash, send_file
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
import pickle
app = Flask(_name_) #Initialize the flask App
model = pickle.load( open('model.pkl', 'rb') )
@app.route('/')
@app.route('/first')
def first():
return render_template('first.html')
@app.route('/login')
def login():
return render_template('login.html')
@app.route('/upload')
def upload():
return render_template('upload.html')
@app.route('/preview',methods=["POST"])
def preview():
44
if request.method == 'POST':
dataset = request.files['datasetfile']
df = pd.read_csv(dataset,encoding = 'unicode_escape')
df.set_index('Id', inplace=True)
return render_template("preview.html",df_view = df)
#@app.route('/home')
#def home():
# return render_template('home.html')
@app.route('/prediction', methods = ['GET', 'POST'])
def prediction():
54
return render_template('prediction.html')
#@app.route('/upload')
#def upload_file():
# return render_template('BatchPredict.html')
@app.route('/predict',methods=['POST'])
def predict():
int_feature = [x for x in request.form.values()]
print(int_feature)
45
int_feature = [float(i) for i in int_feature]
final_features = [np.array(int_feature)]
prediction = model.predict(final_features)
output = format(prediction[0])
print(output)
return render_template('prediction.html', prediction_text= output)
@app.route('/chart')
def chart():
return render_template('chart.html')
@app.route('/performance')
def performance():
return render_template('performance.html')
if _name_ == "_main_":
app.run(debug=True)
55
APPENDIX 2
(SCREEN SHOT)
56
57
58
59
60
61
REFERENCES
1. Heinold, Brian. "A practical introduction to Python programming." (2021).
2. Kneusel, Ronald T. Practical deep learning: A Python-based introduction. No
Starch Press, 2021.
3. Dhruv, Akshit J., Reema Patel, and Nishant Doshi. "Python: the most advanced
programming language for computer science applications." Science and
Technology Publications, Lda (2021): 292-299.
4. Sundnes, Joakim. Introduction to scientific programming with Python. Springer
Nature, 2020.
5. Hill, Christian. Learning scientific programming with Python. Cambridge
University Press, 2020.
62