Data Mining Final

Data Mining Analysis
for Caste-Based Crimes

PROJECT
INTRODUCTION
The data mining analysis on caste-based crimes in India encompasses a comprehensive exploration of crime patterns,
clustering, and forecasting over the period 2001-2012. The multi-step analysis involves data preparation, exploratory data
analysis (EDA), feature engineering, data mining techniques, association rule mining, time series analysis, and geospatial
analysis.
KEY FINDINGS
• Data Preparation and Feature Engineering
• Exploratory Data Analysis (EDA)
• Data Mining Techniques
• Association Rule Mining
• Time Series Analysis
• Geospatial Analysis
• CONCLUSION
ALGORITHMIC APPROACH
PROJECT
• Data Preparation:
• Random Forest Model
• Cluster Assignment
• Visualization
• CONCLUSION
DATA PREPROCESSING
• Collected the data sheets.
• Normalized the numerical variables.

FEATURE ENGINEERING
The decade-long overview of caste-based crimes in India from 2001 to 2012 reveals varying patterns in crime rates. Stable rates
between 2001 and 2004, with a slight dip in 2002, were followed by a substantial 24% decrease in 2003. From 2006 onwards, a
general upward trend emerged, peaking in 2008 with a 12.92% increase. Fluctuations persisted in subsequent years. The highest
number of caste-based crime cases occurred in 2011, while the lowest was reported in 2003. Uttar Pradesh reported the highest
overall number of cases, with Andhra Pradesh leading in caste-based crimes at 43,419 incidents. States like Arunachal Pradesh,
Manipur, Meghalaya, Mizoram, Lakshadweep, and A & N Islands reported minimal or no cases. The Excel sheet "Crime Rate
State-wise" provides detailed year-wise data on crime statistics across Indian states and union territories, offering insights for
policymakers and law enforcement to formulate targeted strategies for crime reduction.
EXPLORATORY DATA ANALYSIS
The mean of caste-based crimes in India from 2001 to 2012 is 30852.83, with a median of 30031.0. An analysis of the mean of
various crime types reveals that, on average, there were 19 murder cases, 38 cases of assault on women, 11 cases of kidnapping
and abduction, 1 case of dacoity, 2 cases of robbery, 6 cases of arson, 117 cases of hurt, 296 cases under the Prevention of
Atrocities (POA) Act, 10 cases under the Protection of Civil Rights (PCR) Act, and 380 cases of other crimes against Scheduled
Castes (SCs). The overall mean of all these crimes is 88. State-wise data and histograms indicate variations in crime rates, with
Goa reporting the lowest mean values for several crime categories, and Uttar Pradesh standing out with the highest mean values for
murder and the Prevention of Atrocities Act. Madhya Pradesh records the highest mean for hurt and assault on women. These
insights emphasize the need for targeted law enforcement and intervention strategies to address varying crime trends across regions
in India.
DATA MINING TECHNIQUE
The provided Python code utilizes the Random Forest clustering algorithm to categorize hate
crime cases across different states or provinces. After reshaping the dataset, the code applies Random
forest clustering with three clusters, assigning each state a label indicative of its hate crime profile.
The original data, comprising states and their respective total cases, is printed alongside the assigned
cluster labels, and cluster centres are identified. The code facilitates the classification of states I
Into three groups: High, Medium, and Low Hate Crime cases. For instance, Bihar and Uttar Pradesh
fall into the High category, while Maharashtra, Madhya Pradesh, and others are in the Medium category.
States like Andhra Pradesh and Arunachal Pradesh are classified as Low, with specific case counts
provided.
Although the code implies potential visualization, the actual code for it is not included. It's crucial to
ensure that The data variable is properly defined with relevant state information to avoid errors.
OUTPUT AFTER APPLYING CLUSTERING
ALGORITHM
• States with the Highest Number of Cases Reported: Bihar, Uttar Pradesh
• States with the MEDIUM Number of Cases Reported: Maharashtra, Madhya Pradesh, Uttar Pradesh, Rajasthan, Gujarat,
Karnataka, Odisha, Tamil Nadu
• States with the LOWEST Number of Cases Reported: Andhra Pradesh, Arunachal Pradesh, Assam, Chhattisgarh, Goa,
Haryana, Himachal Pradesh, Jammu & Kashmir, Jharkhand, Kerala, Manipur, Meghalaya, Mizoram, Nagaland, Punjab,
Sikkim, Tripura, Uttarakhand, West Bengal, A & N Islands, Chandigarh, D & N Haveli, Daman & Diu, Delhi,
ASSOCIATION RULE MINING
The Python code employs the mlxtend library to conduct association rule mining on a dataset
featuring crime statistics across various states or union territories. The dataset encompasses
diverse crime types, including arson, assault, robbery, kidnapping, and murder. The code
preprocesses the data by converting numeric values to binary form and subsequently applies
the Apriori algorithm for association rule mining. Frequent item sets and associated rules are
generated based on specified support and confidence thresholds, revealing patterns of co-
occurring crimes. The code categorizes states into High, Medium, and Low crime cases,
presenting the most frequently occurring crimes and their support values for each category. For
instance, in states with High crime cases, assaults, arson, and other crimes against SCs are
prominent, while states with Medium and Low crime cases exhibit different combinations of
crimes, providing insights into crime associations within the dataset.
OUTPUT AFTER APPLYING ASSOCIATION RULE
MINING
• The most frequently occurring association rule mining of states with high crime cases:
(Assault), (Arson), (Other Crimes Against SCs), (Arson), (Prevention of Atrocities (POA) Act)
• The most frequently occurring association rule mining of states with Medium crime cases:
(Assault), (Arson), (Murder), (Other Crimes Against SCs), (Hurt)
• The most frequently occurring association rule mining of states with Low crime cases:
Other Crimes Against SCs, MURDER, HURT, ASSULT, Prevention of Atrocities (POA) Act
TIME SERIES ANALYSIS
For time series forecasting, model selection involves considering the characteristics of the data, and ARIMA
models, encompassing Autoregressive (AR), Integrated (I), and Moving Average (MA) components, are
commonly employed. The order of the ARIMA model can be determined using tools like autocorrelation
function (ACF) and partial autocorrelation function (PACF) plots. Following model selection, the data is split
into training and testing sets, with the former utilized for model training and the latter for performance
evaluation. Metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared
Error (RMSE) are commonly used for model evaluation. The trained model is then applied to forecast future
data points.
Forecasting for the next three years
GEOSPATIAL ANALYSIS
To visually represent the geographical distribution of reported crime cases across different States and
Union Territories (UT) in India, latitude and longitude data can be incorporated to create heat maps or
choropleth maps. These maps provide an intuitive way to identify crime hotspots and regional
variations. For instance, the State of Uttar Pradesh has reported the highest number of cases, totaling
76,473, making it a potential hotspot. Similarly, Rajasthan, with 55,002 cases, and Madhya Pradesh,
with 49,810 cases, stand out as regions of concern. On the other hand, some areas, such as Arunachal
Pradesh, Manipur, Meghalaya, Mizoram, and Lakshadweep, report relatively low or no crime cases.
By leveraging geographical data, these visualizations can offer valuable insights into the spatial
distribution of crime, aiding law enforce
MAP ANALYSIS
CONCLUSION
In conclusion, the extensive data mining analysis on caste-based crimes in India has
illuminated significant insights into the dynamics of criminal activities over the past decade.
Through a multifaceted approach encompassing clustering, association rule mining, time
series analysis, and geospatial examination, the study revealed nuanced patterns in crime
rates, regional variations, and potential hotspots. The findings not only offer a
comprehensive understanding of the prevailing trends but also provide a solid foundation for
informed decision-making by policymakers, law enforcement agencies, and researchers. By
synthesizing statistical analyses, machine learning methodologies, and geographical
perspectives, this study contributes to the development of targeted strategies and
interventions aimed at mitigating the impact of caste-based crimes and fostering a safer and
more equitable society.
REFERENCES
• IEEE Links:
• http://ieeexplore.ieee.org/document/7433797/
• http://ieeexplore.ieee.org/document/7275858/
• A Report by Human Rights Watch for the United Nations World Conference Against
Racism, Racial Discrimination, Xenophobia and Related Intolerance. Durban, South
Africa, September 2001.
• Presented at the Seventieth Session of the Committee on the Elimination of Racial
Discrimination in February 2007

Data Mining Final

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Data Mining Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Final

Uploaded by

Copyright:

Available Formats

Data Mining Analysis

for Caste-Based Crimes

• Normalized the numerical variables.

You might also like