Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3331076.3331114acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Exploratory data analysis and crime prediction for smart cities

Published: 10 June 2019 Publication History

Abstract

Crime has been prevalent in our society for a very long time and it continues to be so even today. Currently, many cities have released crime-related data as part of an open data initiative. Using this as input, we can apply analytics to be able to predict and hopefully prevent crime in the future. In this work, we applied big data analytics to the San Francisco crime dataset, as collected by the San Francisco Police Department and available through the Open Data initiative. The main focus is to perform an in-depth analysis of the major types of crimes that occurred in the city, observe the trend over the years, and determine how various attributes contribute to specific crimes. Furthermore, we leverage the results of the exploratory data analysis to inform the data preprocessing process, prior to training various machine learning models for crime type prediction. More specifically, the model predicts the type of crime that will occur in each district of the city. We observe that the provided dataset is highly imbalanced, thus metrics used in previous research focus mainly on the majority class, disregarding the performance of the classifiers in minority classes, and propose a methodology to improve this issue. The proposed model finds applications in resource allocation of law enforcement in a Smart City.

References

[1]
Yehya Abouelnaga. San Francisco crime classification. arXiv preprint arXiv:1607.03626, 2016.
[2]
Tahani Almanie, Rsha Mirza, and Elizabeth Lor. Crime prediction based on crime types and using spatial and temporal criminal hotspots. arXiv preprint arXiv:1508.02050, 2015.
[3]
Exegetic Andrew B. Collier. Making Sense of Logarithmic Loss. http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/, 2015.
[4]
Shen Ting Ang, Weichen Wang, and Silvia Chyou. San Francisco crime classification. University of California San Diego, 2015.
[5]
J. Bruin. Ucla: Multinomial logistic regression @ONLINE, February 2011.
[6]
City and County of San Francisco. Police Department Incidents. https://data.sfgov.org/Public-Safety/Police-Department-Incidents/tmnf-yvry/, 2017.
[7]
DataSF. Open government. https://www.data.gov/open-gov/. Accessed 2018-04-12.
[8]
Emre Eftelioglu, Shashi Shekhar, and Xun Tang. Crime hotspot detection: A computational perspective. In Data Mining Trends and Applications in Criminal Science and Investigations, pages 82--111. IGI Global, 2016.
[9]
Debopriya Ghosh, Soon Chun, Basit Shafiq, and Nabil R Adam. Big data-based smart city platform: Real-time crime analysis. In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, pages 58--66. ACM, 2016.
[10]
Jelle J Goeman and Saskia le Cessie. A goodness-of-fit test for multinomial logistic regression. Biometrics, 62(4):980--985, 2006.
[11]
Jacob Hochstetler, Lauren Hochstetler, and Song Fu. An optimal police patrol planning strategy for smart city safety. In 2016 IEEE 18th International Conference on HPCC/SmartCity/DSS, pages 1256--1263. IEEE, 2016.
[12]
Dennis Hsu, Melody Moh, and Teng-Sheng Moh. Mining frequency of drug side effects over a large twitter dataset using apache spark. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 915--924. ACM, 2017.
[13]
Dan Jurafsky and James H Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 2009.
[14]
Brian Kolo. Binary and Multiclass Classification. Lulu. com, 2011.
[15]
Gabriela Hernandez Larios. Case study report: San Francisco crime classification, 2016.
[16]
Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news, 2(3):18--22, 2002.
[17]
Shannon J Linning, Martin A Andresen, and Paul J Brantingham. Crime seasonality: Examining the temporal fluctuations of property crime in cities with varying climates. International journal of offender therapy and comparative criminology, 61(16):1866--1891, 2017.
[18]
Nicholas R Lomb. Least-squares frequency analysis of unequally spaced data. Astrophysics and space science, 39(2):447--462, 1976.
[19]
Paolo Neirotti, Alberto De Marco, Anna Corinna Cagliano, Giulio Mangano, and Francesco Scorrano. Current trends in smart city initiatives: Some stylised facts. Cities, 38:25--36, 2014.
[20]
Trung T Nguyen, Amartya Hatua, and Andrew H Sung. Building a learning machine classifier with inadequate data for crime prediction. Journal of Advances in Information Technology Vol, 8(2), 2017.
[21]
Philip H Swain and Hans Hauska. The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics, 15(3):142--147, 1977.
[22]
Luca Venturini and Elena Baralis. A spectral analysis of crimes in San Francisco. In Proceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, page 4. ACM, 2016.
[23]
Xiaoxu Wu. An informative and predictive analysis of the San Francisco police department crime data, Master Thesis, 2016.

Cited By

View all
  • (2023)A Design of Crime Category Detection Framework using Stacking Ensemble ModelSuç Kategorisi Tespiti için Yığınlama Topluluk Öğrenimi Modeli Kullanan Çatı TasarımıÇukurova Üniversitesi Mühendislik Fakültesi Dergisi10.21605/cukurovaumfd.141064238:4(1035-1048)Online publication date: 28-Dec-2023
  • (2023)Mitigating Imbalanced Data in Online Social Networks using Stratified K-Means Sampling2023 8th International Conference on Business and Industrial Research (ICBIR)10.1109/ICBIR57571.2023.10147677(883-888)Online publication date: 18-May-2023
  • (2023)UniMHe: Unified Multi Hyperedge Prediction A Case Study on Crime Dataset2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386448(5134-5139)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '19: Proceedings of the 23rd International Database Applications & Engineering Symposium
June 2019
364 pages
ISBN:9781450362498
DOI:10.1145/3331076
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crime prediction model
  2. multiclass classification
  3. predictive analytics
  4. smart city

Qualifiers

  • Research-article

Conference

IDEAS 2019

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Design of Crime Category Detection Framework using Stacking Ensemble ModelSuç Kategorisi Tespiti için Yığınlama Topluluk Öğrenimi Modeli Kullanan Çatı TasarımıÇukurova Üniversitesi Mühendislik Fakültesi Dergisi10.21605/cukurovaumfd.141064238:4(1035-1048)Online publication date: 28-Dec-2023
  • (2023)Mitigating Imbalanced Data in Online Social Networks using Stratified K-Means Sampling2023 8th International Conference on Business and Industrial Research (ICBIR)10.1109/ICBIR57571.2023.10147677(883-888)Online publication date: 18-May-2023
  • (2023)UniMHe: Unified Multi Hyperedge Prediction A Case Study on Crime Dataset2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386448(5134-5139)Online publication date: 15-Dec-2023
  • (2022)Prognostication of Crime Using Bagging Regression ModelHandbook of Research on Technological Advances of Library and Information Science in Industry 5.010.4018/978-1-6684-4755-0.ch023(462-478)Online publication date: 14-Oct-2022
  • (2022)Naïve Bayes–AdaBoost Ensemble Model for Classifying Sexual CrimesData Intelligence and Cognitive Informatics10.1007/978-981-16-6460-1_30(393-405)Online publication date: 1-Feb-2022
  • (2022)Artificial Intelligence, Big Data Analytics, and Smart CitiesBuilding on Smart Cities Skills and Competences10.1007/978-3-030-97818-1_19(315-326)Online publication date: 7-Jul-2022
  • (2021)A Thorough Analysis of Machine Learning and Deep Learning Methods for Crime Data AnalysisInventive Computation and Information Technologies10.1007/978-981-33-4305-4_58(795-812)Online publication date: 28-Mar-2021
  • (2021)Open Data in Prediction Using Machine Learning: A Systematic ReviewInnovative Systems for Intelligent Health Informatics10.1007/978-3-030-70713-2_50(536-553)Online publication date: 6-May-2021
  • (2020)Spatial Modeling for Homicide Rates Estimation in Pernambuco State-BrazilISPRS International Journal of Geo-Information10.3390/ijgi91207409:12(740)Online publication date: 11-Dec-2020
  • (2020)Safety App: Crime Prediction Using GIS2020 3rd International Conference on Communication System, Computing and IT Applications (CSCITA)10.1109/CSCITA47329.2020.9137772(120-124)Online publication date: Apr-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media