Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-layer data integration technique for combining heterogeneous crime data

Published: 01 May 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Analysis of publicly available human and drug trafficking crime data faces the challenge of finding a comprehensive dataset that includes a sufficiently large number of crime incidents. Our proposed methodology attempts to address this challenge by using entity resolution techniques to merge multiple state-wide crime datasets and a county-wide incident report dataset to get a clearer picture of a category of criminal activity in a geographical area. This methodology combines incident reports, crime reports, and court records to close any gaps that may be present in a single data source. We apply this methodology to create a dataset that includes drug and human trafficking related crimes and incidents from three distinct sources (from Louisville Open Data Crime Reports, Federal Bureau of Investigation Kentucky Crime Incidents, and the Kentucky Online Offender Lookup website) to provide researchers data to study the link between drug and human trafficking related crimes. In a case study performed with the new merged dataset, an XGBoost classifier was able to label a 7-day sliding time window, within any given county, as containing a human trafficking related incident or not with a Matthews correlation coefficient of 0.86.

    Highlights

    Novel multi-layer data integration technique for combining crime datasets.
    Robust system addresses data fragmentation issues prevalent in human trafficking data.
    New dataset of crimes related to human and drug trafficking for further research.
    Case study showing spatio-temporal link between drug and human trafficking crimes.

    References

    [1]
    Addington L.A., NIBRS as the new normal: What fully incident-based crime data mean for researchers, in: Handbook on crime and deviance, Springer, 2019, pp. 21–33.
    [2]
    Alvari H., Shakarian P., Snyder J.E.K., A non-parametric learning approach to identify online human trafficking, in: 2016 IEEE conference on intelligence and security informatics, 2016, pp. 133–138,.
    [3]
    Artetxe M., Labaka G., Agirre E., Unsupervised statistical machine translation, in: Proceedings of the 2018 conference on empirical methods in natural language processing, Association for Computational Linguistics, Brussels, Belgium, 2018.
    [4]
    Asghari M., Sierra-Sosa D., Elmaghraby A.S., A topic modeling framework for spatio-temporal information management, Information Processing & Management 57 (2020).
    [5]
    Baccouche A., Ahmed S., Sierra-Sosa D., Elmaghraby A., Malicious text identification: Deep learning from public comments and emails, Information 11 (2020) 312.
    [6]
    Bahulkar A., Baycik N.O., Sharkey T., Shen Y., Szymanski B., Wallace W., Integrative analytics for detecting and disrupting transnational interdependent criminal smuggling, money, and money-laundering networks, in: 2018 IEEE international symposium on technologies for homeland security, IEEE, 2018, pp. 1–6.
    [7]
    Bales K., Murphy L.T., Silverman B.W., How many trafficked people are there in Greater New Orleans? Lessons in measurement, Journal of Human Trafficking 6 (2020) 375–387.
    [8]
    Boecking B., Miller K., Kennedy E., Dubrawski A., Quantifying the relationship between large public events and escort advertising behavior, Journal of Human Trafficking 5 (2019) 220–237,.
    [9]
    Bouche V., Crotty S.M., Estimating demand for illicit massage businesses in Houston, Texas, Journal of Human Trafficking 4 (2018) 279–297.
    [10]
    Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: Experiences from the scikit-learn project. In ECML PKDD workshop: Languages for data mining and machine learning (pp. 108–122).
    [11]
    Burke M.C., Bruijn B., Introduction to human trafficking: Definitions and prevalence, in: Human trafficking, Routledge, 2017, pp. 3–24.
    [12]
    Catlett C., Cesario E., Talia D., Vinci A., Spatio-temporal crime predictions in smart cities: A data-driven approach and experiments, Pervasive and Mobile Computing 53 (2019) 62–74.
    [13]
    Chen T., Guestrin C., XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, in: KDD ’16, Association for Computing Machinery, New York, NY, USA, 2016, pp. 785–794,.
    [14]
    Chen Z., Jiang F., Cheng Y., Gu X., Liu W., Peng J., XGBoost classifier for DDoS attack detection and analysis in SDN-based cloud, in: 2018 IEEE international conference on big data and smart computing, 2018, pp. 251–256,.
    [15]
    Chicco D., Jurman G., The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics 21 (2020) 1–13.
    [17]
    Diaz M., Panangadan A., Natural language-based integration of online review datasets for identification of sex trafficking businesses, in: 2020 IEEE 21st international conference on information reuse and integration for data science, IEEE, 2020, pp. 259–264.
    [18]
    Dubrawski A., Miller K., Barnes M., Boecking B., Kennedy E., Leveraging publicly available data to discern patterns of human-trafficking activity, Journal of Human Trafficking 1 (2015) 65–85.
    [19]
    Evans L., Owda M., Crockett K., Vilas A.F., A methodology for the resolution of cashtag collisions on Twitter–A natural language processing & data fusion approach, Expert Systems with Applications 127 (2019) 353–369.
    [20]
    Falade A., Azeta A., Oni A., Odun-ayo I., Systematic literature review of crime prediction and data mining, Review of Computer Engineering Studies 6 (2019) 56–63.
    [21]
    Farrell A., Dank M., Kafafian M., Lockwood S., Pfeffer R., Hughes A., Vincent K., Capturing human trafficking victimization through crime reporting, 2019.
    [22]
    Farrell A., Reichert J., Using US law-enforcement data: Promise and limits in measuring human trafficking, Journal of Human Trafficking 3 (2017) 39–60.
    [23]
    Getoor L., Machanavajjhala A., Entity resolution: Theory, practice & open challenges, Proceedings Of The VLDB Endowment 5 (2012) 2018–2019.
    [24]
    Gholizadehy S., Phillipsy M., Hosseinabadi M.T., Leon D., Rozier J., Analysis of human trafficking in North Carolina based on criminal records: A framework to measure human trafficking trends, in: 2020 IEEE international conference on big data, IEEE, 2020, pp. 1309–1315.
    [25]
    Goodey J., Human trafficking: Sketchy data and policy responses, Criminology & Criminal Justice 8 (2008) 421–442.
    [26]
    Hernández M.A., Stolfo S.J., The merge/purge problem for large databases, ACM Sigmod Record 24 (1995) 127–138.
    [27]
    Hossain S., Abtahee A., Kashem I., Hoque M.M., Sarker I.H., Crime prediction using spatio-temporal data, in: International conference on computing science, communication and security, Springer, 2020, pp. 277–289.
    [28]
    Kangaspunta K., Collecting data on human trafficking: Availability, reliability and comparability of trafficking data, in: Measuring human trafficking, Springer, 2007, pp. 27–36.
    [29]
    Kentucky Department of Corrections K., 2021, URL http://kool.corrections.ky.gov/.
    [30]
    Khorshidi S., Mohler G., Carter J.G., Assessing GAN-based approaches for generative modeling of crime text reports, in: 2020 IEEE international conference on intelligence and security informatics, IEEE, 2020, pp. 1–6.
    [31]
    Konrad R.A., Trapp A.C., Palmbach T.M., Blom J.S., Overcoming human trafficking via operations research and analytics: Opportunities for methods, models, and applications, European Journal of Operational Research 259 (2017) 733–745.
    [32]
    Ku C.-H., Leroy G., A crime reports analysis system to identify related crimes, Journal of the American Society for Information Science and Technology 62 (2011) 1533–1547.
    [33]
    Laura L., Me G., Searching the web for illegal content: The anatomy of a semantic search engine, Soft Computing 21 (2017) 1245–1252.
    [34]
    Louisville Metro Government L., 2021, URL https://data.louisvilleky.gov/.
    [35]
    Marciani G., Porretta M., Nardelli M., Italiano G.F., A data streaming approach to link mining in criminal networks, in: 2017 5th International conference on future Internet of Things and cloud workshops, IEEE, 2017, pp. 138–143.
    [36]
    Maxfield M.G., The national incident-based reporting system: Research and policy applications, Journal of Quantitative Criminology 15 (1999) 119–149.
    [37]
    Mikolov T., Sutskever I., Chen K., Corrado G., Dean J., Distributed representations of words and phrases and their compositionality, 2013, arXiv preprint arXiv:1310.4546.
    [38]
    Nagpal, C., Miller, K., Boecking, B., & Dubrawski, A. (2017). An entity resolution approach to isolate instances of human trafficking online. In Proceedings of emnlp ’17 3rd workshop on noisy user-generated text (pp. 77–84).
    [39]
    New York City T., NYC open data, 2021, URL https://opendata.cityofnewyork.us/.
    [40]
    Nguyen M.T., Boundy E., Big data and smart (equitable) cities, in: Seeing cities through big data, Springer, 2017, pp. 517–542.
    [41]
    Nobre J., Neves R.F., Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets, Expert Systems with Applications 125 (2019) 181–194.
    [42]
    Polaris Project J., Myths, facts, and statistics, 2021, URL https://polarisproject.org/myths-facts-and-statistics/.
    [43]
    Portnoff, R. S., Huang, D. Y., Doerfler, P., Afroz, S., & McCoy, D. (2017). Backpage and bitcoin: Uncovering human traffickers. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1595–1604).
    [44]
    Rao G.A., Srinivas G., VenkataRao K., Prasad Reddy P., A partial ratio and ratio based fuzzy-wuzzy procedure for characteristic mining of mathematical formulas from documents, ICTACT Journal on Soft Computing 8 (2018) 1728–1732.
    [45]
    Robinson D., Scogings C., The detection of criminal groups in real-world fused data: Using the graph-mining algorithm “GraphExtract”, Security Informatics 7 (2018) 1–16.
    [46]
    Roe-Sepowitz D., A six-year analysis of sex traffickers of minors: Exploring characteristics and sex trafficking patterns, Journal of Human Behavior in the Social Environment 29 (2019) 608–629.
    [47]
    Sarker A., DeRoos A., Perrone J., Mining social media for prescription medication abuse monitoring: A review and proposal for a data-centric framework, Journal of The American Medical Informatics Association 27 (2019) 315–329,. arXiv:https://academic.oup.com/jamia/article-pdf/27/2/315/34152138/ocz162.pdf.
    [48]
    Shelley L., The relationship of drug and human trafficking: A global perspective, European Journal on Criminal Policy and Research 18 (2012) 241–253.
    [49]
    Strom K.J., Smith E.L., The future of crime data: The case for the national incident-based reporting system (NIBRS) as a primary data source for policy evaluation and crime analysis, Criminology & Public Policy 16 (2017) 1027–1048.
    [50]
    The Daily Star K.J., Sex workers forced to take harmful drug, 2011, URL https://www.thedailystar.net/news-detail-195013.
    [51]
    Tong E., Zadeh A., Jones C., Morency L.-P., Combating human trafficking with multimodal deep models, in: Proceedings of the 55th annual meeting of the association for computational linguistics, in: Long Papers, Vol. 1, Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1547–1556,. URL https://www.aclweb.org/anthology/P17-1142.
    [52]
    Tripp T.M., McMahon-Howard J., Perception vs. reality: The relationship between organized crime and human trafficking in metropolitan Atlanta, American Journal of Criminal Justice 41 (2016) 732–764.
    [53]
    United States Department of Justice, Federal Bureau of Investigation T.M., National incident-based reporting system, 2021, URL https://www.fbi.gov/services/cjis/ucr/nibrs.
    [54]
    U.S. Department of Health and Human Services T.M., Fact sheet: Human trafficking, 2018, URL https://www.acf.hhs.gov/otip/fact-sheet/resource/fshumantrafficking.
    [55]
    US v. Pipkins T.M., US v. Pipkins: vol. 378, Court of Appeals, 11th Circuit, 2004, p. 1281.
    [56]
    Wu Y., Zhao S., Li W., Phrase2Vec: Phrase embedding based on parsing, Information Sciences 517 (2020) 100–127.
    [57]
    Yang D., Heaney T., Tonon A., Wang L., Cudré-Mauroux P., CrimeTelescope: Crime hotspot prediction based on urban and social media data fusion, World Wide Web 21 (2018) 1323–1347.
    [58]
    Zhang S., Hu Y., Bian G., Research on string similarity algorithm based on Levenshtein distance, in: 2017 IEEE 2nd advanced information technology, electronic and automation control conference, IEEE, 2017, pp. 2247–2251.
    [59]
    Zhao, X., & Tang, J. (2017). Modeling temporal-spatial correlations for crime prediction. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 497–506).
    [60]
    Zhou B., Chen L., Zhao S., Zhou F., Li S., Pan G., Spatio-temporal analysis of urban crime leveraging multisource crowdsensed data, Personal and Ubiquitous Computing (2021) 1–14.
    [61]
    Zhou B., Chen L., Zhou F., Li S., Zhao S., Das S.K., Escort: Fine-grained urban crime risk inference leveraging heterogeneous open data, IEEE Systems Journal (2020).
    [62]
    Zhu J., Li L., Jones C., Identification and detection of human trafficking using language models, in: 2019 European intelligence and security informatics conference, IEEE, 2019, pp. 24–31.

    Index Terms

    1. Multi-layer data integration technique for combining heterogeneous crime data
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Information Processing and Management: an International Journal
          Information Processing and Management: an International Journal  Volume 59, Issue 3
          May 2022
          760 pages

          Publisher

          Pergamon Press, Inc.

          United States

          Publication History

          Published: 01 May 2022

          Author Tags

          1. Entity resolution
          2. Data integration
          3. Trafficking crimes
          4. XGBoost
          5. Binary classification
          6. Natural language processing

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 12 Aug 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media