Toward Safer Roads: Predicting the Severity of Traffic Accidents in Montreal Using Machine Learning
Abstract
:1. Introduction
- Novelty and Contributions of the Research:
2. Related Work
- Analysis of comparative studies:
3. Data Overview
3.1. Data Source
3.2. Data Description
- Feature Selection Process:
- street_name (RUE_ACCDN): Name of the street where the collision occurred.
- collision_near (ACCDN_PRES_DE): Landmark near the collision site.
- collision_type (CD_GENRE_ACCDN): Type of collision.
- surface_condition (CD_ETAT_SURFC): Condition of the road surface.
- road_category (CD_CATEG_ROUTE): Category of the road.
- longitudinal_location (CD_LOCLN_ACCDN): Longitudinal location.
- weather_conditions (CD_COND_METEO): Weather conditions.
- light_cars_trucks_count (nb_automobile_camion_leger): Number of light cars and trucks involved.
- heavy_trucks_count (nb_camionLourd_tractRoutier): Number of heavy trucks involved.
- bicycle_count (nb_bicyclette): Number of bikes involved.
- motorcycle_count (nb_motocyclette): Number of motorcycles involved.
- emergency_vehicle_count (nb_urgence): Number of emergency vehicles involved.
- unspecified_vehicle_count (nb_veh_non_precise): Number of unspecified vehicles involved.
- authorized_speed (VITESSE_AUTOR): Authorized speed on the road.
- x_coordinate (LOC_X): X coordinate (Nad83 MTM8).
- y_coordinate (LOC_Y): Y coordinate (Nad83 MTM8).
- hour (HR_ACCDN): Hour of the collision.
4. Methodology
4.1. Setup and Application Design
- Hardware and Software Configuration:
- Processor: NVIDIA GeForce GTX 1650 ( manufactured by NVIDIA Corporation, Santa Clara, CA, USA).
- RAM: 32 GB.
- Storage: 1 TB SSD.
- Operating System: Windows 11.
- Programming Language: Python 3.11.
- Libraries: Matplotlib 3.9.1, Seaborn 0.13.2, Pandas 2.0.2, NumPy 1.23.5, Scikit-learn 1.2.1, XGBoost 2.1.0, CatBoost 1.2.5, Flask 3.0.3, Angular 14.2.0, Swagger-UI (OpenAPI 3.0.3), NodeJS v18.16.0, Pickle-Mixin 1.0.2, Requests 2.32.3.
- Data Preprocessing Steps:
- Data cleaning: Removal of duplicates and irrelevant columns using Pandas.
- Handling missing values: Implementing imputation strategies for categorical and numerical data.
- Data balancing: Employing the SMOTE-ENN algorithm to address class imbalances.
- Feature selection: Utilizing the chi-square statistical method to identify the most relevant features.
- Machine Learning Models:
- Models used: XGBoost, CatBoost, Random Forest, Gradient Boosting.
- Training and testing split: 80% training, 20% testing.
- Evaluation metrics: accuracy, precision, recall, F1 score.
- Training Environment:
- Software: Jupyter 1.0.0, Notebook 7.0.8, Anaconda 23.3.1
- Web Application Design:
- Backend:
- -
- Framework: Flask
- -
- API management: Swagger-UI for API documentation and testing.
- -
- Model deployment: Integration of the trained XGBoost model for real-time predictions.
- Frontend:
- -
- Framework: Angular
- -
- User interface: Interactive forms for data input and real-time feedback on predictions.
4.2. Data Preprocessing
4.3. Dealing with Missing Values
- Delete columns that are missing more than 50% of the data. Table 4 shows the attributes with more than 50% missing values that were removed from the dataset.
- For columns of a numeric type that represent categorical variables, we replace missing values with the value from the previous row (using the fillna method from the Python Pandas library with method = ffill). This method is chosen to preserve the order of the data wherever possible, assuming that adjacent entries are likely to have similar or identical categorizations, which is common with time series or ordered datasets. Table 5 below shows the attributes where this imputation strategy was applied, indicating the number and percentage of missing values imputed.
- For purely numeric columns, replace missing values with the column mean. This approach is used to maintain the overall distribution and central tendency of the data. This is important to avoid biasing results in predictive modeling. However, we are aware of the potential biases that this method introduces and therefore limit its application to columns where the mean is a representative summary statistic of the underlying distribution.Table 6 shows the attributes where this imputation strategy was applied.
- Solving the Data Imbalance Problem:
4.4. Feature Selection Using the Chi-Square Statistical Method
- is the chi-square statistic.
- n is the number of observation categories.
- is the observed frequency in category i.
- is the expected frequency in category i under the null hypothesis that the observed and expected frequencies are independent.
4.5. Exploratory Data Analysis
- Hourly Accident Severity Distribution: This chart illustrates the distribution of accident severity throughout the day, categorized by each hour.
- Weekly Accident Severity Distribution: This chart shows the distribution of accident severity across the days of the week and provides insight into daily patterns.
- Monthly Accident Severity Distribution: This chart shows how accident severity varies from month to month and highlights possible seasonal trends.
- Yearly Accident Severity Distribution: This chart shows annual accident severity.
Impact of Exploratory Data Analysis on Data Preparation and Model Performance
4.6. Development of the Predictive Model
4.6.1. Gradient Boosting (GB)
4.6.2. Extreme Gradient Boosting (XGBoost)
4.6.3. Categorical Boosting (CatBoost)
4.6.4. RandomForest (RF)
5. Results and Discussion
5.1. Results
5.1.1. Interpretation of Results
5.1.2. Key Factors Influencing Accident Severity
- Key factors:
- Detailed Analysis:
5.1.3. Comparison of the Results with a Previous Study in the Literature
5.1.4. Real-Time Prediction Web Application
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, G.; Yau, K.K.; Chen, G. Risk factors associated with traffic violations and accident severity in China. Accid. Anal. Prev. 2013, 59, 18–25. [Google Scholar] [CrossRef]
- World Health Organization. Global Status Report on Road Safety 2023. Available online: https://www.who.int/teams/social-determinants-of-health/safety-and-mobility/global-status-report-on-road-safety-2023 (accessed on 20 December 2023).
- Transport Canada. Canadian Motor Vehicle Traffic Collision Statistics 2021. Available online: https://tc.canada.ca/en/road-transportation/statistics-data/canadian-motor-vehicle-traffic-collision-statistics-2021 (accessed on 20 December 2023).
- Alkheder, S.; Taamneh, M.; Taamneh, S. Severity prediction of traffic accident using an artificial neural network. J. Forecast. 2017, 36, 100–108. [Google Scholar] [CrossRef]
- Çeven, S.; Albayrak, A. Traffic accident severity prediction with ensemble learning methods. Comput. Electr. Eng. 2024, 114, 109101. [Google Scholar] [CrossRef]
- Hashmienejad, S.H.A.; Hasheminejad, S.M.H. Traffic accident severity prediction using a novel multi-objective genetic algorithm. Int. J. Crashworthiness 2017, 22, 425–440. [Google Scholar] [CrossRef]
- Sameen, M.I.; Pradhan, B. Severity prediction of traffic accidents with recurrent neural networks. Appl. Sci. 2017, 7, 476. [Google Scholar] [CrossRef]
- Yan, M.; Shen, Y. Traffic accident severity prediction based on random forest. Sustainability 2022, 14, 1729. [Google Scholar] [CrossRef]
- Dhanya, K.; Vajipayajula, S.; Srinivasan, K.; Tibrewal, A.; Kumar, T.S.; Kumar, T.G. Detection of Network Attacks using Machine Learning and Deep Learning Models. Procedia Comput. Sci. 2023, 218, 57–66. [Google Scholar] [CrossRef]
- Filali, A.; Mlika, Z.; Cherkaoui, S.; Kobbane, A. Preemptive SDN load balancing with machine learning for delay sensitive applications. IEEE Trans. Veh. Technol. 2020, 69, 15947–15963. [Google Scholar] [CrossRef]
- Hammouri, A.; Hammad, M.; Alnabhan, M.; Alsarayrah, F. Software bug prediction using machine learning approach. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 78–83. [Google Scholar] [CrossRef]
- Kumar, R.; Kumar, P.; Kumar, Y. Time series data prediction using IoT and machine learning technique. Procedia Comput. Sci. 2020, 167, 373–381. [Google Scholar] [CrossRef]
- Muktar, B.; Fono, V.; Zongo, M. Predictive Modeling of Signal Degradation in Urban VANETs Using Artificial Neural Networks. Electronics 2023, 12, 3928. [Google Scholar] [CrossRef]
- Ahmed, S.; Hossain, M.A.; Ray, S.K.; Bhuiyan, M.M.I.; Sabuj, S.R. A study on road accident prediction and contributing factors using explainable machine learning models: Analysis and performance. Transp. Res. Interdiscip. Perspect. 2023, 19, 100814. [Google Scholar] [CrossRef]
- Wu, P.; Meng, X.; Song, L. A novel ensemble learning method for crash prediction using road geometric alignments and traffic data. J. Transp. Saf. Secur. 2020, 12, 1128–1146. [Google Scholar] [CrossRef]
- Gan, J.; Li, L.; Zhang, D.; Yi, Z.; Xiang, Q. An alternative method for traffic accident severity prediction: Using deep forests algorithm. J. Adv. Transp. 2020, 2020, 1257627. [Google Scholar] [CrossRef]
- Dong, C.; Shao, C.; Li, J.; Xiong, Z. An improved deep learning model for traffic crash prediction. J. Adv. Transp. 2018, 2018, 3869106. [Google Scholar] [CrossRef]
- Zhang, C.; He, J.; Wang, Y.; Yan, X.; Zhang, C.; Chen, Y.; Liu, Z.; Zhou, B. A crash severity prediction method based on improved neural network and factor Analysis. Discret. Dyn. Nat. Soc. 2020, 2020, 4013185. [Google Scholar] [CrossRef]
- Yang, J.; Han, S.; Chen, Y. Prediction of Traffic Accident Severity Based on Random Forest. J. Adv. Transp. 2023, 2023, 7641472. [Google Scholar] [CrossRef]
- Gupta, U.; Varun, M.; Srinivasa, G. A Comprehensive Study of Road Traffic Accidents: Hotspot Analysis and Severity Prediction Using Machine Learning. In Proceedings of the 2022 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India, 8–10 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Paul, A.K.; Boni, P.K.; Islam, M.Z. A Data-Driven Study to Investigate the Causes of Severity of Road Accidents. In Proceedings of the 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 3–5 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–7. [Google Scholar]
- Gatarić, D.; Ruškić, N.; Aleksić, B.; Đurić, T.; Pezo, L.; Lončar, B.; Pezo, M. Predicting Road Traffic Accidents—Artificial Neural Network Approach. Algorithms 2023, 16, 257. [Google Scholar] [CrossRef]
- Sowdagur, J.A.; Rozbully-Sowdagur, B.T.B.; Suddul, G. An Artificial Neural Network Approach for Road Accident Severity Prediction. In Proceedings of the 2022 IEEE Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 25–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 267–270. [Google Scholar]
- Meocci, M.; Branzi, V.; Martini, G.; Arrighi, R.; Petrizzo, I. A predictive pedestrian crash model based on artificial intelligence techniques. Appl. Sci. 2021, 11, 11364. [Google Scholar] [CrossRef]
- Islam, M.K.; Reza, I.; Gazder, U.; Akter, R.; Arifuzzaman, M.; Rahman, M.M. Predicting road crash severity using classifier models and crash hotspots. Appl. Sci. 2022, 12, 11354. [Google Scholar] [CrossRef]
- Aldhari, I.; Almoshaogeh, M.; Jamal, A.; Alharbi, F.; Alinizzi, M.; Haider, H. Severity Prediction of Highway Crashes in Saudi Arabia Using Machine Learning Techniques. Appl. Sci. 2022, 13, 233. [Google Scholar] [CrossRef]
- Shen, Y.; Zheng, C.; Wu, F. Study on Traffic Accident Forecast of Urban Excess Tunnel Considering Missing Data Filling. Appl. Sci. 2023, 13, 6773. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Z.; Pu, Z.; Xu, C. Comparing prediction performance for crash injury severity among various machine learning and statistical methods. IEEE Access 2018, 6, 60079–60087. [Google Scholar] [CrossRef]
- Infante, P.; Jacinto, G.; Afonso, A.; Rego, L.; Nogueira, V.; Quaresma, P.; Saias, J.; Santos, D.; Nogueira, P.; Silva, M.; et al. Comparison of statistical and machine-learning models on road traffic accident severity classification. Computers 2022, 11, 80. [Google Scholar] [CrossRef]
- Mansoor, U.; Ratrout, N.T.; Rahman, S.M.; Assi, K. Crash severity prediction using two-layer ensemble machine learning model for proactive emergency management. IEEE Access 2020, 8, 210750–210762. [Google Scholar] [CrossRef]
- Vijithasena, R.; Herath, W. Data Visualization and Machine Learning Approach for Analyzing Severity of Road Accidents. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Wahab, L.; Jiang, H. A comparative study on machine learning based algorithms for prediction of motorcycle crash severity. PLoS ONE 2019, 14, e0214966. [Google Scholar] [CrossRef] [PubMed]
- Ville de Montréal. Collisions Routières, [Jeu de données]. Dans Données Québec, 2018. Mis à jour le 19 Décembre 2022. 2022. Available online: https://www.donneesquebec.ca/recherche/dataset/vmtl-collisions-routieres (accessed on 19 December 2023).
- Licenses, Creative Commons. Attribution 4.0 International (CC BY 4.0). Creative Commons License. 2013. Available online: https://creativecommons.org/licenses/by/4.0/deed.en (accessed on 20 December 2023).
- McKinney, W. An improved air quality index machine learning-based forecasting with multivariate data imputation approach. Atmosphere. Sci. Comput. 2022, 13, 1144. [Google Scholar]
- Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021, 8, 140. [Google Scholar] [CrossRef] [PubMed]
- Nijman, S.; Leeuwenberg, A.; Beekers, I.; Verkouter, I.; Jacobs, J.; Bots, M.; Asselbergs, F.; Moons, K.; Debray, T. Missing data is poorly handled and reported in prediction model studies using machine learning: A literature review. J. Clin. Epidemiol. 2022, 142, 218–229. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Swana, E.F.; Doorsamy, W.; Bokoro, P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset. Sensors 2022, 22, 3246. [Google Scholar] [CrossRef] [PubMed]
- Muntasir Nishat, M.; Faisal, F.; Jahan Ratul, I.; Al-Monsur, A.; Ar-Rafi, A.M.; Nasrullah, S.M.; Reza, M.T.; Khan, M.R.H. A comprehensive investigation of the performances of different machine learning classifiers with SMOTE-ENN oversampling technique and hyperparameter optimization for imbalanced heart failure dataset. Sci. Program. 2022, 2022, 3649406. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1322–1328. [Google Scholar]
- Ray, S.; Alshouiliy, K.; Roy, A.; AlGhamdi, A.; Agrawal, D.P. Chi-squared based feature selection for stroke prediction using AzureML. In Proceedings of the 2020 Intermountain Engineering, Technology and Computing (IETC), Orem, UT, USA, 2–3 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Spencer, R.; Thabtah, F.; Abdelhamid, N.; Thompson, M. Exploring feature selection and classification methods for predicting heart disease. Digit. Health 2020, 6, 2055207620914777. [Google Scholar] [CrossRef] [PubMed]
- Thaseen, I.S.; Kumar, C.A. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J. King Saud Univ.-Comput. Inf. Sci. 2017, 29, 462–472. [Google Scholar]
- Guo, M.; Yuan, Z.; Janson, B.; Peng, Y.; Yang, Y.; Wang, W. Older pedestrian traffic crashes severity analysis based on an emerging machine learning XGBoost. Sustainability 2021, 13, 926. [Google Scholar] [CrossRef]
- Dong, S.; Khattak, A.; Ullah, I.; Zhou, J.; Hussain, A. Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley Additive exPlanations. Int. J. Environ. Res. Public Health 2022, 19, 2925. [Google Scholar] [CrossRef] [PubMed]
- Lu, P.; Zheng, Z.; Ren, Y.; Zhou, X.; Keramati, A.; Tolliver, D.; Huang, Y. A gradient boosting crash prediction approach for highway-rail grade crossing crash analysis. J. Adv. Transp. 2020, 2020, 6751728. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Bentéjac, C.; Csörgo, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Sarveshvar, M.; Gogoi, A.; Chaubey, A.K.; Rohit, S.; Mahesh, T. Performance of different machine learning techniques for the prediction of heart diseases. In Proceedings of the 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS), Bengaluru, India, 21–22 December 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 1, pp. 1–4. [Google Scholar]
- Hébert, A.; Guédon, T.; Glatard, T.; Jaumard, B. High-resolution road vehicle collision prediction for the city of montreal. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1804–1813. [Google Scholar]
- Mufid, M.R.; Basofi, A.; Al Rasyid, M.U.H.; Rochimansyah, I.F. Design an mvc model using python for flask framework development. In Proceedings of the 2019 International Electronics Symposium (IES), Surabaya, Indonesia, 27–28 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 214–219. [Google Scholar]
Study | Focus | Data Used | Models Evaluated | Key Findings |
---|---|---|---|---|
[14] | Prediction of traffic accidents | New Zealand dataset (2016–2020) | RF, DJ, AdaBoost, XGBoost, LGBM, CatBoost | RF most effective with 81.45% accuracy. Importance of road category and vehicle number. |
[15] | Accident prediction based on road and traffic data | Not specified | Ensemble learning CPM-GAs | Improved accuracy and reduced variance in predictions. |
[16] | Predicting the severity of a traffic accident | UK road safety dataset | Deep Forests | Superior stability and accuracy with minimal hyperparameters. |
[17] | Traffic accident prediction with deep learning | Data from Knox County, Tennessee | Improved deep learning model, MVNB | The model is characterized by prediction accuracy and dimensionality reduction. |
[18] | Accident severity prediction | I5 interstate highway, Washington State (2011–2015) | Improved neural network | The focus is on vehicle-related versus road-related factors. |
[19] | Predicting the severity of a traffic accident | Chinese National Car Accident In-Depth Investigation System (2018–2020) | RF | The RF algorithm is superior in predicting severity. |
[20] | Analysis of traffic accidents | UK dataset (2005–2017) | Naive-Bayes, LR, AdaBoost, XGBoost, RF | Insights into accident severity and hotspot identification. |
[21] | Causes of the severity of a traffic accident | UK road accident database | NCA, k-nearest neighbors, Individual Conditional Expectation | Identified significant factors influencing the severity of the accident. |
[22] | Traffic accident prediction with ANN | Serbia and Bosnia and Herzegovina | ANN | High accuracy in predicting accident events and severity. |
[23] | Predicting the severity of road accidents in Mauritius | Not specified | ANN (MLP) | MLP outperforms other models with an accuracy of 84.1%. |
[24] | Pedestrian crash model | Italy, ISTAT dataset (5 years) | Gradient Boosting | Effective in predicting the risk of pedestrian accidents |
[25] | Analysis of crash severity and hotspots | Al-Ahsa, Saudi Arabia (2016–2018) | Gradient Boosting, RF, logistic regression | Identified factors and hotspots for severe R.T.C.s. |
[26] | Severity of highway accident in Saudi Arabia | Qassim Province (2017–2019) | RF, XGBoost, logistic regression | XGBoost is the most accurate at predicting accident severity. |
[27] | Traffic accident forecast in tunnels | YingTian Street Tunnel, Nanjing | GCN-LSTM, BP neural network, RF | The RF mode excels at predicting the duration of an accident. |
[28] | Predicting the severity of injuries in an accident | Highway divergence areas, Florida | K-Nearest Neighbor, Decision Tree, RF, SVM | RF most effective; highlights overfitting problems. |
[29] | Classification of the severity of a traffic accident | Setúbal, Portugal (2016–2019) | Logistic regression, machine learning models | Comparing performance between models on balanced datasets. |
[30] | Accident severity prediction for emergency management | Great Britain (2011–2016) | Two-layer ensemble model | Superior performance in accuracy and F1 score. |
[31] | Analysis of the severity of traffic accidents | USA (2016–2019) | Random Forest | High accuracy in predicting accident severity. |
[32] | Predicting the severity of a motorcycle accident in Ghana | Ghana (2011–2015) | J48 Decision Tree, RF, IBk | RF is the most accurate in predicting severity. |
Current Work | Accident severity prediction in Montreal | Montreal collision data (2012–2021) | XGBoost, CatBoost, RF, GB | The XGBoost model demonstrated highest accuracy (96%) and effectiveness in predicting accident severity. |
Description | Value |
---|---|
Number of rows | 218,272 |
Number of columns | 68 |
Type of data | float64, int64. object |
Categorical variables | 15 (type object) |
Numerical variables | 53 (29 int64, 24 float64) |
Severity of the Accident | Numerical Coding |
---|---|
Damage Below Reporting Threshold | 0 |
Property Damage Only | 1 |
Minor | 2 |
Serious | 3 |
Fatal | 4 |
Attribute | Number of Missing | Percentage |
---|---|---|
kilometer_marker | 218,161 | 99.949146 |
road_direction | 217,882 | 99.821324 |
civic_number_suffix | 217,828 | 99.796584 |
road_number | 217,550 | 99.669220 |
construction_zone | 213,368 | 97.753262 |
special_situation | 213,077 | 97.619942 |
positioning | 169,056 | 77.451987 |
road_surface | 165,001 | 75.594213 |
distance_in_meters | 157,580 | 72.194326 |
cardinal_point_code | 150,120 | 68.776572 |
civic_number | 124,781 | 57.167662 |
Attribute | Number of Missing | Percentage |
---|---|---|
type_of_marker | 82,307 | 37.708456 |
collision_near | 71,083 | 32.566248 |
road_configuration | 21,972 | 10.066339 |
longitudinal_location | 17,763 | 8.138011 |
weather_conditions | 13,602 | 6.231674 |
lighting | 12,919 | 5.918762 |
surface_condition | 12,760 | 5.845917 |
street_name | 12,298 | 5.634255 |
collision_type | 10,067 | 4.612135 |
road_aspect | 9917 | 4.543414 |
environment | 7055 | 3.232206 |
road_category | 6355 | 2.911505 |
detached_location | 19 | 0.008705 |
administrative_region | 8 | 0.003665 |
county_name | 8 | 0.003665 |
municipality_code | 7 | 0.003207 |
Attribute | Number of Missing | Percentage |
---|---|---|
authorized_speed | 80,885 | 37.056975 |
x_coordinate | 11 | 0.005040 |
y_coordinate | 11 | 0.005040 |
longitude | 11 | 0.005040 |
latitude | 11 | 0.005040 |
Balancing Algorithm | Accuracy |
---|---|
SMOTE-ENN | 0.985085 |
SMOTE-Tomek | 0.895400 |
SMOTE | 0.867106 |
ADASYN | 0.811218 |
Feature | Chi-Square Score | Percentage |
---|---|---|
Collision_Near | 495,129.765158 | 30.526103 |
Street_Name | 175,285.888584 | 10.806854 |
Num_Serious_Injuries | 169,805.678768 | 10.468984 |
Num_Deaths | 162,050.724638 | 9.990870 |
Total_Victims | 151,493.855851 | 9.340010 |
Num_Minor_Injuries | 150,924.871613 | 9.304931 |
Pedestrian_Deaths | 98,471.014493 | 6.071007 |
Total_Pedestrian_Victims | 31,182.458448 | 1.922484 |
Pedestrian_Injuries | 30,277.689032 | 1.866702 |
Longitudinal_Location | 24,494.472488 | 1.510151 |
Bicycle_Deaths | 20,159.420290 | 1.242883 |
Bicycle_Injuries | 16,883.042752 | 1.040886 |
Total_Bicycle_Victims | 16,847.829913 | 1.038715 |
X_Coordinate | 14,566.502900 | 0.898065 |
Motorcycle_Deaths | 11,630.434783 | 0.717048 |
Bicycle_Count | 10,794.691447 | 0.665522 |
Unspecified_Vehicle_Count | 8160.399934 | 0.503111 |
Y_Coordinate | 6016.332705 | 0.370923 |
Total_Motorcycle_Victims | 4873.843542 | 0.300486 |
Motorcycle_Injuries | 4830.242330 | 0.297798 |
Road_Category | 3951.186923 | 0.243601 |
Emergency_Vehicle_Count | 2492.677718 | 0.153680 |
Heavy_Trucks_Count | 2031.992082 | 0.125278 |
Motorcycle_Count | 1912.272853 | 0.117897 |
Light_Cars_Trucks_Count | 1911.036696 | 0.117821 |
Collision_Type | 1561.271344 | 0.096257 |
Hour | 1255.653908 | 0.077414 |
Authorized_Speed | 1127.238142 | 0.069497 |
Surface_Condition | 950.290263 | 0.058588 |
Weather_Conditions | 915.358973 | 0.056434 |
Class | Precision | Recall | F1 Score | Support | Accuracy |
---|---|---|---|---|---|
Results for XGBoost | |||||
Damage Below Reporting Threshold | 0.79 | 0.75 | 0.77 | 1385 | |
Property Damage Only | 0.66 | 0.56 | 0.61 | 907 | |
Minor | 0.89 | 0.84 | 0.86 | 3974 | |
Serious | 0.97 | 1.00 | 0.98 | 12953 | |
Fatal | 1.00 | 1.00 | 1.00 | 15346 | |
Weighted Avg | 0.96 | 0.96 | 0.96 | 34565 | 0.96 |
Results for CatBoost | |||||
Damage Below Reporting Threshold | 0.76 | 0.72 | 0.74 | 1385 | |
Property Damage Only | 0.62 | 0.43 | 0.51 | 907 | |
Minor | 0.86 | 0.78 | 0.82 | 3974 | |
Serious | 0.95 | 1.00 | 0.97 | 12953 | |
Fatal | 1.00 | 1.00 | 1.00 | 15346 | |
Weighted Avg | 0.94 | 0.95 | 0.94 | 34565 | 0.95 |
Results for RF | |||||
Damage Below Reporting Threshold | 0.75 | 0.70 | 0.73 | 1385 | |
Property Damage Only | 0.62 | 0.31 | 0.42 | 907 | |
Minor | 0.81 | 0.65 | 0.72 | 3974 | |
Serious | 0.91 | 0.99 | 0.95 | 12953 | |
Fatal | 0.99 | 1.00 | 1.00 | 15346 | |
Weighted Avg | 0.92 | 0.93 | 0.92 | 34565 | 0.93 |
Results for GB | |||||
Damage Below Reporting Threshold | 0.75 | 0.72 | 0.73 | 1385 | |
Property Damage Only | 0.56 | 0.42 | 0.48 | 907 | |
Minor | 0.76 | 0.59 | 0.67 | 3974 | |
Serious | 0.88 | 0.92 | 0.90 | 12953 | |
Fatal | 0.94 | 0.98 | 0.96 | 15346 | |
Weighted Avg | 0.88 | 0.89 | 0.88 | 34565 | 0.89 |
Feature | Chi-Square Score | Percentage |
---|---|---|
Collision_Near | 495,129.765158 | 30.526103 |
Street_Name | 175,285.888584 | 10.806854 |
Longitudinal_Location | 24,494.472488 | 1.510151 |
Road_Category | 3951.186923 | 0.243601 |
Emergency_Vehicle_Count | 2492.677718 | 0.153680 |
Heavy_Trucks_Count | 2031.992082 | 0.125278 |
Motorcycle_Count | 1912.272853 | 0.117897 |
Light_Cars_Trucks_Count | 1911.036696 | 0.117821 |
Collision_Type | 1561.271344 | 0.096257 |
Surface_Condition | 950.290263 | 0.058588 |
Weather_Conditions | 915.358973 | 0.056434 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Muktar, B.; Fono, V. Toward Safer Roads: Predicting the Severity of Traffic Accidents in Montreal Using Machine Learning. Electronics 2024, 13, 3036. https://doi.org/10.3390/electronics13153036
Muktar B, Fono V. Toward Safer Roads: Predicting the Severity of Traffic Accidents in Montreal Using Machine Learning. Electronics. 2024; 13(15):3036. https://doi.org/10.3390/electronics13153036
Chicago/Turabian StyleMuktar, Bappa, and Vincent Fono. 2024. "Toward Safer Roads: Predicting the Severity of Traffic Accidents in Montreal Using Machine Learning" Electronics 13, no. 15: 3036. https://doi.org/10.3390/electronics13153036