Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London

Zheng, Minxue; Xie, Xintong; Jiang, Yutao; Shen, Qiu; Geng, Xiaolei; Zhao, Luyao; Jia, Feng

doi:10.3390/su16166969

Open AccessArticle

Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London

by

Minxue Zheng

^1,2,

Xintong Xie

¹

,

Yutao Jiang

¹,

Qiu Shen

^1,2,

Xiaolei Geng

^1,2,

Luyao Zhao

^1,2 and

Feng Jia

^1,2,*

¹

School of the Environment and Safety Engineering, Jiangsu University, Zhenjiang 212013, China

²

School of the Emergency Management, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(16), 6969; https://doi.org/10.3390/su16166969

Submission received: 27 June 2024 / Revised: 8 August 2024 / Accepted: 10 August 2024 / Published: 14 August 2024

(This article belongs to the Special Issue Research on Sustainable Transportation and Urban Traffic—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Road traffic accidents pose significant challenges to sustainable urban safety and intelligent transportation management. The effective hazard identification of crash hotspots is crucial in implementing targeted safety measures. A severity-weighted system was adopted to quantify crash hazard levels. Using 1059 valid crash records of the City of London, the spatial correlations of crash points were first examined via average nearest neighbor analysis. Then, the optimal KDE bandwidth was determined via ArcGIS’s automatic extraction method, multi-distance spatial cluster analysis, and incremental spatial autocorrelation (ISA) analysis. The predictive accuracy index (PAI) was used to evaluate the accuracy of KDE results at various bandwidths. The results revealed a clustered spatial distribution of crash points. The optimized KDE bandwidth obtained via ISA analysis was 134 m, and the yielded PAI was 4.381, indicating better predictive accuracies and balanced hotspot distributions and reflecting both local concentrations and the overall continuity of crash hazard hotspots. Applying this bandwidth to the validation data allowed the successful identification of most high-risk areas and potential crash hazard hotspots attributed to traffic environmental factors; this method exhibits reliability, accuracy, and robustness over medium to long time scales. This workflow can serve as an analytical template for assisting planners in improving the identification accuracy of hazard hotspots, thereby reducing crash occurrences, actively promoting sustainable traffic safety development, and providing valuable insights for targeted crash prevention and intelligent traffic safety management in urban areas.

Keywords:

urban traffic accidents; sustainable traffic safety; hotspot identification; hazard level; kernel density estimation; bandwidth optimization

1. Introduction

With the accelerated process of urbanization and the continuous growth of transportation demands, traditional vehicles powered by fossil fuels are no longer sufficient to meet the needs of today’s society. Therefore, a variety of transportation options have emerged, such as hybrid cars, electric buses, etc. However, these advancements have also introduced more complex types of risks when crashes occur. Pedestrians, cyclists, and drivers of small, motorized vehicles are particularly vulnerable in crashes. Among pedestrians, children, the elderly, and people with disabilities are the most frequently involved in such crashes [1]. Road traffic accidents claim approximately 1.19 million lives worldwide each year, and cause economic losses amounting to about 3% of the gross domestic product (GDP) in most countries [2]. Although the occurrence of traffic accidents is influenced by various factors such as the natural environment, road network configuration, road design, and maintenance, it is not randomly distributed in space but rather concentrated in certain areas [3,4,5]. Effectively identifying high-risk areas of road traffic accidents in spatial terms is of great significance in reducing crash rates, improving emergency response rates, and minimizing casualties and economic losses [6,7,8] while meeting the sustainable urban development goals. Therefore, it is becoming an important research topic in the field of intelligent traffic management [9].

Current research on the characteristics of traffic accidents mainly focuses on identifying traffic accident hotspots, with less emphasis on the effect of identification methods on determining hotspots [10,11]. By jointly examining crash severities and occurrence while taking spatial effects into account, more informative results can be reached for practitioners [3]. Geurts et al. determined the weights of minor injuries, serious injuries, and fatalities in traffic accidents through a weighted analysis method, validated their model using Belgian traffic accident data, and constructed an urban traffic accident severity weighting system [12]. Le et al. used the Belgian government’s accident severity weighting system to identify potential traffic accident hotspots in Hanoi effectively and noted that the location of hotspots migrates relative to seasonal time variations [13]. Choudhary et al. also identified successfully urban crash hotspots in India using the same weighting system [14]. Based on these studies, the Belgian government’s accident severity weighting system is considered to be applicable to hotspot identification of traffic accident data from different regions. This system also provides data-backed support for urban traffic planners and policymakers, enabling the formulation of transportation policies and plans that better align with sustainable development objectives.

There are various methods for identifying crash hotspots. In comparison to other clustering techniques used to identify crash hotspots, the planar kernel density estimation (KDE) in ArcGIS offers prominent advantages such as excellent visualization capabilities and the ability to assess the spread of crash risk [15,16]. Moreover, when the study area is characterized by a dense traffic network with dispersed traffic accidents, the planar KDE is more suitable [17,18]. Before conducting planar KDE analysis, it is necessary to perform a spatial correlation test on the research data, and spatial autocorrelation analysis can provide effective indicators for assessing the correlation [3,19,20]. Ye et al. and Hazaymeh et al. used planar KDE and spatial autocorrelation analysis in GIS to determine that the research data exhibited a clustered distribution, and described the spatial characteristics of complex traffic environments quantitatively [21,22]. The effectiveness of planar KDE results largely depends on the choice of the bandwidth, which controls the range of influence of the kernel function [23,24,25,26], thereby determining the smoothness of the density estimation and hotspot identification accuracy. According to the characteristics of different study areas, the specification of bandwidth and cell size should not be the same [26]. Existing research lacks discussion on methods for selecting threshold distances [27], or assuming them based on past experience [4,28], which may lead to imprecise results. Charpentier improved the accuracy of planar KDE results by adjusting the search radius using Ripley’s K-function [29]. Le et al. and Alam et al. attempted to determine the search radius using incremental spatial autocorrelation (ISA) analyses to enhance the accuracy of planar KDE identification [30,31]. Habib et al. utilized the G-function to calculate a network kernel density distance threshold of 2300 m for the city of San Francisco, which employed cross-K validation to improve the accuracy of hotspot identification and distribution [32]. Loo et al. argue that a larger bandwidth value is more appropriate for analysis in areas with low collision density [33]. Existing research has proposed several methods for selecting the bandwidth; however, there is a lack of comparative analysis of results under different bandwidths. Moreover, there is insufficient persuasive evidence for bandwidth optimization to improve the accuracy of the results. Chainey et al. first introduced the predictive accuracy index (PAI) method in the field of urban crime hotspot research [34]. This index measures and optimizes the performance of different methods and has been used in road safety research recently [34,35,36,37]. Studies have shown that combining various spatial statistical methods can more accurately describe the spatial distribution characteristics of traffic accidents and provide new insights for optimizing the selection of planar KDE bandwidths.

In summary, there are several shortcomings in existing research on identifying hotspots of road traffic accidents: (1) the lack of consideration of the effect of crash severity on hotspots, and (2) the lack of consideration of the effect of KDE bandwidth on identification results. This study addresses these issues by utilizing the crash severity weighting system proposed by the Belgian government to quantify the risk of traffic accidents and achieve a standardized identification of crash hazard hotspots. The City of London is taken as a case study; 655 valid traffic accident data points from January 2018 to December 2020 were used as test data for investigating the hazard level of road traffic accidents. Firstly, the spatial correlation of crash points within the study area was examined using average nearest neighbor analyses, providing a prerequisite for planar KDE analyses. Then, different bandwidth values for KDE were calculated using the ArcGIS automatic value selection method, multi-distance spatial cluster analysis (Ripley’s K-function), and ISA analysis on the test data. This produced KDE results with varying bandwidths, and the PAI was introduced to determine the optimal bandwidth value for planar KDE. Finally, 404 valid data points from January 2021 to June 2023 in the City of London were used as validation data. The optimal bandwidth was employed to calculate the planar KDE results for the validation data, thereby identifying crash hazard hotspots and validating the effectiveness and stability of the proposed method at a medium-term to long-term scale. This method can be adjusted as urbanization progresses and traffic demands change, providing a reference for developing effective urban road traffic management strategies and promoting the realization of a safer and more sustainable transportation environment.

The rest of the paper is organized as follows: Section 2 describes the data and methodology used in the study; Section 3 details the choice of bandwidth and presents the results of the hotspot identification; Section 4 discusses the results of the study and finally concludes the paper.

2. Materials and Methods

2.1. Study Area and Data

The selected study area for this research study was the City of London located in southern England. It has a population density of 15,695 individuals per square kilometer; in other words, it is markedly populous, and has a well-developed transportation system. However, the road environment across different regions is obviously different. Traffic accidents are frequent occurrences along major arteries, and the spatial distribution of crashes in the study area is shown in Figure 1.

The City of London falls under the Central Activities Zone (CAZ) as defined in the Greater London Spatial Development Strategy 2021. It is a key plaNNIng area within the strategic objectives, with the aim of realizing 95% green urban transportation by 2041 [38]. To accomplish this goal, it is crucial to undertake scientifically informed plaNNIng for the transportation environment. Therefore, studying crash hazard hotspots in this area holds significant importance for realizing sustainable and eco-friendly commuting goals.

The research data were sourced from the national traffic accident database published on the UK government’s official website (www.data.gov.uk, accessed on 20 November 2023), which records 600,000 traffic accident cases. The data include 34 items, such as crash ID, geographic coordinates, occurrence time, number of casualties, severity of injuries, and crash conditions (weather, lighting, road surface, road type, and road infrastructure). After projecting crash coordinates using the projection tool in ArcMap and performing preliminary screening and processing, 655 valid traffic accident cases from January 2018 to December 2020 were selected as test data, and 404 valid cases from January 2021 to June 2023 were selected as validation data. The distribution of the total number of crashes per year is shown in Figure 2.

2.2. Methodology

2.2.1. Overview of the Method

To accurately identify crash hazard hotspots with respect to road traffic accidents, the research method proposed in this paper is illustrated in Figure 3 and includes the following:

(1): Collection of traffic accident data, including crash coordinates, road network data, and casualty information, followed by data screening and preliminary processing;
(2): Determination of crash hazard level, which serves as the analytical indicator for the identification of crash hazard hotspots. Crash hotspots identified based on this indicator are defined as crash hazard hotspots;
(3): Verification of the spatial point distribution pattern using the nearest neighbor index (NNI) to test the clustering characteristics of crash points within the study area and providing a prerequisite for planar KDE analysis;
(4): Determination of the optimal bandwidth using ArcGIS’s automatic extraction method, multi-distance spatial cluster analysis (Ripley’s K-function), and ISA analysis to calculate the bandwidth, obtaining planar KDE results under different bandwidths and further introducing the PAI to determine the optimal bandwidth for planar KDE;
(5): Model validation and extension: identifying crash hazard hotspots using the optimized bandwidth of planar KDE and drawing their distribution maps and analyzing the accuracy of the results.

2.2.2. Crash Hazard Level

The hazard level is defined by the severity index of each crash using the Belgian government’s accident severity weighting system. The system assigns individual weights of 1, 3, and 5 for minor, serious, and fatal crashes, respectively, meaning that 1 death is equivalent to 5 minor injuries, and 1 serious injury is equivalent to 3 minor injuries. The hazard level (SI) is calculated using the following:

SI = L + 3S + 5D

(1)

where L, S, and D are the total number of slight injuries, serious injuries, and deaths, respectively.

2.2.3. Average Nearest Neighbor (ANN) Analysis

The average nearest neighbor (ANN) analysis method was employed to verify the spatial distribution status of crashes and assess whether the data are meaningful for spatial analysis. In this method, the nearest neighbor index (NNI) is defined as the ratio of the average observed distance (D_o) to the average expected distance (D_e) between points, as shown in Equation (2). When D_o is greater than D_e, i.e., NNI > 1, this indicates that the data points are randomly distributed. When D_o is less than D_e, i.e., NNI < 1, this indicates that there is a clustered distribution among the points, and the greater the difference, the stronger the clustering:

N N I = D_{o} / D_{e}

(2)

D_{o} = \sum_{i = 1}^{n} D_{i} / n

(3)

D_{e} = \frac{0.5}{\sqrt{n / A}}

(4)

where n corresponds to the total number of features, and A is the area of the minimum enclosing rectangle of all features or its user-specified area value.

To measure the statistical significance of the ANN analysis results, a Z-score evaluation is used. The calculation method can be found in Equation (5). This score is also subsequently employed to assess the statistical significance of the ISA analysis results. The larger the Z-score, the more significant the spatial autocorrelation.

Z - s c o r e = \frac{{(D}_{o} - D_{e})}{S E}

(5)

S E = \frac{0.26136}{\sqrt{n^{2} / A}}

(6)

where

S E

represents the standard error.

2.2.4. Kernel Density Estimation (KDE)

The KDE method calculates the size per unit area from point or line features based on a kernel function (K), and it fits a smooth conical surface to each point or line [34]. It computes the density of features within a neighborhood, and scales based on the distance from event points to draw a heat map of the spatial distribution [39,40]. Preliminary research indicates that the choice of bandwidth (h) has a greater impact on KDE calculations compared to the choice of K [41]. The formula for calculating KDE is as follows:

f_{n} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h})

(7)

where

x_{i}

is the value of variable

x

in

i

, and

n

is the factor near location x in bandwidth

h

.

The Gaussian kernel function has the advantage of being unbounded, allowing it to cover all data points within the bandwidth, with data points closer to the kernel’s center receiving greater weight. Consequently, the Gaussian function was selected in this study and presented as follows:

K (\frac{x - x_{i}}{h}) = \frac{1}{\sqrt{2 π}} e^{\frac{- ({x - x_{i})}^{2}}{2}}

(8)

2.2.5. Multi-Distance Spatial Cluster Analysis (Ripley’s K-Function)

Ripley’s K-function is a multi-distance point pattern analysis method that can calculate the maximum spatial clustering distance [42]. This function accumulates the distribution frequency of distances between events at different neighborhood scales to determine whether the spatial aggregation or dispersion of events has statistical significance. To simplify interpretation, Ripley’s K-function is often transformed into the linear L(d) function as follows:

L (d) = \sqrt{\frac{A \sum_{i = 1}^{N} \sum_{j = 1, j \neq i}^{N} k (i, j)}{π N (N - 1)}}

(9)

where A is the total area of the factor, N is the total number of factors, d is the distance, and k (i, j) is the factor’s weight. When the distance between i and j is less than d, the weight is 1 (if no boundary correction); when the vice versa occurs, the weight is 0. When boundary correction is applied, the weight k (i, j) is slightly different.

Ripley’s K-function calculates the difference between the observed count of point pairs and the expected count of point pairs at various distance thresholds, which can explore the spatial clustering patterns of traffic accidents at different distances. The expected value (ExpectedK) represents the predicted value calculated under the assumption of randomness, and the observed value (ObservedK) represents the result value calculated from actual data. The Diffk value is the observed value minus the expected value. A positive DiffK value that is significantly greater than zero suggests that the observed spatial distribution is more clustered than a distribution that would occur by chance. The magnitude of this positive Diffk is indicative of the strength of the spatial aggregation, with larger values denoting a more pronounced clustering effect. Therefore, the ObservedK corresponding to the maximum Diffk value is selected as the bandwidth value result.

2.2.6. Incremental Spatial Autocorrelation (ISA) Analysis

ISA analysis is a method for measuring the spatial autocorrelation of features based on distance. This method emphasizes not only the spatial positions of the features but also the attribute values of the data, providing a more comprehensive description of the traffic accidents’ spatial autocorrelations. Therefore, this method is adopted to select the bandwidth. The method uses Moran’s I to describe the degree of spatial clustering, and the calculation formula is as follows:

I = \frac{N \sum_{i} \sum_{j} W_{i, j} (X_{i} - \bar{X}) (X_{j} - \bar{X})}{(\sum_{i} \sum_{j} W_{i, j}) \sum_{i} (X_{i} - \bar{X}) {(X_{j} - \bar{X})}^{2}}

(10)

where N is the number of cases, X_i is the variable value at a particular location, X_j is the variable value at another location,

\bar{X}

is the mean of the variable, and W is a weight applied to the comparison between location i and location j. W_i,j is a distance-based weight matrix, which is the inverse distance between locations i and j (1/d_ij).

In this study, the attribute values represent crash hazard levels. Moran’s I with a spatial adjacency matrix is used to assess the correlation between the hazard levels at each location and those of neighboring locations. Moran’s I value < 1 indicates negative correlation, >1 indicates positive correlation, and =1 indicates no correlation. The statistical significance, Z-score, can be calculated using the expected value (E[I]) and the variance (

V [I])

.

Z - s c o r e = \frac{(I - E [I])}{\sqrt{V [I]}}

(11)

E [I] = \frac{- 1}{(n - 1)}

(12)

V [I] = E [I^{2}] - E {[I]}^{2}

(13)

2.2.7. Predictive Accuracy Index (PAI)

The PAI can be used to quantitatively compare the abilities of different methods in identifying crash hazard hotspots. In this study, the PAI is defined as the ratio between the proportion of crashes occurring within identified hotspots and the proportion of the hotspot coverage area [26]. The calculation formula for PAI is as follows:

PAI = (n/N)/(m/M)

(14)

where n represents the number of crashes within the identified hotspots, N represents the total number of crashes, m represents the area of the hotspots, and M represents the total area of the study region. A higher PAI value indicates a stronger ability of the method to accurately identify the locations of crash hazard hotspots within the study region.

3. Results

3.1. ANN Analysis Results

The ANN analysis was conducted on a dataset of 655 crashes in the City of London from January 2018 to December 2020, and the results are shown in Figure 4. The Z-score ranging from −1.65 to 1.65 indicates no spatial correlation (random distribution), >1.65 indicates dispersion, and <−1.65 indicates clustering. The p-value represents the level of significance. When the significance level is set at 0.01 and the Z-score is <−2.58 or >2.58, there is a 1% chance of mistakenly concluding that the spatial distribution is not random. At a significance level of 0.05 and a Z-score of <−1.96 or >1.96, there is a 5% chance of mistakenly concluding that the spatial distribution is not random. When the significance level is 0.10 and the Z-score is <−1.65 or >1.65, there is a 10% chance of erroneously believing the spatial distribution to be non-random. In this case, the Z-score was −27.007, which was far below −2.58, and the corresponding p-value was much smaller than 0.01; this indicates that the observed spatial pattern is unlikely to be random (a low-probability event). This suggested that the likelihood that the observed spatial pattern is a random distribution is extremely low. In other words, the spatial distribution of crash points was not random. The NNI calculated by Equation (2) was 0.447984, which was less than 1. This indicated that the spatial distribution pattern of the crash points exhibited clustering rather than dispersion. Taking into account both the Z-score and the NNI, it could be concluded that the distribution of traffic accidents in the City of London followed a significant spatial clustering pattern. This finding served as a prerequisite for subsequent hotspot analyses.

3.2. Multi-Distance Spatial Cluster Analysis (Ripley’s K-Function)

The ANN analysis only revealed the spatial clustering characteristics of the crash points. To further investigate the clustering patterns at different scales, this study conducted 99 Monte Carlo iterations using Ripley’s K-function, and the results are shown in Figure 5. The low-value confidence envelope (LwConfEnv) represents the discrete confidence interval of expected values; it offers insights into the baseline expectation of crashes across different regions. While the high-value confidence envelope (HiConfEnv) represents the clustering confidence interval of expected values, it is employed to identify areas where the observed data points not only exceed the expected values but do so in a clustered manner, suggesting a non-random spatial pattern. The x-axis and y-axis represent the distance and the values of the L(d) function, respectively. The observed values are above the HiConfEnv and significantly higher than the LwConfEnv, confirming that the distribution of crashes exhibits clustering characteristics.

Table 1 presents the values of the L(d) function at various distances, including ExpectedK, ObservedK, DiffK, LwConfEnv, and HiConfEnv. Table 1 shows that ObservedK is consistently greater than ExpectedK and also greater than HiConfEnv, indicating that the distance thresholds between 183.105733 m and 208.275673 m all exhibit clustering patterns. Among them, the maximum value of DiffK for OBJECTID 3 is 56.428114, indicating that the ObservedK value can exceed the ExpectedK value under random distribution by up to 56.428114. This suggests that the clustering effect of crash points is most significant at a distance of 189.428 m. Therefore, this study chose 189.428 m as the bandwidth for the subsequent analysis using planar KDE.

3.3. Incremental Spatial Autocorrelation Analysis

Ripley’s K-function analysis revealed the clustering characteristics of crash points at different scales without considering the effect of crash hazard levels on the clustering patterns. Therefore, an ISA analysis method was adopted to explore the spatial correlation of crash hazard levels at different distances. The calculated values of Moran’s I, expected I, variance, Z-score, and p-value are listed in Table 2. The curves depicting the changes in Moran’s I and Z-score are shown in Figure 6, where the x-axis represents the distance, the left y-axis represents the Z-score, and the right y-axis represents Moran’s I. Moran’s I measures the spatial correlation of attribute values, while the Z-score measures the statistical significance of Moran’s I. A higher Z-score indicates a more significant spatial autocorrelation. Therefore, the distance value corresponding to the peak Z-score was selected as the bandwidth value for the planar KDE.

From Table 2 and Figure 6, the Z-score decreases from 1.497267 at 110 m to 1.480205 at 116 m and then rapidly increases from 1.480205 to the peak value of 2.140632 at 134 m. After 140 m, the change becomes more gradual. Similarly, Moran’s I decreases slowly from 0.031638 at 110 m to 0.030261 at 116 m and then rapidly increases from 0.030261 to its peak value of 0.040079 at 134 m. After 140 m, Moran’s I exhibits a gentle decreasing trend. The peak values of both Z-score and Moran’s I are achieved at 134 m, indicating that the spatial autocorrelation of crash hazard levels is most significant at this distance. This finding was consistent with the conclusions drawn by Hazaymeh et al. with respect to identifying traffic accident hotspots in Jordan [16]. They also found an optimal distance for the spatial autocorrelation of crash hazard levels. Therefore, in the subsequent study, 134 m was selected as the bandwidth for the planar KDE analysis, as it represented the distance at which the spatial autocorrelation of crash hazard levels was most pronounced. Compared to Ripley’s K-function analysis, the ISA analysis considered the hazard levels and accurately captured the spatial effects on crash risks. It provided an alternative approach for optimizing the bandwidth in planar KDE analyses.

3.4. ArcGIS Automated Extraction

The study utilized ArcGIS10.2 for the selection of the default bandwidth in planar kernel density estimation analysis. In kernel density analysis, if specific adjustments are not applied to the search radius value within the analysis window, the software defaults to a bandwidth value obtained by dividing the minimum of the analysis area’s length or width by 30. Therefore, the default bandwidth in this paper for ArcGIS automated extraction was set at 52.8 m.

3.5. Bandwidth Optimization

To evaluate the bandwidth effect on identifying crash hazard hotspots using planar KDE, this study compared three bandwidth selection methods: (1) the default bandwidth from ArcGIS automatic extraction resulted in a value of 52.8 m; (2) the optimal bandwidth from multi-distance spatial cluster analysis (Ripley’s K-function) was 189.428 m; (3) the optimal bandwidth from ISA analysis was 134 m. The results are shown in Figure 7, where the red color represents densely populated hotspot areas, indicating higher crash risk levels. The risk level decreases gradually as it spreads outward, represented by colors transitioning from red to blue, which indicates non-hotspot areas. The hotspot distribution varies significantly under the three different bandwidths:

(a): With the default bandwidth, the hotspots are small and dispersed, potentially underestimating the spatial overflow effect of crash risks.
(b): The optimal bandwidth obtained from multi-distance spatial cluster analysis results in larger and overly smooth hotspots, masking local high-risk areas.
(c): The hotspot distribution under the optimal bandwidth obtained from ISA analysis is relatively balanced, striking a better balance between local concentration and overall continuity of hotspots.

By comparing the three selected bandwidth methods, it was evident that the ISA analysis provided a more balanced and accurate representation of hotspot distributions, capturing both the local concentration and overall continuity of crash risks.

The PAI values for the three bandwidths in the planar KDE analysis are presented in Table 3. The highest PAI value, 4.381, was obtained using the optimal bandwidth from ISA analyses. This value surpassed the PAI values of 4.082 obtained from multi-distance spatial cluster analysis and 3.459 from the ArcGIS automatic extraction. This indicated that the ISA analysis, which considered crash hazard levels, could more accurately identify high-risk areas, and it enhanced the reliability of hotspot predictions. Therefore, the optimal bandwidth for the planar KDE analysis was chosen as 134 m.

3.6. Crash Hazard Hotspot Identification and Analysis (Test Data)

Based on the optimized bandwidth of 134 m and an output resolution of 1 m, the planar KDE method was applied to analyze 655 crash data points from January 2018 to December 2020. As a result, a total of 9 traffic crash hazard hotspots were identified in the City of London, as shown in Figure 8.

Table 4 summarizes the basic characteristics of different crash hazard hotspots, including their area, the number of crashes in the hotspot, hotspot hazard level, and unit kernel density estimates. The hotspot hazard level is calculated as the sum of the individual crash hazard levels within the crash hazard hotspot, representing the overall crash hazard level of the hotspot. The magnitude of the unit kernel density estimate corresponds to the relative traffic accident safety, meaning that areas with larger kernel density estimates indicate lower traffic safety levels.

Table 4 reveals significant differences among the nine crash hazard hotspots, indicating the spatially uneven distribution of crash risks. Crash hazard hotspot ID1 has the largest area (59,964 m²), the highest number of identified crashes (65), the highest hotspot hazard level (115), and the largest unit kernel density estimate (0.046838). This indicated that this area is a high-risk hotspot with a high frequency of crashes, deserving special attention. In the case of ID3 and ID8, both have 51 crashes, but ID3 exhibits higher densities, hotspot hazard levels, and unit kernel density estimates compared to ID8. This demonstrates that hotspots with an equal number of crashes cannot be assumed to have the same level of risk. Similarly, in the cases of ID2 and ID9, where the crashes are 12 and 5, respectively, ID2 has a higher crash frequency and density but a lower hotspot hazard level and unit kernel density estimate compared to ID9. This pattern was also observed in ID5, ID6, and ID7. These findings emphasize that considering both crash hazard levels and crash frequencies improves the identification accuracy of crash hazard hotspots.

3.7. Crash Hazard Hotspot Identification (Validation Data)

To validate the effectiveness and stability of the proposed method for crash hazard hotspot identification, a separate set of crash records was utilized; the records are from January 2021 to June 2023, comprising 404 incidents. Applying the planar KDE method with the optimized bandwidth, a total of eight crash hazard hotspots were identified based on the distribution and density of crashes in the validation data, as shown in Figure 9.

Table 5 summarizes the basic characteristics of different crash hazard hotspots identified for the validation data. Overall, the distribution of crash hazard hotspots in the validation data was consistent with the results of test data in Section 3.5, indicating that the method proposed in this study demonstrates good stability and predictive capability.

From Table 5, ID1 has the highest number of crashes (48) and hotspot hazard level (90), indicating that this area remains a high-risk hotspot with a significant frequency of crashes, consistent with the results from the test data. In the case of ID3 and ID8, the number of crashes and the hotspot hazard level are the same, but the unit kernel density estimate exhibits a higher risk with respect to ID3. Due to the effect of spatiotemporal characteristics and other factors, a new crash hazard hotspot ID4* emerged, with only eight crashes but a higher hotspot hazard level and unit kernel density estimate compared to ID5, ID7, and ID2, indicating that hotspots with a lower number of crashes do not necessarily mean they have a lower hazard level. This suggests an increasing traffic risk in the surrounding area, indicating a potential crash hazard hotspot. The absence of ID4 and ID6 from the validation data, which were present in the test data, was a phenomenon similar to the time-dependent characteristics of crash hazard hotspots observed in the study carried out by Bíl et al. in the Czech Republic, where crash hazard hotspots influenced by traffic conditions and other factors evolve over time [43]. The validation results demonstrated that the optimized planar KDE method could identify most high-risk areas, and the distribution of crash hazard hotspots was consistent with the test data. This indicated that the proposed method exhibited good stability and reliability over medium-term to long-term time scales.

4. Discussion

4.1. Traffic Crash Data

This study extracted 1059 valid crash data points from more than 600,000 traffic accident records published on the UK government’s official website, covering the period from January 2018 to June 2023 in the City of London. By utilizing the average nearest neighbor analysis method, the study revealed the overall clustering trend in crash locations, laying the foundation for crash hazard hotspot analysis. The analysis conducted in this study primarily focused on static analysis and is based on historical data. However, it is important to note that traffic crash risks exhibit certain spatiotemporal dynamics influenced by factors such as traffic volume and road conditions. Future research can incorporate spatiotemporal correlation analysis methods to dynamically analyze the evolution patterns of crash hazard hotspots, thus providing more accurate support for the development of traffic management measures.

4.2. Crash Hazard Level

To investigate the effect of crash risks on hotspots in urban traffic, this study quantified the crash risk as a hazard level. Considering that both the UK and Belgium are European countries with similar trends in annual changes in GDP in recent years, and they have comparable indicators of public transportation accessibility and open green space (Belgium: 95.3; London: 94.8) [44], the Belgian government’s severity-weighted system for assessing crash severity was adopted to quantitatively calculate the hazard level of crashes. This approach provided a crash risk measure for hotspot analysis and improved the accuracy of identifying crash hazard hotspots. It ensured that the identified hotspots aligned more closely with the actual road safety conditions, thus providing valuable insights for traffic management. Due to limitations in available data, this study’s calculation of hazard levels only considered the effect of the number of people involved and the severity of injuries on crash risks. In future research, it would be worthwhile to consider factors such as economic loss resulting from crashes in order to propose a more comprehensive severity-weighted system for assessing crash severity.

4.3. Kernel Density Bandwidth

Currently, the selection of planar KDE bandwidth primarily relies on ArcGIS automatic extraction or empirical values, with a minority of scholars employing multi-distance spatial cluster analysis (Ripley’s K-function) and ISA analysis for calculations. This paper compared the planar KDE results calculated using the above three methods, highlighting the effect of bandwidth on hotspot identification. By employing the PAI for bandwidth optimization, an optimal bandwidth value of 134 m was determined, ensuring the continuity of hotspots while identifying most high-risk areas. The results indicated that the integrated use of various spatial statistical methods could more scientifically and objectively determine the bandwidth, enhancing the reliability and robustness of the outcomes. Future research may adopt more scientifically sound and rational computational methods for selecting planar KDE bandwidths.

4.4. Hotspot Identification

This paper presented a method capable of accurately identifying crash hazard hotspots for traffic accidents and quantitatively ranking the risk levels of these incidents. The hotspot identification results from the validation data confirmed that crash hazard hotspot ID1 in the City of London remained a high-risk area, while the risk levels of other crash hazard hotspots tended to stabilize. Influenced by factors in the traffic environment, the locations of some hotspots evolved over time. However, overall, the distribution of crash hazard hotspots in the City of London from the validation data was fundamentally consistent with that of the test data. The identification results of crash hazard hotspots from both the test data and validation data indicate that solely considering the number of crashes in hotspots cannot reflect their level of hazard. Considering both the hazard level of crashes and the frequency of traffic accidents could enhance the accuracy of hotspot identification, and good robustness and reliability can be demonstrated over medium-term to long-term time scales.

This study analyzed the spatial distribution of traffic accidents based on planar KDE. The spatial distribution of traffic accidents was strongly influenced by the road network. However, planar KDE, which carries out calculations based on Euclidean distances, may have the problem of overestimating the clustering patterns of features. In the future, it will be necessary to strengthen theoretical research methods, such as conducting precise identifications of crash hazard hotspots with respect to the road traffic network.

5. Conclusions

This study examined the method and application of identifying urban traffic crash hazard hotspots using 1059 valid traffic accident cases that occurred in the City of London from January 2018 to June 2023, also investigated the effect of KDE bandwidth optimization on the assessment of crash hazard levels. The main conclusions were as follows:

(1): The NNI of the test data within the study area was 0.448, which was less than 1. The corresponding Z-score was −27.007, and the p-value was far less than 0.01. This indicated that the spatial distribution pattern of crash points exhibited a significant clustering characteristic rather than a dispersion characteristic.
(2): The bandwidth values obtained from the test data using ArcGIS automatic extraction, multi-distance spatial cluster analysis (Ripley’s K-function), and ISA analysis were 52.8 m, 189.428 m, and 134 m, respectively. The corresponding PAI values were 3.459, 4.082, and 4.381, respectively. Among these, the ISA analysis yielded the highest PAI value, indicating that the determined bandwidth of 134 m was the optimal value. With this bandwidth, the planar KDE method accurately predicted and identified the traffic crash hazard hotspots in the City of London. The predicted hotspots exhibited a balanced distribution and reflected both the local concentration and overall continuity of hotspots. This suggested that the chosen bandwidth effectively captured the spatial patterns of crash risks, providing a reliable measure for sustainable road safety assessment and management in the study area.
(3): To quantitatively rank the identified crash hazard hotspots, several factors can be considered, including the number of crashes within each hotspot, the hazard level of the hotspots, and the kernel density estimates. These quantitative measures allow for a comparison of the crash risk levels among different hotspots, enabling the identification of areas requiring prioritization and providing targeted evidence for developing sustainable traffic safety improvement measures within them.
(4): The utilization of the optimal bandwidth value of 134 m on the validation data demonstrated the ability to identify most high-risk areas. The distribution of crash hazard hotspots was largely consistent with the results obtained from the test data. Moreover, the method can identify potential crash hazard hotspots caused by environmental traffic factors. These findings indicated that the proposed method exhibits good reliability, accuracy, and robustness at a medium-term to long-term scale.
(5): The identification of crash hazard hotspots can be integrated into intelligent transportation management to establish a comprehensive traffic safety management model that achieves the sustainability of transportation systems across the “prevention, control, and evaluation” stages, aimed at continuously enhancing the safety and environmental adaptability of urban road traffic systems.

Author Contributions

Conceptualization, M.Z. and X.X.; data curation, X.X. and Y.J.; funding acquisition, M.Z.; formal analysis, Q.S. and X.G.; methodology, X.G. and L.Z.; writing—original draft preparation, M.Z. and X.X.; writing—review and editing, M.Z. and F.J.; supervision, F.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Special Scientific Research Project of School of the Emergency Management, Jiangsu University (KY-C-08, KY-D-10).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tasic, I.; Elvik, R.; Brewer, S. Exploring the safety in numbers effect for vulnerable road users on a macroscopic scale. Accid. Anal. Prev. 2017, 109, 36–46. [Google Scholar] [CrossRef] [PubMed]
World Health Organization (WHO). Global Status Report on Road Safety 2023; World Health Organization: Geneva, Switzerland, 2023; Available online: https://www.who.int/publications/i/item/9789240086456 (accessed on 13 December 2023).
Ziakopoulos, A.; YaNNIs, G. A review of spatial approaches in road safety. Accid. Anal. Prev. 2020, 135, 105323. [Google Scholar] [CrossRef] [PubMed]
Harirforoush, H.; Bellalite, L. A new integrated GIS-based analysis to detect hotspots: A case study of the city of Sherbrooke. Accid. Anal. Prev. 2019, 130, 62–74. [Google Scholar] [CrossRef] [PubMed]
Chang, H.; Xu, C.K.; Tang, T. Investigating the temporal dynamics of motor vehicle collision density patterns in urban road networks—A case study of New York. J. Saf. Res. 2024, 89, 116–134. [Google Scholar] [CrossRef] [PubMed]
Kang, Y.; Cho, N.; Son, S. Spatiotemporal characteristics of elderly population’s traffic accidents in Seoul using space-time cube and space-time kernel density estimation. PLoS ONE 2018, 13, e0196845. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Mousavi, S.M.; Dadashova, B.; Lord, D.; Wolshon, B. Toward a crowdsourcing solution to identify high-risk highway segments through mining driving jerks. Accid. Anal. Prev. 2021, 155, 106101. [Google Scholar] [CrossRef] [PubMed]
Xiao, G.; Xiao, Y.; Shu, Y.; Ni, A.; Jiang, Z. Technical and economic analysis of battery electric buses with different charging rates. Transp. Res. Part D Transp. Environ. 2024, 132, 104254. [Google Scholar] [CrossRef]
Reza, S.; Oliveira, H.S.; Machado, J.J.M.; Tavares, J.M.R.S. Urban Safety: An Image-Processing and Deep-Learning-Based Intelligent Traffic Management and Control System. Sensors 2021, 21, 7705. [Google Scholar] [CrossRef] [PubMed]
Ge, H.; Dong, L.; Huang, M.; Zang, W.; Zhou, L. Adaptive kernel density estimation for traffic accidents based on improved bandwidth research on black spot identification model. Electronics 2022, 11, 3604. [Google Scholar] [CrossRef]
Aziz, S.; Ram, S. A Meta-analysis of the methodologies practiced worldwide for the identification of Road Accident Black Spots. Transp. Res. Procedia 2022, 62, 790–797. [Google Scholar] [CrossRef]
Geurts, K.; Wets, G.; Brijs, T.; Vanhoof, K. Identification and Ranking of Black Spots: Sensitivity Analysis. Transp. Res. Rec. 2004, 1897, 34–42. [Google Scholar] [CrossRef]
Le, K.G.; Liu, P.; Lin, L.-T. Determining the road traffic accident hotspots using GIS-based temporal-spatial statistical analytic techniques in Hanoi, Vietnam. Geo-Spat. Inf. Sci. 2020, 23, 153–164. [Google Scholar] [CrossRef]
Choudhary, J.; Ohri, A.; Kumar, B. Spatial and statistical analysis of road accidents hot spots using GIS. In Proceedings of the 3rd Conference of Transportation Research Group of India, Kolkata, India, 17−20 December 2015. [Google Scholar]
Prasannakumar, V.; Vijith, H.; Charutha, R.; Geetha, N. Spatio-Temporal Clustering of Road Accidents: GIS Based Analysis and Assessment. Procedia Soc. Behav. Sci. 2011, 21, 317–325. [Google Scholar] [CrossRef]
Kazmi, S.S.A.; Ahmed, M.; Mumtaz, R.; Anwar, Z. Spatiotemporal Clustering and Analysis of Road Accident Hotspots by Exploiting GIS Technology and Kernel Density Estimation. Comput. J. 2022, 65, 155–176. [Google Scholar] [CrossRef]
Steenberghen, T.; Dufays, T.; Thomas, I.; Flahaut, B. Intra-urban location and clustering of road accidents using GIS: A Belgian example. Int. J. Geogr. Inf. Sci. 2004, 18, 169–181. [Google Scholar] [CrossRef]
Shafabakhsh, G.; Famili, A.; Bahadori, M.S. GIS-based spatial analysis of urban traffic accidents: Case study in Mashhad, Iran. J. Traffic Transp. Eng. (Engl. Ed.) 2017, 4, 290–299. [Google Scholar] [CrossRef]
Ouni, F.; Belloumi, M. Pattern of road traffic crash hot zones versus probable hot zones in Tunisia: A geospatial analysis. Accid. Anal. Prev. 2019, 128, 185–196. [Google Scholar] [CrossRef] [PubMed]
Gedamu, W.T.; Plank-Wiedenbeck, U.; Wodajo, B.T. A spatial autocorrelation analysis of road traffic crash by severity using Moran’s I spatial statistics: A comparative study of Addis Ababa and Berlin cities. Accid. Anal. Prev. 2024, 200, 107535. [Google Scholar] [CrossRef]
Ye, Q.; Li, Y.; Shen, W.; Xuan, Z. Division and Analysis of Accident-Prone Areas near Highway Ramps Based on Spatial Autocorrelation. Sustainability 2023, 15, 7942. [Google Scholar] [CrossRef]
Hazaymeh, K.; Almagbile, A.; Alomari, A.H. Spatiotemporal Analysis of Traffic Accidents Hotspots Based on Geospatial Techniques. ISPRS Int. J. Geo-Inf. 2022, 11, 260. [Google Scholar] [CrossRef]
Xie, Z.; Yan, J. Kernel Density Estimation of traffic accidents in a network space. Computers, Environment and Urban Systems 2008, 32, 396–406. [Google Scholar] [CrossRef]
Okabe, A.; Satoh, T.; Sugihara, K. A kernel density estimation method for networks, its computational method and a GIS-based tool. Int. J. Geogr. Inf. Sci. 2009, 23, 7–32. [Google Scholar] [CrossRef]
Al-Aamri, A.K.; Hornby, G.; Zhang, L.C.; Al-Maniri, A.A.; Padmadas, S.S. Mapping road traffic crash hotspots using GIS-based methods: A case study of Muscat Governorate in the Sultanate of Oman. Spat. Stat. 2020, 42, 100458. [Google Scholar] [CrossRef]
Loo, B.; Anderson, T. Spatial Analysis Methods of Road Traffic Collisions; CRC Press: Boca Raton, FL, USA, 2015; pp. 1–314. [Google Scholar] [CrossRef]
Sun, X.; Hu, H.; Ma, S.; Lin, K.; Wang, J.; Lu, H. Study on the Impact of Road Traffic Accident Duration Based on Statistical Analysis and Spatial Distribution Characteristics: An Empirical Analysis of Houston. Sustainability 2022, 14, 14982. [Google Scholar] [CrossRef]
Özcan, M.; Kucukonder, M. Investigation of Spatiotemporal Changes in the Incidence of Traffic Accidents in Kahramanmaraş, Turkey, Using GIS-Based Density Analysis. J. Indian Soc. Remote Sens. 2020, 48, 1045–1056. [Google Scholar] [CrossRef]
Charpentier, A.; Gallic, E. Kernel density estimation based on Ripley’s correction. Geoinformatica 2015, 20, 95–116. [Google Scholar] [CrossRef]
Le, K.G.; Liu, P.; Lin, L.T. Traffic accident hotspot identification by integrating kernel density estimation and spatial autocorrelation analysis: A case study. Int. J. Crashworthiness 2022, 27, 543–553. [Google Scholar] [CrossRef]
Alam, M.S.; Tabassum, N.J. Spatial pattern identification and crash severity analysis of road traffic crash hot spots in Ohio. Heliyon 2023, 9, e16303. [Google Scholar] [CrossRef] [PubMed]
Habib, M.F.; Bridgelall, R.; Motuba, D.; Rahman, B. Exploring the Robustness of Alternative Cluster Detection and the Threshold Distance Method for Crash Hot Spot Analysis: A Study on Vulnerable Road Users. Safety 2023, 9, 57. [Google Scholar] [CrossRef]
Loo, B.P.Y.; Yao, S.; Wu, J. Spatial point analysis of road crashes in Shanghai: A GIS-based network kernel density method. In Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–6. [Google Scholar] [CrossRef]
Chainey, S.; Tompson, L.; Uhlig, S. The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime. Secur. J. 2008, 21, 4–28. [Google Scholar] [CrossRef]
Van Patten, I.T.; McKeldin-Coner, J.; Cox, D. A microspatial analysis of robbery: Prospective hot spotting in a small city. Crime Mapp. A J. Res. Pract. 2009, 1, 7–32. [Google Scholar]
Hart, T.C.; Zandbergen, P.A. Effects of Data Quality on Predictive Hotspot Mapping; National Justice Research Service: Washington, DC, USA, 2012. [Google Scholar]
Thakali, L.; Kwon, T.J.; Fu, L. Identification of crash hotspots using kernel density estimation and kriging methods: A comparison. J. Mod. Transp. 2015, 23, 93–106. [Google Scholar] [CrossRef]
Greater London Authority. The London Plan 2021. 2021. Available online: https://www.london.gov.uk/programmes-strategies/plaNNIng/london-plan/new-london-plan/london-plan-2021 (accessed on 1 March 2021).
Hashimoto, S.; Yoshiki, S.; Saeki, R.; Mimura, Y.; Ando, R.; Nanba, S. Development and application of traffic accident density estimation models using kernel density estimation. J. Traffic Transp. Eng. (Engl. Ed.) 2016, 3, 262–270. [Google Scholar] [CrossRef]
Yu, H.; Liu, P.; Chen, J.; Wang, H. Comparative analysis of the spatial analysis methods for hotspot identification. Accid. Anal. Prev. 2014, 66, 80–88. [Google Scholar] [CrossRef] [PubMed]
O’Sullivan, D.; Wong, D.W.S. A Surface-Based Approach to Measuring Spatial Segregation. Geogr. Anal. 2007, 39, 147–168. [Google Scholar] [CrossRef]
Hohl, A.; Zheng, M.; Tang, W.; Delmelle, E.; Casas, I. Spatiotemporal Point Pattern Analysis Using Ripley’s K Function; CRC Press: Boca Raton, FL, USA, 2017; pp. 155–176. [Google Scholar] [CrossRef]
Bíl, M.; Andrášik, R.; Sedoník, J. A detailed spatiotemporal analysis of traffic crash hotspots. Appl. Geogr. 2019, 107, 82–90. [Google Scholar] [CrossRef]
U.N.H.S.P. World Cities Report 2022; United Nations Human Settlements Programme (UN-Habitat): Nairobi, Kenya, 2022; Available online: www.unhabitat.org (accessed on 23 August 2022).

Figure 1. Spatial distribution of traffic accidents in the City of London.

Figure 2. Total number of crashes per year (January 2018 to June 2023).

Figure 3. Flowchart of the research methodology.

Figure 4. Significance results of the ANN analysis.

Figure 5. Results of Ripley’s K-function analysis.

Figure 6. Moran’s I and Z-score results of the ISA analysis.

Figure 7. Planar KDE results under different bandwidth values. (a) ArcGIS automatic extraction (cell size: 6.28 m; bandwidth: 52.800 m), (b) Ripley’s K-function (cell size: 1 m; bandwidth: 189.428 m), and (c) ISA analysis (cell size: 1 m; bandwidth: 134.000 m).

Figure 8. Distribution of traffic crash hazard hotspots (test data).

Figure 9. Distribution of traffic crash hazard hotspots (validation data).

Table 1. ExpectedK, ObservedK, DiffK, LwConfEnv, and HiConfEnv values in Ripley’s K-function analysis.

OBJECTID	ExpectedK	ObservedK	DiffK	LwConfEnv	HiConfEnv
1	127	183.105733	56.105733	120.158208	126.275048
2	130	186.040716	56.040716	123.182899	129.179362
3	133	189.428114	56.428114	125.971586	131.997496
4	136	192.32785	56.327850	128.447979	134.406691
5	139	195.395571	56.395571	131.371488	137.289097
6	142	197.910726	55.910726	134.187466	140.385117
7	145	200.409007	55.409007	136.924051	143.414315
8	148	202.949072	54.949072	139.733479	146.058682
9	151	205.701196	54.701196	142.755882	149.15032
10	154	208.275673	54.275673	145.331213	152.159813

Table 2. Moran’s I, expected I, variance, Z-score, and p-value in ISA analysis.

OID	Distance	Moran’s I	Expected I	Variance	Z-Score	p Value
0	110	0.031638	−0.001560	0.000492	1.497267	0.134324
1	116	0.030261	−0.001553	0.000462	1.480205	0.138818
2	122	0.033616	−0.001550	0.000431	1.693919	0.090281
3	128	0.035998	−0.001548	0.000402	1.872998	0.061069
4	134	0.040079	−0.001546	0.000378	2.140632	0.032304
5	140	0.032162	−0.001541	0.000355	1.789074	0.073603
6	146	0.031515	−0.001538	0.000318	1.852972	0.063886
7	152	0.028327	−0.001536	0.000297	1.732003	0.083273

First peak and largest peak (distance; value): 134; 2.140632.

Table 3. Comparison of PAI values for planar KDE results under three bandwidths.

Method (Bandwidths)	Total Crashes	Crashes	Percentage of Crashes (%)	Total Area (km²)	Area (km²)	Percentage of Area (%)	PAI
ArcGIS automatic Extraction-52.8 m	655	76	11.603	2.892	0.097	3.354	3.459
Ripley’s K-function-189.428 m	655	294	44.885	2.892	0.318	10.996	4.082
ISA-134 m	655	229	34.962	2.892	0.231	7.981	4.381

Table 4. Area, number of crashes, hazard level, and unit kernel density estimates of traffic crash hazard hotspots for test data.

No.	Crash Hazard Hotspot ID	Area m²	Number of Crashes in Hotspots	Hotspot Hazard Levels	Unit Kernel Density Estimates
1	1	59,964	65	115	0.046838
2	3	47,665	51	103	0.038729
3	8	53,764	51	98	0.037935
4	5	18,730	16	26	0.030833
5	6	12,182	14	23	0.030304
6	7	16,167	13	36	0.031051
7	2	8433	12	17	0.028719
8	9	8047	5	22	0.030276
9	4	5579	2	2	0.027396

Table 5. Area, number of crashes, hazard level, and unit kernel density estimates of traffic crash hazard hotspots for validation data.

No.	Crash Hazard Hotspot ID	Area m²	Number of Crashes in Hotspots	Hotspot Hazard Levels	Unit Kernel Density Estimates
1	1	39,466	48	90	0.024252
2	3	66,756	31	78	0.023539
3	8	53,029	31	78	0.023464
4	5	23,479	14	19	0.015603
5	7	27,996	10	21	0.015993
6	2	39,300	9	20	0.015723
7	4 *	28,397	8	23	0.018698
8	9	11,802	7	15	0.015225

* Represents a new crash hazard hotspot that emerged between January 2021 and June 2023.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, M.; Xie, X.; Jiang, Y.; Shen, Q.; Geng, X.; Zhao, L.; Jia, F. Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London. Sustainability 2024, 16, 6969. https://doi.org/10.3390/su16166969

AMA Style

Zheng M, Xie X, Jiang Y, Shen Q, Geng X, Zhao L, Jia F. Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London. Sustainability. 2024; 16(16):6969. https://doi.org/10.3390/su16166969

Chicago/Turabian Style

Zheng, Minxue, Xintong Xie, Yutao Jiang, Qiu Shen, Xiaolei Geng, Luyao Zhao, and Feng Jia. 2024. "Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London" Sustainability 16, no. 16: 6969. https://doi.org/10.3390/su16166969

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Kernel Density Estimation Bandwidth for Road Traffic Accident Hazard Identification: A Case Study of the City of London

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data

2.2. Methodology

2.2.1. Overview of the Method

2.2.2. Crash Hazard Level

2.2.3. Average Nearest Neighbor (ANN) Analysis

2.2.4. Kernel Density Estimation (KDE)

2.2.5. Multi-Distance Spatial Cluster Analysis (Ripley’s K-Function)

2.2.6. Incremental Spatial Autocorrelation (ISA) Analysis

2.2.7. Predictive Accuracy Index (PAI)

3. Results

3.1. ANN Analysis Results

3.2. Multi-Distance Spatial Cluster Analysis (Ripley’s K-Function)

3.3. Incremental Spatial Autocorrelation Analysis

3.4. ArcGIS Automated Extraction

3.5. Bandwidth Optimization

3.6. Crash Hazard Hotspot Identification and Analysis (Test Data)

3.7. Crash Hazard Hotspot Identification (Validation Data)

4. Discussion

4.1. Traffic Crash Data

4.2. Crash Hazard Level

4.3. Kernel Density Bandwidth

4.4. Hotspot Identification

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI