Abstract
Multi-agent simulation (MAS) has attracted significant attention for the prevention of pedestrian accidents and the spread of infectious diseases caused by overcrowding in recent years. In the MAS paradigm, each pedestrian is represented by a single agent. Control parameters for each agent need to be calibrated based on pedestrian traffic data to reproduce phenomena of interest accurately. Furthermore, observing all pedestrian traffic at large-scale events such as festivals and sports games is difficult. In such cases, parameter optimization is essential so that the appropriate parameters can be determined by solving an error minimization problem between the simulation results and incomplete observed pedestrian traffic data. We propose a benchmark problem, namely MAS-Bench, to discuss the performance of MAS parameter calibration methods uniformly. Numerical experiments demonstrate the baseline performance of four well-known optimization methods on six different error minimization problems that are defined on MAS-Bench. Moreover, we investigate the validity of the error function in the calibration by evaluating the correlation between the calibration and estimation scores. These scores are error functions relating to the available and unavailable observations, respectively.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Multi-agent simulation (MAS), which is a method for assessing the safety of traffic such as pedestrian flows, is an active research field at present. MAS offers the advantages of the safety of subjects and rapid evaluation of various actual situations compared to experiments based on actual pedestrian flows. MAS can be used to determine traffic bottlenecks, optimize the efficiency of signage to control pedestrian flows, and design building layouts to facilitate smooth exits in emergencies [1,2,3,4,5]. In recent years, with the increase in computational resources, research interest in the MAS field has shifted to large-scale pedestrian flows such as festivals and sports games. The parameters of the traffic volume and the pedestrian behavior model need to be calibrated properly in the MAS framework to reproduce large-scale events. The traffic volume refers to the number of people who pass through a given point at every time span. A behavior model is a mathematical expression that describes pedestrian behavior, such as the social force model [6] and velocity obstacle [7]. For example, the departure time and arrival-departure point of pedestrians in the MAS framework can be viewed as parameters relating to the traffic volume, whereas the formula type is a parameter relating to the behavior model.
The values of these parameters are generally determined by observations of the actual pedestrian flow. However, the parameter values sometimes cannot be determined directly owing to incomplete pedestrian flow data, especially in large-scale events. Unobserved pedestrian flow occurs during large-scale events because of the cost of measurement equipment and privacy concerns. For example, video cameras can only monitor traffic in a limited area, and a global positioning system (GPS) can only track a limited number of pedestrians. Therefore, the pedestrian flow must be estimated from the available observations.
One of the most straightforward means of estimating the entire pedestrian flow from limited observations is calibrating MAS parameters that can reproduce the observations appropriately. Optimization methods have been reported to avoid the complexity of manual calibration and automatically determine appropriate parameter settings. In [8, 9], a behavioral model was calibrated until the simulation results matched the pedestrians’ movements extracted from the video camera. An error minimization problem between the simulation results and observed pedestrian traffic needs to be solved to estimate the entire pedestrian flow for traffic volume parameters [10, 11].
Evolutionary algorithms are widespread in parameter calibration owing to their applicability to multiple parameter types. As the accuracy of a simulation is largely determined by the calibrated parameters, the effectiveness of MAS is inherently dependent on the performance of the applied parameter optimization algorithm. Despite this great demand for selecting appropriate parameter calibration methodologies, there has been little discussion on their characteristics in the MAS field. In this study, we propose a benchmark problem for calibrating MAS parameters, known as MAS-Bench. In MAS-Bench, the error function formulated based on large-scale events is minimized to estimate the actual traffic. MAS-Bench includes two traffic situations, namely the pedestrian flow returning from a fireworks event and a sports stadium, to evaluate parameter calibration methods in different situations. As the departure and arrival points are already determined as the event venue and nearest station, respectively, in the two cases, the departure time and behavior model are calibrated as MAS parameters in MAS-Bench. Specifically, these parameters are calibrated by the minimization of the errors between the actual observations and simulation results.
Numerical experiments confirm the baseline performance of several representative optimization methods and the validity of the error function in MAS-Bench. As mentioned previously, observing all pedestrian flows at large-scale events is difficult. Unobserved pedestrian flows are estimated by minimizing the error function using the available observations. However, determining how well the pedestrian flows can be estimated by minimizing the error function is challenging. That is, assessing the accuracy of these estimates is difficult owing to the unavailability of actual pedestrian flow data. To address this issue, we synthetically generate pedestrian flow data from specific parameter values. The synthetic pedestrian flows are employed to examine the fidelity of the error function. Specifically, we investigate the similarity between the calibrated parameter values and original equivalents. We also analyze the validity of the error function in the calibration by evaluating the correlation between the calibration and estimation scores. These scores are error functions of the available and unavailable observation, respectively.
The contributions of this study are as follows:
-
(1)
MAS-Bench has been released as open source so that many users can work on parameter calibration problems in the MAS area with ease (Sections “MAS-Bench” and “Benchmark problems”).
-
(2)
A baseline of the performance of four well-known optimization methods is established by evaluating these algorithms on six different benchmark problems consisting of two different traffic situations (Section “Comparison of parameter calibration methods”).
-
(3)
We demonstrate that reducing the error between the available observations and simulation helps to estimate entire pedestrian flows by comparing the rank correlation coefficient of the calibration and estimation scores (Section “Estimation performance of entire pedestrian flow”).
MAS-Bench will be publicly available on GitHub.Footnote 1
Related work
A crowd simulation is often used to plan and manage an event or evacuation to mitigate traffic and crowd congestion [12]. As typical examples of these crowd simulations, a macro model focuses on the entire people flow, and a micro model focuses on individual behavior [13]. Multi-agent simulation (MAS), a representative example of micro models, enables more precise calculations than macro models. However, MAS has been used only in spaces of several hundred people owing to issues of computational resources. With the increased computational resources, research interest in MAS has shifted to large-scale pedestrian flows such as festivals and sports games. However, improving the accuracy of MAS is still a challenge. This section introduces the existing research on the calibration of MAS parameters and highlights the challenges of large-scale MAS research.
Numerous attempts have been made to automatically calibrate MAS parameters based on pedestrian flow data. Many parameter calibration studies have focused on aligning the pedestrian trajectory that is captured in camera images with the simulation results. For example, Johansson et al. [14] used an evolutionary algorithm to calibrate the parameters of the commonly used crowd model in MAS, namely the social force model. Zhong et al. [8] proposed a differential evolution-based optimization method to search for promising MAS parameter settings. Their optimization method exhibited high performance in the situation of pedestrians crossing a street. Wolinski et al. [15] proposed a benchmark for evaluating optimization methods in the context of hallway situations. Evolutionary computation algorithms achieved satisfactory performance in the parameter calibration task using this benchmark.
The above studies assumed small-scale events and the availability of comprehensive pedestrian information. For example, the benchmark of Wolinski et al. [15] involved a maximum of 150 pedestrians. In recent years, researchers have attempted to reproduce large-scale events, driven by the reduced cost of measurement equipment and the increasing magnitude of such events. Kiyotake et al. [10] used Bayesian optimization to calibrate the origin–destination traffic volume for an event that almost 2000 people joined. Makinoshima et al. [11] calibrated MAS parameters for 5000 evacuees. However, when observing public spaces, as opposed to confined spaces such as individual rooms or small event halls, ensuring a comprehensive trajectory of a pedestrian group is challenging owing to privacy and security concerns.
The methods employed for acquiring human flow data can be categorized into two approaches: direct measuring pedestrians using video cameras or sensors, and using GPS or smartphones to gather personal location information [16]. Video cameras and sensors enable the measurement of all pedestrians who pass through the designated area. However, as the measurement cost increases with the observation area, spatial gaps may arise in the flow data for large-scale events. Thus, a large portion of the pedestrian path cannot obtain pedestrian movement information. Conversely, GPS and smartphones offer the ability to capture the complete pedestrian trajectory. Nonetheless, owing to the challenges that are associated with acquiring personal information from a substantial number of pedestrians, measuring the location information for a large cohort of individuals during a large-scale event remains infeasible.
Estimating large-scale pedestrian information is also computationally expensive, as demonstrated by the approximately 60 h of evaluation time required in the studies conducted by Kiyotake et al. [10]. Furthermore, in scenarios with unobserved data, the reliability of the estimated pedestrian information is an issue. Although previous studies have evaluated the quality of pedestrian information based on the reproducibility error of the simulation, few studies have verified the validity of the error function, which the current study aims to address.
The parameter calibration in crowd simulations uses the meta-heuristic optimization method, which does not depend on a specific problem. Biological evolution algorithms such as genetic algorithm and differential evolution [8, 15], group behavior of biological organisms such as ant colony optimization and particle swarm optimization [5, 17, 18], and stochastic inference algorithms such as bayesian optimization [10] are well-known as typical parameter calibration and optimization methods. However, the performance of parameter calibration methods in crowd simulation is less common knowledge because few benchmarks are available. This study evaluates the performance of typical calibration methods by comparing the four types of optimization methods.
MAS-Bench
In MAS-Bench, the error minimization problem between the observation and simulation results is defined as follows:
where \({\varvec{\theta }}\) denotes the MAS parameters, \({\varvec{O}}\) is the observation, \(S({{\varvec{\theta }}})\) represents the simulation results, and \(\epsilon ^\textrm{obs}({\varvec{O}}, S({\varvec{\theta }}))\) is the calibration score. The calibration score \(\epsilon ^\textrm{obs}({\varvec{O}}, S({\varvec{\theta }}))\) is an error function between the observation and simulation results. The purpose of this study is to determine the best MAS parameters \({\varvec{\theta }^*}\) that minimize the calibration score \(\epsilon ^\textrm{obs}({\varvec{O}}, S({\varvec{\theta }}))\).
The MAS parameters \({\varvec{\theta }}\) and simulation results \(S({{\varvec{\theta }}})\) are explained in Fig. 1. In this study, the MAS simulates the pedestrian flow from the departure point to the arrival point. In Fig. 1, the departure and arrival points represent the origin and destination, respectively. The GPS and video cameras can observe the pedestrian trajectory from the origin to the destination and the destination traffic volume. However, the origin traffic volume is not observed, because the origin is an outdoor environment, and it is difficult to measure pedestrians accurately [16]. Therefore, the origin traffic volume needs to be calibrated by the observation \({\varvec{O}}\) and simulation results \(S({{\varvec{\theta }}})\). The MAS parameters \({\varvec{\theta }}\) calibrate the origin traffic volume. The origin traffic volume is the number of passing pedestrians at the departure point at every time span. The origin traffic volume can be represented by a histogram of the number of people with classes of departure times. In Fig. 1, the number of passing pedestrians who select Routes 1 and 2 at 20:00 are three and two, respectively.
Figure 2 presents an overview of the parameter calibration in MAS-Bench. The parameter calibration can be divided into the MAS-Bench and user parts, which are bounded by red and black dashed lines, respectively. The MAS-Bench part accepts the MAS parameters \({\varvec{\theta }}\) from the user part and returns the calibration score \(\epsilon ^\textrm{obs}({\varvec{O}}, S({\varvec{\theta }}))\) to the user part. The user part receives the calibration score \(\epsilon ^\textrm{obs}({\varvec{O}}, S({\varvec{\theta }}))\) from the MAS-Bench part and suggests the subsequent MAS parameters \({\varvec{\theta }}\) to the MAS-Bench part. Benchmark users (e.g., optimization researchers) swap the optimizer in the user part when evaluating their optimization method.
MAS-Bench consists of three components: (i) the origin generator, (ii) the crowd simulator, and (iii) the performance evaluator. These components are described as follows:
-
(i)
Origin generator: The origin generator generates the origin traffic volume, which sets the number of passing pedestrians in each behavior model. If these settings are directly specified MAS parameters, the optimization of \(N \times 2\) variables is required for N pedestrians. However, optimizing \(N \times 2\) variables in events with thousands to tens of thousands of people is very difficult. Thus, we use a Gaussian mixture model (GMM) to reduce the number of MAS parameters. The details are presented in Section “Implementation of origin generator”.
-
(ii)
Crowd simulator: The crowd simulator simulates the entire pedestrian flow. We use CrowdWalk [19] to simulate the pedestrian flow. The settings of the map and pedestrian information in CrowdWalk are explained in Section “Implementation of crowd simulator”.
-
(iii)
Performance evaluator: The performance evaluator evaluates the error between the observation and simulation results. The root mean squared error (RMSE) is used to evaluate the destination traffic flow and pedestrian trajectory, which is described in Section “Implementation of performance evaluator”.
Implementation of origin generator
The origin generator generates the agents using the GMM, which is a model that is expressed by the combination of multiple Gaussian distributions. The parameter calibration using the GMM method generates one Gaussian distribution per behavior model, because the number of people who depart per time can be represented by a probability distribution [20]. In this study, we assume that the behavior models of pedestrians are classified into a certain number of typical models. Nishida et al. [21] demonstrated that pedestrian behavior at intersections can be grouped by probabilistic models. We assume there are K typical behavior models. A single Gaussian distribution requires three parameters: the mean, variance, and proportional coefficients. \(K \times 3\) variable optimization is required to calibrate the parameters of the behavioral model K in the GMM. The number of behavior models K is substantially less than the total number of pedestrians N. Therefore, the parameter calibration using the GMM method involves fewer variables than the direct calibration of the departure times and behavior models.
In this definition, \(f(t; {\varvec{\theta }})\) represents the probability distribution using the GMM. The MAS parameters \({\varvec{\theta }}\) contain \(\theta _{k}\), in which the number of behavioral models is K, where k denotes \(k = \{ 1,\dots , K \}\). \(\theta _k\) includes the mean \(\mu _k\), variance \(\sigma _k\), and proportional coefficients \(\pi _k\). The Gaussian distribution \(f_{k}(t; \theta _k)\) of the k-th basis of \(f(t; {\varvec{\theta }})\) can be described as follows:
where \(C_{k}=1/\int _{0}^{T}f_{k}(t; \mu _{k}, \sigma _{k})dt\) denotes the normalization constant, T denotes the closing time for the simulation, and \(\sum _{k=1}^{K} \pi _{k} = 1\) is satisfied.
Subsequently, the Gaussian distribution is converted into a frequency distribution. The frequency distribution class is divided by the number of time spans from the start to end of the stimulation to yield \(M^\textrm{ogn}\). The time span in the frequency distribution is \(\Delta t^\textrm{ogn} = \Delta t \times (T / M^\textrm{ogn})\), where \(\Delta t\) is the simulation time step. The fireworks event situation is simulated from 19 : 00 to 24 : 00 and the time span in this situation is \(\Delta t^\textrm{ogn} = \Delta t \times (18,000~\textrm{s} / 60~\textrm{min})\). The frequency distribution formula is as follows:
where the time \(t_m\) is denoted by \(m \Delta t^\textrm{ogn}\), in which \(N_{k}^\textrm{ogn}(t_m)\) is the number of people in the k-th behavioral model at time \(t_m\). The people who depart from the origin are calibrated to the number of people N until simulation time T. Therefore, the number of people is corrected using the calibration parameter \(\alpha\) so that \(\sum _{m = 1}^{M^\textrm{ogn}} N^\textrm{ogn}(t_m) = N\).
Finally, the number of departures at every time span \(\Delta t^\textrm{ogn}\) is less than or equal to R. The formula is as follows:
Implementation of crowd simulator
The crowd simulator simulates the entire pedestrian flow using CrowdWalk [19]. CrowdWalk is a simulator that uses a one-dimensional network map to simplify the pedestrian position coordinates in two-dimensional space, which enables fast computation and saves memory in the simulation of large-scale pedestrian flows. This section describes the network map and calculation of the agent position setting in CrowdWalk.
The network map defines the moving path of pedestrians. In practice, the moving path is designed manually considering the conditions of the roads and interior structure of the building at the actual event to be simulated. The network map consists of links and nodes. A link is a rectilinear region with a length and width. A node is the termination or connection point of a link. The link and node settings express factors such as the path from the origin to destination, road width, and branches. That is, various road conditions can be used to express a one-dimensional network map by setting the links and nodes. In this benchmark, the network map represents the path from the main venue to the nearest station. The network map is created using OpenStreetMapFootnote 2 data. The detailed map data that are included in this benchmark are described in Section “Benchmark problems”.
The positions of pedestrians are updated according to the social force model [6]. In this model, the motions of pedestrians are guided by virtual forces that can be expressed as:
where \(f_{i}^\textrm{dest}\) is the attractive force from the destination point, \(f_{i,j}^\textrm{ped}\) is the repulsive force from other pedestrians, and \(f_{i,w}^\textrm{wall}\) is the repulsive force from static obstacles such as walls. The simulator uses \(f_{i}\) as the acceleration and updates the pedestrian positions. Each walking velocity \(v_i\) update formula is as follows.
The walking velocity \(v_i(t)\) at time \(t = 0\) is the free walking velocity \(v_i^0\). The free walking velocity \(v^0\) is different for each behavior model. Each free walking velocity \(v^0\) in this benchmark is described in Section “Benchmark problems”. The one-dimensional network map only considers forces that are parallel to the travel direction. Therefore, CrowdWalk assumes that the forces from the walls and obstacles \(f_{i,w}^\textrm{wall}\) are negligible. Readers are referred to [19] for further descriptions of the social force model in CrowdWalk.
Pedestrians flow in one direction from the start (origin) to the end (destination) in the benchmark problem. Pedestrians’ speed and route in the simulation are determined by the behavior model and traffic (or staff) control. For example, the walking velocity \(v_i(t) = 0\) due to a stop signal, and the pedestrian chooses the longest route due to a branch signal. Depending on the settings of the behavior model, the longest route may not be selected. The behavior model requires information such as the free walking velocity \(v^0\) and the preferred route between origin and destination. Traffic control requires information on the times and locations of stop and branch signals. In this benchmark problem, we created a behavior model and traffic control on on-site information. The behavior model and traffic control settings in each situation are explained in Section “Benchmark problems”.
Implementation of performance evaluator
The performance evaluator computes the error between the observation and simulation results. The observation that is used to calculate the error function is the destination traffic volume and pedestrian trajectory. The destination traffic volume refers to the number of people who pass a destination point, whereas the pedestrian trajectory is the moving distance of a pedestrian. These two values are each compared using the RMSE. The calibration score \(\epsilon ^\textrm{obs}\) represents a linear combination of two error functions and is calculated as follows:
where \(\beta\) is the weighting parameter that determines the importance of the destination traffic volume and pedestrian trajectory.
The destination traffic volume error is calculated as the error owing to the number of people at every time span. In this benchmark, we divide the number of time spans from the start to end of the simulation into \(M^\textrm{dest}\). The time span in the destination traffic volume is denoted by \(\Delta t^\textrm{dest} = \Delta t \times (T / M^\textrm{dest})\), where T is the closing time for the simulation and \(\Delta t\) is the simulation time step. The fireworks event situation is recorded by the cameras from 19 : 00 to 24 : 00 and the time span is \(\Delta t^\textrm{dest} = 30~\textrm{sec}\). The destination traffic volume error is calculated as follows:
where the time \(t_m\) is denoted by \(m \Delta t^\textrm{dest}\), D is the number of total destination points, \(N_d^\textrm{dest}(t)\) is the observed number of pedestrians at the d-th destination point at time t, and \(\hat{N}_d^\textrm{dest}(t)\) is the simulated number of pedestrians at the d-th destination point at time t.
The pedestrian trajectory error is calculated as the error owing to the total distance of each pedestrian. In this benchmark, we divide the number of time spans from the start to end of the a-th pedestrian into \(T_a\). The number of time spans in a pedestrian is \(T_a = (t_{a}^\textrm{dest} - t_{a}^\textrm{ogn}) / \Delta t\), where \(\Delta t\) is the simulation time step, and \(t^\textrm{ogn}_a\) and \(t^\textrm{dest}_a\) represent the departure and arrival times, respectively. The pedestrian trajectory error is calculated as follows:
where A is the number of total tracked agents, \(P_a(t)\) is the observed total distance of the a-th pedestrian at time t, and \(\hat{P_a}(t)\) is the simulated total distance of the a-th pedestrian at time t.
The event participants’ cooperation is required to obtain a reliable pedestrian trajectory. The security of mobile devices has been strengthened to ensure the confidentiality of personal data, making it difficult to obtain accurate location information. In previous research, we hired part-time workers to acquire the pedestrian trajectory. Part-time workers are provided with GPS loggers for events to obtain accurate location information. In this study, we assumed that the pedestrian trajectory could be obtained by part-time workers.
Benchmark problems
MAS-Bench provides six benchmark problems that include two different situations, as shown in Table 1. Each situation provides three benchmark problems with different numbers of behavior models. The origin traffic volume in each behavior model K is generated by a Gaussian mixture model. The K-th proportional coefficient of each Gaussian distribution is determined by \(\pi _{K} = 1 - \sum _{k=1}^{K-1}\). Therefore, The number of dimensions is \(K \times 3 - 1\). The design of the map and behavior model in each situation are based on actual events. The range of the MAS parameters and values of the constant parameters introduced in Section “MAS-Bench” are presented in Tables 2 and 3, respectively.
To address this benchmark problem, we created a pedestrian behavior model with on-site measurement information such as cameras, GPS, and laser radar. Because more observation data is required to develop a detailed behavior model, simple behavior models have been created. On the other hand, a behavior model in the target environment is significantly restricted by crowd management such as traffic lights and barricades. Therefore, we determined that a simple behavior model could represent the entire crowd flow. The two situations are described in the following.
Situation 1: fireworks event
At the fireworks event, pedestrians flow in one direction, from the event venue to the nearest station. The walking path and state of the simulation are depicted in Fig. 3a. The pedestrian departure time is from 19:00 to 24:00. Therefore, the simulation closing time T is 18,000 sec. According to the camera measurement from 2012 to 2022, the number of people who passed through the nearest station ranged from 30,000 to 50,000. Thus, based on the number of people passing through the nearest station in past events, We decided the number of people N = 45,000. However, the simulation of 45,000 people requires a huge computational cost and the parameter calibration is challenging. Based on the investigation by Kato et al.Footnote 3 the 1/10 scale simulation result and original simulation result are almost the same solution. Therefore, we design the route width and number of people at the fireworks event on a 1/10 scale. The walking paths for the fireworks event constitute three routes. The GPS holders depart from three walking routes every 10 min. Therefore, the total number of tracking agents who depart within the simulation time is 90.
Figure 4 shows three routes of the fireworks event from the origin to destination, which are 327 m (Route 1), 577 m (Route 2), and 755 m (Route 3). Security staff are stationed at certain points along the routes to lead pedestrians to prevent traffic congestion. Branch controls are performed at two locations: between Routes 1 and 2, and between Routes 2 and 3. The stop controls are set at seven locations: three along Route 1, two along Route 2, and two along Route 3. The pedestrians follow the guides and move forward along each route.
The pedestrians who participate in the fireworks event are divided into three behavior models, as follows.
-
Guided model: In this model, branch controls and stop controls are followed, and it can occur on all routes (Routes 1, 2, and 3). The free walking velocity based on the Guided model is the default value \(v^0 = 1.02~\mathrm{m/s}\) of CrowdWalk.
-
Busy model: This model aims to reach the destination point early. Only stop controls are followed and the shortest route is walked (Route 1). The free walking velocity based on the Busy model is \(v^0 = 1.12~\mathrm{m/s}\), which is 10% faster than the default value of CrowdWalk.
-
Slow model: In this model, fireworks are enjoyed slowly at stalls. Branch controls and stop controls are followed but the shortest route is not walked. The free walking velocity in the Slow model is \(v^0 = 0.51~\mathrm{m/s}\) until reaching the second branch control point (between first and second branch control point) and \(v^0 = 1.02~\mathrm{m/s}\) after passing the branch control point.
Situation 2: sports stadium
At the sports stadium, pedestrians flow in one direction from the stadium to the nearest station. The walking path and state of the simulation are depicted in Fig. 3b. The pedestrian departure time is from 19:00 to 20:00. Therefore, the simulation closing time T is 3600 s.
According to J.League attendance records,Footnote 4 the number of attendees in 2022 was between 14,000 and 23,000. Based on the number of attendance data in past events, we decided the number of people N = 17,600. We design the route width and number of people in the sports stadium on a 1/10 scale to reduce the computational cost. The walking paths for the sports stadium constitute two routes. The GPS holders depart from three walking routes every several minutes. In this study, the GPS holders depart from two walking routes every 10 min. Therefore, the total number of tracking agents who depart within the simulation time is 12.
Figure 5 depicts the division of the pedestrians into two routes from the origin to the destination in the sports stadium: 400 m (Route A) and 1,740 m (Route B). Three traffic signals (one along Route A and two along Route B) are set to control the pedestrians along the routes to prevent traffic congestion. The pedestrians follow the traffic signals as they move along the two routes.
The pedestrians in the sports stadium are divided into two behavior models:
-
Route A model: The shortest route (Route A) from the main venue to the station is selected.
-
Route B model: The longest route (Route B) from the main venue to the station is selected.
Experiments
We evaluated the advantages and disadvantages of the parameter calibration methods using MAS-Bench. Moreover, we verified whether the parameter calibration could estimate the entire pedestrian flow.
Comparison of parameter calibration methods
First, we evaluated the performance of the calibration score \(\epsilon ^\textrm{obs}\), which is the error between the observation and simulation results, using four optimization methods. Higher similarity is exhibited as the calibration score \(\epsilon ^\textrm{obs}\) approaches 0. The experimental settings are described in Section “Comparison of parameter calibration methods” and the evaluations of the calibration score for all benchmark problems are presented in Section “Experimental results”.
Experimental settings
We explain the baseline optimization methods, the number of evaluations, and the computing performance in the following.
The following optimization methods were used as baselines:
-
Particle swarm optimization (PSO): This method is based on swarm intelligence, inspired by the group behavior of biological organisms [22]. This algorithm attracts the set of MAS parameters to the best parameters. The best MAS parameters are selected by sharing information among the set of MAS parameters. The next-generation set of MAS parameters is determined by the position and velocity information of each MAS parameter. The PSO can obtain good performance in the benchmark function [23]. It is one of the methods often used in the MAS research field [5, 17, 18].
-
Covariance matrix adaptation evolution strategy (CMA-ES): This method is an evolutionary algorithm inspired by biological evolution [24]. This algorithm generates the set of MAS parameters using a multivariate normal distribution (MND). The position and size of the MND are determined by the set of high-ranking MAS parameters. The set of high-ranking MAS parameters is selected by sharing information among the set of MAS parameters. The next-generation set of MAS parameters is determined by the MND based on the set of high-ranking MAS parameters. CMA-ES is known as a state-of-the-art method in the benchmark function [25].
-
Tree-structured Parzen estimator (TPE): TPE is a Bayesian optimization method [26]. This algorithm determines the MAS parameters using the probability density function (PDF) for each parameter. The set of MAS parameters generates the PDF for each parameter. The subsequent MAS parameters are determined by the pseudo-random numbers based on the PDF. TPE is known as a state-of-the-art method in machine-learning optimization [27].
-
Random search (RS): RS generates the set of MAS parameters using a random number generator. This algorithm does not share information among the set of MAS parameters. The advantage is that the parameters within the range can be searched exhaustively. RS exhibits high performance in high-dimensional optimization [28].
Each optimization method had the same budget (number of evaluations) per trial, namely 4000. The median value of all trials was used to compare the optimization methods. We used a median value of 10 trials in the experiments.
We used an Intel CPU Corei9-10900K (10 cores, 20 threads) with 64.0 GB of memory in the computer experiments. The calculation time per trial for the fireworks event was approximately 200 min, whereas that for the sports stadium was approximately 30 min.
Experimental results
Table 4 presents the results of the four optimization methods on six benchmark problems. This table shows the median, minimum, and interquartile range (IQR) of the calibration score in each case. Smaller median and minimum values indicate that the optimization method found a good solution. A smaller IQR indicates that the solution of the optimization method was stable. Figure 6 depicts this table as boxplots. The search performance of CMA-ES and TPE was better than that of the other optimization methods on almost all benchmark problems. Furthermore, the IQR of TPE was smaller than that of the other optimization methods. Therefore, TPE was a stable and efficient search method. However, the minimum of CMA-ES was better than that of TPE on all benchmark problems. In general, CMA-ES was superior to TPE for local searches. The search performance of CMA-ES was better than that of TPE when good local optima were found in a narrow search range. PSO exhibited good performance for low-dimensional benchmark problems. In the SS-1 benchmark problem, PSO found for true MAS parameters. However, its search performance was inferior to that of the other optimization methods in benchmark problems with high dimensions.
Next, we consider the search performance of each optimization method using the history of the calibration score. Figure 7 shows the best calibration score in the past. The solid line means the median value of each optimization method. The graph descends when the next calibration score \(\epsilon _\textrm{obs}\) to be searched is lower than that in the past and remains constant when it cannot be updated. The calibration scores of PSO were hardly searched after 1000 budgets. Furthermore, those of CMA-ES and TPE were hardly searched after 2000 budgets. That is, apart from RS, the baseline optimization methods almost converged at 2000 budgets. From this, we can see that the benchmark problem tends to fall into a local solution, and the difference in the local solution is significant. In general, the global search performance in problems with many local solutions. However, the CMA-ES searched better than the TPE in the case of FS-3. As illustrated in Fig. 7, the CMA-ES and the TPE in FS-3 found a good solution for even more than 3000 budgets. Therefore, The CMA-ES might be a better search performance than the TPE because of budgetary issues.
Estimation performance of entire pedestrian flow
We investigated the validity of the error function in the calibration by evaluating the correlation between the calibration and estimation scores. The correlation coefficient is an index that evaluates the relationship between two data types with real values of [-1,1]. Section “Evaluation formula for calibration score” presents the formula for the estimation score and correlation coefficient expressions. Section “Estimation performance of entire pedestrian flow” describes the evaluation of the correlation coefficients of all benchmark problems.
Evaluation formula for calibration score
Benchmark problems used the synthetic pedestrian flow as observation data for parameter calibration. The synthetic pedestrian flow is generated by giving a tentative value to the origin traffic volume. All pedestrian behavior is determined by the origin traffic volume. Therefore, The origin traffic volume was used to calculate the estimation score. The estimation score represents the error owing to the number of people at every time span \(\Delta t^\textrm{ogn}\) and is calculated as follows:
where K is the number of behavior models, \(N_k^\textrm{ogn}(t_m)\) is the observed number of behavior model k pedestrians at time \(t_{m}\), and \(\hat{N}_k^\textrm{ogn}(t_m)\) is the simulated number of the behavior model k pedestrians at time \(t_{m}\).
The similarity of the calibration score \(\epsilon ^\textrm{obs}\) and estimation score \(\epsilon ^\textrm{unobs}\) was determined by the Spearman’s rank correlation coefficient, which is calculated as follows:
where B is the total number of simulation results and \(Dist(\epsilon _b^\textrm{unobs},\epsilon _b^\textrm{obs})\) is the rank difference between b-th the estimation score \(\epsilon ^\textrm{unobs}\) and calibration score \(\epsilon ^\textrm{obs}\).
Evaluation results
We evaluated the correlation coefficients \(\rho\) of all benchmark problems using \(B = 40,000\) simulation results that were selected by random sampling. The experimental results of the RS method in Section “Comparison of parameter calibration methods” were used for the simulation results of the random sampling. Spearman’s rank correlation coefficient was used to calculate the correlation coefficient. The rank correlation coefficient between the estimation and calibration scores shows the validity of the error function in this paper. Guilford’s rule [29] was used to interpret the correlation coefficient. The results of Guilford’s rule are presented in Table 5.
As can be observed from Table 6, the correlation coefficient \(\rho\) for all benchmark problems was greater than or equal to the moderate level. In general, two data groups can be said to have a positive correlation if they are at or above the moderate level. Thus, if the calibration score \(\epsilon ^\textrm{obs}\) decreases, the estimation score \(\epsilon ^\textrm{unobs}\) will also decrease. However, the correlation between the two data groups differed significantly depending on the number of pedestrian models K. In this benchmark problem, \(K=1\) exhibited a very strong positive correlation and \(K=2,3\) exhibited a moderate positive correlation.
Figure 8 depicts scatter plots of the ranks of the calibration and estimation scores. Scatter plots with a correlation coefficient of 0.9 or greater are plotted in a straight line. It can be observed that the rank difference between the two data groups was small. Scatter plots with a correlation coefficient of approximately 0.6 are plotted in an elliptical shape. It can be observed that the rank difference between the two data groups was slightly spread out. In this benchmark, estimating the entire pedestrian flow from the available observations became difficult as the number of pedestrian models K increased.
Finally, Fig. 9 depicts the normal distribution of the true MAS parameters \({\varvec{\theta }}^\textrm{true}\) and the best MAS parameters \({\varvec{\theta }}^*\). The true MAS parameters \({\varvec{\theta }}^\textrm{true}\) is the tentative value used to generate the synthetic pedestrian flow for the benchmark problem. The best MAS parameters \({\varvec{\theta }}^*\) are the best calibration scores in RS. The true and best normal distributions of one and two behavior models were almost the same. The result for three behavior models also indicates that the true and best normal distributions did not differ in terms of shape. The graphs show that reducing the error between the available observations and simulation helped to estimate entire pedestrian flows.
Conclusions
We have proposed MAS-Bench for testing parameter calibration methods in MAS. MAS-Bench is formulated as an optimization problem to minimize the error between the observed pedestrian flow and simulated results. MAS-Bench allows users to apply optimization methods to evaluate their performance on the same simulation model and setup fairly. MAS-Bench includes three components: an origin generator, a crowd simulator, and a performance evaluator. In the origin generator, the number of required parameters to generate pedestrian flows is significantly reduced by approximating the traffic volume using a GMM. In the crowd simulator, two different large-scale pedestrian flows are modeled as test problems. The error function for observations from video cameras and GPS is formulated in the performance evaluator. The validity of the error function and the baseline performance of commonly used optimization methods were confirmed through numerical experiments.
Future enhancements of MAS-Bench will include improving the error function and replacing the components of MAS-Bench. As indicated in Section “Estimation performance of entire pedestrian flow”, the error function is suitable for estimating relatively simple pedestrian flows, but the correlation coefficient becomes weaker as the number of parameters increases. Additional parameters can be optimized efficiently by improving the error function. The other components of MAS-Bench (i.e., the agent generator and crowd simulator) are also replaceable. Another simulator (e.g., MATSim [30] or GAMA [31]) and approximation method (e.g., the interpolation curve and piecewise linear approximation) can be used. Finally, we believe that MAS-Bench can help researchers in the MAS field and developers of optimization methods to establish more efficient parameter calibration algorithms. We plan to include other situations and pedestrian variations in the future so that a wide range of problems can be addressed.
Data availability statement
The datasets used in this study are available on https://github.com/MAS-Bench/MAS-Bench
References
Fang, Z., Li, Q., Li, Q., Han, L. D., & Wang, D. (2011). A proposed pedestrian waiting-time model for improving space-time use efficiency in stadium evacuation scenarios. Building and Environment, 46(9), 1774–1784. https://doi.org/10.1016/j.buildenv.2011.02.005
Ha, V., & Lykotrafitis, G. (2012). Agent-based modeling of a multi-room multi-floor building emergency evacuation. Physica A: Statistical Mechanics and its Applications, 391(8), 2740–2751. https://doi.org/10.1016/j.physa.2011.12.034
Khamis, N., Selamat, H., Ismail, F. S., Lutfy, O. F., Haniff, M. F., & Nordin, I. N. A. M. (2020). Optimized exit door locations for a safer emergency evacuation using crowd evacuation model and artificial bee colony optimization. Chaos, Solitons & Fractals, 131, 109505. https://doi.org/10.1016/j.chaos.2019.109505
Zhang, Z., Jia, L., & Qin, Y. (2017). Optimal number and location planning of evacuation signage in public space. Safety Science, 91, 132–147. https://doi.org/10.1016/j.ssci.2016.07.021
Dubey, R. K., Khoo, W. P., Morad, M. G., Hölscher, C., & Kapadia, M. (2020). Autosign: A multi-criteria optimization approach to computer aided design of signage layouts in complex buildings. Computers & Graphics, 88, 13–23. https://doi.org/10.1016/j.cag.2020.02.007
Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physical Review E, 51(5), 4282–4286. https://doi.org/10.1103/PhysRevE.51.4282
Fiorini, P., & Shiller, Z. (1998). Motion planning in dynamic environments using velocity obstacles. The International Journal of Robotics Research, 17(7), 760–772. https://doi.org/10.1177/027836499801700706
Zhong, J., & Cai, W. (2015). Differential evolution with sensitivity analysis and the Powell’s method for crowd model calibration. Journal of Computational Science, 9, 26–32. https://doi.org/10.1016/j.jocs.2015.04.013
Zhong, J., Hu, N., Cai, W., Lees, M., & Luo, L. (2015). Density-based evolutionary framework for crowd model calibration. Journal of Computational Science, 6, 11–22. https://doi.org/10.1016/j.jocs.2014.09.002
Kiyotake, H., Kohjima, M., Matsubayashi, T., & Toda, H. (2018). Multi agent flow estimation based on bayesian optimization with time delay and low dimensional parameter conversion. In: Proceedings of the 21st International Conference on Principles and Practice of Multi-Agent Systems pp. 53–69. https://doi.org/10.1007/978-3-030-03098-8_4
Makinoshima, F., & Oishi, Y. (2022). Crowd flow forecasting via agent-based simulations with sequential latent parameter estimation from aggregate observation. Scientific Reports, 12(1), 1–13. https://doi.org/10.1038/s41598-022-14646-4
Sidiropoulos, G., Kiourt, C., & Moussiades, L. (2020). Crowd simulation for crisis management: The outcomes of the last decade. Machine Learning with Applications, 2, 100009. https://doi.org/10.1016/j.mlwa.2020.100009
Yang, S., Li, T., Gong, X., Peng, B., & Hu, J. (2020). A review on crowd simulation and modeling. Graphical Models, 111, 101081. https://doi.org/10.1016/j.gmod.2020.101081
Johansson, A., Helbing, D., & Shukla, P. K. (2007). Specification of the social force pedestrian model by evolutionary adjustment to video tracking data. Advances in Complex Systems, 10(supp02), 271–288. https://doi.org/10.48550/arXiv.0810.4587
Wolinski, D., Guy, S. J., Olivier, A. H., Lin, M., Manocha, D., & Pettré, J. (2014). Parameter estimation and comparative evaluation of crowd simulations, 33(2), 303–312. https://doi.org/10.1111/cgf.12328
Brunetti, A., Buongiorno, D., Trotta, G. F., & Bevilacqua, V. (2018). Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing, 300, 17–33. https://doi.org/10.1016/j.neucom.2018.01.092
Cristiani, E., & Peri, D. (2017). Handling obstacles in pedestrian simulations: Models and optimization. Applied Mathematical Modelling, 45, 285–302. https://doi.org/10.1016/j.apm.2016.12.020
Liu, H., Xu, B., Lu, D., & Zhang, G. (2018). A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm. Applied Soft Computing, 68, 360–376. https://doi.org/10.1016/j.asoc.2018.04.015
Yamashita, T., Okada, T., & Noda, I. (2013). Implementation of simulation environment for exhaustive analysis of huge-scale pedestrian flow. SICE Journal of Control, Measurement, and System Integration, 6(2), 137–146. https://doi.org/10.9746/jcmsi.6.137
Okukubo, T., Bando, Y., & Onishi, M. (2022). Traffic prediction during large-scale events based on pattern-aware regression. Journal of Information Processing, 30, 42–51. https://doi.org/10.2197/ipsjjip.30.42
Nishida, R., Onishi, M., & Hashimoto, K. (2019). Construction of a route choice model for application to a pedestrian flow simulation. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) pp. 614–619. https://doi.org/10.1109/PERCOMW.2019.8730657
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
El-Abd, M., & Kamel, M.S. (2009). Black-box optimization benchmarking for noiseless function testbed using particle swarm optimization. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers pp. 2269–2274 https://doi.org/10.1145/1570256.1570316
Hansen, N., & Ostermeier, A. (2001). Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2), 159–195. https://doi.org/10.1162/106365601750190398
Hansen, N., Auger, A., Ros, R., Finck, S., & Pošík, P. (2010). Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation pp. 1689–1696. https://doi.org/10.1145/1830761.1830790
Bergstra, J.S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems pp. 2546–2554
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., & Cox, D. D. (2015). Hyperopt: A python library for model selection and hyperparameter optimization. Computational Science & Discovery, 8(1), 014008. https://doi.org/10.1088/1749-4699/8/1/014008
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10), 281–305.
Guilford, J. P. (1950). Fundamental statistics in psychology and education. McGraw-Hill.
Horni, A., Nagel, K., & Axhausen, K. W. (2016). The multi-agent transport simulation MATSim. Ubiquity Press.
Taillandier, P., Vo, D.A., Amouroux,E., & Drogoul, A. (2012). Gama: A simulation platform that integrates geographical information data, agent-based modeling and multi-scale control. In: Proceedings of the 13th International Conference on Principles and Practice of Multi-Agent Systems pp. 242–258 https://doi.org/10.1007/978-3-642-25920-3_17
Shigenaka, S., Takami, S., Watanabe, S., Tanigaki, Y., Ozaki, Y., & Onishi, M. (2021). Mas-bench: Parameter optimization benchmark for multi-agent crowd simulation. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems pp. 1652–1654
Acknowledgements
Part of the work has been presented at the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2021), Virtual, May 3–7, 2021 [32].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was done while the 4th author was at the University of Freiburg.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shigenaka, S., Takami, S., Tanigaki, Y. et al. MAS-Bench: a benchmarking for parameter calibration of multi-agent crowd simulation. J Comput Soc Sc 7, 2121–2145 (2024). https://doi.org/10.1007/s42001-024-00302-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-024-00302-6