1. Introduction
Co-operative Intelligent Transportation Systems (C-ITS) technologies have been attracting much R&D efforts during the last decades and are becoming more pervasive in the context of Smart Roads, not only due to the deployment of connected roadside infrastructures, but especially for their potential to provide information that can help the driver and road users. C-ITS are equipped with traffic sensors and communications platforms able to extract useful information from the road vehicles and transmit it to nearby vehicles and/or to traffic management centers.
The connected and sensor-equipped roadside infrastructures can assist real-time applications, such as route planning, traffic analysis, and statistics collection applications for highway management or other third parties’ purposes. As an example, a highway traffic radar can support autonomous vehicles’ maneuvers, but it can also count the number of vehicles entering the highway and monitor their behavior, and therefore, it can be used to assist traffic control prediction and congestion control decisions.
Traffic congestion is one of the main challenges faced today by drivers, motorway operators, and city managers, because it makes daily travel more complex, with negative impacts on the environment, time, and monetary costs for the users. Such a problem can be mitigated by providing to the driver traffic flow predictions, such as the probability of traffic congestion, helping them to avoid undesirable events by rerouting their travels, choosing another means of transport, or changing their trip times. On the other hand, traffic flow predictions can be beneficial for road and city operators in the implementation of traffic planning and management strategies.
The platform used to gather monitoring data was developed under the scope of the PASMO project—An Open Platform for the development and experimentation of Mobility Solutions—[
1], providing connectivity to the project sensors and the usage of telecommunications services. This project was developed by the Institute of Telecommunications at the University of Aveiro, with the purpose of providing solutions for intelligent mobility problems and an open platform for researchers to develop their own ideas. The project platform [
2] includes a set of services related to road mobility that use a set of physical sensors (e.g., traffic classification radar, parking, meteorological, etc.) and communication networks to monitor traffic behavior. The devices composing the core of the platform were installed in the municipalities of Ílhavo and Aveiro, in Portugal, at strategic locations to capture the most valuable information about the environment around the beaches of Barra and Costa Nova, as the map area, illustrated at an approximated scale of 1/10,000, in
Figure 1 demonstrates. The main local city is 10 Km away toward the right of the map. Both covered places are both summer resorts, where people go to the beach, and places of residence, that act as urban dormitories for people that work in cities nearby. Vacationers and residents have different travel patterns, and their impact on the overall traffic differs from day to day and it varies depending on the time of year.
Although there is a vast amount of work on data analysis and traffic forecasting studies published, namely, [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13], datasets are not in the public domain. Publicly available dataset repositories such as OpenDataMonitor, Kaggle, and MDPI Data, allow one to find several datasets related with traffic accidents, but none representing highway traffic, especially Portuguese traffic, behavior. A search over the IEEE DataPort repository returned two references [
14,
15], published by Cruz et al., related to WiFI data [
14] and V2V communications [
15] collected by driving vehicles in Porto city. There is not, at least to our best knowledge, a dataset that describes the traffic flow in a shoreline area that makes it possible to correlate the habits of travel to the beach with the weather information. This makes the present dataset unique.
From the set of devices, we selected three traffic classification radars [
16,
17,
18] covering all entries of the beaches of Barra and Costa Nova, whose data were collected from 2019 to 2021. This paper details the corresponding dataset, aggregating traffic data with a time granularity of 10 min, detailing the number of detected vehicles, including 2-wheelers, and their maximum, minimum, and median speed, in both directions. The goal is to enable further and better work by researchers. We also believe that the acquisition and processing methods can be helpful for others responsible for acquiring, cleaning, and processing IoT data from smart cities.
The paper continues in
Section 2 with a description of the dataset and detailing the data collection and preparation methodologies in
Section 3.
Section 4 concludes the paper, and some additional information is provided in the annexes.
2. Data Description
The original telemetry dataset contains more than 170 million records (170,158,409) considering the years of 2019, 2020, and 2021, and is composed of parking sensors and radars data.
Section 3 presents the steps to transform the original data, including granularity and speed calculations related to radar stations, since the parking sensor data were not included in the present dataset.
The final dataset, summarized in
Table A1 (
Appendix A), has 74,305 records ordered and summarized by index columns (year, month, day, hour, and minute). In addition to the indexing columns, it contains the weekday (wd), added for ease of query writing; mean speed approximating radar i (Speed_med_ARi); maximum speed approximating radar i (Speed_max_Ari); the minimum speed approximating radar i (Speed_min_Ari); mean speed leaving radar i (Speed_med_DRi); maximum speed leaving radar i (Speed_max_DRi); minimum speed leaving radar i (Speed_min_DRi); traffic Flow at Barra (TF_Barra); and traffic Flow at Costa (TF_Costa). The
minute attribute represents ten-minute intervals (0–10, 11–20, 21–30, 31–40, 41–50, 51–60) used to aggregate the original records. The dataset can be accessed at
https://figshare.com/s/d324f5be912e7f7a0d21 (accessed on 23 June 2020).
Each record stores the minimal, maximal, and mean speed of the objects approximating and detaching for each radar station. Furthermore, the traffic flow (TF) values for the two regions (Barra and Costa Nova) are presented by the TF_barra and TF_Costa attributes. The units of Speed and Traffic Flow are meters per second (m/s) and the number of objects per ten minutes (#obj/10 min), respectively.
TF_Barra and
TF_Costa can be positive or negative. A positive TF value represents a movement of the increase in the traffic flow for that region, and negative values express a movement of reduction.
Figure 2 presents the dataset; the x-axis represents the time and the y-axis the traffic flow values (Barra and Costa Nova).
Figure 3 presents the data distribution of the final dataset considering the values of Traffic Flow for each region.
The detailed statistics of the dataset are presented in
Table 1. Mean, Std (Standard Deviation), Min (Minimal), 25% (quartile 1), 50% (quartile 2), 75% (quartile 3), and Max (Maximal) are related to values of Traffic Flow for Barra and Costa Nova, and complete the description of the final dataset.
Figure 4 shows cumulative traffic count over the course of an entire day that, as explained in
Section 3, is calculated from a cumulative sum of the difference of vehicles entering and exiting, shifting the values to make the minimum of zero. The values, by themselves, do not provide much information, since it is not possible to know exactly how many cars were in each place. Although the values are not precise about the number of vehicles in the system,
Figure 4 allows one to understand the count evolution over the day, and, for instance, to notice that the lowest number of vehicles in Barra was in the morning, while the highest was in the afternoon. Furthermore, when comparing the values, we can observe that the number of vehicles entering and leaving Barra goes from 476 to 0, and since these values were taken in the morning, we can conclude the data represent people leaving home to go to work. If we compare it with Costa Nova, we can see the discrepancy in values for the same period is not so large; it falls from 101 to 6.
3. Data Preparation Methods
The radar used in PASMO was a UMRR-0C Type 42, produced by smartmicro, that operates in the 24Ghz band for multilane, multiobject traffic tracking and is capable of measuring several parameters (such as range, angle, and radial speed) of moving targets. It has a bandwidth of 250 Mhz and a maximum transmitted power of 20 dBm, and it uses a multiple Frequency Modulated Continuous Wave (FMCW) technique to acquire the relative speed and range of each target. The Type 42 integrated array of antennas allows for long-range and wide horizontal coverage. It also integrates tracking algorithms that can track up to 126 moving targets simultaneously, regardless of object speed, distance to the sensor, or azimuth angle.
Table 2 summarizes the main characteristics of the radar used, and Correia [
19] further details data sensorization and the gathering process.
The original telemetry dataset contains more than 170 million records (170,158,409) over 3 years, and is composed of parking sensors and radar data. The dataset is a sample from the entire PASMO platform data, according to a specific method.
Figure 5 presents the complete method designed to prepare the final dataset, starting with the data selection applied to Parking and Radar data. For the goal of this dataset, just radar data were selected, resulting in 155,432,185 records.
Table 3 presents the attributes of the original data. Each record is produced at a sampling interval of 100 milliseconds and contains the identification of the moving object, id, and co-ordinates of the radar, timestamp, and x-y axis speed component. The dataset was aggregated considering a ten-minute granularity.
Before adjusting the granularity, other derived radar data were produced; year, month, day, hour, weekday, and minute attributes were calculated from the timestamp. Furthermore,
xSpeed and
ySpeed attributes also result in the Speed measure. Negative values for Speed represent the measure of the speed of an object approximating the radar, and positive values represent the movement of the detachment; the
in_out logical attribute stores this situation. These steps are represented in
Figure 5.
Using the identification, speed, and direction of the moving object, it is possible to compute the number of vehicles that have passed, as well as their speed (maximal, mean, and minimal) at the radar level by year, month, day, hour, and minutes (ten-minute intervals).
Table 4 presents the resulting format of the processing radar data, and each record represents measures aggregated for ten minutes at the hour.
Figure 6 depicts the localization of the radars. The first one is before entering the bridge, the second one is in the interconnection segment between Barra and Costa Nova, and the third one is at the urban limit to the south of Costa Nova.
This work presents a dataset to store the traffic flow in Barra and Costa Nova. Therefore, two new measures were computed to represent the traffic flow in regions:
TF_Barra and
TF_Costa.
where
QARi = quantity of objects approximating the radar
i, and
QDRi = quantity of objects detaching from the radar
i, computed by
with
i as the identification of the radar,
j as the interval minute (0, 10, …, 50),
obj_count_A as value of the
count_obj attribute where the
in_out = 1 and
obj_count_D as value of the
count_obj attribute where the
in_out = 0.
Therefore,
TF_Barra and
TF_Costa can be positive or negative values. A positive TF value represents a movement of the increase in the traffic flow for that region, and negative values can express a movement of reduction. Additionally, new speed measures were computed considering the movement of approximation and detaching for each radar.
with
n as the quantity of records representing the
Speed_med attribute with
in_out = 1.
with
i as the identification of the radar,
j as the interval minute, and
n as the quantity of records representing the
Speed_med attribute with
in_out = 0.
In Equations (5)–(10),
i represents the identification of the radar and
j the interval minute. Equations (7) and (9) consider the records with
in_out = 1, while Equations (8) and (10) consider those with
in_out = 0.
Table 5 presents the final processed radar data.
4. Conclusions
The present dataset was obtained through the aggregation of telemetry data from the PASMO project, and it was cleaned and aggregated in order to obtain a summary of the passages at 10 min intervals. The aggregation process produced 155,432,185 records that allow us to characterize the traffic in the beach area in the years of 2019, 2020, and 2021.
In addition to allowing the analysis of traffic distribution and the forecast of future traffic, the present dataset can be correlated with national/regional holidays (
Table A2 in
Appendix B) and a set of events related to the COVID-19 pandemic [
20] and the circulation restrictions that occurred in 2020 and 2021 (
Table A3 in
Appendix C). Furthermore, it can be correlated with meteorological data through the use of an adequate dataset, since in the case of bathing areas, the meteorological factor has a strong impact on road travel.
This dataset can be used for a wide spectrum of situations related to smart cities and vehicular traffic. We highlight the identification of behaviors, trends, diurnal mobility patterns, and training of modelling and predictive algorithms.
In the near future, we plan to clean and aggregate the data produced by the remaining radars installed within the scope of the PASMO project and try to create a dataset that will allow us to characterize the pattern of entrances and exits in the city of Aveiro.