1. Introduction
The particulate matter (PM) concentration is increasing continuously with the rapid growth of the economy and industrialization [
1]. The statement from the Expert Panel on Population and Prevention Science of the American Heart Association indicates that PM, especially PM2.5, is harmful to human health [
2,
3] due to the increased risk of people suffering from cardiovascular, respiratory diseases, and cancer [
4]. Most of the countries in the world are currently suffering from PM2.5 [
5]. Many researchers use methods such as prediction or change point detection [
6,
7] to solve the above problems It can provide a reference for people to travel and reduce the harm of PM2.5 to human health. At the same time, it can also provide the basis for the government managers to carry out environmental problems.
As far as PM2.5 forecasts are concerned, there are two kinds of methods for predicting PM2.5 concentration in existing literature, one is the physical method, and the other is the statistical method.
The physical method is to simulate environmental factors directly by physics, chemistry, biology, and other methods. For instance, Woody et al. [
8] used the community Multiscale Air Quality-Advanced Plume Treatment model to predict the PM2.5 concentration caused by aviation activities. Geng et al. [
9] employed the nested-grid GEOS-Chem model and satellite data of MODIS and MISR instruments to predict PM2.5 concentration. However, due to excessive consumption of resources and manpower, this method has certain shortcomings.
The statistical method, including machine learning, deep learning, or other statistical methods, has been widely used for predicting PM2.5 concentration. They overcome the shortcomings of the physical model, such as the Markov model [
10], support the vector regression model [
11], alternating decision trees, and random forests [
12]. However, many meteorological factors related to PM2.5 are nonlinear. The above machine learning methods used to predict PM2.5 concentration are used for dealing with linear relationships and show low prediction accuracy. Recently, some deep learning models, including convolutional neural networks (CNN), recurrent neural networks (RNN), and their deformations, have been adopted for predicting PM2.5 concentration. CNN can extract valid information from feature inputs and discovery deep connections between different feature elements [
13]. RNN and its deformations are very effective for processing data with sequence characteristics. They can mine the timing information and semantic information from data [
14,
15,
16]. Therefore, these models can handle non-linear relationships well and make up for the defects of machine learning among these models. The long-short term memory (LSTM) model is relatively popular. It is suitable for processing and predicting important events with relatively long intervals and delays in time series data. In addition, the bidirectional long short-term memory neural network (BiLSTM) connects two hidden layers and operates in both directions between input and output. The BiLSTM-based structure also allows the training of the prediction model to use both the future features and the past features for a specific time range efficiently, which improves the prediction accuracy to a certain extent. Additionally, BiLSTM is very popular in text classification [
17], speech recognition [
18], and PM2.5 prediction [
19]. Meanwhile, its extended model, CNN-BiLSTM, is also widely used in many fields, such as diagnosis of heart disease [
20], video compression [
21], and COVID-19 diagnosis [
22]. However, this model requires a large number of training data and cannot reflect the influence of different features on the prediction results, especially for predicting PM2.5 concentration. At present, most methods based on the integration of CNN and LSTM do not take it into consideration. Therefore, the attention mechanism can be introduced into the time series models to capture the importance degree of the effects of featured states at different times in the past on future PM2.5 concentration. The attention-based layer can automatically weight the past feature states to improve prediction accuracy, as shown in [
17,
23].
Therefore, a hybrid model named CNN-BiLSTM-Attention is proposed, including a CNN layer, a BiLSTM layer, and an attention layer. It can utilize CNN to extract effective spatial features from all factors related to PM2.5. BiLSTM is employed to solve the problems of gradient disappearance and explosion in the way of time series and identify temporal features in two directions of the hidden layer [
24]. Additionally, the attention mechanism is adopted to analyze the importance of all features and assign corresponding weights to each feature. The proposed model can advance their respective advantages and improve the accuracy of PM2.5 concentration prediction.
The rest of this article is organized as follows. The second section presents the framework of the proposed model. The third section describes the process of the experiment and discusses the results. The fourth section draws the conclusion.
4. Conclusions
In recent years, predicting the PM2.5 concentration has attracted the attention of many scholars, especially those who are committed to environmental protection. Coupled with the improvement of urban air pollution prediction and control management, many air quality monitoring stations are deployed in many cities. How to effectively use the data collected by these monitoring stations and improve urban air quality is an important issue. In this paper, an intelligent PM2.5 concentration prediction model CNN-BiLSTM-Attention is proposed.
Taking Beijing as a research case, this model was applied to a dataset of hours from January 2013 to February 2017 in Shunyi District, Beijing. Results show that:
- (1)
For a model based on deep learning, the parameters adjustment of the network framework is inevitable. In this paper, a large number of experiments and tuning tools are used to determine the parameters.
- (2)
The performance of the hybrid CNN-BiLSTM-Attention model proposed in this paper is better than the traditional models and machine learning models used to predict PM2.5 concentration. Additionally, it is better than the integration of the two models based on CNN and LSTM. This is due to the attention mechanism that can capture the degree of influence of the feature states at different times on the PM2.5 concentration. The attention-based layer can automatically weight the past feature states.
- (3)
The short-term (48 h) and long-term (72 h, 96 h, 120 h, and 144 h) predictions of the models carried out in this paper show that the prediction performance is the best in the next 48 h, with MAE, RMAE and R2 being 2.366 , 3.065 and 0.960, respectively. Additionally, the CNN-BiLSTM-Attention model predicts the next 144 h is still better than other models’ predictions for the next 48 h. Therefore, this hybrid model has good generalization ability and is also conducive to long-term dependence feature extraction.
The proposed CNN-BiLSTM-Attention model is an intelligent PM2.5 concentration prediction model based on the analysis and modeling of historical air quality data. It can help environmental protection agencies implement some measures to strengthen environmental protection. Meanwhile, it provides a reference for the measures taken by the transportation-related departments to reduce related gas emissions. The model established in this paper is closely related to reality, deeply analyzes and discusses PM2.5 issues, establishes a corresponding model, and analyzes the prediction accuracy so that the model has good versatility and generalization. It can also be used to predict other pollutants. With the large-scale deployment of air quality monitoring stations, the prediction model in this paper has potential for application.
However, since air quality monitoring stations have only been deployed in recent years, the limitation of the amount of data may affect the training of the model. In the future, as more air quality monitoring stations are deployed, there will be longer periods of data to optimize the prediction model. In addition, the PM2.5 concentration is spatially related. In the future, PM2.5 concentration data from surrounding monitoring stations and related factors will be taken into consideration to further improve the prediction accuracy of the model.