Forecasting air quality time series using deep learning

Brian S Freeman; Graham Taylor; Bahram Gharabaghi; Jesse Thé

doi:10.1080/10962247.2018.1459956

Forecasting air quality time series using deep learning

J Air Waste Manag Assoc. 2018 Aug;68(8):866-886. doi: 10.1080/10962247.2018.1459956. Epub 2018 May 24.

Authors

Brian S Freeman¹, Graham Taylor¹, Bahram Gharabaghi¹, Jesse Thé^{1

2}

Affiliations

¹ a School of Engineering , University of Guelph , Guelph , Ontario , Canada.
² b Lakes Environmental , Waterloo , Ontario , Canada.

PMID: 29652217
DOI: 10.1080/10962247.2018.1459956

Abstract

This paper presents one of the first applications of deep learning (DL) techniques to predict air pollution time series. Air quality management relies extensively on time series data captured at air monitoring stations as the basis of identifying population exposure to airborne pollutants and determining compliance with local ambient air standards. In this paper, 8 hr averaged surface ozone (O₃) concentrations were predicted using deep learning consisting of a recurrent neural network (RNN) with long short-term memory (LSTM). Hourly air quality and meteorological data were used to train and forecast values up to 72 hours with low error rates. The LSTM was able to forecast the duration of continuous O₃ exceedances as well. Prior to training the network, the dataset was reviewed for missing data and outliers. Missing data were imputed using a novel technique that averaged gaps less than eight time steps with incremental steps based on first-order differences of neighboring time periods. Data were then used to train decision trees to evaluate input feature importance over different time prediction horizons. The number of features used to train the LSTM model was reduced from 25 features to 5 features, resulting in improved accuracy as measured by Mean Absolute Error (MAE). Parameter sensitivity analysis identified look-back nodes associated with the RNN proved to be a significant source of error if not aligned with the prediction horizon. Overall, MAE's less than 2 were calculated for predictions out to 72 hours.

Implications: Novel deep learning techniques were used to train an 8-hour averaged ozone forecast model. Missing data and outliers within the captured data set were replaced using a new imputation method that generated calculated values closer to the expected value based on the time and season. Decision trees were used to identify input variables with the greatest importance. The methods presented in this paper allow air managers to forecast long range air pollution concentration while only monitoring key parameters and without transforming the data set in its entirety, thus allowing real time inputs and continuous prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants / analysis
Air Pollution / analysis*
Deep Learning*
Environmental Monitoring / methods*
Forecasting / methods
Ozone / analysis
Particulate Matter / analysis
Seasons

Substances

Air Pollutants
Particulate Matter
Ozone