An integrated analysis of air pollution and meteorological conditions in Jakarta

Handhayani, Teny

doi:10.1038/s41598-023-32817-9

Download PDF

Article
Open access
Published: 09 April 2023

An integrated analysis of air pollution and meteorological conditions in Jakarta

Teny Handhayani¹Â

Scientific Reports volumeÂ 13, ArticleÂ number:Â 5798 (2023) Cite this article

6736 Accesses
12 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Air pollution and climate change are general problems for society. This paper proposes an integrated analysis of the Air Quality Index (AQI) and meteorological conditions in Jakarta. The column-based data integration model is applied to create integrated data of the Air Quality Index and meteorological conditions. The integrated data is then used to generate a causal graph using the PC algorithm. The causal graph reveals that there exist causal relationships between pollutants and meteorological conditions, e.g, humidity, rainfall, wind speed, and duration of sunshine affect particulate matter 10 (PM$_{10}$); wind speed affects sulfur dioxide (SO$_2$); temperature affects ozone (O$_3$). The historical data records that the average wind speed is decreased and the number of unhealthy days has risen. Ozone and particulate matter are two pollutants that mainly influence poor air quality in Jakarta. The integrated data is also used to train Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) for forecasting. Experimental results show that LSTM using integrated data produces smaller errors for forecasting AQI and meteorological conditions.

Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru

Article Open access 20 December 2021

A novel hybrid model for six main pollutant concentrations forecasting based on improved LSTM neural networks

Article Open access 24 August 2022

An air quality index prediction model based on CNN-ILSTM

Article Open access 19 May 2022

Introduction

Poor air quality is dangerous to civilization and the environment^1,2,3,4,5,6. Air pollution is the largest cause of non-communicable diseases in some countries and regions, for instance, in Southeast Asia⁷. In 2020, the government of Republic Indonesia establishes that air quality is measured from the concentration of 7 parameters: particulate matter 10 (PM$_{10}$), particulate matter 2.5 (PM$_{2.5}$), sulfur dioxide (SO$_2$), nitrogen dioxide (NO$_2$), carbon monoxide (CO), ozone (O$_3$), and hydrocarbons (HC).

Air pollution might have a linkage to meteorological conditions or vice versa. Some research has been conducted to analyze the linkage of air pollutants and meteorological conditions^{8,9,10,11,12,13}. A study in Taiwan reveals that temperature was associated with the incidence of CO poisoning¹⁴. A Bayesian Network graphical model has been used to analyze the statistical dependencies between environmental parameters, air pollution variables, and health data¹⁵. A study found that the maximum aerosol optical depth (AOD) in Palangka Raya, Pontianak, and Jambi happened in the dry season from July to October¹⁶.

The historical data of Air Quality Index (AQI) and meteorological conditions in Jakarta record some important information^17,18. The increasing number of unhealthy days happened from 2010â2013, 2015â2018, and 2020â2021. It is understandable that in 2020, the air quality was getting better because of the limited activities during the Covid-19 outbreak. However, the number of unhealthy days raised in 2021. From 2010 to 2021, the number of unhealthy days is always higher than healthy days. This is an early warning for the society that poor air quality might worsen if it is not managed properly. The average temperature slightly increased around $0.55 ^{\circ }$C from 2013 to 2019 and the average wind speed decreased.

Correlation measures a relationship between variables. However, correlation does not imply causation¹⁹. It means that statistical properties alone do not determine causal structures. The causal learning methods are enable to analyze the dependence structures among variables. A study has been conducted to observe the performance of learning algorithms to learn Bayesian network structures from climate data²⁰. Some studies have been done to analyze the causal effects between pollution and health. A research has revealed the causal effects between local air pollution on daily deaths²¹. Gaussian process model and information geometric causal inference criterion have been implemented to obtain the correct causal directions between air pollutants²². A causal inference approach named Total Events Avoided (TEA) has been used for evaluating the health impacts of an air pollution regulation²³.

Analyzing the trend of air pollution is beneficial for the government and society to find the important factors that contribute to air quality. This research is conducted to study the causal relationships between air pollutants and meteorological conditions in Jakarta. The problem of this research is how to analyze the causal effect of air pollution and meteorological conditions in Jakarta. This paper proposes an integrated analysis of AQI and meteorological conditions using a causal learning approach. It implements the PC algorithm to generate a causal graph from a dataset. The causal graph is then used to analyze the cause and effect relationships among variables. The proposed method is useful to analyze the linkage of air pollution and meteorological conditions in Jakarta. The integrated data is also applied to train models for forecasting. This paper implements Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) to forecast AQI and meteorological conditions. The research contribution is an integration model to analyze the dependency relationships among variables and prediction of the future values of AQI and meteorological conditions, the case study in Jakarta.

Methods

Air Quality Index (AQI)

The Ministry of Environment and Forestry in the Republic of Indonesia measures the Air Quality Index (AQI) using equation (1), where I, $I_a$, $I_b$, $L_a$, $L_b$, and $L_x$ represent AQI score, upper limit AQI, lower limit AQI, upper limit ambient concentration, lower limit ambient concentration, and measurement results of real ambient concentration, respectively²⁴. Air Quality Index (AQI) standard values are categorized as good (1â50), moderate (51â100), unhealthy (101â200), very unhealthy (201â300), and hazardous ($\ge$301).

$$\begin{aligned} I = \frac{I_a - I_b}{L_a - L_b}(L_x - L_b) + I_b \end{aligned}$$

(1)

PC algorithm

A causal graph is a graphical model that represents cause and effect relationships among variables. Assume that causal information between variables can be represented by a directed acyclic graph (DAG) where the nodes represent random variables and the edges represent direct causal effects^25,26,27,28. Each causal DAG implies a set of conditional independence relationships²⁵. A simple graph $A \rightarrow B$ (i.e., A is a parent of B) represents that A is a direct cause of B. A is a (possibly indirect) cause of B only if there is a directed path from A to B (A is an ancestor of B). One of the algorithms for learning a causal graph from a dataset is the PC algorithm^26,28,29.

The PC algorithm applies conditional independence tests to generate a causal graph from a dataset²⁶. Suppose E, ${\hat{\rho }}$, $\alpha$ , n, and $\Phi (.)$ denotes the separation set, the partial correlation, the significance level, the number of samples, and the cumulative distribution function (cdf) of ${\mathcal {N}}(0,1)$, respectively. An equation (2) can be used to compute a conditional independence test for Gaussian data^30,31. It tests a question âis a variable $D_{u}$ conditionally independent $D_{v}$ of given $D_{E}$?â

$$\begin{aligned} D_{u}\bot D_{v} \mid D_{E} \Leftrightarrow \sqrt{n - \mid E \mid - 3} \left| \frac{1}{2} \log \left( \frac{1+{\hat{\rho }}_{uv \mid E}}{1-{\hat{\rho }}_{uv \mid E}} \right) \right| \le \Phi ^{-1}(1-\alpha /2). \end{aligned}$$

(2)

The correlation coefficient of two random variables X and Y is $\rho _{XY} = \frac{\sigma _{XY}}{\sigma _X \sigma _Y}$, where $\sigma$ is standard deviation³². The partial correlation can be computed from correlation matrix using Eq. (3), where A, B, and C are random variables³³.

$$\begin{aligned} {\hat{\rho }}_{AB.C} = \frac{\rho _{AB} - \rho _{AC} \rho _{CB}}{\sqrt{(1-\rho ^2_{AC})(1-\rho ^2_{CB})}} \end{aligned}$$

(3)

In general, the PC algorithm has two main steps: generating graph skeleton and orienting the edges³⁴. Suppose a dataset consists of v variables. The first step is generating a complete undirected network consisting of v vertices. The conditional independence tests are run for every triplet vertices. The output of the first step is a skeleton. The information of the conditional independence test in the first step is used to orient the edges. The output of the PC algorithm is a graph represented by a Completed Partially Directed Acyclic Graph (CPDAG)³⁰. The PC algorithm can be used to learn causal graphs by assuming there are no latent variables in the dataset.

Long short-term memory (LSTM)

Long short-term memory (LSTM) is an efficient gradient-based method^35,36. LSTM refers to a standard recurrent neural network (RNN) that has long-term memory and short-term memory. Suppose $\zeta$, $X_t$, ${\tilde{S}}_t$, $S_{t-1}$, $S_t$, $\circ$, $O_t$ denote the sigmoid function, the preprocessed data, the new state of memory cell, the previous state of the memory cell, the final state of memory cell, Hadamard product, and the final output of the memory unit, respectively. Let $i_t$, $f_t$, $o_t$ be the output of different gates and $W^{(i)}$, $W^{(f)}$, $W^{(o)}$, $W^{(c)}$, $U^{(i)}$, $U^{(f)}$, $U^{(o)}$, $U^{(c)}$ be coefficient matrices. The mathematical models related to the LSTM memory unit are defined by Eqs. (4â9)³⁷. LSTM networks work well for making predictions based on time series data^{38,39,40,41,42}.

$$\begin{aligned} i_t= & {} \zeta \left( W^{(i)} X_t + U^{(i)} S_{t-1}\right) \end{aligned}$$

(4)

$$\begin{aligned} f_t= & {} \zeta \left( W^{(f)} X_t + U^{(f)} S_{t-1}\right) \end{aligned}$$

(5)

$$\begin{aligned} o_t= & {} \zeta \left( W^{(o)} X_t + U^{(o)} S_{t-1} \right) \end{aligned}$$

(6)

$$\begin{aligned} {\tilde{S}}_t= & {} \tanh \left( W^{(c)} X_t + U^{(c)} S_{t-1}\right) \end{aligned}$$

(7)

$$\begin{aligned} S_t= & {} f \circ S_{t-1} + i_t \circ {\tilde{S}}_t \end{aligned}$$

(8)

$$\begin{aligned} O_t= & {} o_t \circ \tanh (S_t) \end{aligned}$$

(9)

Gated recurrent unit (GRU)

Gated recurrent unit (GRU) is recurrent neural networks (RNN) using gating mechanism^43,44. Let $W_z$, $W_r$, W, $U_z$, $U_r$, U, $b_z$, $b_r$, and b be model parameters. Suppose $\odot$ represents element-wise multiplication. For each j-th hidden unit, GRU has a reset gate $r^j_t$ and an update gate $z^j_t$ to control the hidden state $h^j_t$ at each time t which are computed using Eqs. (10â13). GRU has been successfully implemented for forecasting the time series datasets^45,46,47.

$$\begin{aligned} r_t= & {} \zeta \left( W_r x_t + U_r h_{t-1} + b_r \right) \end{aligned}$$

(10)

$$\begin{aligned} z_t= & {} \zeta \left( W_z x_t + U_z h_{t-1} + b_z \right) \end{aligned}$$

(11)

$$\begin{aligned} {\tilde{h}}_t= & {} \tanh \left( W x_t + U(r_t \odot h_{t-1}) + b \right) \end{aligned}$$

(12)

$$\begin{aligned} h_t= & {} (1-z_t) \odot h_{t-1} + z_t \odot {\tilde{h}}_t \end{aligned}$$

(13)

Evaluation metric

The evaluation metric for forecasting are mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE). MAE reflects the actual situation of the prediction error and RMSE evaluates the degree of change and accuracy of the data. Let $y'$, y, and n be the predicted value, true value, and the number of samples. Equations (14â16) are used to compute MAE, MSE, and RMSE, respectively^47,48.

$$\begin{aligned} MAE_{y', y}= & {} \frac{1}{n} \sum _{i = 1}^{n}|y'_i - y_i | \end{aligned}$$

(14)

$$\begin{aligned} MSE_{y', y}= & {} \frac{1}{n} \sum _{i = 1}^{n}(y'_i - y_i)^2 \end{aligned}$$

(15)

$$\begin{aligned} RMSE_{y', y}= & {} \sqrt{\frac{1}{n} \sum _{i = 1}^{n}(y'_i - y_i)^2} \end{aligned}$$

(16)

Dataset

This paper use AQI and meteorological conditions in Jakarta from public datasets. The AQI data owned by DKI Jakarta Provincial Government can be accessed at https://data.jakarta.go.id/organization/badan-pengelolaan-lingkungan-hidup-daerah¹⁸. The meteorological conditions dataset is obtained from an open dataset belonging to Indonesian Agency for Meteorological, Climatological and Geophysics (Badan Meteorologi, Klimatologi, dan Geofisika or simply BMKG) that is available at http://dataonline.bmkg.go.id/home⁴⁹.

The air quality dataset is daily records Air Quality Index (AQI) of PM$_{10}$, PM$_{2.5}$, SO$_2$, CO, O$_3$, and NO$_2$ from 2010 to 2021. PM$_{2.5}$ is only available from 2021. The meteorological conditions is a daily record of average temperature ($^\circ$C), average relative humidity (RH) (%), average rainfall (mm), average duration of sunshine (hours), and average wind speed (m/s) from 2010 to 2021.

The proposed method

This paper proposes an integrated analysis of air pollution data and meteorological condition to analyze the air quality in Jakarta. The proposed method is illustrated in Fig. 1. The stages of the proposed method are data integration, causal graph generation, and forecasting. The integration process of meteorological data and AQI data use column-based integration. The datasets are time series data with numerical values. The idea of data integration has been used to learn simultaneously from multiple data sources^50,51. The integration data requires not only the same date for each sample but also the same number of samples from all resources. In this paper, the integrated data is a single table containing variables from meteorological data and AQI data. This data is then used as input for generating a causal graph and forecasting. A causal graph is generated using the PC algorithm. LSTM and GRU are implemented for forecasting.

This paper uses the PC algorithm from bnlearn in the R package^52,53. A causal graph is generated in R Studio. It also implements LSTM and GRU from TensorFlow Keras. The forecasting is run in Jupyter Notebook for Python.

LSTM and GRU are implemented to forecast the prediction of AQI and meteorological conditions. The LSTM and GRU models consist of stacked layers with 128 and 64 units, dropout layer and dense layer. LSTM and GRU are run for 50 epochs and they implement the Softmax activation function. This paper uses multivariate forecasting. The experiments use integrated and not integrated data. The letter i and p indicate that the algorithm is implemented for forecasting using integrated data and not integrated data, respectively. A not-integrated data refers to AQI data or a meteorological conditions dataset. An integrated dataset is a dataset containing AQI and meteorological conditions obtained from data integration process. This paper runs multivariate forecasting in 3 different scenarios:

Experiment 1 using training set from 2010 to 2018 and testing set from 2019.
Experiment 2 using training set from 2010 to 2019 and testing set from 2020.
Experiment 3 using training set from 2010 to 2020 and testing set from 2021.

Results and discussion

The datasets are containing less than 5% missing values. The missing values are filled up using an average value of the observed variables from 7 days before the observed date. After preprocessing phase, it implements column-based integration to create a single formed data from AQI and meteorological condition datasets. The integrated data is used to generate a causal graph and to train models for forecasting.

Causality analysis

This paper examines the dependence relationships between air pollutants represented by AQI and meteorological conditions. A graph is generated from an integrated data of AQI and meteorological conditions from 2010 to 2021. The dataset consists of 4383 samples and 10 variables (temperature, humidity, rainfall, sunshine, wind speed, PM$_{10}$, SO$_2$, CO, O$_3$, and NO$_2$). PM$_{2.5}$ is not included to the experiments due to the samples are only available from 2021. Figure 2 shows a causal graph generated using the PC algorithm at significance level of $\alpha = 0.05$. The graph finds some information that will be explained as follows.

Humidity, rainfall, and duration of sunshine are causal parameters for PM$_{10}$. Those findings are corresponding to some previous studies. Humidity influences PMâs natural deposition process; moisture particles adhere to PM and accumulate atmospheric PM concentration⁹. The increasing humidity reduces PM$_{10}$ concentrations in the atmosphere because moisture particles grow in size to a point where âdry depositionâ happens. PM$_{10}$ continually reduced with humidity rising¹⁰. The precipitation has a certain wet scavenging effect on PM$_{2.5}$ and PM$_{10}$¹¹. Precipitation scavenging refers to the cleaning of gases and particles by cloud and precipitation elements. A study of ambient air quality in Jakarta found that the concentration of suspended particulate matter is decreased in the wet season (OctoberâMarch) and increased in the dry season (AprilâSeptember) because rainfall removes the pollutant in the atmosphere⁵⁴.
CO has a dependent relationship to humidity. The previous study shows that higher humidity has a negative effect on the adsorption of carbon monoxide⁵⁵.
Wind speed has dependence relationships to SO$_2$, NO$_2$, and PM$_{10}$.
Temperature has a causal relationship to O$_3$. The chemical reactions in the formation or destruction of O$_3$ are influenced by temperature, solar radiation, and wind speed⁵⁶. A study found that diurnal temperature range, precipitation, and wind speed had the largest impact on SO$_2$ in Shandong, China⁵⁷.
CO causes O$_3$ and it is similar to a study in Kota Bharu, Malaysia that discovers CO as a causal parameter for O$_3$⁵⁸.
PM$_{10}$, SO$_2$, and CO affect O$_3$. O$_3$ is an air pollutant that is formed in the atmosphere from a combination of nitrogen oxides, volatile organic compounds, CO, and methane in the presence of sunlight⁵⁹.
NO$_2$ affects PM$_{10}$ and SO$_2$.
Sunshine affects PM$_{10}$.

The causal graph in Fig. 2 explains the connection of certain parameters from meteorological conditions to air pollution. Those relationships are not revealed when the analysis is done separately.

Correlation analysis

This paper highlights correlation coefficient ($\rho$) with the values $\rho \ge \pm 0.2$. The correlation coefficient between variables except for PM$_{2.5}$ is computed from samples of 2010â2021. The correlation coefficient involving PM$_{2.5}$ is obtained from the dataset of 2021. Table 1 shows correlation coefficient between two variables computed using Pearson correlation. The longer the sunshine duration makes the higher temperature, lower humidity, and lower rainfall. The temperature and duration of sunshine have a positive correlation to PM$_{10}$ and PM$_{2.5}$. The more concentration of PM the higher temperature will be. Humidity has a negative correlation to PM$_{10}$ and PM$_{2.5}$. This is one of the possible ways to decrease PM concentration by increasing humidity. Higher rainfall increases humidity. Weather modification to create artificial rain is useful to decrease PM concentration. Humidity and CO have a positive correlation. Meanwhile, wind speed and SO$_2$ have a negative correlation.

The annual average AQI from 2010 to 2021 is illustrated in Fig. 3A. The highest exposure to O$_3$ happened in 2012. Figure 3B shows the monthly average AQI in Jakarta from 2010 - 2021. The top 3 air pollutant are O$_3$, PM$_{2.5}$ and PM$_{10}$. AQI score of O$_3$ is always higher than 50 and it reaches over 100 in October to November which is categorized as an unhealthy condition.

Table 1 Correlation coefficient.

Full size table

Sunrise and sunset in Jakarta are not significantly different every day throughout the year because it lies on a latitude of â 6$^\circ$12$^\prime$ 52.63$^\prime$ $^\prime$ S and a longitude of 106$^\circ$50$^\prime$ 42.47$^\prime$ $^\prime$ E. The length of daylight remains the same every day, so the duration of sunshine is mostly affected by clouds. In the last 10 years, the low average rainfall happens from May to September, and the lowest is around 1.8 mm in August. Meanwhile, the longest average sunshine duration occurs in August, September, and October at 5.7, 6.4, and 5.3 hours, respectively. The month of June to October has a high average level of PM$_{10}$ over 73 and the highest is 76.79 in August. The lowest average of PM$_{10}$ is 50.24 in January. This finding is closed to the previous study⁵⁴ which is states that the highest concentration of PM$_{10}$ occurs in September 2015 and the lowest one is in February 2017. In 2021, the two highest average AQI for PM$_{2.5}$ are 80.56 in June and 86.32 in July. In May and October, the average temperature is around 29.1 $^\circ$C which is higher than the overall average temperature of 28.49 $^\circ$C. The average humidity during JulyâOctober is around 70â74%. Since 2015, the average wind speed decreases around 1 m/s than that in 2010. In 2021, the correlation between wind speed and PM$_{2.5}$ $\rho$(wind speed and PM$_{2.5}$) is â 0.32. The decrement in wind speed contributes to increasing PM$_{2.5}$. Wind speed and SO$_2$ have a negative correlation, so decreasing wind speed rises SO$_2$. A positive correlation is obtained between SO$_2$ and NO$_2$ as 0.6, indicating that the concentration of those pollutants rises together. O$_3$ has a positive correlation to PM$_{10}$ and PM$_{2.5}$.

The historical data and forecasting models

A record of the number of unhealthy (U) and very unhealthy days (VU) in the year 2010â2021 is presented in Table 2. The historical data shows that O$_3$ is the pollutant that mostly causes unhealthy and very unhealthy days. There are 108 days where on the same day two pollutants have AQI scores over 100 but only 22 days were labeled as very unhealthy because they only pay attention to a pollutant that has the highest AQI scores on that days. In 2020, on three consecutive days, the three pollutants together (SO$_2$, O$_3$, and NO$_2$) have AQI scores of more than 100 and those are categorized as unhealthy. It needs further study for a case when more than two pollutants have AQI scores over 100 in a day. It is possible to be more hazardous when the concentration of multiple pollutants reaches the unhealthy limit at the same time, so the categories of air pollution levels need to be evaluated.

Table 2 Pollutants that affect unhealthy (U) and very unhealthy (VU) days in Jakarta.

Full size table

The previous studies reveal various effects of the pollutants. The ambient temperature increased acute cardiovascular-respiratory mortality effects of PM$_{2.5}$⁶⁰. Exposure to PM$_{10}$, NO$_2$, and O$_3$ generates a relative risk to human health⁶¹. The effect of humans inhaling O$_3$ possibly leads to acute lung function changes and inflammation⁶². PM$_{2.5}$ may contribute to the development of diabetes mellitus, increase cardiopulmonary morbidity and mortality, and cause adverse birth outcomes⁶³. Epidemiological evidence shows that PM$_{2.5}$ damage the human respiratory system⁶⁴. The accumulating of exposure to low concentrations of carbon monoxide can affect a number of organ systems⁶⁵.

The actual data and forecasting of AQI from 2010 to 2021 are described in Fig. 4 A and B, respectively. The performance of LSTM and GRU are evaluated using MAE and RMSE. According to the experimental results, LSTM using integrated data produces the smaller error. In general, LSTM and GRU show a good performance in forecasting PM$_{10}$, CO, and O$_3$.

The actual data and forecasting meteorological conditions are described in Fig. 5A and B, respectively. LSTM and GRU work well to forecast temperature, humidity, sunshine duration and wind speed. However, they are less accurate to predict rainfall.

The two highest AQI of PM$_{10}$ were in 2011 and 2013 when the averages were 76.59 and 78.21. The AQI of SO$_2$ was consistently rising around 3 times higher than in 2010. The AQI of CO increased from 2010 to 2017, but it decreased from 2020 to 2021. The AQI of O$_3$ was also rising and the highest was in 2012â2013. Figure 6 shows the values of MAE and RMSE for forecasting results. LSTM using integrated data produces smaller errors. In general, the forecasting results of AQI data from 2020 to 2021 have higher errors than that from 2019. It is suspected that major restrictions in some activities during the Covid-19 outbreak influence that condition, for instance, the national or local lockdown reduces the use of motor vehicles which decreases the CO level. There was a huge increase in SO$_2$ and NO$_2$ from September 2020âJanuary 2021 but the reason is unknown. It needs further study for investigation. Comparing to the other study which is forecasting the observed variables using not integration data⁶⁶, the forecasting using integration data produces slightly lower MAE and RMSE.

The findings in this paper are expected to enrich the knowledge of the linkage between air pollution and climate change. This contribution is beneficial to determining the proper handling of air pollution and climate change problems.

Conclusion

In conclusion, the integration analysis successfully discovers the linkage between air pollution and meteorological conditions in Jakarta. The integrated data is used to generate a causal graph and to train models for forecasting. A causal graph shows that there exist dependence relationships between AQI and meteorological conditions. This information is beneficial for handling air pollution and climate change. LSTM and GRU work well as models for forecasting PM$_{10}$, CO, O$_3$, temperature, humidity, sunshine duration, and wind speed. However, those models show less accurate to predict SO$_2$, NO$_2$, and rainfall. LSTM using integrated data produces a smaller error. The forecasting results of air pollution before the Covid-19 outbreak are more accurate. The Covid-19 outbreak influences human activities that probably affect air quality, e.g, decreasing CO, and increasing NO$_2$ and SO$_2$. The future work is implementing machine learning approach for an integrated analysis to find the connection of population growth, industries, human activities and air pollution to the climate change in Indonesia.

Data availibility

The datasets are available from the corresponding author by request for strong reasons.

References

Kan, H. et al. Part 1 a time-series study of ambient air pollution and daily mortality in Shanghai, China. Res. Rep. Health Eff. Inst. 154(1), 17â78 (2010).
Google ScholarÂ
Qian, Z. et al. Part 2 association of daily mortality with ambient air pollution, and effect modification by extremely high temperature in Wuhan, China. Res. Rep. Health Eff. Inst. 154(1), 91â217 (2010).
Google ScholarÂ
Tramuto, F. et al. Urban air pollution and emergency room admissions for respiratory symptoms: A casecrossover study in palermo, Italy. Environ. Health 10(31), 1â11 (2011).
Google ScholarÂ
Zhang, J., Wei, Y. & Fang, Z. Ozone pollutan a major health hazard worldwide. Front. Immunol. 10(1), 1â10 (2019).
Google ScholarÂ
Holm, S. M. & Balmes, J. R. Systematic review of ozone effects on human lung function, 2013 through 2020. Chest 161(1), 190â201 (2022).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Peng, H. et al. Relationship between meteorological factors, air pollutants and hand, foot and mouth disease from 2014 to 2020. BMC Public Health 22(1), 1â10 (2022).
ArticleÂ CASÂ Google ScholarÂ
Asian development bank. Air quality in Asia: Why is it important, and what can we do? (2022; accessed 20 Sept 2022); https://www.adb.org/sites/default/files/publication/780921/air-quality-asia.pdf.
He, J. et al. Air pollution characteristics and their relation to meteorological conditions during 2014â2015 in major chinese cities. Environ. Pollut. 223, 484â496 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Hernandez, G., Berryand, T.-A., Wallis, S. L. & Poyner, D. Temperature and humidity effects on particulate matter concentrations in a sub-tropical climate during winter. In International Proceedings of Chemical, Biological and Environmental Engineering 41â49 (2017).
Lou, C. et al. Relationships of relative humidity with pm2.5 and pm10 in the yangtze river delta, china. Environ. Monit. Assess. 189(582), 1â16 (2017).
CASÂ Google ScholarÂ
Yansui, L., Zhou, Y. & Lu, J. Exploring the relationship between air pollution and meteorological conditions in china under environmental governance. Sci. Rep. 10, 1â11 (2020).
Google ScholarÂ
Liu, Z. et al. Analysis of the influence of precipitation and wind on PM_2.5 and PM₁₀ in the atmosphere. Adv. Meteorol. 2020(1), 1â13 (2020).
CASÂ Google ScholarÂ
Hou, K. & Xu, X. Evaluation of the influence between local meteorology and air quality in Beijing using generalized additive models. Atmosphere 13(24), 1â14 (2021).
CASÂ Google ScholarÂ
Wang, C. H. et al. Quantifying the effects of climate factors on carbon monoxide poisoning a retrospective study in Taiwan. Front. Public Health 9(1), 1â7 (2021).
Google ScholarÂ
Vitolo, C., Scutari, M., Ghalaieny, M., Tucker, A. & Russell, A. Modeling air pollution, climate, and health datavusing bayesian networks: A case studyvof the English regions. Earth Space Sci. 5, 76â88 (2018).
ArticleÂ ADSÂ Google ScholarÂ
Kusumaningtyas, S. D. A. et al. Aerosols optical and radiative properties in Indonesia based on AERONET version 3. Atmos. Env. 282, 119174 (2022).
ArticleÂ CASÂ Google ScholarÂ
Dinas Lingkungan Hidup Provinsi DKI Jakarta: Laporan Kualitas Udara Jakarta (2022; accessed 10 Jul 2022); https://lingkunganhidup.jakarta.go.id/files/14477-2022-06-24-07-45-08.pdf.
Portal Data Terpadu Pemprov DKI Jakarta: Dataset Indeks Standar Pencemaran Udara (2022, accessed 24 Jun 2022); https://data.jakarta.go.id/group/lingkungan-hidup.
Peters, J., Janzing, D. & SchÃ¶lkopf, B. Elements of Causal Inference (The MIT Press, 2017).
MATHÂ Google ScholarÂ
Scutari, M., Graafland, C. E. & GutiÃ©rrez, J. M. Who learns better bayesian network structures: Accuracy and speed of structure learning algorithms. Int. J. Approx. Reason. 115, 235â253 (2019).
ArticleÂ MathSciNetÂ MATHÂ Google ScholarÂ
Schwartz, J., Bind, M. A. & Koutrakis, P. Estimating causal effects of local air pollution on daily deaths effect of low levels. Environ. Health Perspect. 125(1), 23â29 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Zhang, Y., Gen, Y. & Luo, G. Causal direction inference for air pollutants data. Comput. Electr. Eng. 68, 404â1411 (2018).
ArticleÂ Google ScholarÂ
Nethery, R. C., Mealli, F., Sacks, J. D. & Dominici, F. Evaluation of the health impacts of the 1990 clean air act amendments using causal inference and machine learning. J. Am. Stat. Assoc. 16(1), 1â12 (2020).
Google ScholarÂ
Kementrian Lingkungan Hidup dan Kehutanan: Indeks Standar Pencemaran Udara (ISPU) Sebagai Informasi Mutu Udara Ambien di Indonesia (2022, accessed 10 Jul 2022); https://ditppu.menlhk.go.id/portal/read/indeks-standar-pencemar-udara-ispu-sebagai-informasi-mutu-udara-ambien-di-indonesia.
Pearl, J. Causality Models, Reasoning and Inference (Cambridge University Press, 2000).
MATHÂ Google ScholarÂ
Spirtes, P., Glymour, C. & Scheines, R. Causation, Prediction, and Search (The MI Press, 2001).
BookÂ MATHÂ Google ScholarÂ
Pearl, J. Causal inference in statistics an overview. Stat. Surv. 3, 96â146 (2009).
ArticleÂ MathSciNetÂ MATHÂ Google ScholarÂ
Colombo, D., Maathuis, M. H., Kalisch, M. & Richardson, T. S. Learning high dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40(1), 294â321 (2012).
ArticleÂ MathSciNetÂ MATHÂ Google ScholarÂ
Kalisch, M., MÃ¤chler, M., Colombo, D., Maathuis, M. H. & BÃ¼hlmann, P. Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1â26. https://doi.org/10.18637/jss.v047.i11 (2012).
ArticleÂ Google ScholarÂ
Kalisch, M. & Buhlmann, P. Estimating high dimensional directed acyclic graphs with the PC algorithm. J. Mach. Learn. Res. 8, 613â636 (2007).
MATHÂ Google ScholarÂ
Cui, R., Groot, P. & Heskes, T. Copula PC algorithm for causal discovery from mixed data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2016).
Walpole, R. E., Mayers, R. H. & Myers, S. L. Probability and Statistics for Engineers and Scientists (Prentice Hall, New Jersey, 2011).
Google ScholarÂ
Meloun, M. & MilitkÃ½, J. Statistical Data Analysis A Practical Guide (India PVT LTD, 2011).
BookÂ MATHÂ Google ScholarÂ
Colombo, D. & Maathuis, M. H. Order independent constraint based causal structure learning. J. Mach. Learn. Res. 14(2014), 3921â3962 (2016).
MATHÂ Google ScholarÂ
Hochreiter, S. & Schmidhuber, J. Long Short Term Memory. Neural Comput. 9(8), 1735â1780 (1997).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Gers, F. A. & Schmidhuber, J. LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Trans. Neural Netw. 12(6), 1333â1340 (2001).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Zhao, Z., Chen, W., Wu, X., Chen, P. C. Y. & Liu, J. LSTM network a deep learning approach for short term traffic forecast. IET Intel. Transport Syst. 11(2), 68â75 (2017).
ArticleÂ Google ScholarÂ
Tsai, Y.-T., Zeng, Y.-R. & Chang, Y.-S. Air pollution forecasting using RNN with LSTM. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) 1074â1079 (2018).
Belavadi, S. V., Rajagopal, S., Ranjani, R. & Mohan, R. Air quality forecasting using LSTM RNN and wireless sensor networks. In The 11th International Conference on Ambient Systems, Networks and Technologies (ANT) April 6â9, 2020, Warsaw, Poland 241â248 (2020).
Poornima, S. & Pushpalatha, M. Prediction of rainfall using intensified LSTM based recurrent neural network with weighted linear units. Atmosphere 10, 1â18 (2019).
ArticleÂ Google ScholarÂ
Alhirmizy, S. & Qader, B. Multivariate time series forecasting with LSTM for Madrid, Spain pollution. In 2019 International Conference on Computing and Information Science and Technology and Their Applications ICCISTA 1â5 (2019).
Ghanbari, R. & Borna, K. Multivariate time series prediction using LSTM neural networks. In 2021 26th International Computer Conference, Computer Society of Iran CSICC 1â5 (2021).
Cho, K., Merrienboer, B. V., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation encoder decoder approaches. In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation SSST-8 103â111 (2014).
Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent Neural Networks for multivariate time series with missing values. Sci. Rep. 8, 1â12 (2018).
ArticleÂ Google ScholarÂ
Zhou, X., Xu, J., Zeng, P. & Meng, X. Air pollutant concentration prediction based on GRU method. In IOP Conf. Series: Journal of Physics: Conf. Series 1â6 (2019).
Athira, V., Vinayakumar, R. & Kumar, P. S. Deepairnet applying recurrent networks for air quality prediction. In International Conference on Computational Intelligence and Data Science ICCIDS 2018 1394â1403 (2018).
Tao, Q., Li, Y. & Sidorov, D. Air pollution forecasting using a deep learning model based on 1d Convnets and Bidirectional GRU. IEEE Access 7, 76690â76698 (2019).
ArticleÂ Google ScholarÂ
Iglewicz, B. & Myers, R. H. Comparisons of approximations to the percentage points of the sample coefficient of variation. Technometrics 12(1), 166â169 (1970).
Google ScholarÂ
Pusat Database BMKG: Data Harian (2022; accessed 23 June 2022). http://dataonline.bmkg.go.id/home?language=indonesia.
Pavlidis, P., Weston, J., Cai, J. & Grundy, W. N. Gene functional classification from heterogeneous data. In Proceedings of the Fifth Annual International Conference on Computational Biology 249â255 (2001).
Daemen, A., Gevaert, O. & Moor, B. D. Integration of clinical and microarray data with kernel method. In Proceedings of the 29th Annual International Conference of the IEEE EMBS 5411â5415 (2007).
Scutari, M. bnlearnâan R package for Bayesian Network learning and inference (2010, accessed 2 October 2022); https://www.bnlearn.com/.
Scutari, M. Learning Bayesian Networks with the bnlearn R Package. J. Stat. Softw. 35(3), 1â22 (2010).
ArticleÂ MathSciNetÂ Google ScholarÂ
Kusumaningtyas, S. D. A., Aldrian, E., Wati, T. & Atmoko, D. Sunaryo: The recent state of ambient air quality in Jakarta. Aerosol Air Qual. Res. 18(9), 2343â2354 (2018).
ArticleÂ CASÂ Google ScholarÂ
Eslamian, M., Nadimi, E. & Salehi, A. Effect of humidity on gas sensing properties of tin dioxide toward carbon monoxide: A first principle study. In 2017 Iranian Conference on Electrical Engineering (ICEE) 276â278 (2017).
Alvim-Ferraz, M. C. M., Sousa, S. I. V., Pereira, M. C. & Martins, F. G. Contribution of anthropogenic pollutants to the increase of tropospheric ozone levels in the oporto metropolitan area, portugal since the 19th century. Environ. Pollut. 140, 516â524 (2006).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Wu, H., Hong, S., Hu, M., Li, Y. & Yun, W. Assessment of the factors influencing sulfur dioxide emission in Shandong, China. Atmosphere 13, 1â14 (2022).
Google ScholarÂ
Raffee, A. F., Hamid, H. A., Rahmat, S. N. & Jaffar, M. I. The cause-and-effect analysis of ground level ozone (O₃), air pollutants and meteorological parameters using the causal relationship approach. J. Eng. Res. 1, 1â21 (2022).
Google ScholarÂ
Schneidemesser, E. V. et al. Chemistry and the linkages between air quality and climate change. Chem. Rev. 115(10), 3856â3897 (2015).
ArticleÂ Google ScholarÂ
Li, Y., Ma, Z., Zheng, C. & Shang, Y. Ambient temperature enhanced acute cardiovascular-respiratory mortality effects of PM_2.5 in Beijing, China. Int. J. Biometeorol. 59, 1761â1770 (2015).
ArticleÂ ADSÂ PubMedÂ Google ScholarÂ
Khaniabadi, Y. O. et al. Exposure to PM₁₀, NO₂, and O₃ and impacts on human health. Environ. Sci. Pollut. Res. 24, 2781â2789 (2016).
ArticleÂ Google ScholarÂ
Bromberg, P. A. Mechanisms of the acute effects of inhaled ozone in humans. Biochem. Biophys. Acta 12, 2771â2781 (2016).
ArticleÂ Google ScholarÂ
Feng, S., Gao, D., Liao, F., Zhou, F. & Wang, X. The health effects of ambient PM_2.5 and potential mechanisms. Ecotoxicol. Environ. Saf. 128, 67â74 (2016).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Xing, Y.-F., Xu, Y.-H., Shi, M.-H. & Lian, Y.-X. The impact of PM_2.5 on the human respiratory system. J. Thorac. Dis. 8(1), 69â74 (2016).
Google ScholarÂ
Townsend, C. L. & Maynard, R. L. Effects on health of prolonged exposure to low concentrations of carbon monoxide. Occup. Environ. Med. 59(10), 708â711 (2022).
ArticleÂ Google ScholarÂ
Handhayani, T., Lewenusa, I., Herwindiati, D. E. & Hendryli, J. A comparison of LSTM and BiLSTM for forecasting the air pollution index and meteorological conditions in jakarta. In 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) (eds Kartadie, R. & Wibowo, F.W.) 334â339 (2022).

Download references

Acknowledgements

The author thanks Dr. Siti Syuhaida Mohamed Yunus for the meaningful discussions and suggestions for this research.

Author information

Authors and Affiliations

Fakultas Teknologi Informasi, Universitas Tarumanagara, Jakarta, Indonesia
Teny Handhayani

Authors

Teny Handhayani
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

The author contributes to collecting the dataset, running the experiments, and preparing the manuscript.

Corresponding author

Correspondence to Teny Handhayani.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Handhayani, T. An integrated analysis of air pollution and meteorological conditions in Jakarta. Sci Rep 13, 5798 (2023). https://doi.org/10.1038/s41598-023-32817-9

Download citation

Received: 16 October 2022
Accepted: 03 April 2023
Published: 09 April 2023
DOI: https://doi.org/10.1038/s41598-023-32817-9

This article is cited by

Analysis of spatiotemporal distribution of air quality index (AQI) in the state of West Bengal, India from 2016 to 2021
- Buddhadev Ghosh
- Harish Chandra Barman
- Pratap Kumar Padhy
Discover Atmosphere (2023)