CO, NO2 AND NOx URBAN POLLUTION MONITORING
WITH ON FIELD CALIBRATED ELECTRONIC NOSE
Saverio De Vito°*, Marco Piga+, Luca Martinotto+, Girolamo Di Francia°
°ENEA, Centro Ricerche Portici, 80055 Portici (NA), Italy
+
Pirelli Labs, Viale Sarca 222, 20126 Milano, Italy
Abstract
Low cost gas multisensor devices can represent an efficient solution for densifying the
sparse urban air pollution monitoring mesh. In a previous work, we proposed and
evaluated the calibration of such a device using short term on-field recorded data for the
Benzene pollution quantification. In this work, we present and discuss the results
obtained for CO, NO2 and total NOx pollutants concentration estimation with the same
set up. Conventional air pollution monitoring station is used to provide reference data.
We show how a multivariate calibration can be achieved with the use of two weeks long
on-field data recording and neural regression systems. Also for these pollutants, no
significant performance boost was detectable when longer recordings were used. The
influence of an appropriate feature selection for achieving optimal performances is also
discussed comparing long term performance results of the obtained calibrations.
Benefits and issues of multivariate correlation based calibration are evaluated during
one year long measurement campaign.
Keywords: Urban Air Pollution Monitoring; On-field calibration; Electronic nose;
Multisensor device; Feature Selection; Electronic nose design; Artificial neural
networks; Automatic Bayesian Regularization.
1. Introduction
Urban pollution monitoring is currently carried out with networks of industrial
spectrometers based stations. Their dimensions and costs make unrealistic the
realization of a measurement mesh with appropriate density. The pollution diffusion
dynamic inside cities is indeed very complex due to the intricate pattern of roads and
buildings. Actually, the typical monitoring mesh granularity exceeds several kilometres
and this can hamper the representativity of gathered data in terms of the actual
pollutants concentration distribution in city areas. As an example, roads surrounded by
high buildings can undergo very high pollution levels due to the so called canyon
concentration effect even if a nearby monitoring station report extremely low pollution
levels [1-3]. On the other hand, real time knowledge of the actual pollution
concentration pattern in city areas is considered crucial for implementing savvy traffic
management and air pollution management policies [4].
During the last years, some authors have proposed the use of gas multisensor devices as
a tool for densening the urban pollution monitoring mesh due to the significantly low
cost per unit [4],[6-9]. Their reduced dimensional impact make them suitable even for
cities historical centres where they can be easily hidden preserving architectural heritage
fruition.
Unfortunately, the intrinsic long term stability and selectivity issues that affect the solid
state sensors they rely on, can severely limit their reliability, in particular when
compared with the well established performances of conventional analyzers. Some
studies confirms how, when calibrated for single specie quantification in laboratory,
they exhibit poor performances when operating in real conditions, revealing a low
*
Corresponding author: tel.+390817723364, fax.+390817723344, e-mail:
saverio.devito@portici.enea.it
reliability of conventional in-lab calibration process when dealing with complex
mixtures [5-6].
Recently, some on-field calibration strategies, have been proposed to overcome these
issues [6-8]. In particular, our group proposed the use of a spectrometers based station
as a reference for the on-field calibration of a small multisensor device obtaining very
encouraging results for benzene quantification purposes. We also focused on the
optimal duration of the calibration process finding that a neural network, trained using a
ten days measurements set, was capable to gain a relative estimation error of less than
0.02 (2%) over more than 6 months [8]. A long term performance degradation was
observed and interpreted as caused by modification in different species absolute and
relative concentration in the Winter time. A recalibration procedure, performed every 6
months, has been found to effectively restore the performance level. In a similar frame,
Tsujita et al. proposed an interesting automatic on-field recalibration scheme that could
be useful for facing at least the sensor stability issues [9].
Pattern recognition algorithms are often used as automated multivariate analysis tools in
gas sensing but their performances are rarely discussed especially for regression
problems, e.g. quantitative concentration estimation problems [10-11]. In these
problems, sensor fusion subsystems are devised for approximating Ψ in:
Cj=Ψ(RSens1, RSens2,...RSensn)
(1)
where Cj is the real j-th pollutant concentrations as measured by conventional stations
and RSensi is the i-th sensor resistance measured by the multi-sensor device.
In a complex scenario such as urban pollution monitoring, an in-deep analysis of the
results obtained by such a system can significantly help the feasibility assessment,
guiding the development of appropriate monitoring strategies (device positioning,
calibration methodology and long term performance assessment) as well as the
instrumentation design and validation process.
In this paper, we study the performance assessment of on-field calibrated multisensor
device, presenting the results obtained for CO, NOx and NO2 concentration estimation.
We analyze the results trying to reach an in deep knowledge on the main performance
drivers in the selected scenario validating the hypothesis by correlation analysis, feature
selection and long term performance assessment.
2 Experimental Setup
A multisensor device, developed by Pirelli Labs (see ref. [12]), has been co-located with
a conventional air pollution analyzer, operated by the regional environmental protection
agency (ARPA). The conventional analyzer response has been used to provide the true
concentration values of the target pollutants at the measurement site. These values have
been hence used as a reference for the tuning of a multivariate regression system
designed for calibrating the multisensor device response. This multisensor device was
designed to host five custom metal oxide chemoresistive sensors plus two commercial
Relative Humidity and Temperature solid state sensors. It is built on a very compact
design (volume=9.7*10-3 m3) and was easily deployed in the operative scenario due to
its limited weight (less than 2.5 Kg). An on board microcontroller provided the
computing platform, controlling analog to digital conversion, while an on board
secondary memory device provided local storing capacity for up to 72 hrs of data
measurement, with a 8 s sampling period. Hourly average of measurements was being
transmitted via a GPRS modem toward PC class data sinks. A more detailed
instrumentation and setup description can be found in [8].
Conventional fixed station provided reference concentration estimation for CO (mg/m3),
non-metanic hydrocarbons (NMHC) (µg/m3), C6H6 (µg/m3), NOx (ppb), NO2 (µg/m3). It
was sampled recording hourly averages of the concentration values. Unfortunately,
after only 8 days the NHMC analyzer went out of service. The multisensor device was
sampled to provide the hourly average of the resistivity expressed by CO, NOx, O3 and
NO2 specific Metal Oxide (MOX) chemiresistors, a NMHC targeted MOX sensor whose
characteristics are also detailed in [8], plus the commercial Temperature and Relative
Humidity sensors. Measurement campaign took place using as testing site one of the
main street in the centre of an Italian city, characterized by heavy car traffic. Data
acquisition campaign lasted from March 2004 until April 2005, building up a suitable
one year long data set for the devised application.
3. Experimental Scenario and methods discussion.
According to the results of our previous investigation, the testing site was characterized
by the presence of significant correlations among the recorded species concentrations
time series [8]. In order to generate a non linear multivariate calibration for the
multisensor device, a back propagation neural network architecture was designed to
solve a typical regression problem. The system was actually trained to estimate real
pollutant concentrations hourly mean, as provided by the conventional analyzer, using
the hourly mean of electrical resistance expressed by the multisensor device
chemiresistors used as input features.
In the framework of statistical pattern recognition, full regression problems are
characterized by the need of approximating the ’a posteriori’ distribution of target
random variables given the values of the measured variables (see [10]). Practically, the
typical statistical regression system works by approximating the functional relationships
(if existing) among measurements in the ‘feature space’ and expected output values
starting from a predefined internal model and using a limited data set (training set) as
examples to tune it. If the training set choice is representative of the real data statistical
distribution we can expect the trained regressor to be able to correctly estimate the
searched functional relationships retaining appropriate generalization properties. Of
course, noise both on the measurement and target variables will affect the overall
estimation performance at various extents.
In our pollutant concentration estimation scenario, we can expect reasonably good
performances if the sensors pool, actually composing the feature space, includes a
sensor unit showing good selectivity and stability properties for each target pollutant. In
this case univariate calibration can represent a viable solution. Usually, this is very
difficult to obtain for the intrinsic properties and limitations of solid state sensors.
Nice performance levels could be expected even if the target specific sensor is prone to
significant selectivity issues but, in this case, a properly devised sensor pool could be
capable to describe the influence of interferents on the target-specific sensor response.
This can, for instance, occur if a sensor pool subset response is strongly correlated to
interferents concentrations. In this case, a statistical regressor could extract sufficient
knowledge about interferents concentrations and their effects on the other gas sensors
responses, so to model such influences during the training phase and provide induced
error cancellation. For instance, if CO concentration estimations are affected by the
presence of a particular inteferent specie, say H2S, a sensor fusion subsystem can learn
to estimate and subtract the influence on the CO targeted sensor given you can measure
H2S concentrations and you have a representative set of training samples to model it.
In our scenario, the usual presence of significant correlation among pollutant
concentrations can also help a statistical regressor to extract knowledge about a
particular gas concentration even if the related sensor performances are low or,
ultimately, if no specifically targeted sensor is present in the measurement sensor pool.
In this case the target gas concentration is estimated starting from the response of
sensors towards gases whose concentrations are strongly correlated, not necessarily
linearly, with the concentration of the pursued gas. This feature can have a strong
drawback due to possible changes in the relative concentrations of gases on seasonal or
spatial basis, as it is, for example,
the differential increase of major pollutants
concentration (see for example ref. [3]) and the increase of NOx emissions by domestic
heating both measurable in the Winter season. Actually, these changes could severely
hamper the significance of the training set related knowledge.
Last paper was focused on the results of benzene concentration estimation. We showed
how, in that case, it was possible to estimate the concentration of a pollutant for which
no specific sensor was present in the selected sensor pool, using the information brought
in by the NMHC sensor. Actually, Benzene is the most significant part of NMHC
related pollution in the selected scenario. We also found a measurable performance
degradation using a Spring time obtained calibration during Winter time due to the
above mentioned house heating pollution related effect.
In this work, we show results obtained calibrating feed forward neural networks in
regression scheme for NOx, NO2 and CO pollutant concentrations, in the same
applicative scenario so to check the generalization of benzene related findings.
Investigations on the performance relationships with training length have been
performed in order to compare the results with those obtained for benzene. Influences of
feature selection and correlation based learning have been investigated and evaluated.
Using conventional station output, neural networks have been trained and simulated in
the Matlab environment; hyperbolic tangent has been selected as
hidden neuron
transfer function. Levenberg-Marquardt training algorithm was chosen for the
experimentations, early stopping and automatic bayesian regularization (ABR) were
used as complexity control algorithms to avoid over training issues [13-14]. Training
sets were built using consecutive samples, i.e. using fixed length specific interval of
campaign data, while validation (when used for early stopping) was built by randomly
choosing samples from the remaining data. Two approaches have been devised for
uncertainty estimations on performance and for generalization purposes, one of which
based on crossvalidation.
Automatic bayesian regularization is a neural network capacity control technique for
neural networks training; it was introduced in the late 90s, but its usage is still
uncommon in the electronic nose community. Training the regressor internal model
usually involves the minimization of an error function. In conventional neural networks
backpropagation training, the error function basically take only into account the
empirical error (i.e. the error on training set) while model capacity (i.e. the number of
hidden neurons) is fixed and solution complexity should be carefully controlled. This
training data centric procedure can easily lead to data overfitting. Approaches based in
regularization theory
(see ref. [10] for a detailed description) try to balance the
empirical error with a term related to model complexity, their global error function
being described by:
n w
k
n
2
(2)
where n is the running training samples index,
n the empirical error function
computed on the n-th of k total training samples and , are the regularization
parameters. In neural networks, the w term take into account the network weigths
values, trying to estimate network model complexity by their norm. In appropriate
conditions, pursuing the minimization of (2) produces minimization of such superfluous
connection weight values that could produce overtraining. This typically allows also for
a smoother network response, and the effect is similar to what obtainable by superfluous
network connections pruning. In fact, as regard as network architecture, by pursuing
regularization, any modestly oversized network should be able to approximate
regression function retaining good generalization properties.
In order to avoid both underfitting and overtraining issues, / value should be, of
course, carefully tuned and this adds free parameters, to be optimized.
ABR is a regularization based training methodology that try to estimate and , in a
bayesian framework (see [10],[15]), starting from initial hypothesis on weights
statistical distribution, while simultaneously minimizing (2) by using LevenbergMarquardt network training algorithm. In the Bayesian framework weights are
considered as random variables as their posterior probability density can be described
as:
P( w | D, , , M )
P( D | w, , M ) P( w | , M )
P( D | , , M )
(3)
D being the gathered data set and M the particular neural network model.
Assuming Gaussian distribution for prior network weights distribution and training set
noise, the optimal weigths maximizing (3) also minimize the regularized error function
given in (2).
In ABR, each network training step adds knowledge on weights
distribution practically obtaining an approximation, say w’, of the weigth values w MP
that maximize posterior probability in (3). This knowledge can be used to update
suboptimal and values pursuing the maximization of
P( , | D, M )
P( D | , , M ) P( , | M )
P( D | M )
(4)
In fact, adding the likely assumption of P( , | M ) being uniform then, after simple
analytical steps, approximations of MP and MP can be found using (3):
'
2w
' 2
, '
k
2 n ( w ' )
(5)
n
with being the so called effective parameters number, i.e. a measure of the number of
neural network parameters that are effectively in use of error function minimization.
The value of is computed as
N 2 tr ( H ) 1
(6)
using the Hessian matrix H of the objective function whose Gauss-Newton
approximation is readily available by using Levenberg-Marquardt training algorithm.
The obtained approximation of MP and MP , in turn, modifies eq. (2) and hence the
best w values obtainable by a further training iteration. The process is iterated until
convergence. Note that, following network weights initialization, and are
initialized at 0 and 1 respectively before taking the first step of the training algorithm.
4. Results and Discussion
Initially, in order to allow the reader to compare the error level for CO, NOx and NO2
with benzene related findings reported in our previous work, a preliminary regression
experiment has been set up. A two weeks long data segment, starting from the first day
of measurements and using all the sensor responses, has been used as training set. Only
in this preliminary experiment, a validation set consisting of 40% of the remaining
samples was randomly extracted and reserved for the implementation of early stopping
strategy. Remaining samples were used as test set for performance evaluation. Three
different network were devised each one targeted to a specific analyte concentration
estimation problem and Levenberg Marquardt algorithm used for training purposes. The
number of hidden layer was empirically fixed to 25 for all the three specie while results
assessment was repeated 20 times and averaged in order to reduce local minima and
initial weight choice influences. Mean Relative Error (MRE) computed as
MRE
1 n y i xi
n i 1 yi
(7)
where yi is the true concentration value at the i-th sample belonging to the test set, xi
the network concentration estimation and n the number of test samples involved, was
set as primary performance estimator both for this and for the following
experimentations. We found MRE(NO2) to be equal to 0.26, MRE(NOX)=0.42 and
MRE(CO)=0.32. Although MRE figures are significantly worse than those computed for
benzene with similar settings (MRE = 0.02), even in this case the ratio between Mean
Absolute Error (MAE), computed as
MAE
1 n
y i xi
n i 1
(8)
and the single specie yearly concentration range was encouraging, setting at 3% for
CO, 8% for NO2 and 7% for NOx. As a reference, their concentrations units of
measure was, respectively, mg/m3, µg/m3, ppb. The most relevant negative impact on
relative error was found at low concentrations levels typically reached at night.
It is also interesting to note that, for all these three species, the performance obtained on
the normalized training set in terms of Mean Squared Error (MSE) are typically by
three order of magnitude than those obtained for Benzene in the same setup. This result
is explained with a significant “noisy” behaviour of feature space variables (sensor
resistances) in response to these specie concentrations confirming the definitely worse
descriptive power, in operative conditions, of the sensor array as regards as these
species concentrations.
It should be underlined that the use of early stopping makes the above results depend on
the merge of training and validation set related knowledge and not only on the training
set data alone. However, similar but more rigorous results, as regards as training set
length influence on performance, were obtained without the use of any validation set
and are reported in the following paragraphs that represent the core of this paper,
building up the basis for the drawn conclusions.
4.1 Optimal training length estimation
As a second investigation, we wanted to test the optimal training duration when using
consecutive samples. In the Benzene case, a ten days long training set proved capable to
recover the expected daily and weekly cycle of the pollutant concentration statistical
distribution variation. Our hypothesis was that results obtained for benzene could be
generalized as well for the other species.
For this reason, we set up a training-evaluation procedure using the whole sensor array
response and different training set lengths for CO and NO2. Networks training was
conducted
using
Levenberg-Marquardt
algorithm
with
Automatic
Bayesian
Regularization, for complexity control, so without the use of any validation set. Hidden
layer number was initially set at 5 by using preliminary results.
As a first experiment we set up different length consecutive samples training set starting
from the first sample of the available data set (March, 1st). For each selected training
length, all the remaining samples were used as test set. In fact, each training duration
produced a different regressor with different performances that we wanted to compare
by using different length test sets. In a similar setup it is important to evaluate
uncertainties on the performance estimation in order to understand significance of the
reported differences possibly achieving generalization of findings. The primary source
for uncertainties on performance estimation that should be tackled is the use of a limited
and different duration test set. It is also well known that the outcome of the training
process of a neural network itself, is prone to uncertainties due to the initial choice of
weights and related local minima issues.
So, for each trained network characterized by its own (training, test) set couple,
concentration estimation errors observations on test set were computed. For each
training length the procedure has been repeated for 20 times in order to reduce the
impact of initial NN weights choice, however variability on the performance indicators
was negligible except in the case of CO concentration estimation for a training length of
96 hours for which test results were unstable.
As regards as uncertainty estimation, errors observation appeared not to be i.i.d.;
actually they shown to be significantly autocorrelated. In fact, their autocorrelation
function (ACF) was found definitely not being impulsive, instead absolute error ACF
was only slowly fading under the white noise confidence intervals for ACF. For this
reason, confidence intervals on samples mean could not be computed using CLT and t
statistic just because observations failed to meet the i.i.d. (independent and identically
distributed) requirement. In order to correctly compute the confidence intervals for the
MRE and MAE, a novel algorithm, proposed by Zhang in [16] for the calculation of the
uncertainty on the sample mean of autocorrelated measurements, was used.
This
methodology is based on a weak stationarity hypothesis that allow for an Auto
Regressive Moving Average modeling of the measurement process. It is interesting to
note that the so computed confidence intervals are notably wider that the classic ones as
one can expect given a lower number of independent variable observations in
autocorrelated time series. It should be also noted that absolute errors observations
show a notable trend in time, in this situation, author warn that the use of the MAE
averaged value itself and the corresponding variance to characterize the regression
systems may be misleading. For this reason, although they lead to the same conclusions,
we based our comparisons on MRE computed values as a primary performance
indicator. Results for CO and NO2 have been reported in table 1 and 2, respectively.
They show that for that year no particular benefit could be achieved by extending the
duration of a training set starting form the first available sample for more than two
weeks, when estimating pollutant concentrations over the remaining samples of that
year.
A further experiment has been conducted for the sake of comparison, i.e. for comparing
multiple training set durations in a slightly different setup. This time, we have used
constant test set duration and fixed test set location in the data set time series. All the
regressors were tested on the same test set i.e. the campaign final six months. Training
sets start time, training procedures and regressors’ architecture were the same as in the
previous experiment. Altough this setup may be correctly perceived as penalizing for
short length training sets whose entire samples are located much before the start of test
set, the results may represents an ideal comparison for the above presented ones.
Results, presented in table 3, shows a substantial confirmation of what already found.
Training and test localization in a particular time in the above setup may cast doubts on
the possibility to generalize findings. Of course, the same apply to the training set
choice process. For this reason, in order to compare and generalize performance figures
resulted from the use of different training set length choices we decided to apply a
particular crossvalidation approach that could take into account training (and test
samples) located in different time locations. We set up the following cross validation
procedure: in the entire campaign year, considering a fixed test set length h
corresponding to 6 Months, and for each selected training set duration i, then k mutually
exclusive, consecutive training sets Trmi, m being the training set index, have been
considered, with
k
(TotalDBLentgh h)
i
(9)
Actually, for each training set choice Trmi, made by i consecutive samples, the h
immediately following consecutive samples have been considered as its specific test set
Temk. For each Trmi , Temh couple, the neural network has been trained with ABR for 20
times. MRE, Relative Error Standard Deviation (STD_RE), MAE, Absolute Error
Standard Deviation (STD_MAE) and Squared Correlation Coefficient (SCC) computed
over each Temh , were used as performance indexes. Even in this case and for each
couple, the results variability due to the NN training was found to be negligible,
confirming ABR positive impact. Then, for each i, the k results coming from the
different Trmi , Temh
couples have been considered as i.i.d. observations for
performance figures and were finally averaged and their relative uncertainty computed
so to correctly compare results coming from the different i dependant regressors.
In this case, we are evaluating the estimation capabilities of various regressors, each
trained with a different length training set, tested on a fixed number of test samples
immediately following the training set.
Obtained results for CO and NO2 are shown in table 4 and 5, respectively. Fig. 1 depicts
graphically MAE behaviour together with confidence intervals when changing training
length for CO concentration estimation problem.
Single performance figures were significantly worse than that which has been found for
the benzene case for which best MRE was found to be equal to 0.02. However, even for
these two species, a relatively short length training set seems to be able to produce
results that are very near to what obtainable with far longer data sets whether being
significantly different from those obtainable at the shortest training set length. By using
longer training set a slow positive trend is observable but uncertainty rapidly grow after
700 samples length for which we are not able anymore to reliably distinguish among
average error performance. In fact, considering the performance figures observations as
normally distributed (Kolmogorov-Smirnov test), a two samples right tailed Welch
test1, testing against an H0 hypothesis of no significant difference in the mean,
conducted for MAE observations in CO with test set lengths of 96 and 360 samples
resulted in a p=0.02, thus leading to reject H0. Conversely, Null Hypothesis cannot be
1 Behrens-Fisher like settings, using Satterthwaite approximation for pooled variance at
alfa=0.05.
rejected for any settings using 360 length test set together with longer training sets
related results. Similar behaviour is found for MRE estimations, e.g. using a test length
equal to 72 and 360 resulted in a p=0.04. Similar tests have been conducted for NO2
also, resulting in a substantial confirmation.
Overall results confirm the feasibility of the on-field approach for the calibration of
multisensor devices in the city air pollution scenario, showing how a limited training
campaign can be used for computing an optimal calibration for several of the following
months. Absolute performance level however, depends on the descriptive power of the
selected sensor array for the specific pollutant concentration estimation problem.
4.2 Feature Selection
A further investigation involved testing the influence of a feature selection procedure on
the obtainable performance in our scenario. Feature selection can broaden the
experimentalist’s knowledge about the information contribution brought in by the single
sensor to the concentration estimation problem for each specie. This is particularly true
if the selection is conducted using a brute force approach, i.e. exploring all available
feature set combinations and their performance, though it can be very computationally
expensive and hence it is not always a viable choice.
In our scenario, we have analysed the performances obtainable, in terms of MRE and
MAE, by using different subset of the original sensor array for estimating NO2, CO and
NOx with separate regressors. Training set length was fixed to 10 days, starting from
the first campaign day, according to the previous experimentations; the remaining
samples has been used as test set. Confidence interval has been computed using Zhang
method for uncertainty estimation for sample means of autocorrelated data series.
Wilcoxon signed rank test (nonparametric paired samples with no normality
assumption) has been used for evaluating the significance of observed performance
differences at alfa=0.05. In order to estimate real operative system performance, feature
selection validation and overall performance testing should be performed on different
set, but our goal, as above mentioned, was different. i.e. investigating if there is
significant effect of feature selection in this particular scenario.
Interestingly, for NO2, the best performance values have been obtained using all
the sensors responses; e.g. using NOx and NO2 sensors responses, CO and NMHC
sensors, or NOx, NO2 and CO led to respectively 0.37, 0.29 and 0.30 MRE while using
the entire gas sensor array led to a 0.22 MRE (see Table 6 and Fig.2). Performance level
that are not significantly different can be obtained by coupling NO2 sensor response
with both CO and NMHC sensors response. This results confirms the possibility of
achieving suitable performances, even when specific sensors are expressing low overall
performance, by coupling their responses with information coming from sensors whose
response is directed towards species showing high correlation factors with the specie
under analysis (see [8]).
Table 6 confirms the influence of temperature and humidity on on-field specific
sensor response calibration, in this case when NOx and NO2 sensors are concerned.
Their sensor response led to a significant performance increase when coupled with the
response of single NOx sensor, single NO2 sensor, NOx+NO2 sensors, NOx+NO2+O3
sensors. No improvement has been recorded when coupling RH and T sensor response
to feature vectors containing CO, NMHC and NO2.
Results obtained during CO directed sensors set selection, which are depicted in table 7,
revealed that a non linear function of NMHC sensor response is able to follow the CO
concentration significantly better than a CO targeted sensor response based one. This is
probably due, again, to the high correlation factor existing between CO and Benzene
concentrations in the selected scenario (r=0.93, SCC=0.86) and the very good
performances of the NMHC targeted sensor. Best performance are achieved selecting a
subset of the sensor array using both CO and NHMC sensor responses while inserting
NO2 and NOX sensor responses led to worse performance values. Fig. 3 depicts the
differences between concentration estimation provided by the model and the true
concentration as reported by the conventional analyzer, while fig. 4 reports MAE results
of this particular feature vector setup versus training set length choice.
Performance assessment was conducted by using ABR according to the procedure
explained in the previous chapter with significant reduction of time needed by
conventional procedures. As a control check, we report results of a conventional neural
network hyperparameters optimization procedure for the CO, NMHC feature vector.
Complexity control was achieved by controlling the target empirical error to be chosen
in the [10e-3, 2x10e-3, 4x10e-3] set while the number of neurons varied in [5,10,15,20]
set. Early stopping was not selected because it would have hampered results comparison
by modifying the test set duration. For each empirical error and number of hidden
neurons selection results had to be averaged through 20 different training runs. Best
results were obtained for 10 neurons and 2x10e-3 target empirical error and were very
similar (MRE=0.27, MAE=0.36) to the one obtained with ABR using the same feature
vector. The same procedure was applied to the entire array and led again to similar
values (MRE=0.32, MAE=0.47) as referred to what obtained by ABR.
Summarizing, the major difference with respect to Benzene concentration evaluation is
due primarily to the worse performance of CO, NOX and NO2 sensor in estimating their
target gas. However, the presence of other sensors significantly contribute to mitigate
this issue as a result of correlation effects.
These results warn against the unselective usage of all the sensor lines when training a
sensor fusion subsystem in this scenario. Thus they also strongly suggest the use of
different sensor fusion subsystem each one trained for the prediction of a single
pollutant. In this case, feature selection algorithms could help to select the optimal
composition of the array to be used for each pollutant concentration prediction problem.
In table 8 and 9 we report the results of the performance evaluation versus training set
duration for the CO by using the best feature vector composition, using, respectively,
the single test set and crossvalidated approach depicted in section 4.1. Results confirm
the optimality of a two weeks length choice regardless of feature vector composition
while, of course, performance are generally better.
4.3 Performance estimation over time
Finally, starting from a ten days training set length we have investigated the MAE
performance index for CO in a week by week fashion over 55 weeks so to check their
evolution over time. In addition, by using several feature vector composition, we
investigated the influences of the use of different sensors responses on the concentration
estimation during time.
As we found for Benzene, short time performance indexes computed for this specie
revealed to degrade in time, specifically, significant performance hits are detectable
after 6 months from the calibration set end (see Fig. 5). Experimentally, we found that,
with respect to the use of all available sensor responses, using only CO sensor response
cause a small performance degradation in the Summer time and a significant
performance boost in the Winter time. In particular, we believe that this is explanable
by the definite changes in the concentration distribution of NOx and also NO2 during
Winter time. These changes disrupts relationships learnt by the regression system by
looking at the Spring time training set. Instead, the combined use of CO and NMHC
sensor responses have a positive effect during all test set. Summarizing, the descriptive
power of NOx and NO2 sensors and the correlation of their target gases with CO has less
weight in the overall performance economy with respect to the misleading effects
brought in by their targets distribution changes in the Winter time (see Fig. 6 a, b and c).
As regards as NO2 week by week estimation error (see Fig. 7 and 8), we see, as above
mentioned, that the all sensor based approach is able to obtain best performance by
exploiting correlation positive effects, even here, however, performance worsening in
Winter time is evident.
The combined findings reported in [8] and in the present paper lead us to consider as
feasible the use of field data to obtain a suitable calibration for multisensor devices in
city air pollution monitoring framework. These calibrations, however, suffer from long
term performance degradation and are sensitive to relative concentration distribution
changes appearing on seasonal basis on as a results of particular events. This is far more
evident when, due to poor specific sensor performances, the best statistical regressor has
to rely on scenario-specific specie correlation factors. It remains still to prove if a site
specific calibration can be used for concentration estimations at short distance sites. In
this case, we believe that performance will be ultimately related to the site dependence
of the specie correlation patterns that should be investigated.
5. Conclusions and further works
In this work we have further investigated the feasibility of the on field calibration
approach for a multisensor device in the air pollution monitoring scenario. The results
obtained for typical city air pollutants, i.e. CO, NO2, NOx and Benzene Have been
reported and discussed Overall results confirm that a relatively short training set
duration, about two weeks long can cope with the cyclic behaviour of concentrations
distribution leading to optimal performances and being sufficiently insensitive to
outliers. This is very encouraging for the possibility of using multisensor devices,
together with mobile conventional analyzers for the purpose of on-field calibration, for
densening the sparse pollution monitoring network especially in the city historical
centres.
Feature selection results definitely suggest the use of multiple regressors, each one
being specifically designed and trained for a particular specie concentration estimation.
In this way the designer is free to select the best feature set for the specific regression
problem obtaining optimal results.
Feature selection analysis has furthermore confirmed the influences of scenario related
species concentration correlations and specific sensors performances on the overall
estimation scores. When specific sensors fails to correctly follow their target specie
concentration, the existence of a strong correlation between this specie concentration
and another one whose targeted sensor shows good performance may help the array to
recover, ultimately leading to significantly better performance levels. However, in this
case, changes in species relative statistical distribution, such as those that are due to
Winter time house heating effect, may have a negative impact in the long term
performances of a neural calibration. The same negative impact could probably be
observed when using a calibration obtained for a specific site showing site specific
multivariate concentration distribution and operating the multisensor device in a
significantly distant site.
Acknowledgments.
This work has been partially founded by Pirelli Labs. The authors wish to thank prof.
Ciro D’ Elia of University of Cassino for helpful discussions on autocorrelated time
series and cyclostationary processes.
References
[1] N.A. Mazzeo, L. E. Venegas, Evaluation of turbulence from traffic using
experimental data obtained in a street canyon, Int. J. Environ. Pollut. 25 (2005) 164176.
[2] S. Vardoulakis, B. E. A. Fisher, K. Pericleous, N. Gonzalez-Flesca, Modelling air
quality in street canyons: a review, Atm. Environ. 37 (2003) 155-182.
[3] K. A. Kourtidis, I. Ziomas, C. Zerefos, E. Kosmidis, P. Symeonidis, E.
Christophilopoulos, S. Karathanassis, A. Mploutsos, Benzene, toluene, ozone, NO2 and
SO2 measurements in an urban street canyon in Thessaloniki, Greece, Atm. Environ. 36
(2002) 5355-5364.
[4] B. Croxford, A. Penn, B. Hiller, Spatial distribution of urban pollution: civilizing
urban traffic, 5th Symposium on Highway and urban pollution, Copenhagen, 1995.
[5] G. Martinelli, M. C. Carotta, G. Ghiotti, E. Traversa, Thick film gas sensors based
on nano-sized semiconducting oxide powders, MRS Bull. 24 (1999) 30-36.
[6] M. Kamionka, P. Breuil, C. Pijolat, Calibration of a multivariate gas sensing device
for atmospheric pollution measurement, Sens. Actuators B Chem. 18 (2006) 323-327.
[7] M.C. Carotta, G. Martinelli, L. Crema, C. Malagu, M. Merli, G. Ghiotti,
E.
Traversa, Nanostructured thick-film gas sensors for atmospheric pollutant monitoring:
quantitative analysis on field tests, Sens. Actuators B Chem. 76 (2001) 336-342.
[8] S. De Vito, E. Massera, M. Piga, L. Martinotto and G. Di Francia, On field
calibration of an electronic nose for benzene estimation in an urban pollution
monitoring scenario, Sens. Actuators B Chem. 129 (2008) 750-757.
[9] W. Tsujita, A Yoshino, H. Ishida, T. Moriizumi, Gas sensor network for airpollution monitoring, Sens. Actuators B Chem. 110 (2005) 304-311.
[10] C.M. Bishop, Pattern Recognition and Machine Learning, Springer Science, 2006,
ISBN 0-387-31073-8.
[11] M. Pardo, G. Sberveglieri, Remarks on the use of multilayer perceptrons for the
analysis of chemical sensor array data, IEEE Sens. J. 4 (2004) 355-363.
[12] http://www.pirellilabs.com.
[13] M. T. Hagan, M. B. Menhaj, Training feedforward networks with the marquardt
Algorithm, IEEE Trans. Neural Networks 5 (1994) 989-993.
[14] Foresee, F.D., and M.T. Hagan, "Gauss-Newton approximation to Bayesian
regularization," Proceedings of the 1997 International Joint Conference on Neural
Networks, 1997, pp. 1930-1935.
[15] MacKay, D.J.C., "Bayesian interpolation," Neural Computation, Vol. 4, No. 3,
1992, pp. 415-447.
[16] Nien Fan Zhang, “Calculation of the uncertainty of the mean of autocorrelated
measurements”, 2006 Metrologia 43 pp.276-281.
Biographies
Saverio De Vito received his degree in Informatics Engineering from University of
Naples “Federico II” in 1998. During 1998 and 1999 he was a research fellow at
Artificial Vision and intelligent systems laboratory of the above university working on
breast cancer computer aided diagnosis. From 1999 to 2004 he was with a software
house as a R&D technical manager in the framework of satellite based telemedicine,
earth observation and distance learning projects. In June 2004 he joined ENEA, as a
researcher. His research interests include statistical pattern recognition, electronic noses,
wireless sensor networks and computer aided diagnosis. Since 2005 he is contract
professor of Applied Informatics at University of Cassino.
Ettore Massera received his degree in Physics from the “Federico II” University of
Naples in May 1997. He has been working at the ENEA research center in Portici (NA)
from June 2003. At present he is in charge of research activity on gas sensor devices
based on nano-structured materials. Previously he worked on the study of thermal and
optical properties of porous silicon at the Physics Department of University of Naples.
Girolamo Di Francia received his degree in physics from the University of Naples
“Federico II”. In 1985 he started his research activity in the field of fabrication and
characterization of semiconductor solar cells (c-Si, GaAs), formerly in the Ansaldo
comp. in Genova, and then in the ENEA research center of Rome, where he was
appointed full time researcher in 1988. From 1991 he joined the ENEA research center
of Naples where, starting from 1992, he investigated porous silicon based devices. In
1996 he established there the Gas Sensor Laboratory mainly devoted to the fabrication
and characterization of devices based on nanomaterials and on polymers
nanocomposites.
Table 1: CO concentration estimation performances of the neural regression
scheme computed over different training set length. All sensor responses has been
used for feature vector composition, 5 neurons has been implied in the hidden
layer, ABR has been used for complexity control. Small performance enhancement
are obtained by using more than ten days data recording. MAE figures are
expressed in. mg/m3
Hrs
MRE
STD_RE
24
96
240
600
1200
2400
0.49
0.78
0.34
0.31
0.34
0.29
1.08
1.37
0.86
0.85
0.98
0.74
MAE
mg/m3
0.58
1.56
0.53
0.44
0.41
0.46
STD_MAE
mg/m3
0.43
1.19
0.63
0.46
0.42
0.49
SCC
0.77
0.14
0.79
0.81
0.85
0.84
Table 2: NO2 concentration estimation performances of the neural regression
scheme computed over different training set length. All sensor responses has been
used for feature vector composition, 5 neurons has been implied in the hidden
layer, ABR has been used for complexity control. For NO2 a small performance
worsening is observed obtained by extending the ten days length training set.
MAE figures are expressed in μg/ m3.
Hrs
MRE
STD_RE
24
96
240
600
1200
2400
0.61
0.31
0.22
0.23
0.27
0.27
1.04
0.68
0.62
0.66
0.65
0.68
MAE
(μg/m3)
49.4
30.2
23.7
27.1
26.5
28.2
STD_MAE
(μg/m3)
35.0
25.4
21.0
25.9
23.1
23.2
SCC
0.04
0.36
0.66
0.62
0.58
0.54
Table 3: Performance evaluation for different feature vector composition in the
NO2 concentration estimation problem. Ten days long training set has been used.
CO
X
NMHC
Feature Set
NOx
NO2
O3
MRE
T
RH
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
0.30
0.29
0.43
0.50
0.29
0.29
0.30
0.37
0.29
0.30
0.31
0.34
0.31
0.26
0.27
0.22
0.24
0.28
0.24
0.24
MAE
(μg/m3)
40
40
44
54
28
26
31
34
29
32
28
38
26
23
24
19
20
27
20
20
Table 4: Performance evaluation for different feature vector composition in the
CO concentration estimation problem. Ten days long training set has been used.
CO
X
X
X
X
X
NMHC
X
X
X
X
X
Feature Set
NOx
NO2
O3
MRE
T
X
X
X
X
X
X
RH
X
0.41
0.27
0.27
0.35
0.35
0.35
MAE
(mg/m3)
0.72
0.35
0.35
0.48
0.55
0.55
Table 5: CO concentration estimation performances of the neural regression
scheme computed over different training set length by using CO and NMHC solid
state sensor response as feature vector. 5 neurons has been implied in the hidden
layer, ABR has been used for complexity control. No performance enhancement
are obtained by using more than ten days data recording.
Hrs
MRE
STD_RE
12
24
48
72
96
168
240
360
480
540
600
700
800
900
1200
1600
1800
2000
2200
2400
0.2542
0.3010
0.4146
0.4122
0.3991
0.2626
0.2664
0.3154
0.3202
0.3220
0.3047
0.2909
0.2916
0.2801
0.2660
0.2503
0.2519
0.2502
0.2484
0.2460
0.6116
0.8119
1.0000
0.9936
0.9642
0.7360
0.7503
0.8258
0.8497
0.8490
0.8235
0.8062
0.8094
0.7879
0.7590
0.7017
0.7073
0.7023
0.6935
0.6873
MAE
(mg/m3)
0.3882
0.3769
0.4443
0.4471
0.4508
0.3594
0.3518
0.3908
0.3841
0.3872
0.3718
0.3591
0.3582
0.3539
0.3486
0.3484
0.3503
0.3496
0.3520
0.3539
STD_MAE
(mg/m3)
0.3882
0.3935
0.3734
0.3746
0.3839
0.3887
0.3798
0.3766
0.3716
0.3726
0.3690
0.3654
0.3647
0.3689
0.3735
0.3794
0.3832
0.3885
0.3930
0.3985
SCC
0.8736
0.8718
0.8707
0.8716
0.8713
0.8795
0.8794
0.8635
0.8686
0.8670
0.8712
0.8758
0.8770
0.8773
0.8767
0.8793
0.8773
0.8756
0.8750
0.8737
Figure 1: CO concentration estimation MAE, expressed in mg/m3, versus training
set length measured in samples (hours) with related confidence intervals. All
sensor responses have been used as feature vector (crossvalidation setting, see
Table 4 for details).
Figure 2: Hourly concentration estimation of NO2 , expressed in μg/m3, over one
week period. Blue dashed line represent true concentration value as reported by
the conventional analyzer.
Figure 3: Hourly concentration estimation, expressed in mg/m3, of CO over one
week period. Blue dashed line represent true concentration value as reported by
the conventional analyzer.
Figure 4: CO concentration estimation MAE versus training set length measured
in samples (hours) with related confidence intervals, only CO and NMHC sensor
response have been used for feature vector composition (crossvalidation settings,
see table 9 for details).
Figure 5 Qualitative behaviour of the weekly mean absolute error (mg/m3) in the
CO concentration estimation problem. Blue line depicts MAE obtained by using all
the sensors response for the feature vector composition while red line depicts MAE
obtained by using only CO and NMHC sensor response. While there is a small
advantage for the overall approach during the summer time, at the start of the
winter time it accounts for a significantly higher error, this is very likely due to
changes in the relative distribution of NO2, NOx and CO concentrations.
Figure 6: Probability density function (PDF) non-parametric estimations in
summer time (blue, dash-dotted) and in winter time (red solid) for CO (a), NOx (b)
and NO2 (c) concentrations. Significant distribution changes are found for NOx
and NO2 leading to changes in concentration ratios with respect to CO. These
modifications are identified as the main driver for performance degradation in
winter time when NOx and NO2 sensor responses are used for CO estimation.
Figure 7: Weekly mean absolute error in the NO2 concentration estimation
problem. Blue (dots) line depicts MAE obtained by using all the sensors response
for the feature vector composition while green line (squares) depicts MAE
obtained by using NO2, NOx, O3, T and RH sensors response. Black (circle) line
depicts NO2, T and RH sensors based estimation. All sensor approach retain a
significant advantage over most of the test period and obtain best overall
performance scores.
Figure 8: Weekly mean relative error in the NO2 concentration estimation
problem. Blue (dots) line depicts MAE obtained by using all the sensors response
for the feature vector composition while green line (squares) depicts MAE
obtained by using NO2, NOx, O3, T and RH sensors response. Black (circle) line
depicts NO2, T and RH sensors based estimation.