Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
CO, NO2 AND NOx URBAN POLLUTION MONITORING WITH ON FIELD CALIBRATED ELECTRONIC NOSE Saverio De Vito°*, Marco Piga+, Luca Martinotto+, Girolamo Di Francia° °ENEA, Centro Ricerche Portici, 80055 Portici (NA), Italy + Pirelli Labs, Viale Sarca 222, 20126 Milano, Italy Abstract Low cost gas multisensor devices can represent an efficient solution for densifying the sparse urban air pollution monitoring mesh. In a previous work, we proposed and evaluated the calibration of such a device using short term on-field recorded data for the Benzene pollution quantification. In this work, we present and discuss the results obtained for CO, NO2 and total NOx pollutants concentration estimation with the same set up. Conventional air pollution monitoring station is used to provide reference data. We show how a multivariate calibration can be achieved with the use of two weeks long on-field data recording and neural regression systems. Also for these pollutants, no significant performance boost was detectable when longer recordings were used. The influence of an appropriate feature selection for achieving optimal performances is also discussed comparing long term performance results of the obtained calibrations. Benefits and issues of multivariate correlation based calibration are evaluated during one year long measurement campaign. Keywords: Urban Air Pollution Monitoring; On-field calibration; Electronic nose; Multisensor device; Feature Selection; Electronic nose design; Artificial neural networks; Automatic Bayesian Regularization. 1. Introduction Urban pollution monitoring is currently carried out with networks of industrial spectrometers based stations. Their dimensions and costs make unrealistic the realization of a measurement mesh with appropriate density. The pollution diffusion dynamic inside cities is indeed very complex due to the intricate pattern of roads and buildings. Actually, the typical monitoring mesh granularity exceeds several kilometres and this can hamper the representativity of gathered data in terms of the actual pollutants concentration distribution in city areas. As an example, roads surrounded by high buildings can undergo very high pollution levels due to the so called canyon concentration effect even if a nearby monitoring station report extremely low pollution levels [1-3]. On the other hand, real time knowledge of the actual pollution concentration pattern in city areas is considered crucial for implementing savvy traffic management and air pollution management policies [4]. During the last years, some authors have proposed the use of gas multisensor devices as a tool for densening the urban pollution monitoring mesh due to the significantly low cost per unit [4],[6-9]. Their reduced dimensional impact make them suitable even for cities historical centres where they can be easily hidden preserving architectural heritage fruition. Unfortunately, the intrinsic long term stability and selectivity issues that affect the solid state sensors they rely on, can severely limit their reliability, in particular when compared with the well established performances of conventional analyzers. Some studies confirms how, when calibrated for single specie quantification in laboratory, they exhibit poor performances when operating in real conditions, revealing a low * Corresponding author: tel.+390817723364, fax.+390817723344, e-mail: saverio.devito@portici.enea.it reliability of conventional in-lab calibration process when dealing with complex mixtures [5-6]. Recently, some on-field calibration strategies, have been proposed to overcome these issues [6-8]. In particular, our group proposed the use of a spectrometers based station as a reference for the on-field calibration of a small multisensor device obtaining very encouraging results for benzene quantification purposes. We also focused on the optimal duration of the calibration process finding that a neural network, trained using a ten days measurements set, was capable to gain a relative estimation error of less than 0.02 (2%) over more than 6 months [8]. A long term performance degradation was observed and interpreted as caused by modification in different species absolute and relative concentration in the Winter time. A recalibration procedure, performed every 6 months, has been found to effectively restore the performance level. In a similar frame, Tsujita et al. proposed an interesting automatic on-field recalibration scheme that could be useful for facing at least the sensor stability issues [9]. Pattern recognition algorithms are often used as automated multivariate analysis tools in gas sensing but their performances are rarely discussed especially for regression problems, e.g. quantitative concentration estimation problems [10-11]. In these problems, sensor fusion subsystems are devised for approximating Ψ in: Cj=Ψ(RSens1, RSens2,...RSensn) (1) where Cj is the real j-th pollutant concentrations as measured by conventional stations and RSensi is the i-th sensor resistance measured by the multi-sensor device. In a complex scenario such as urban pollution monitoring, an in-deep analysis of the results obtained by such a system can significantly help the feasibility assessment, guiding the development of appropriate monitoring strategies (device positioning, calibration methodology and long term performance assessment) as well as the instrumentation design and validation process. In this paper, we study the performance assessment of on-field calibrated multisensor device, presenting the results obtained for CO, NOx and NO2 concentration estimation. We analyze the results trying to reach an in deep knowledge on the main performance drivers in the selected scenario validating the hypothesis by correlation analysis, feature selection and long term performance assessment. 2 Experimental Setup A multisensor device, developed by Pirelli Labs (see ref. [12]), has been co-located with a conventional air pollution analyzer, operated by the regional environmental protection agency (ARPA). The conventional analyzer response has been used to provide the true concentration values of the target pollutants at the measurement site. These values have been hence used as a reference for the tuning of a multivariate regression system designed for calibrating the multisensor device response. This multisensor device was designed to host five custom metal oxide chemoresistive sensors plus two commercial Relative Humidity and Temperature solid state sensors. It is built on a very compact design (volume=9.7*10-3 m3) and was easily deployed in the operative scenario due to its limited weight (less than 2.5 Kg). An on board microcontroller provided the computing platform, controlling analog to digital conversion, while an on board secondary memory device provided local storing capacity for up to 72 hrs of data measurement, with a 8 s sampling period. Hourly average of measurements was being transmitted via a GPRS modem toward PC class data sinks. A more detailed instrumentation and setup description can be found in [8]. Conventional fixed station provided reference concentration estimation for CO (mg/m3), non-metanic hydrocarbons (NMHC) (µg/m3), C6H6 (µg/m3), NOx (ppb), NO2 (µg/m3). It was sampled recording hourly averages of the concentration values. Unfortunately, after only 8 days the NHMC analyzer went out of service. The multisensor device was sampled to provide the hourly average of the resistivity expressed by CO, NOx, O3 and NO2 specific Metal Oxide (MOX) chemiresistors, a NMHC targeted MOX sensor whose characteristics are also detailed in [8], plus the commercial Temperature and Relative Humidity sensors. Measurement campaign took place using as testing site one of the main street in the centre of an Italian city, characterized by heavy car traffic. Data acquisition campaign lasted from March 2004 until April 2005, building up a suitable one year long data set for the devised application. 3. Experimental Scenario and methods discussion. According to the results of our previous investigation, the testing site was characterized by the presence of significant correlations among the recorded species concentrations time series [8]. In order to generate a non linear multivariate calibration for the multisensor device, a back propagation neural network architecture was designed to solve a typical regression problem. The system was actually trained to estimate real pollutant concentrations hourly mean, as provided by the conventional analyzer, using the hourly mean of electrical resistance expressed by the multisensor device chemiresistors used as input features. In the framework of statistical pattern recognition, full regression problems are characterized by the need of approximating the ’a posteriori’ distribution of target random variables given the values of the measured variables (see [10]). Practically, the typical statistical regression system works by approximating the functional relationships (if existing) among measurements in the ‘feature space’ and expected output values starting from a predefined internal model and using a limited data set (training set) as examples to tune it. If the training set choice is representative of the real data statistical distribution we can expect the trained regressor to be able to correctly estimate the searched functional relationships retaining appropriate generalization properties. Of course, noise both on the measurement and target variables will affect the overall estimation performance at various extents. In our pollutant concentration estimation scenario, we can expect reasonably good performances if the sensors pool, actually composing the feature space, includes a sensor unit showing good selectivity and stability properties for each target pollutant. In this case univariate calibration can represent a viable solution. Usually, this is very difficult to obtain for the intrinsic properties and limitations of solid state sensors. Nice performance levels could be expected even if the target specific sensor is prone to significant selectivity issues but, in this case, a properly devised sensor pool could be capable to describe the influence of interferents on the target-specific sensor response. This can, for instance, occur if a sensor pool subset response is strongly correlated to interferents concentrations. In this case, a statistical regressor could extract sufficient knowledge about interferents concentrations and their effects on the other gas sensors responses, so to model such influences during the training phase and provide induced error cancellation. For instance, if CO concentration estimations are affected by the presence of a particular inteferent specie, say H2S, a sensor fusion subsystem can learn to estimate and subtract the influence on the CO targeted sensor given you can measure H2S concentrations and you have a representative set of training samples to model it. In our scenario, the usual presence of significant correlation among pollutant concentrations can also help a statistical regressor to extract knowledge about a particular gas concentration even if the related sensor performances are low or, ultimately, if no specifically targeted sensor is present in the measurement sensor pool. In this case the target gas concentration is estimated starting from the response of sensors towards gases whose concentrations are strongly correlated, not necessarily linearly, with the concentration of the pursued gas. This feature can have a strong drawback due to possible changes in the relative concentrations of gases on seasonal or spatial basis, as it is, for example, the differential increase of major pollutants concentration (see for example ref. [3]) and the increase of NOx emissions by domestic heating both measurable in the Winter season. Actually, these changes could severely hamper the significance of the training set related knowledge. Last paper was focused on the results of benzene concentration estimation. We showed how, in that case, it was possible to estimate the concentration of a pollutant for which no specific sensor was present in the selected sensor pool, using the information brought in by the NMHC sensor. Actually, Benzene is the most significant part of NMHC related pollution in the selected scenario. We also found a measurable performance degradation using a Spring time obtained calibration during Winter time due to the above mentioned house heating pollution related effect. In this work, we show results obtained calibrating feed forward neural networks in regression scheme for NOx, NO2 and CO pollutant concentrations, in the same applicative scenario so to check the generalization of benzene related findings. Investigations on the performance relationships with training length have been performed in order to compare the results with those obtained for benzene. Influences of feature selection and correlation based learning have been investigated and evaluated. Using conventional station output, neural networks have been trained and simulated in the Matlab environment; hyperbolic tangent has been selected as hidden neuron transfer function. Levenberg-Marquardt training algorithm was chosen for the experimentations, early stopping and automatic bayesian regularization (ABR) were used as complexity control algorithms to avoid over training issues [13-14]. Training sets were built using consecutive samples, i.e. using fixed length specific interval of campaign data, while validation (when used for early stopping) was built by randomly choosing samples from the remaining data. Two approaches have been devised for uncertainty estimations on performance and for generalization purposes, one of which based on crossvalidation. Automatic bayesian regularization is a neural network capacity control technique for neural networks training; it was introduced in the late 90s, but its usage is still uncommon in the electronic nose community. Training the regressor internal model usually involves the minimization of an error function. In conventional neural networks backpropagation training, the error function basically take only into account the empirical error (i.e. the error on training set) while model capacity (i.e. the number of hidden neurons) is fixed and solution complexity should be carefully controlled. This training data centric procedure can easily lead to data overfitting. Approaches based in regularization theory (see ref. [10] for a detailed description) try to balance the empirical error with a term related to model complexity, their global error function being described by:  n   w k n 2 (2) where n is the running training samples index,  n the empirical error function computed on the n-th of k total training samples and  ,  are the regularization parameters. In neural networks, the w term take into account the network weigths values, trying to estimate network model complexity by their norm. In appropriate conditions, pursuing the minimization of (2) produces minimization of such superfluous connection weight values that could produce overtraining. This typically allows also for a smoother network response, and the effect is similar to what obtainable by superfluous network connections pruning. In fact, as regard as network architecture, by pursuing regularization, any modestly oversized network should be able to approximate regression function retaining good generalization properties. In order to avoid both underfitting and overtraining issues,  /  value should be, of course, carefully tuned and this adds free parameters, to be optimized. ABR is a regularization based training methodology that try to estimate  and  , in a bayesian framework (see [10],[15]), starting from initial hypothesis on weights statistical distribution, while simultaneously minimizing (2) by using LevenbergMarquardt network training algorithm. In the Bayesian framework weights are considered as random variables as their posterior probability density can be described as: P( w | D,  ,  , M )  P( D | w,  , M ) P( w |  , M ) P( D |  ,  , M ) (3) D being the gathered data set and M the particular neural network model. Assuming Gaussian distribution for prior network weights distribution and training set noise, the optimal weigths maximizing (3) also minimize the regularized error function given in (2). In ABR, each network training step adds knowledge on weights distribution practically obtaining an approximation, say w’, of the weigth values w MP that maximize posterior probability in (3). This knowledge can be used to update suboptimal  and  values pursuing the maximization of P( ,  | D, M )  P( D |  ,  , M ) P( ,  | M ) P( D | M ) (4) In fact, adding the likely assumption of P( ,  | M ) being uniform then, after simple analytical steps, approximations of  MP and  MP can be found using (3): '   2w ' 2 , '  k  2  n ( w ' ) (5) n with  being the so called effective parameters number, i.e. a measure of the number of neural network parameters that are effectively in use of error function minimization. The value of  is computed as   N  2  tr ( H ) 1 (6) using the Hessian matrix H of the objective function whose Gauss-Newton approximation is readily available by using Levenberg-Marquardt training algorithm. The obtained approximation of  MP and  MP , in turn, modifies eq. (2) and hence the best w values obtainable by a further training iteration. The process is iterated until convergence. Note that, following network weights initialization,  and  are initialized at 0 and 1 respectively before taking the first step of the training algorithm. 4. Results and Discussion Initially, in order to allow the reader to compare the error level for CO, NOx and NO2 with benzene related findings reported in our previous work, a preliminary regression experiment has been set up. A two weeks long data segment, starting from the first day of measurements and using all the sensor responses, has been used as training set. Only in this preliminary experiment, a validation set consisting of 40% of the remaining samples was randomly extracted and reserved for the implementation of early stopping strategy. Remaining samples were used as test set for performance evaluation. Three different network were devised each one targeted to a specific analyte concentration estimation problem and Levenberg Marquardt algorithm used for training purposes. The number of hidden layer was empirically fixed to 25 for all the three specie while results assessment was repeated 20 times and averaged in order to reduce local minima and initial weight choice influences. Mean Relative Error (MRE) computed as MRE  1 n y i  xi  n i 1 yi (7) where yi is the true concentration value at the i-th sample belonging to the test set, xi the network concentration estimation and n the number of test samples involved, was set as primary performance estimator both for this and for the following experimentations. We found MRE(NO2) to be equal to 0.26, MRE(NOX)=0.42 and MRE(CO)=0.32. Although MRE figures are significantly worse than those computed for benzene with similar settings (MRE = 0.02), even in this case the ratio between Mean Absolute Error (MAE), computed as MAE  1 n  y i  xi n i 1 (8) and the single specie yearly concentration range was encouraging, setting at 3% for CO, 8% for NO2 and 7% for NOx. As a reference, their concentrations units of measure was, respectively, mg/m3, µg/m3, ppb. The most relevant negative impact on relative error was found at low concentrations levels typically reached at night. It is also interesting to note that, for all these three species, the performance obtained on the normalized training set in terms of Mean Squared Error (MSE) are typically by three order of magnitude than those obtained for Benzene in the same setup. This result is explained with a significant “noisy” behaviour of feature space variables (sensor resistances) in response to these specie concentrations confirming the definitely worse descriptive power, in operative conditions, of the sensor array as regards as these species concentrations. It should be underlined that the use of early stopping makes the above results depend on the merge of training and validation set related knowledge and not only on the training set data alone. However, similar but more rigorous results, as regards as training set length influence on performance, were obtained without the use of any validation set and are reported in the following paragraphs that represent the core of this paper, building up the basis for the drawn conclusions. 4.1 Optimal training length estimation As a second investigation, we wanted to test the optimal training duration when using consecutive samples. In the Benzene case, a ten days long training set proved capable to recover the expected daily and weekly cycle of the pollutant concentration statistical distribution variation. Our hypothesis was that results obtained for benzene could be generalized as well for the other species. For this reason, we set up a training-evaluation procedure using the whole sensor array response and different training set lengths for CO and NO2. Networks training was conducted using Levenberg-Marquardt algorithm with Automatic Bayesian Regularization, for complexity control, so without the use of any validation set. Hidden layer number was initially set at 5 by using preliminary results. As a first experiment we set up different length consecutive samples training set starting from the first sample of the available data set (March, 1st). For each selected training length, all the remaining samples were used as test set. In fact, each training duration produced a different regressor with different performances that we wanted to compare by using different length test sets. In a similar setup it is important to evaluate uncertainties on the performance estimation in order to understand significance of the reported differences possibly achieving generalization of findings. The primary source for uncertainties on performance estimation that should be tackled is the use of a limited and different duration test set. It is also well known that the outcome of the training process of a neural network itself, is prone to uncertainties due to the initial choice of weights and related local minima issues. So, for each trained network characterized by its own (training, test) set couple, concentration estimation errors observations on test set were computed. For each training length the procedure has been repeated for 20 times in order to reduce the impact of initial NN weights choice, however variability on the performance indicators was negligible except in the case of CO concentration estimation for a training length of 96 hours for which test results were unstable. As regards as uncertainty estimation, errors observation appeared not to be i.i.d.; actually they shown to be significantly autocorrelated. In fact, their autocorrelation function (ACF) was found definitely not being impulsive, instead absolute error ACF was only slowly fading under the white noise confidence intervals for ACF. For this reason, confidence intervals on samples mean could not be computed using CLT and t statistic just because observations failed to meet the i.i.d. (independent and identically distributed) requirement. In order to correctly compute the confidence intervals for the MRE and MAE, a novel algorithm, proposed by Zhang in [16] for the calculation of the uncertainty on the sample mean of autocorrelated measurements, was used. This methodology is based on a weak stationarity hypothesis that allow for an Auto Regressive Moving Average modeling of the measurement process. It is interesting to note that the so computed confidence intervals are notably wider that the classic ones as one can expect given a lower number of independent variable observations in autocorrelated time series. It should be also noted that absolute errors observations show a notable trend in time, in this situation, author warn that the use of the MAE averaged value itself and the corresponding variance to characterize the regression systems may be misleading. For this reason, although they lead to the same conclusions, we based our comparisons on MRE computed values as a primary performance indicator. Results for CO and NO2 have been reported in table 1 and 2, respectively. They show that for that year no particular benefit could be achieved by extending the duration of a training set starting form the first available sample for more than two weeks, when estimating pollutant concentrations over the remaining samples of that year. A further experiment has been conducted for the sake of comparison, i.e. for comparing multiple training set durations in a slightly different setup. This time, we have used constant test set duration and fixed test set location in the data set time series. All the regressors were tested on the same test set i.e. the campaign final six months. Training sets start time, training procedures and regressors’ architecture were the same as in the previous experiment. Altough this setup may be correctly perceived as penalizing for short length training sets whose entire samples are located much before the start of test set, the results may represents an ideal comparison for the above presented ones. Results, presented in table 3, shows a substantial confirmation of what already found. Training and test localization in a particular time in the above setup may cast doubts on the possibility to generalize findings. Of course, the same apply to the training set choice process. For this reason, in order to compare and generalize performance figures resulted from the use of different training set length choices we decided to apply a particular crossvalidation approach that could take into account training (and test samples) located in different time locations. We set up the following cross validation procedure: in the entire campaign year, considering a fixed test set length h corresponding to 6 Months, and for each selected training set duration i, then k mutually exclusive, consecutive training sets Trmi, m being the training set index, have been considered, with k (TotalDBLentgh  h) i (9) Actually, for each training set choice Trmi, made by i consecutive samples, the h immediately following consecutive samples have been considered as its specific test set Temk. For each Trmi , Temh couple, the neural network has been trained with ABR for 20 times. MRE, Relative Error Standard Deviation (STD_RE), MAE, Absolute Error Standard Deviation (STD_MAE) and Squared Correlation Coefficient (SCC) computed over each Temh , were used as performance indexes. Even in this case and for each couple, the results variability due to the NN training was found to be negligible, confirming ABR positive impact. Then, for each i, the k results coming from the different Trmi , Temh couples have been considered as i.i.d. observations for performance figures and were finally averaged and their relative uncertainty computed so to correctly compare results coming from the different i dependant regressors. In this case, we are evaluating the estimation capabilities of various regressors, each trained with a different length training set, tested on a fixed number of test samples immediately following the training set. Obtained results for CO and NO2 are shown in table 4 and 5, respectively. Fig. 1 depicts graphically MAE behaviour together with confidence intervals when changing training length for CO concentration estimation problem. Single performance figures were significantly worse than that which has been found for the benzene case for which best MRE was found to be equal to 0.02. However, even for these two species, a relatively short length training set seems to be able to produce results that are very near to what obtainable with far longer data sets whether being significantly different from those obtainable at the shortest training set length. By using longer training set a slow positive trend is observable but uncertainty rapidly grow after 700 samples length for which we are not able anymore to reliably distinguish among average error performance. In fact, considering the performance figures observations as normally distributed (Kolmogorov-Smirnov test), a two samples right tailed Welch test1, testing against an H0 hypothesis of no significant difference in the mean, conducted for MAE observations in CO with test set lengths of 96 and 360 samples resulted in a p=0.02, thus leading to reject H0. Conversely, Null Hypothesis cannot be 1 Behrens-Fisher like settings, using Satterthwaite approximation for pooled variance at alfa=0.05. rejected for any settings using 360 length test set together with longer training sets related results. Similar behaviour is found for MRE estimations, e.g. using a test length equal to 72 and 360 resulted in a p=0.04. Similar tests have been conducted for NO2 also, resulting in a substantial confirmation. Overall results confirm the feasibility of the on-field approach for the calibration of multisensor devices in the city air pollution scenario, showing how a limited training campaign can be used for computing an optimal calibration for several of the following months. Absolute performance level however, depends on the descriptive power of the selected sensor array for the specific pollutant concentration estimation problem. 4.2 Feature Selection A further investigation involved testing the influence of a feature selection procedure on the obtainable performance in our scenario. Feature selection can broaden the experimentalist’s knowledge about the information contribution brought in by the single sensor to the concentration estimation problem for each specie. This is particularly true if the selection is conducted using a brute force approach, i.e. exploring all available feature set combinations and their performance, though it can be very computationally expensive and hence it is not always a viable choice. In our scenario, we have analysed the performances obtainable, in terms of MRE and MAE, by using different subset of the original sensor array for estimating NO2, CO and NOx with separate regressors. Training set length was fixed to 10 days, starting from the first campaign day, according to the previous experimentations; the remaining samples has been used as test set. Confidence interval has been computed using Zhang method for uncertainty estimation for sample means of autocorrelated data series. Wilcoxon signed rank test (nonparametric paired samples with no normality assumption) has been used for evaluating the significance of observed performance differences at alfa=0.05. In order to estimate real operative system performance, feature selection validation and overall performance testing should be performed on different set, but our goal, as above mentioned, was different. i.e. investigating if there is significant effect of feature selection in this particular scenario. Interestingly, for NO2, the best performance values have been obtained using all the sensors responses; e.g. using NOx and NO2 sensors responses, CO and NMHC sensors, or NOx, NO2 and CO led to respectively 0.37, 0.29 and 0.30 MRE while using the entire gas sensor array led to a 0.22 MRE (see Table 6 and Fig.2). Performance level that are not significantly different can be obtained by coupling NO2 sensor response with both CO and NMHC sensors response. This results confirms the possibility of achieving suitable performances, even when specific sensors are expressing low overall performance, by coupling their responses with information coming from sensors whose response is directed towards species showing high correlation factors with the specie under analysis (see [8]). Table 6 confirms the influence of temperature and humidity on on-field specific sensor response calibration, in this case when NOx and NO2 sensors are concerned. Their sensor response led to a significant performance increase when coupled with the response of single NOx sensor, single NO2 sensor, NOx+NO2 sensors, NOx+NO2+O3 sensors. No improvement has been recorded when coupling RH and T sensor response to feature vectors containing CO, NMHC and NO2. Results obtained during CO directed sensors set selection, which are depicted in table 7, revealed that a non linear function of NMHC sensor response is able to follow the CO concentration significantly better than a CO targeted sensor response based one. This is probably due, again, to the high correlation factor existing between CO and Benzene concentrations in the selected scenario (r=0.93, SCC=0.86) and the very good performances of the NMHC targeted sensor. Best performance are achieved selecting a subset of the sensor array using both CO and NHMC sensor responses while inserting NO2 and NOX sensor responses led to worse performance values. Fig. 3 depicts the differences between concentration estimation provided by the model and the true concentration as reported by the conventional analyzer, while fig. 4 reports MAE results of this particular feature vector setup versus training set length choice. Performance assessment was conducted by using ABR according to the procedure explained in the previous chapter with significant reduction of time needed by conventional procedures. As a control check, we report results of a conventional neural network hyperparameters optimization procedure for the CO, NMHC feature vector. Complexity control was achieved by controlling the target empirical error to be chosen in the [10e-3, 2x10e-3, 4x10e-3] set while the number of neurons varied in [5,10,15,20] set. Early stopping was not selected because it would have hampered results comparison by modifying the test set duration. For each empirical error and number of hidden neurons selection results had to be averaged through 20 different training runs. Best results were obtained for 10 neurons and 2x10e-3 target empirical error and were very similar (MRE=0.27, MAE=0.36) to the one obtained with ABR using the same feature vector. The same procedure was applied to the entire array and led again to similar values (MRE=0.32, MAE=0.47) as referred to what obtained by ABR. Summarizing, the major difference with respect to Benzene concentration evaluation is due primarily to the worse performance of CO, NOX and NO2 sensor in estimating their target gas. However, the presence of other sensors significantly contribute to mitigate this issue as a result of correlation effects. These results warn against the unselective usage of all the sensor lines when training a sensor fusion subsystem in this scenario. Thus they also strongly suggest the use of different sensor fusion subsystem each one trained for the prediction of a single pollutant. In this case, feature selection algorithms could help to select the optimal composition of the array to be used for each pollutant concentration prediction problem. In table 8 and 9 we report the results of the performance evaluation versus training set duration for the CO by using the best feature vector composition, using, respectively, the single test set and crossvalidated approach depicted in section 4.1. Results confirm the optimality of a two weeks length choice regardless of feature vector composition while, of course, performance are generally better. 4.3 Performance estimation over time Finally, starting from a ten days training set length we have investigated the MAE performance index for CO in a week by week fashion over 55 weeks so to check their evolution over time. In addition, by using several feature vector composition, we investigated the influences of the use of different sensors responses on the concentration estimation during time. As we found for Benzene, short time performance indexes computed for this specie revealed to degrade in time, specifically, significant performance hits are detectable after 6 months from the calibration set end (see Fig. 5). Experimentally, we found that, with respect to the use of all available sensor responses, using only CO sensor response cause a small performance degradation in the Summer time and a significant performance boost in the Winter time. In particular, we believe that this is explanable by the definite changes in the concentration distribution of NOx and also NO2 during Winter time. These changes disrupts relationships learnt by the regression system by looking at the Spring time training set. Instead, the combined use of CO and NMHC sensor responses have a positive effect during all test set. Summarizing, the descriptive power of NOx and NO2 sensors and the correlation of their target gases with CO has less weight in the overall performance economy with respect to the misleading effects brought in by their targets distribution changes in the Winter time (see Fig. 6 a, b and c). As regards as NO2 week by week estimation error (see Fig. 7 and 8), we see, as above mentioned, that the all sensor based approach is able to obtain best performance by exploiting correlation positive effects, even here, however, performance worsening in Winter time is evident. The combined findings reported in [8] and in the present paper lead us to consider as feasible the use of field data to obtain a suitable calibration for multisensor devices in city air pollution monitoring framework. These calibrations, however, suffer from long term performance degradation and are sensitive to relative concentration distribution changes appearing on seasonal basis on as a results of particular events. This is far more evident when, due to poor specific sensor performances, the best statistical regressor has to rely on scenario-specific specie correlation factors. It remains still to prove if a site specific calibration can be used for concentration estimations at short distance sites. In this case, we believe that performance will be ultimately related to the site dependence of the specie correlation patterns that should be investigated. 5. Conclusions and further works In this work we have further investigated the feasibility of the on field calibration approach for a multisensor device in the air pollution monitoring scenario. The results obtained for typical city air pollutants, i.e. CO, NO2, NOx and Benzene Have been reported and discussed Overall results confirm that a relatively short training set duration, about two weeks long can cope with the cyclic behaviour of concentrations distribution leading to optimal performances and being sufficiently insensitive to outliers. This is very encouraging for the possibility of using multisensor devices, together with mobile conventional analyzers for the purpose of on-field calibration, for densening the sparse pollution monitoring network especially in the city historical centres. Feature selection results definitely suggest the use of multiple regressors, each one being specifically designed and trained for a particular specie concentration estimation. In this way the designer is free to select the best feature set for the specific regression problem obtaining optimal results. Feature selection analysis has furthermore confirmed the influences of scenario related species concentration correlations and specific sensors performances on the overall estimation scores. When specific sensors fails to correctly follow their target specie concentration, the existence of a strong correlation between this specie concentration and another one whose targeted sensor shows good performance may help the array to recover, ultimately leading to significantly better performance levels. However, in this case, changes in species relative statistical distribution, such as those that are due to Winter time house heating effect, may have a negative impact in the long term performances of a neural calibration. The same negative impact could probably be observed when using a calibration obtained for a specific site showing site specific multivariate concentration distribution and operating the multisensor device in a significantly distant site. Acknowledgments. This work has been partially founded by Pirelli Labs. The authors wish to thank prof. Ciro D’ Elia of University of Cassino for helpful discussions on autocorrelated time series and cyclostationary processes. References [1] N.A. Mazzeo, L. E. Venegas, Evaluation of turbulence from traffic using experimental data obtained in a street canyon, Int. J. Environ. Pollut. 25 (2005) 164176. [2] S. Vardoulakis, B. E. A. Fisher, K. Pericleous, N. Gonzalez-Flesca, Modelling air quality in street canyons: a review, Atm. Environ. 37 (2003) 155-182. [3] K. A. Kourtidis, I. Ziomas, C. Zerefos, E. Kosmidis, P. Symeonidis, E. Christophilopoulos, S. Karathanassis, A. Mploutsos, Benzene, toluene, ozone, NO2 and SO2 measurements in an urban street canyon in Thessaloniki, Greece, Atm. Environ. 36 (2002) 5355-5364. [4] B. Croxford, A. Penn, B. Hiller, Spatial distribution of urban pollution: civilizing urban traffic, 5th Symposium on Highway and urban pollution, Copenhagen, 1995. [5] G. Martinelli, M. C. Carotta, G. Ghiotti, E. Traversa, Thick film gas sensors based on nano-sized semiconducting oxide powders, MRS Bull. 24 (1999) 30-36. [6] M. Kamionka, P. Breuil, C. Pijolat, Calibration of a multivariate gas sensing device for atmospheric pollution measurement, Sens. Actuators B Chem. 18 (2006) 323-327. [7] M.C. Carotta, G. Martinelli, L. Crema, C. Malagu, M. Merli, G. Ghiotti, E. Traversa, Nanostructured thick-film gas sensors for atmospheric pollutant monitoring: quantitative analysis on field tests, Sens. Actuators B Chem. 76 (2001) 336-342. [8] S. De Vito, E. Massera, M. Piga, L. Martinotto and G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sens. Actuators B Chem. 129 (2008) 750-757. [9] W. Tsujita, A Yoshino, H. Ishida, T. Moriizumi, Gas sensor network for airpollution monitoring, Sens. Actuators B Chem. 110 (2005) 304-311. [10] C.M. Bishop, Pattern Recognition and Machine Learning, Springer Science, 2006, ISBN 0-387-31073-8. [11] M. Pardo, G. Sberveglieri, Remarks on the use of multilayer perceptrons for the analysis of chemical sensor array data, IEEE Sens. J. 4 (2004) 355-363. [12] http://www.pirellilabs.com. [13] M. T. Hagan, M. B. Menhaj, Training feedforward networks with the marquardt Algorithm, IEEE Trans. Neural Networks 5 (1994) 989-993. [14] Foresee, F.D., and M.T. Hagan, "Gauss-Newton approximation to Bayesian regularization," Proceedings of the 1997 International Joint Conference on Neural Networks, 1997, pp. 1930-1935. [15] MacKay, D.J.C., "Bayesian interpolation," Neural Computation, Vol. 4, No. 3, 1992, pp. 415-447. [16] Nien Fan Zhang, “Calculation of the uncertainty of the mean of autocorrelated measurements”, 2006 Metrologia 43 pp.276-281. Biographies Saverio De Vito received his degree in Informatics Engineering from University of Naples “Federico II” in 1998. During 1998 and 1999 he was a research fellow at Artificial Vision and intelligent systems laboratory of the above university working on breast cancer computer aided diagnosis. From 1999 to 2004 he was with a software house as a R&D technical manager in the framework of satellite based telemedicine, earth observation and distance learning projects. In June 2004 he joined ENEA, as a researcher. His research interests include statistical pattern recognition, electronic noses, wireless sensor networks and computer aided diagnosis. Since 2005 he is contract professor of Applied Informatics at University of Cassino. Ettore Massera received his degree in Physics from the “Federico II” University of Naples in May 1997. He has been working at the ENEA research center in Portici (NA) from June 2003. At present he is in charge of research activity on gas sensor devices based on nano-structured materials. Previously he worked on the study of thermal and optical properties of porous silicon at the Physics Department of University of Naples. Girolamo Di Francia received his degree in physics from the University of Naples “Federico II”. In 1985 he started his research activity in the field of fabrication and characterization of semiconductor solar cells (c-Si, GaAs), formerly in the Ansaldo comp. in Genova, and then in the ENEA research center of Rome, where he was appointed full time researcher in 1988. From 1991 he joined the ENEA research center of Naples where, starting from 1992, he investigated porous silicon based devices. In 1996 he established there the Gas Sensor Laboratory mainly devoted to the fabrication and characterization of devices based on nanomaterials and on polymers nanocomposites. Table 1: CO concentration estimation performances of the neural regression scheme computed over different training set length. All sensor responses has been used for feature vector composition, 5 neurons has been implied in the hidden layer, ABR has been used for complexity control. Small performance enhancement are obtained by using more than ten days data recording. MAE figures are expressed in. mg/m3 Hrs MRE STD_RE 24 96 240 600 1200 2400 0.49 0.78 0.34 0.31 0.34 0.29 1.08 1.37 0.86 0.85 0.98 0.74 MAE mg/m3 0.58 1.56 0.53 0.44 0.41 0.46 STD_MAE mg/m3 0.43 1.19 0.63 0.46 0.42 0.49 SCC 0.77 0.14 0.79 0.81 0.85 0.84 Table 2: NO2 concentration estimation performances of the neural regression scheme computed over different training set length. All sensor responses has been used for feature vector composition, 5 neurons has been implied in the hidden layer, ABR has been used for complexity control. For NO2 a small performance worsening is observed obtained by extending the ten days length training set. MAE figures are expressed in μg/ m3. Hrs MRE STD_RE 24 96 240 600 1200 2400 0.61 0.31 0.22 0.23 0.27 0.27 1.04 0.68 0.62 0.66 0.65 0.68 MAE (μg/m3) 49.4 30.2 23.7 27.1 26.5 28.2 STD_MAE (μg/m3) 35.0 25.4 21.0 25.9 23.1 23.2 SCC 0.04 0.36 0.66 0.62 0.58 0.54 Table 3: Performance evaluation for different feature vector composition in the NO2 concentration estimation problem. Ten days long training set has been used. CO X NMHC Feature Set NOx NO2 O3 MRE T RH X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 0.30 0.29 0.43 0.50 0.29 0.29 0.30 0.37 0.29 0.30 0.31 0.34 0.31 0.26 0.27 0.22 0.24 0.28 0.24 0.24 MAE (μg/m3) 40 40 44 54 28 26 31 34 29 32 28 38 26 23 24 19 20 27 20 20 Table 4: Performance evaluation for different feature vector composition in the CO concentration estimation problem. Ten days long training set has been used. CO X X X X X NMHC X X X X X Feature Set NOx NO2 O3 MRE T X X X X X X RH X 0.41 0.27 0.27 0.35 0.35 0.35 MAE (mg/m3) 0.72 0.35 0.35 0.48 0.55 0.55 Table 5: CO concentration estimation performances of the neural regression scheme computed over different training set length by using CO and NMHC solid state sensor response as feature vector. 5 neurons has been implied in the hidden layer, ABR has been used for complexity control. No performance enhancement are obtained by using more than ten days data recording. Hrs MRE STD_RE 12 24 48 72 96 168 240 360 480 540 600 700 800 900 1200 1600 1800 2000 2200 2400 0.2542 0.3010 0.4146 0.4122 0.3991 0.2626 0.2664 0.3154 0.3202 0.3220 0.3047 0.2909 0.2916 0.2801 0.2660 0.2503 0.2519 0.2502 0.2484 0.2460 0.6116 0.8119 1.0000 0.9936 0.9642 0.7360 0.7503 0.8258 0.8497 0.8490 0.8235 0.8062 0.8094 0.7879 0.7590 0.7017 0.7073 0.7023 0.6935 0.6873 MAE (mg/m3) 0.3882 0.3769 0.4443 0.4471 0.4508 0.3594 0.3518 0.3908 0.3841 0.3872 0.3718 0.3591 0.3582 0.3539 0.3486 0.3484 0.3503 0.3496 0.3520 0.3539 STD_MAE (mg/m3) 0.3882 0.3935 0.3734 0.3746 0.3839 0.3887 0.3798 0.3766 0.3716 0.3726 0.3690 0.3654 0.3647 0.3689 0.3735 0.3794 0.3832 0.3885 0.3930 0.3985 SCC 0.8736 0.8718 0.8707 0.8716 0.8713 0.8795 0.8794 0.8635 0.8686 0.8670 0.8712 0.8758 0.8770 0.8773 0.8767 0.8793 0.8773 0.8756 0.8750 0.8737 Figure 1: CO concentration estimation MAE, expressed in mg/m3, versus training set length measured in samples (hours) with related confidence intervals. All sensor responses have been used as feature vector (crossvalidation setting, see Table 4 for details). Figure 2: Hourly concentration estimation of NO2 , expressed in μg/m3, over one week period. Blue dashed line represent true concentration value as reported by the conventional analyzer. Figure 3: Hourly concentration estimation, expressed in mg/m3, of CO over one week period. Blue dashed line represent true concentration value as reported by the conventional analyzer. Figure 4: CO concentration estimation MAE versus training set length measured in samples (hours) with related confidence intervals, only CO and NMHC sensor response have been used for feature vector composition (crossvalidation settings, see table 9 for details). Figure 5 Qualitative behaviour of the weekly mean absolute error (mg/m3) in the CO concentration estimation problem. Blue line depicts MAE obtained by using all the sensors response for the feature vector composition while red line depicts MAE obtained by using only CO and NMHC sensor response. While there is a small advantage for the overall approach during the summer time, at the start of the winter time it accounts for a significantly higher error, this is very likely due to changes in the relative distribution of NO2, NOx and CO concentrations. Figure 6: Probability density function (PDF) non-parametric estimations in summer time (blue, dash-dotted) and in winter time (red solid) for CO (a), NOx (b) and NO2 (c) concentrations. Significant distribution changes are found for NOx and NO2 leading to changes in concentration ratios with respect to CO. These modifications are identified as the main driver for performance degradation in winter time when NOx and NO2 sensor responses are used for CO estimation. Figure 7: Weekly mean absolute error in the NO2 concentration estimation problem. Blue (dots) line depicts MAE obtained by using all the sensors response for the feature vector composition while green line (squares) depicts MAE obtained by using NO2, NOx, O3, T and RH sensors response. Black (circle) line depicts NO2, T and RH sensors based estimation. All sensor approach retain a significant advantage over most of the test period and obtain best overall performance scores. Figure 8: Weekly mean relative error in the NO2 concentration estimation problem. Blue (dots) line depicts MAE obtained by using all the sensors response for the feature vector composition while green line (squares) depicts MAE obtained by using NO2, NOx, O3, T and RH sensors response. Black (circle) line depicts NO2, T and RH sensors based estimation.