1. Introduction
Atmospheric methane (CH
4) plays a significant role in global warming. Methane is released into the atmosphere through a combination of natural sources (e.g., peatlands), human activities (e.g., agriculture, pipeline leaks) and the release of trapped stores due to rising global temperatures (e.g., permafrost melt) [
1].
A variety of approaches exist for monitoring methane emissions. Beginning with the greatest coverage, satellites can provide global methane concentration data but are limited in either temporal or spatial resolution [
2]. Existing satellites are also limited in the terrains over which methane concentration measurements can be taken; data cannot be provided for sea, snow or marshland. Higher-resolution surveys may be carried out by aircraft or ground teams but these methods require additional labour for each new dataset collected.
Autonomous local sensors, such as weather stations, address these issues, continuously collecting data at high frequencies, in some cases every few seconds. However, many weather stations are permanent standalone installations, based in cities, and, as such, have limited spatial coverage. To compensate for this, wireless sensor networks and “Internet of Things” (IoT) devices are increasingly used to augment datasets, especially in the field of air pollution monitoring, adding additional data collection sites [
3,
4,
5,
6,
7,
8]. IoT devices are typically powered by batteries or small solar panels; therefore, minimising power consumption is crucial [
9]. Much of the value of wireless sensor networks lies in their coverage and scalability, so enabling large fleet sizes by minimising the individual device cost is also a key consideration [
5]. As such, the sensors included in these devices should accommodate these requirements.
Many categories of methane sensors have been developed: optical, capacitance-based, calorimetric, resonant, acoustic-based, pyroelectric, metal oxide semiconductor (MOS), and electrochemical [
1,
10,
11]. Of these types, MOS sensors show particular potential for compact, low-power and low-cost applications, but additional steps must be taken to improve their performance in outdoor settings [
10,
12,
13,
14,
15,
16].
Machine learning techniques are often employed to improve the usability of gas sensor data, by addressing either the selectivity of the sensor or the calibration accuracy [
11,
13,
17]. Classification algorithms, such as support vector machines and neural networks, have been shown to improve the ability of both individual sensors and sensor arrays to identify specific target gases or gas mixtures [
18,
19,
20,
21,
22,
23]. This approach holds promise for a variety of emerging applications, ranging from disease diagnosis from the gas composition of human breath [
22] to identifying specific sources of air pollution in urban environments. Regression machine learning can be used to calibrate the output of gas sensors in varying environmental conditions—i.e., target gas concentration, temperature and air humidity—and to offset long-term sensor drift [
24].
The Figaro NGM2611-E13 (Figaro, Rolling Meadows, IL, USA) is a low-cost methane detection module based around the TGS2611-E00 MOS sensor [
25]. The manufacturer provides sensitivity characteristics for methane concentrations above 300 ppm [
26], but lower concentrations are typical in outdoor settings.
Several authors have investigated methods for calibrating this sensor at lower methane concentrations [
14,
15,
16]. Results are consistently encouraging, with strong correlation between calibrated sensor output and true methane concentration achieved in all of these studies.
Van den Bossche et al. [
14] calibrated a TGS2611-E00 methane sensor across 15–30 °C, 40–80% relative humidity and 2–9 ppm methane. Methane concentration was recorded using a Picarro G2301 Cavity Ringdown Gas Analyzer (Picarro, Santa Clara, CA, USA). For this limited methane concentration range, a linear fit was assumed for the sensor calibration, with temperature and humidity compensation applied separately. Within this range, a systematic error of −1.0 ppm and a variable error of ±1.7 ppm in estimated methane concentration were achieved. It should be noted that a linear fit cannot be assumed for wider ranges of methane concentration; an exponential relationship is visible in the 300–10,000 ppm range published by the manufacturer [
26].
Bastviken et al. [
16] investigated a calibration approach for the NGM2611-E13 using estimated background methane concentration in a chamber, followed by the injection of methane up to 719 ppm. They tested 15 model equations with the collected chamber data, achieving strong correlation between the model output and true methane concentration (R
2 = 0.99–1.00) and low error (RMSE = 9.8–20) over the full tested concentration range up to 719 ppm.
These existing studies share two main limitations: (1) expensive reference instruments are used to measure methane concentration during sensor calibration, and (2) potential interactions between temperature, humidity and methane concentration are not addressed.
Collier-Oxandale et al. [
15] calibrated the Figaro TGS 2600 by co-deploying sensors with reference-grade instruments in field deployments, during which methane concentrations remained below 6 ppm. Variable correlations were achieved between the sensors and reference measurements (R
2 = 0.625–0.812) for the best-performing model. Terms representing the interaction between temperature and methane concentration were considered, but all of the models which contained such a term also included a time-based term. As recurring diurnal emission cycles are not universal and will vary by site, time-based predictor variables are not applicable to pre-deployment sensor calibration.
This study presents an alternative calibration approach using 200 ppm methane-in-air calibration gas and machine learning models. The calibration conditions span 5–35 °C and 40–85% relative humidity. A range of nonlinear models derived from the sensor response to varying methane concentration, temperature and humidity were formulated and tested, including models with interaction terms. A calibration validation method using 200 ppm methane in air is also presented.
4. Discussion
This study aimed to investigate the practicality of using machine learning to calibrate low-cost methane sensors at lower methane concentrations than required for their typical applications.
The greatly improved performance of model Equation (11) (RMSE = 19.2, R2 = 0.962) compared to Equations (5) and (6) (RMSE = 55 and 35.6, R2 = 0.684 and 0.868, respectively) shows that both temperature and relative humidity need to be accounted for when calibrating the NGM2611-E13 sensor at methane concentrations below 200 ppm. The strong correlation of Equation (11) with true methane concentration also validates the approach of using an equation of this form to model the sensor readings.
Additional interactions between temperature, relative humidity and methane concentration are expected to affect the output voltage of the methane sensor. More complex machine learning models containing additional terms can capture these interactions and reduce the error in the model. However, increased complexity carries a greater risk of overfitting, which can render a model useless for making predictions from new data. The risk of overfitting can be reduced by expanding the training data or using more intensive model validation, both of which increase the computing load when training the model. Therefore, for any machine learning model, a balance exists between model robustness and detail.
In this study, increased model complexity broadly correlated with improved model performance: i.e., lower RMSE and higher R
2. However, diminishing returns in increasing model complexity are also clearly shown by the results in
Table 4. The best performing models: Equations (16), (18), (20) and (21), showed very similar performance (RMSE = 4.5–5.1 ppm, R
2 = 0.997–0.998). The model with the next lowest RMSE was Equation (12) with RMSE = 14.2 ppm, which is almost three times that of the top four models. For model Equations (15)–(20), including an offset term,
reduced the RMSE of each model by around 10 ppm. The additional improvement of Equations (16), (18), (20) and (21) over Equation (12) can be attributed to the inclusion of a temperature and sensor output voltage product term or a relative humidity and sensor output voltage product term. It is unsurprising that the effect of including either one of these terms, or both at the same time, is similar because relative humidity is roughly inversely proportional to air temperature in a sealed volume. Therefore, either term will effectively accommodate the same effect of the environmental conditions in the chamber.
Despite containing almost twice as many terms, model Equation (21) (RMSE = 4.5 ppm, R
2 = 0.998) barely outperforms Equation (16) (RMSE = 5.09 ppm, R
2 = 0.997). As shown by Equation (22), a model of this complexity is also more vulnerable to overfitting (
Figure A5).
The methane decay experiments highlight the importance of verifying machine-learning models beyond simply assessing the nominal model performance. It would not be possible to identify the overfitting of Equation (22) using the training results alone, and applying this model to field data would severely misrepresent the true methane concentration.
Of the tested model equations, Equation (16) (RMSE = 5.09 ppm, R2 = 0.997, complexity = 12) offered the best compromise between performance and complexity.
4.1. Comparison to Related Studies
Compared with a similar study using the Figaro NGM2611-E13 by Bastviken et al. [
16], the study presented here covered lower temperatures (5–30 °C vs. 10–42 °C) and higher relative humidities (40–85% vs. 18–70%). This study trained models on over 50,000 data points, whereas Bastviken et al. used an average of 619–930 data points per sensor. The models tested by Bastviken et al. included temperature and humidity compensation but did not test interactions between predictor variables. In this study, several models which included interaction terms outperformed those which did not, highlighting the importance of considering them in future models.
Bastviken et al. used two approaches to determine the methane concentration: direct measurement using a reference sensor (a Los Gatos Research ultraportable greenhouse gas analyzer), or estimating the background methane concentration. If validated and performed with care, background methane concentration estimations may simplify sensor calibration, but they have the potential to introduce systematic errors which may be of a similar order of magnitude to the methane concentrations that the calibrated sensors are intended to measure. This issue is more easily circumvented by supplying sensors with a known concentration of methane during calibration, as was achieved with the use of a reference gas in this study.
Collier-Oxandale et al. [
15] employed co-location of the methane sensors with reference instruments during field deployment. As stated by Collier-Oxandale et al., this approach exposes sensors to representative field conditions, rather than constraining them to more conventional laboratory settings, which typically control environmental conditions more strictly than real-world settings. However, much longer co-deployments may be required to obtain a broad range of conditions. If no high-emission or extreme events occur during the co-deployment phase, sensor models calibrated in this way may be poorly calibrated for these scenarios, being weighted towards typical field concentrations. As previously mentioned, this study also used time-based predictor variables in most of the presented calibration models. As such, calibrations based on these models would not be transferable to sensors deployed in a different location. Laboratory-based calibration with a methane concentration range which extends beyond the expected range of field values can be more readily generalised and is less biased towards “normal” diurnal cycles or environmental conditions.
Van den Bossche et al. [
14] calibrated a TGS2611-E00 methane sensor across 15–30 °C, 40–80% relative humidity and 2–9 ppm methane. Methane concentration was recorded using a Picarro G2301 Cavity Ringdown Gas Analyzer. The linear fit used for the sensor calibration is a reasonable simplification for narrow ranges of methane concentrations but would underestimate higher methane concentrations. As in the study by Bastviken et al., van den Bossche et al. applied temperature and humidity compensation but neglected interactions between environmental conditions and methane concentration. Arguably, this is less crucial at low concentrations but should be investigated as a potential method for further improving the performance of low-concentration calibrations.
Due to the different concentration ranges used, it is difficult to make direct comparisons between the accuracy of different calibration approaches across these studies. However, a strong correlation between sensor response and true methane concentration below 300 ppm is consistently achieved, and the need to apply both temperature and humidity compensation is identified by all authors.
In the context of commodifying methane sensors for IoT applications, a shared limitation across all of these studies is the use of expensive reference sensors in the calibration approach. Such instruments often cost tens of thousands of pounds, placing them beyond the reach of many citizen scientists or smaller research groups. The calibration air-based method presented in this study offers an alternative low-cost approach; the estimated total cost of the calibration setup (vacuum pump, chamber, Arduino datalogger and calibration gases) is under GBP 500.
4.2. Limitations and Future Work
The TGS2611-E00 sensor incorporates a charcoal filter which improves the sensor selectivity by reducing the influence of other gases, such as ethanol and iso-butane. However, the sensor is also sensitive to hydrogen, making it less appropriate for detecting low methane levels in environments where hydrogen may also be present. This is less likely to be an issue in outdoor settings where hydrogen levels are typically much lower than methane levels, but the incorporation of additional sensors to measure the concentration of interference gases should be considered in relevant applications.
The variable warm-up period for the NGM2611-E13 sensors may pose a barrier to their usage in low-power applications; future work should investigate the effect of this warm-up period on intermittently powered NGM2611-E13 sensors.
5. Conclusions
Overall, the experiments presented show that the NGM2611-E13 methane sensors show a similar relationship between sensor resistance and methane concentration at methane concentrations below the response documented by the manufacturer for methane concentrations in the 300–10,000 ppm range. The presented trained models show promise for calibrating the NGM2611-E13 methane sensors at low methane concentrations across a range of temperature and humidity conditions. The relative performance of different model equations highlights the importance of considering the interaction between predictor variables. For example, the inclusion of a temperature and sensor voltage interaction term was shown to reduce model error and improve the correlation between the model prediction and the true methane concentration.
The presented calibration approach offers an efficient method for calibrating NGM2611-E13 methane sensors using only two pre-balanced calibration gas mixtures and without depending on a more expensive state-of-the-art reference sensor to measure the methane concentration. Data collected at additional intermediate methane concentration levels could be used to further refine these models. Likewise, calibrating the sensors over an even broader range of temperature and humidity conditions may be valuable for environmental monitoring settings.
The methane decay validation experiment presents an intuitive method for identifying inadequacies in calibration models that may not be obvious from the performance of models on training data. For example, spikes in predicted methane concentration that coincide with spikes in temperature clearly indicate insufficient temperature compensation.
This approach to calibrating gas sensors below their intended application concentration ranges may be extended to other low-cost sensors in the future, with the potential to broaden the range of pollutants that can be monitored by wireless sensor networks.