Application of Hybrid Ensemble Machine Learning Approach For Prediction of Residential Natural Gas Demand and Consumption
Application of Hybrid Ensemble Machine Learning Approach For Prediction of Residential Natural Gas Demand and Consumption
Application of Hybrid Ensemble Machine Learning Approach For Prediction of Residential Natural Gas Demand and Consumption
ISSN No:-2456-2165
Abstract:- The only byproducts of burning natural gas Keywords:- Ensemble, Hybrid, Machine Learning, Natural
are carbon dioxide, water vapor, and very little amounts Gas, Prediction.
of nitrogen oxide, making it the cleanest fossil fuel on the
planet. A wide range of consumer products, such as I. INTRODUCTION
stoves, dryers, fireplaces, and furnaces, are also powered
by natural gas. At least one of your appliances Natural gas has long been used in all aspects of human
undoubtedly runs on natural gas. In this work, the endeavor, but particularly in home construction. Natural gas is
demand for residential natural gas was forecasted using a the cleanest fossil fuel on earth because the only byproducts
hybrid ensemble regression machine learning approach. of burning it are carbon dioxide, water vapor, and very little
Accurate forecasting of the demand for natural gas is amounts of nitrogen oxide. Natural gas is also used to power a
crucial for effective energy management and resource sizable variety of consumer goods, including stoves, dryers,
allocation. The hybrid ensemble approach mixes a fireplaces, and furnaces. At least one of your appliances
number of regression algorithms, including linear undoubtedly runs on natural gas. Like almost all other energy
regression (LR), decision tree regression (DTR), support sources, natural gas can be dangerous if used improperly. A
vector regression (SVR), and K-nearest neighbor (KNN), few simple safety measures and knowledge on what to do in
to take advantage of the benefits of each unique model the event of a gas leak can help you protect yourself and those
and improve prediction performance. The hybrid you love. Therefore, a well-designed forecasting model is
ensemble regression model's process has two steps. In the crucial to managing energy policy successfully by providing
first stage, distinct regression models are trained on the energy diversity and energy requirements that adapt to the
dataset. The second stage involves evaluating each dynamic structure of a country, region, or the entire world, in
model's predictions. To evaluate the effectiveness of the line with previously unheard-of increases in energy demand
hybrid ensemble model, a range of measures, including (Faruk et al., 2019).
mean absolute error (MAE), mean squared error (MSE),
coefficient of determination (R-squared), and accuracy, As a fossil fuel, natural gas is one of them. It is
are generated and compared to those of individual environmentally friendly to use natural gas to supply energy
regression models. The anticipated accuracy of the model requirements for industry, transportation, and other uses.
is further assessed using cross-validation techniques to Between a fifth and a third of the energy used worldwide is
ensure resilience. The results of the experiment consumed by buildings. Natural gas accounts for more than a
demonstrated that the hybrid ensemble regression third of the energy used in European residential buildings
technique routinely outperformed individual regression (Then et al., 2020). The strategic growth of nations'
models in terms of prediction accuracy. Combining economies and societies depends on energy. The basic
numerous models enables the collection of the various purpose of data mining is to create models using preprocessed
correlations and patterns contained in the data, or existing data to find patterns that the data set's attributes
enhancing the model's overall performance. have shown. Energy plays a crucial role in the strategic
development of society and economies. To uncover patterns
that the data set's features have revealed, data mining's
This study aims to address the problems raised by (Shapi Investigate data-driven prediction models for predicting
et al., 2020), who built a prediction model for energy natural gas prices based on well-known machine learning
utilization in the cloud-based machine learning platform of techniques, such as Gaussian process regression, support
Microsoft Azure. A case study involving two tenants of a vector machines (SVM), and artificial neural networks (ANN)
business building is utilized to highlight real-world (GPR) (Driven et al., 2019). For evaluation and the cross-
applications in Malaysia. The obtained data is evaluated and validation method of model training, we use monthly Henry
prepared prior to being used for training and testing models. Hub natural gas spot pricing data from January 2001 to
To compare the potency of each strategy, RMSE, NRMSE, October 2018. The results show that these four machine
and MAPE metrics are used. The experiment's findings show learning methods perform differently when attempting to
that the distribution of energy use differs depending on the forecast natural gas prices. SVM, GBM, and GPR all perform
renter. predictions less accurately than ANN does generally. This has
proven that machine learning classifiers can accurately predict
It was possible to accurately predict the consumption of outcomes for many tasks.
natural gas in Istanbul's Bahçeşehir by using a range of
powerful machine learning techniques. Final performance The research could be used to considerably enhance
evaluations showed that XGBoost outperformed MLP, Forest natural gas consumption forecast systems, despite its
Regression, and Linear Regression by 0.04 Mean Absolute limitations due to the small number of articles it looked at. For
Error each. Because it is highly scalable, efficiently cuts down supply-demand equilibrium and investment purposes, accurate
on compute time, and makes best use of memory, XGBoost predictions of natural gas consumption are crucial. Accurate
performs better at prediction (Ahmed et al., 2021). Accurate and exact forecasts prevent economic loss and keep supply
forecasts prevent economic loss and keep supply and demand and demand in equilibrium.
in balance.
III. METHODOLOGY
The amount of natural gas used by data centers was
calculated by (Liu et al., 2022). Under the premise that The suggested model system, which is represented in the
electricity and gas are employed as energy suppliers of energy accompanying picture, will forecast natural gas demand and
supply and energy consumption from two viewpoints, the data consumption from both residential and commercial
center energy scheduling model is built by taking into account components using two separate datasets and will be
the service level of the data center. The lowest model is the implemented using a machine learning technique-based data
scheduling calculation model used in the data center. The best mining approach. The model would attempt to fix the issue
model is the data center energy supply scheduling model. The identified with the current system. The addition of a strategy
particle swarm approach is used to mimic the timetable. The for noise reduction in the dataset is due to the fact that noisy
results show that, while accounting for the degree of data data will lead to biased predictions, which could lead to
center service delay, using natural gas as an additional energy inaccurate performance accuracy. Two datasets would be
source can significantly reduce the data center's overall energy developed from different sources in order to further ensure
consumption. good and reliable performance.
A. Natural Gas Dataset to train and change the model to boost its effectiveness and
Dataset will be collected from kaggle.com which is the accuracy.
largest community of data scientist
(https://www.kaggle.com/datasets). These are the stages involved in fine tuning
Loading the pre-trained model
B. Data Preprocessing Freeze most if not all layers in the model to prevent them
Data modifications done on it before feeding it to the from further training
algorithm are referred to as pre-processing. Data Swap final layer or layers of the model with a new one
preprocessing is a method for transforming unclean data into that are specific to the tack
clean data sets. In other words, anytime data is acquired from Train the model on dataset using a lower learning rate than
various sources, it is done so in a raw manner that makes in the pre-trained phase
analysis impossible. Data cleaning, dimensionality reduction, Analyze the performance of the improved model and make
feature engineering, data sampling, data transformation, and any necessary hyper-parameter adjustments.
imbalanced data are a few of the crucial data preprocessing
approaches. E. Prediction Tools and Methods
Throughout this study, the Python programming
This crucial stage is utilized to improve the data's quality language would be used. To assist with the experiment, Sci-
in order to encourage the extraction of valuable insights. To kit Learn libraries would be used. Prediction model-specific
prevent overfitting or underfitting the suggested developed libraries and extensions are abundant in the Python
model, it is also important to make sure that there isn't too programming language. One of the finest places to get
much noise in the dataset. Boost the model performance's machine learning algorithms is Sci-kit Learn (https://scikit-
accuracy and computational effectiveness to get the dataset learn.org), where almost every form of algorithm is easily
ready for the right prediction. accessible and can be evaluated quickly and easily. The data
will be handled using Numpy and Pandas. Jupyter Notebooks
C. Feature Extraction would be utilized for debugging and for its ability to present
In order to decrease dataset overfitting, increase code elegantly.
prediction accuracy, and shorten model training time, feature
extraction is crucial. The properties of the dataset that would F. System Specification/Configuration
be utilized to train the machine learning models have a A laptop computer with the following specifications
significant impact on the algorithm's performance. Model would be used to perform and/or implement these research
performance may be negatively impacted by irrelevant, experiments: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz,
unsuitable, or only partially relevant features. In order to RAM 4.00 GB, 64-bit operating system, x64-based processor.
automatically choose the features in the dataset that contribute
the most to the output or prediction variable that interests us, G. Performance Evaluation Technique
feature selection will be applied to the data. Unrelated Building a prediction model using machine learning
elements in the data might make models less accurate, techniques has as its main objective the generality of the
especially those that use linear techniques like logistic and training datasets. On real data, machine learning models
linear regression. should be able to perform fairly effectively. Training data and
testing data will be separated into two groups. Testing data
D. Fine Tuning will be used to put machine learning classifiers to the test,
In order to enhance the performance of a machine whilst training data will be used to train classifiers.
learning model that has already been trained, fine-tuning often
refers to the process of further training the model on a H. Choice of Evaluation Metrics
particular task or dataset. This is done to make use of the data- Evaluation metrics are employed to measure the
driven knowledge and information the model has previously magnitude of mistakes in the performance of the prediction
gathered during the pre-training phase. It enables you to apply models. It aids in properly determining which of the results
prior knowledge to a new assignment where you can continue acquired is more accurate and reliable for application and
Further Discussions
A hybrid ensemble regression machine learning
approach has some notable benefits and consequences for
predicting residential natural gas demand and consumption.
The discussion examines the approach's main principles and
Fig 5: R^2 score any potential field repercussions. When compared to
individual regression models, the hybrid ensemble regression
The outcomes of the r-square evaluation model for all approach showed greater prediction accuracy. The method
models were reported, as seen in figure 5. It gives a hint as to captures a wider range of patterns and correlations contained
how well a collection of predictions fits the data in terms of in the data by merging different algorithms, such as LR, SVR,
actual values. It is employed to further examine and guarantee DTR, and KNN. As a result, estimates of residential natural
the accuracy of the prediction performance results obtained gas demand become more precise and trustworthy, which is
from the created model. very advantageous for energy management firms, decision-