Hydrogen Demand in
industrial sector
using Data Science
Leah Amor Mamanao (170476)
Max Menk (166428)
Tim Tinnacher (166426)


Topic’s focus on:
Hydrogen Usage and Demand
Data science approaches to forecast Hydrogen Demand


Hydrogen Usage and Demand
most abundant chemical
element, estimated to
contribute 75% of the mass
of the universe


Hydrogen Usage and Demand


Data science approaches to
forecast Hydrogen Demand:
Time Series Forecasting
Machine Learning Regression
Deep Learning Neural Networks
Hybrid methods
Forecasting with External Factors
Demand Segmentation
Real-time Data Integration
Cross-validation and Evaluation
Continuous Monitoring and Adjustment


Time Series Forecasting:
Exponential smoothing or exponential moving average
Ft = Ft-1 + α(At-1 – Ft-1)
Ft : new forecast
F t-1 : previous period forecast
A t-1 : previous period actual demand
α : smoothing (weighting) constant


Exponential smoothing or exponential moving
average (EMA) in Excel:
Demand (Tonnes)
2010 1000
2011 1200
2012 1300
2013 1100
2014 1400
2015 1500
2016 1600
2017 1700
2018 1800
Hydrogen Demand
(Tonnes) Smoothed levels Standard Errors Forecast
2010 1000 #N/A #N/A 1045.150918
2011 1200 1000 #N/A 1119.467323
2012 1300 1140 #N/A 1199.068063
2013 1100 1252 #N/A 1284.328886
2014 1400 1145.6 171.9534821 1375.652258
2015 1500 1323.68 194.4422451 1473.46926
2016 1600 1447.104 199.0913546 1578.241629
2017 1700 1554.1312 199.3199181 1690.463933
2018 1800 1656.23936 158.8958108 1810.665906
y = 3.5714x2 + 59.286x + 990.48
R² = 0.909
1 2 3 4 5 6 7 8 9 10 11
Data Point
Exponential Smoothing
Poly. (Actual)
Linear (Actual)
Time Series Forecasting:


ARIMA (Autoregressive Integrated Moving Average)
1. AR (Autoagressive) :
represents the relationship
between an observation and a
number of lagged observations
(previous time steps)
> Order p : number of lag
observations included in the model
> Parameters: Φ1, Φ2, …, Φp
value of time
series @ time
constant term
autoagressive parameters of order p
moving average parameters of order q
error term at time t
2. I (Integrated): refers to the
number of differences (or the
value of d) taken to make the time
series stationary. It’s the order of
> for example: if d=1, it refers
to first-order differencing
3. MA (Moving Average):
represents the relationship
between an observation and a
residual error from a moving
average model applied to lagged
> Order q: number of lagged
forecast errors in the prediction
Time Series Forecasting:


Time Series Forecasting:
- time series data forecasting tool created by Facebook’s
Core Data Science team
- technique is based on the assumption that time series data
can be described as a mixture of numerous characteristics,
such as trends, seasonality, and holidays
Machine Learning Regression:
Linear regression (or linear model)
- The mathematical formula of the linear
regression can be written as follow:
The figure above illustrates a simple linear regression model, where:
 the best-fit regression line is in blue
 the intercept (b0) and the slope (b1) are shown in green
 the error terms (e) are represented by vertical red lines


Machine Learning Regression:
Logistic regression
- a special case of regression analysis
- used when the dependent variable is
nominally scaled
- the counterpart of linear regression


Machine Learning Regression:
Support Vector Regression (SVR)
- also called Support Vector Machine
(SVM) in machine learning
- supervised learning models with
associated learning algorithms that analyze
data for classification and regression
Deep Learning Neural Networks
neural network is a method in artificial intelligence that
teaches computers to process data in a way that is
inspired by the human brain.
type of machine learning process, called deep learning,
that uses interconnected nodes or neurons in a layered
structure that resembles the human brain
attempt to solve complicated problems, like
summarizing documents or recognizing faces, with
greater accuracy


Deep Learning Neural Networks
Recurrent Neural Networks


Deep Learning Neural Networks
Long Short Term Memory networks
- usually just called “LSTMs” – are a
special kind of RNN, capable of
learning long-term dependencies.
- LSTMs are explicitly designed to
avoid the long-term dependency


Hybrid methods
techniques that use multiple approaches, often of different paradigms, to
solve a problem or achieve a goal
Ensemble methods
idea is that by combining the strengths
and compensating the weaknesses of
different models, the ensemble can
achieve better accuracy, stability, and
generalization than any single model
Bootstrap Aggregating
also knows as bagging, is a machine
learning ensemble meta-algorithm
designed to improve the stability and
accuracy of machine learning algorithms
used in statistical classification and
regression. It decreases the variance
and helps to avoid overfitting.


Bootstrap Aggregating


Forecasting with External Factors
Economic Factors:
Energy Prices
Policy Factors:
Global Economic Growth
Government Subsidies
and Incentives
Emission Regulations
Hydrogen Standards and


Forecasting with External Factors
Technological Factors:
Hydrogen Production
Hydrogen Storage and
Transport Technologies
Fuel Cell Technologies
Environmental Factors:
Climate Change Concerns
Air Quality Regulations


Demand Segmentation
Segmenting by application
Segmenting by region
> can improve forecasting accuracy by isolating
the unique characteristics of each segment
Segmenting by industry


Real-time Data Integration
> Incorporating real-time data, such as weather conditions,
electricity prices, or production capacity, can enhance forecasting
accuracy by capturing up-to-date information that may influence
Benefits of Real-time Data Integration
1. Improved Production Efficiency
2. Enhanced Supply Chain Management
3. Optimized Energy Management
4. Predictive Maintenance and Asset Management
5. Sustainability Advancements


Cross-validation and Evaluation
1. K-Fold Cross-validation


Cross-validation and Evaluation
2. Leave-One-Out Cross-validation
- method involves leaving out a single data
point from the training set for each
iteration and training the model on the
remaining data.
- In LOOCV, fitting of the model is done
and predicting using one observation
validation set.


Cross-validation and Evaluation
3. Repeated K-Fold Cross-
- method combines K-fold
cross-validation with multiple
repetitions, averaging the
performance measures across
all repetitions
Repeated K-Fold Cross Validation of IgG Proposed
Model Figure 5: Repeated K-Fold Cross Validation of
IgA Proposed Model


Continuous Monitoring and Adjustment
Examples CMA is being used:
IT systems
Manufacturing processes
Business operations


Choosing the appropriate data science methodology for hydrogen demand forecasting depends on the
RNN: RNNs have been shown to be effective for forecasting hydrogen demand in the industrial sector.
One study found that an RNN model was able to outperform traditional forecasting methods, such as
ARIMA and exponential smoothing, by up to 10%.
Research has shown that ensemble models can outperform single-method approaches in forecasting
complex phenomena like hydrogen demand.
When applying ensemble methods, it's important to use diverse base models to capture different patterns
and sources of variation in the data.
Cross-validation techniques can be employed to evaluate the performance of the ensemble and its
individual components.
Forecasting with External Factors: By incorporating these external factors into demand forecasting models,
hydrogen producers, infrastructure providers, and policy makers can gain a more comprehensive
understanding of the factors influencing hydrogen demand and make informed decisions for future
development and investment.


Demand Segmentation: By using demand segmentation, forecasters can develop more accurate and
granular forecasts that can be used to inform strategic decisions about hydrogen production, distribution,
and use. This is essential for the development of a successful hydrogen economy.
Real-time Data Integration: Real-time data integration is a critical enabler for accurate hydrogen demand
forecasting in the industrial sector. By leveraging real-time data from various sources and employing
advanced analytics tools, industries can optimize hydrogen production, transportation, and storage,
ensuring a consistent and reliable supply for their operations. This approach contributes to improved
production efficiency, enhanced supply chain management, and optimized energy utilization, while also
advancing sustainability goals.
Continuous Monitoring and Adjustment: Continuous monitoring and adjustment is an essential tool for
ensuring that systems and processes are operating effectively. By collecting, analyzing, and acting on
data, CMA can help to improve efficiency, effectiveness, uptime, and customer satisfaction. As the world
becomes increasingly complex and interconnected, CMA will become even more important for
organizations of all sizes.






Thank you

Service Management: Forecasting Hydrogen Demand

