On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

On deep machine learning & time series models: A case study with the use of

Presentation · June 2017

DOI: 10.13140/RG.2.2.34543.94888


1 author:

1 author:

Carlin Chun-fai Chu

The Open University of Hong Kong


On deep machine learning & time series models:
A case study with the use of Keras

1st International Conference on Econometrics and Statistics

Session EO256: Business analytics

Carlin Chu - The Open University of Hong Kong (Hong Kong)


• Why deep learning ?
• What is LSTM ?
• What is Keras ? Characteristics of Keras
• Suggested steps for LSTM coding
• Example codes
• Bollerslev's time series model
• Work-in-progress

Why deep learning ?
• Deep learning
– Artificial Neural Networks with > 1 hidden layer
– Involves a lot of data for training
– Different level of abstractions

Why Deep Learning? (Slide by Yann LeCun)
Applications of deep learning

• Ancient China board game: GO

• Number of moves > total number of atoms in the world
– Exhaustive search is not possible.
• Deep Neural Network & Advanced tree search & Reinforcement learning

Akita et al. (2016)
News (Textual) + Stock Price (Numerical)

• Info from News article stream +

Daily open price  Close price

• Prediction for 10 company using

input dimension=1000

• Recurrent NN: Long Short-Term

Memory (LSTM)

• Deep Learning for Stock Prediction Using Numerical and Textual Information (Akita et al. 2016)

What is LSTM ?
Traditional Artificial Neural Network (ANN)
• no notion of time ordering
• map the current input feature(s) to the predicted target variable(s)

Recurrent Neural Network (RNN)

• with 'loops' which allow information to persist.
• multiple copies of the same network, each passing a message to a successor.

Long Short Term Memory network (LSTM)

• special kind of RNN with
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)

Characteristics of Keras
• high-level neural networks API, written in Python

• run on top of either TensorFlow / Theano / CNTK

• utilize both CPU and GPU

• Part of AlphaGo is written on TensorFlow (for

distributed computing)

Backend: Theano or TensorFlow ?

• Which one is better ?

– Distributed setting/newer software
 TensorFlow
– Recurrent network / legacy application
 Theano

• How to switch the backend ?

– Locate the .json file
– Change the ’backend’ field

Suggested steps for LSTM coding
1. Normalize the data (Transformation)
2. Data preparation to a 3D dataset
3. Model specification
4. Model training (tackle over-fit issue)
5. Prediction
6. Inverse transformation

Suggested steps for LSTM coding (1)

• Normalize the data (Transformation)

• Transformation of input and target variables
– tends to make the training process better behaved by improving the numerical condition of the optimization problem
– ensuring that various default values involved in initialization and termination are appropriate.
from sklearn.preprocessing import MinMaxScaler

# normalize the dataset

scaler = MinMaxScaler(feature_range=(-1, 1))
normalized_data = scaler.fit_transform(input_data)

Suggested steps for LSTM coding (2)

• Data preparation to a 3D dataset

Time series
data a) Padding original data series with duplicated/repeated values

b) Separate the input feature data and target value data

c) Reshape the padded input data to a 3 dimensional dataset

[samples, time steps, features]

3D dataset

Suggested steps for LSTM coding (2)

• Data preparation to a 3D dataset

# Procedure a & b: Padding and Separate the data # Utility function

# convert an array of values into a input feature and target value
look_back = 3
def create_dataset(dataset, look_back=1):
trainX, trainY = create_dataset(normalized_data, look_back)
dataX, dataY = [], []
for i in range(len(dataset)-look_back):
# Procedure c: Reshape into 3D dataset a = dataset[i:(i+look_back), 0]
# [samples, time steps, features] dataX.append(a)
dataY.append(dataset[i + look_back, 0])
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
return numpy.array(dataX), numpy.array(dataY)

Suggested steps for LSTM coding (3)

• Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# create and fit the LSTM network

model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.compile(loss='mean_squared_error', optimizer='adam')

Suggested steps for LSTM coding (4 & 5)

• Model training (tackle over-fit issue)

# Model training (without validation dataset)
model.fit(trainX, trainY, nb_epoch=100, batch_size=100)

# Model training (with validation dataset, prevent over-fit)

model.fit(trainX, trainY, nb_epoch=100, batch_size=100,
validation_data=(x_val, y_val) )

• Prediction
# Model prediction
testPredict = model.predict(testX)

Suggested steps for LSTM coding (6)

• Inverse transformation

# inverse transformation
testPredict = scaler.inverse_transform(testPredict)


• How to frame the data in an appropriate way

for sequence learning
– Time-steps vs Features

• Normalization gives a better performance

– Fewer epoch is needed for training
– E.g. Epoch = >300 vs 100

import numpy # Version 2: correct usage
input_data=numpy.array([[1.0], [2.0], [3.0], [4],[5],[6],[7],[8],[9]]) model_v2 = Sequential()
model_v2.add(LSTM(4, input_dim=1))
#%% Step 1: normalize the dataset model_v2.add(Dense(1))
import matplotlib.pyplot as plt model_v2.compile(loss='mean_squared_error', optimizer=‘sgd')
from sklearn.preprocessing import MinMaxScaler
#%% Step 4: Model training
scaler = MinMaxScaler(feature_range=(-1, 1)) # Model training (without validation dataset)
normalized_data = scaler.fit_transform(input_data) model_v1.fit(trainX_3D_v1, trainY, nb_epoch=200, batch_size=100, verbose=2)
plt.plot(normalized_data) model_v2.fit(trainX_3D_v2, trainY, nb_epoch=200, batch_size=100, verbose=2)

#%% Step 2: Data preparation to a 3D dataset #%% Step 5: Model prediction

# Utility function testX=numpy.array([[3.0], [4.0], [5.0], [6.0]])
# convert an array of values into a input feature and target value # pay special attention on it....
def data_preparation(input_data, model_input_length=1): normalized_testX = scaler.transform(testX) # do not use fit_transform
dataX, target = [], []
for i in range(len(input_data)-model_input_length): testX_3D_v1=numpy.reshape(normalized_testX, (1, 1, look_back))
dataX.append(input_data[i:i+model_input_length, 0]) testPredict_v1 = model_v1.predict(testX_3D_v1)
target.append(input_data[i + model_input_length, 0])
return numpy.array(dataX), numpy.array(target) testX_3D_v2=numpy.reshape(normalized_testX, (1, look_back, 1))
testPredict_v2 = model_v2.predict(testX_3D_v2)
# Procedure a & b: Padding and Separate the data
model_input_length = 4 # the length of data used for modeling #%% Step 6: inverse transformation
trainX, trainY = data_preparation(normalized_data, model_input_length) testPredict_final_v1 = scaler.inverse_transform(testPredict_v1)
testPredict_final_v2 = scaler.inverse_transform(testPredict_v2)
# Procedure c: Reshape into 3D dataset
# [samples, time steps, features] print('*** Final result: Version 1 ***')
trainX_3D_v1 = numpy.reshape(trainX, (trainX.shape[0], 1, model_input_length)) # misuse print(testPredict_final_v1)
trainX_3D_v2 = numpy.reshape(trainX, (trainX.shape[0], model_input_length, 1)) # correct print('*** Final result: Version 2 ***')
#%% Step 3: Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# create and fit the LSTM network

# Version 1: misuse
model_v1 = Sequential()
model_v1.add(LSTM(4, input_dim=look_back))
model_v1.compile(loss='mean_squared_error', optimizer=‘sgd')

Example code of time series modeling using Keras (1)

# LSTM for international airline passengers problem with window regression framing # make predictions
import numpy trainPredict = model.predict(trainX)
import matplotlib.pyplot as plt testPredict = model.predict(testX)
import pandas
# invert predictions
import math
from keras.models import Sequential trainPredict = scaler.inverse_transform(trainPredict)
from keras.layers import Dense trainY = scaler.inverse_transform([trainY])
from keras.layers import LSTM testPredict = scaler.inverse_transform(testPredict)
from sklearn.preprocessing import MinMaxScaler testY = scaler.inverse_transform([testY])
from sklearn.metrics import mean_squared_error
# calculate root mean squared error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1): trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
dataX, dataY = [], [] print('Train Score: %.2f RMSE' % (trainScore))
for i in range(len(dataset)-look_back-1): testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
a = dataset[i:(i+look_back), 0] print('Test Score: %.2f RMSE' % (testScore))
# shift train predictions for plotting
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY) trainPredictPlot = numpy.empty_like(dataset)
# fix random seed for reproducibility trainPredictPlot[:, :] = numpy.nan
numpy.random.seed(7) trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# load the dataset # shift test predictions for plotting
dataframe = pandas.read_csv('international-airline-passengers.csv', usecols=[1],
testPredictPlot = numpy.empty_like(dataset)
engine='python', skipfooter=3)
dataset = dataframe.values testPredictPlot[:, :] = numpy.nan
dataset = dataset.astype('float32') testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
# normalize the dataset # plot baseline and predictions
scaler = MinMaxScaler(feature_range=(0, 1)) plt.plot(scaler.inverse_transform(dataset))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67) plt.plot(testPredictPlot)
test_size = len(dataset) - train_size plt.show()
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=100, batch_size=1, verbose=2)

Example code of time series modeling using Keras (2)

# define the raw dataset # %% create and fit the model
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" model1 = Sequential()
# create mapping of characters to integers (0-25) and the reverse model1.add(LSTM(32, input_shape=(X1.shape[1], X1.shape[2])))
char_to_int = dict((c, i) for i, c in enumerate(alphabet)) model1.add(Dense(y1.shape[1], activation='softmax'))
int_to_char = dict((i, c) for i, c in enumerate(alphabet)) model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#%% Preparing training data #numpy.random.seed(176)

# prepare the dataset of input to output pairs encoded as integers model1.fit(X1, y1, nb_epoch=500, batch_size=1, verbose=2)
seq_length = 3
dataX = [] # After we fit the model we can evaluate and summarize the performance
dataY = [] # summarize performance of the model
for i in range(0, len(alphabet) - seq_length, 1): scores = model1.evaluate(X1, y1, verbose=0)
seq_in = alphabet[i:i + seq_length] print("Model Accuracy: %.2f%%" % (scores[1]*100))
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out]) # %% We can then re-run the training data through the network and generate
print (seq_in, '->', seq_out) predictions,
# converting both the input and output pairs back into their original character format to
#%% We need to reshape the NumPy array into a format expected by the LSTM networks, get a visual idea of how well the network learned the problem.
that is [samples, time steps, features].
# reshape X to be [samples, time steps, features] # demonstrate some model predictions
X1 = numpy.reshape(dataX, (len(dataX), seq_length, 1)) for pattern in dataX:
x = numpy.reshape(pattern, (1, len(pattern), 1))
# Once reshaped, we can then normalize the input integers to the range 0-to-1, the range of x = x / float(len(alphabet))
the sigmoid activation functions used by the LSTM network. prediction = model1.predict(x, verbose=0)
# normalize index = numpy.argmax(prediction)
X1 = X1 / float(len(alphabet)) result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
# Finally, we can think of this problem as a sequence classification task, where each of the 26 print (seq_in, "->", result)
letters represents a different class.
# As such, we can convert the output (y) to a one hot encoding
# one hot encode the output variable
y1 = np_utils.to_categorical(dataY)

Example code of time series modeling using Keras (3)

Suggested setting for LSTM Hyperparameter
• For LSTMs, use the softsign activation function over tanh (it’s faster and
less prone to saturation (vanishing gradient) (~0 gradients)).
• https://deeplearning4j.org/lstm.html

Bollerslev, Patton and Quaedvlieg (2016)

• Improved version for time series modeling of realized

• Improved Heterogeneous Autoregressive regression

Typical HAR : Daily, Weekly, Monthly

Work in progress
Long Short Term Memory network (LSTM)
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)

• Possible to extract features from different time horizons ?
– Daily, Weekly, Monthly, Intraday
• Model structure ?
– Number of layers ? Activation functions
• How to prevent over-fitting ?
– Types of loss function
• What types of information can be used ?
– Numerical, News, Comment from Social network

Make use of different types of information ?
Stock market
• Price
• Volatility
Personal blogs • Volume Economic factors
• Multimedia • CPI
commentary • GDP
• Retail sales

News media Company factors

• Bloomberg Volatility • P/E ratio
• Releases from • ROE
Stock exchange Model • Debt ratio

Machine Learning techniques (more flexible)

Time series approach (more rigid)
If everything goes right …

Why Deep Learning? (Slide by Andrew Ng, Stanford University)
Thank you for your kind attention.
Hope you find this presentation interesting.

• A Beginner's Guide to Recurrent Networks and LSTMs - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM
• https://deeplearning4j.org/lstm.html
• Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
• Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
• Keras recurrent tutorial
• https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent
• Deep learning Wikipedia
• https://en.wikipedia.org/wiki/Deep_learning
• What Is The Difference Between Deep Learning, Machine Learning and AI?
• https://www.forbes.com/sites/bernardmarr/2016/12/08/what-is-the-difference-between-deep-learning-machine-learning-and-
• What is Deep Learning? (by Jason Brownlee)
• http://machinelearningmastery.com/what-is-deep-learning/
• Do you recommend using Theano or Tensor Flow as Keras' backend? - Quora
• https://www.quora.com/Do-you-recommend-using-Theano-or-Tensor-Flow-as-Keras-backend
• Backend - Keras Documentation
• https://keras.io/backend/
• Ill-Conditioning in Neural Networks
• ftp://ftp.sas.com/pub/neural/illcond/illcond.html
• Understanding LSTM Networks
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Getting started with the Keras Sequential model
• https://keras.io/getting-started/sequential-model-guide/
• Tim Bollerslev, Andrew J. Patton, Rogier Quaedvlieg (2016) Exploiting the errors: A simple approach for improved volatility forecasting,
Journal of Econometrics, Volume 192, Issue 1, 2016, Pages 1-18

