On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325170683
On deep machine learning & time series models: A case study with the use of
Keras
Presentation · June 2017

DOI: 10.13140/RG.2.2.34543.94888
CITATIONS READS
0 5,599
1 author:
Carlin Chun-fai Chu

The Open University of Hong Kong
14 PUBLICATIONS 5 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
High-frequency volatility View project
Financial Trading View project
All content following this page was uploaded by Carlin Chun-fai Chu on 16 May 2018.
The user has requested enhancement of the downloaded file.

On deep machine learning & time series models:
A case study with the use of Keras
1st International Conference on Econometrics and Statistics

Session EO256: Business analytics
Carlin Chu - The Open University of Hong Kong (Hong Kong)

ccfchu@ouhk.edu.hk
1
Content
• Why deep learning ?
• What is LSTM ?
• What is Keras ? Characteristics of Keras
• Suggested steps for LSTM coding
• Example codes
• Bollerslev's time series model
• Work-in-progress
2
Why deep learning ?
• Deep learning
– Artificial Neural Networks with > 1 hidden layer
– Involves a lot of data for training
– Different level of abstractions
The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/

Why Deep Learning? (Slide by Yann LeCun)
3
Applications of deep learning
• Ancient China board game: GO

• Number of moves > total number of atoms in the world
– Exhaustive search is not possible.
• Deep Neural Network & Advanced tree search & Reinforcement learning
4
Akita et al. (2016)
News (Textual) + Stock Price (Numerical)
• Info from News article stream +

Daily open price  Close price
• Prediction for 10 company using

input dimension=1000
• Recurrent NN: Long Short-Term

Memory (LSTM)
• Deep Learning for Stock Prediction Using Numerical and Textual Information (Akita et al. 2016)
5
What is LSTM ?
Traditional Artificial Neural Network (ANN)
• no notion of time ordering
• map the current input feature(s) to the predicted target variable(s)
Recurrent Neural Network (RNN)

• with 'loops' which allow information to persist.
• multiple copies of the same network, each passing a message to a successor.
Long Short Term Memory network (LSTM)

• special kind of RNN with
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)
A short review can be found on ‘A Beginner’s Guide to Recurrent Networks and LSTMs’
• https://deeplearning4j.org/lstm.html
6
Characteristics of Keras
• high-level neural networks API, written in Python
• run on top of either TensorFlow / Theano / CNTK
• utilize both CPU and GPU
• Part of AlphaGo is written on TensorFlow (for

distributed computing)
7
Backend: Theano or TensorFlow ?
• Which one is better ?

– Distributed setting/newer software
 TensorFlow
– Recurrent network / legacy application
 Theano
• How to switch the backend ?

– Locate the .json file
– Change the ’backend’ field
8
Suggested steps for LSTM coding
1. Normalize the data (Transformation)
2. Data preparation to a 3D dataset
3. Model specification
4. Model training (tackle over-fit issue)
5. Prediction
6. Inverse transformation
9
Suggested steps for LSTM coding (1)
• Normalize the data (Transformation)

• Transformation of input and target variables
– tends to make the training process better behaved by improving the numerical condition of the optimization problem
– ensuring that various default values involved in initialization and termination are appropriate.
– ftp://ftp.sas.com/pub/neural/illcond/illcond.html
from sklearn.preprocessing import MinMaxScaler
# normalize the dataset

scaler = MinMaxScaler(feature_range=(-1, 1))
normalized_data = scaler.fit_transform(input_data)
10
• Data preparation to a 3D dataset
Time series
data a) Padding original data series with duplicated/repeated values
b) Separate the input feature data and target value data
c) Reshape the padded input data to a 3 dimensional dataset

[samples, time steps, features]
3D dataset
11
• Data preparation to a 3D dataset
# Procedure a & b: Padding and Separate the data # Utility function

# convert an array of values into a input feature and target value
look_back = 3
def create_dataset(dataset, look_back=1):
trainX, trainY = create_dataset(normalized_data, look_back)
dataX, dataY = [], []
for i in range(len(dataset)-look_back):
# Procedure c: Reshape into 3D dataset a = dataset[i:(i+look_back), 0]
# [samples, time steps, features] dataX.append(a)
dataY.append(dataset[i + look_back, 0])
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
return numpy.array(dataX), numpy.array(dataY)
12
• Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# create and fit the LSTM network

model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
13
Suggested steps for LSTM coding (4 & 5)
• Model training (tackle over-fit issue)

# Model training (without validation dataset)
model.fit(trainX, trainY, nb_epoch=100, batch_size=100)
# Model training (with validation dataset, prevent over-fit)

model.fit(trainX, trainY, nb_epoch=100, batch_size=100,
validation_data=(x_val, y_val) )
• Prediction
# Model prediction
testPredict = model.predict(testX)
14
• Inverse transformation
# inverse transformation
testPredict = scaler.inverse_transform(testPredict)
15
Observations
• How to frame the data in an appropriate way

for sequence learning
– Time-steps vs Features
• Normalization gives a better performance

– Fewer epoch is needed for training
– E.g. Epoch = >300 vs 100
16
import numpy # Version 2: correct usage
input_data=numpy.array([[1.0], [2.0], [3.0], [4],[5],[6],[7],[8],[9]]) model_v2 = Sequential()
model_v2.add(LSTM(4, input_dim=1))
#%% Step 1: normalize the dataset model_v2.add(Dense(1))
import matplotlib.pyplot as plt model_v2.compile(loss='mean_squared_error', optimizer=‘sgd')
from sklearn.preprocessing import MinMaxScaler
#%% Step 4: Model training
scaler = MinMaxScaler(feature_range=(-1, 1)) # Model training (without validation dataset)
normalized_data = scaler.fit_transform(input_data) model_v1.fit(trainX_3D_v1, trainY, nb_epoch=200, batch_size=100, verbose=2)
plt.plot(normalized_data) model_v2.fit(trainX_3D_v2, trainY, nb_epoch=200, batch_size=100, verbose=2)
#%% Step 2: Data preparation to a 3D dataset #%% Step 5: Model prediction

# Utility function testX=numpy.array([[3.0], [4.0], [5.0], [6.0]])
# convert an array of values into a input feature and target value # pay special attention on it....
def data_preparation(input_data, model_input_length=1): normalized_testX = scaler.transform(testX) # do not use fit_transform
dataX, target = [], []
for i in range(len(input_data)-model_input_length): testX_3D_v1=numpy.reshape(normalized_testX, (1, 1, look_back))
dataX.append(input_data[i:i+model_input_length, 0]) testPredict_v1 = model_v1.predict(testX_3D_v1)
target.append(input_data[i + model_input_length, 0])
return numpy.array(dataX), numpy.array(target) testX_3D_v2=numpy.reshape(normalized_testX, (1, look_back, 1))
testPredict_v2 = model_v2.predict(testX_3D_v2)
# Procedure a & b: Padding and Separate the data
model_input_length = 4 # the length of data used for modeling #%% Step 6: inverse transformation
trainX, trainY = data_preparation(normalized_data, model_input_length) testPredict_final_v1 = scaler.inverse_transform(testPredict_v1)
testPredict_final_v2 = scaler.inverse_transform(testPredict_v2)
# Procedure c: Reshape into 3D dataset
# [samples, time steps, features] print('*** Final result: Version 1 ***')
trainX_3D_v1 = numpy.reshape(trainX, (trainX.shape[0], 1, model_input_length)) # misuse print(testPredict_final_v1)
trainX_3D_v2 = numpy.reshape(trainX, (trainX.shape[0], model_input_length, 1)) # correct print('*** Final result: Version 2 ***')
print(testPredict_final_v2)
#%% Step 3: Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Version 1: misuse
model_v1 = Sequential()
model_v1.add(LSTM(4, input_dim=look_back))
model_v1.add(Dense(1))
model_v1.compile(loss='mean_squared_error', optimizer=‘sgd')
17
Example code of time series modeling using Keras (1)
18
# LSTM for international airline passengers problem with window regression framing # make predictions
import numpy trainPredict = model.predict(trainX)
import matplotlib.pyplot as plt testPredict = model.predict(testX)
import pandas
# invert predictions
import math
from keras.models import Sequential trainPredict = scaler.inverse_transform(trainPredict)
from keras.layers import Dense trainY = scaler.inverse_transform([trainY])
from keras.layers import LSTM testPredict = scaler.inverse_transform(testPredict)
from sklearn.preprocessing import MinMaxScaler testY = scaler.inverse_transform([testY])
from sklearn.metrics import mean_squared_error
# calculate root mean squared error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1): trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
dataX, dataY = [], [] print('Train Score: %.2f RMSE' % (trainScore))
for i in range(len(dataset)-look_back-1): testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
a = dataset[i:(i+look_back), 0] print('Test Score: %.2f RMSE' % (testScore))
dataX.append(a)
# shift train predictions for plotting
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY) trainPredictPlot = numpy.empty_like(dataset)
# fix random seed for reproducibility trainPredictPlot[:, :] = numpy.nan
numpy.random.seed(7) trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# load the dataset # shift test predictions for plotting
dataframe = pandas.read_csv('international-airline-passengers.csv', usecols=[1],
testPredictPlot = numpy.empty_like(dataset)
engine='python', skipfooter=3)
dataset = dataframe.values testPredictPlot[:, :] = numpy.nan
dataset = dataset.astype('float32') testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
# normalize the dataset # plot baseline and predictions
scaler = MinMaxScaler(feature_range=(0, 1)) plt.plot(scaler.inverse_transform(dataset))
dataset = scaler.fit_transform(dataset)
plt.plot(trainPredictPlot)
# split into train and test sets
train_size = int(len(dataset) * 0.67) plt.plot(testPredictPlot)
test_size = len(dataset) - train_size plt.show()
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=100, batch_size=1, verbose=2)
http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ 19
 Ref: http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-
networks-python-keras/ 20
21
# define the raw dataset # %% create and fit the model
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" model1 = Sequential()
# create mapping of characters to integers (0-25) and the reverse model1.add(LSTM(32, input_shape=(X1.shape[1], X1.shape[2])))
char_to_int = dict((c, i) for i, c in enumerate(alphabet)) model1.add(Dense(y1.shape[1], activation='softmax'))
int_to_char = dict((i, c) for i, c in enumerate(alphabet)) model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
#%% Preparing training data #numpy.random.seed(176)

# prepare the dataset of input to output pairs encoded as integers model1.fit(X1, y1, nb_epoch=500, batch_size=1, verbose=2)
seq_length = 3
dataX = [] # After we fit the model we can evaluate and summarize the performance
dataY = [] # summarize performance of the model
for i in range(0, len(alphabet) - seq_length, 1): scores = model1.evaluate(X1, y1, verbose=0)
seq_in = alphabet[i:i + seq_length] print("Model Accuracy: %.2f%%" % (scores[1]*100))
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out]) # %% We can then re-run the training data through the network and generate
print (seq_in, '->', seq_out) predictions,
# converting both the input and output pairs back into their original character format to
#%% We need to reshape the NumPy array into a format expected by the LSTM networks, get a visual idea of how well the network learned the problem.
that is [samples, time steps, features].
# reshape X to be [samples, time steps, features] # demonstrate some model predictions
X1 = numpy.reshape(dataX, (len(dataX), seq_length, 1)) for pattern in dataX:
x = numpy.reshape(pattern, (1, len(pattern), 1))
# Once reshaped, we can then normalize the input integers to the range 0-to-1, the range of x = x / float(len(alphabet))
the sigmoid activation functions used by the LSTM network. prediction = model1.predict(x, verbose=0)
# normalize index = numpy.argmax(prediction)
X1 = X1 / float(len(alphabet)) result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
# Finally, we can think of this problem as a sequence classification task, where each of the 26 print (seq_in, "->", result)
letters represents a different class.
# As such, we can convert the output (y) to a one hot encoding
# one hot encode the output variable
y1 = np_utils.to_categorical(dataY)
http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ 22
23
24
https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent 25
26
Suggested setting for LSTM Hyperparameter
Tuning
• For LSTMs, use the softsign activation function over tanh (it’s faster and
less prone to saturation (vanishing gradient) (~0 gradients)).
27
Bollerslev, Patton and Quaedvlieg (2016)
• Improved version for time series modeling of realized

variance
• Improved Heterogeneous Autoregressive regression
(HAR)
Typical HAR : Daily, Weekly, Monthly
28
Work in progress
Long Short Term Memory network (LSTM)
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)
Investigation:
• Possible to extract features from different time horizons ?
– Daily, Weekly, Monthly, Intraday
• Model structure ?
– Number of layers ? Activation functions
• How to prevent over-fitting ?
– Types of loss function
• What types of information can be used ?
– Numerical, News, Comment from Social network
29
Make use of different types of information ?
Stock market
• Price
• Volatility
Personal blogs • Volume Economic factors
• Multimedia • CPI
commentary • GDP
• Retail sales
News media Company factors

• Bloomberg Volatility • P/E ratio
• Releases from • ROE
Stock exchange Model • Debt ratio
Machine Learning techniques (more flexible)

Time series approach (more rigid)
30
If everything goes right …
The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/

31
Why Deep Learning? (Slide by Andrew Ng, Stanford University)
Thank you for your kind attention.
Hope you find this presentation interesting.
32
Reference
• A Beginner's Guide to Recurrent Networks and LSTMs - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM
• Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
• Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
• Keras recurrent tutorial
• https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent
• Deep learning Wikipedia
• https://en.wikipedia.org/wiki/Deep_learning
• What Is The Difference Between Deep Learning, Machine Learning and AI?
• https://www.forbes.com/sites/bernardmarr/2016/12/08/what-is-the-difference-between-deep-learning-machine-learning-and-
ai/#496301bc26cf
• What is Deep Learning? (by Jason Brownlee)
• http://machinelearningmastery.com/what-is-deep-learning/
• Do you recommend using Theano or Tensor Flow as Keras' backend? - Quora
• https://www.quora.com/Do-you-recommend-using-Theano-or-Tensor-Flow-as-Keras-backend
• Backend - Keras Documentation
• https://keras.io/backend/
• Ill-Conditioning in Neural Networks
• ftp://ftp.sas.com/pub/neural/illcond/illcond.html
• Understanding LSTM Networks
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Getting started with the Keras Sequential model
• https://keras.io/getting-started/sequential-model-guide/
• Tim Bollerslev, Andrew J. Patton, Rogier Quaedvlieg (2016) Exploiting the errors: A simple approach for improved volatility forecasting,
Journal of Econometrics, Volume 192, Issue 1, 2016, Pages 1-18
33
View publication stats

On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

Uploaded by

Copyright:

Available Formats

On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Presentation · June 2017

Carlin Chun-fai Chu

High-frequency volatility View project

Financial Trading View project

The user has requested enhancement of the downloaded file.

1st International Conference on Econometrics and Statistics

Carlin Chu - The Open University of Hong Kong (Hong Kong)

The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/

• Ancient China board game: GO

• Info from News article stream +

• Prediction for 10 company using

• Recurrent NN: Long Short-Term

Recurrent Neural Network (RNN)

Long Short Term Memory network (LSTM)

• run on top of either TensorFlow / Theano / CNTK

• utilize both CPU and GPU

• Part of AlphaGo is written on TensorFlow (for

• Which one is better ?

• How to switch the backend ?

• Normalize the data (Transformation)

from sklearn.preprocessing import MinMaxScaler

# normalize the dataset

• Data preparation to a 3D dataset

b) Separate the input feature data and target value data

c) Reshape the padded input data to a 3 dimensional dataset

• Data preparation to a 3D dataset

# Procedure a & b: Padding and Separate the data # Utility function

# create and fit the LSTM network

• Model training (tackle over-fit issue)

# Model training (with validation dataset, prevent over-fit)

• How to frame the data in an appropriate way

• Normalization gives a better performance

#%% Step 2: Data preparation to a 3D dataset #%% Step 5: Model prediction

# create and fit the LSTM network

#%% Preparing training data #numpy.random.seed(176)

• Improved version for time series modeling of realized

Typical HAR : Daily, Weekly, Monthly

News media Company factors

Machine Learning techniques (more flexible)

The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/

You might also like