Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/325170683

On deep machine learning & time series models: A case study with the use of
Keras

Presentation · June 2017


DOI: 10.13140/RG.2.2.34543.94888

CITATIONS READS

0 5,599

1 author:

Carlin Chun-fai Chu


The Open University of Hong Kong
14 PUBLICATIONS   5 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

High-frequency volatility View project

Financial Trading View project

All content following this page was uploaded by Carlin Chun-fai Chu on 16 May 2018.

The user has requested enhancement of the downloaded file.


On deep machine learning & time series models:
A case study with the use of Keras

1st International Conference on Econometrics and Statistics


Session EO256: Business analytics

Carlin Chu - The Open University of Hong Kong (Hong Kong)


ccfchu@ouhk.edu.hk

1
Content
• Why deep learning ?
• What is LSTM ?
• What is Keras ? Characteristics of Keras
• Suggested steps for LSTM coding
• Example codes
• Bollerslev's time series model
• Work-in-progress

2
Why deep learning ?
• Deep learning
– Artificial Neural Networks with > 1 hidden layer
– Involves a lot of data for training
– Different level of abstractions

The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/


Why Deep Learning? (Slide by Yann LeCun)
3
Applications of deep learning

• Ancient China board game: GO


• Number of moves > total number of atoms in the world
– Exhaustive search is not possible.
• Deep Neural Network & Advanced tree search & Reinforcement learning

4
Akita et al. (2016)
News (Textual) + Stock Price (Numerical)

• Info from News article stream +


Daily open price  Close price

• Prediction for 10 company using


input dimension=1000

• Recurrent NN: Long Short-Term


Memory (LSTM)

• Deep Learning for Stock Prediction Using Numerical and Textual Information (Akita et al. 2016)

5
What is LSTM ?
Traditional Artificial Neural Network (ANN)
• no notion of time ordering
• map the current input feature(s) to the predicted target variable(s)

Recurrent Neural Network (RNN)


• with 'loops' which allow information to persist.
• multiple copies of the same network, each passing a message to a successor.

Long Short Term Memory network (LSTM)


• special kind of RNN with
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)

A short review can be found on ‘A Beginner’s Guide to Recurrent Networks and LSTMs’
• https://deeplearning4j.org/lstm.html

6
Characteristics of Keras
• high-level neural networks API, written in Python

• run on top of either TensorFlow / Theano / CNTK

• utilize both CPU and GPU

• Part of AlphaGo is written on TensorFlow (for


distributed computing)

7
Backend: Theano or TensorFlow ?

• Which one is better ?


– Distributed setting/newer software
 TensorFlow
– Recurrent network / legacy application
 Theano

• How to switch the backend ?


– Locate the .json file
– Change the ’backend’ field

8
Suggested steps for LSTM coding
1. Normalize the data (Transformation)
2. Data preparation to a 3D dataset
3. Model specification
4. Model training (tackle over-fit issue)
5. Prediction
6. Inverse transformation

9
Suggested steps for LSTM coding (1)

• Normalize the data (Transformation)


• Transformation of input and target variables
– tends to make the training process better behaved by improving the numerical condition of the optimization problem
– ensuring that various default values involved in initialization and termination are appropriate.
– ftp://ftp.sas.com/pub/neural/illcond/illcond.html

from sklearn.preprocessing import MinMaxScaler

# normalize the dataset


scaler = MinMaxScaler(feature_range=(-1, 1))
normalized_data = scaler.fit_transform(input_data)

10
Suggested steps for LSTM coding (2)

• Data preparation to a 3D dataset

Time series
data a) Padding original data series with duplicated/repeated values

b) Separate the input feature data and target value data

c) Reshape the padded input data to a 3 dimensional dataset


[samples, time steps, features]

3D dataset

11
Suggested steps for LSTM coding (2)

• Data preparation to a 3D dataset

# Procedure a & b: Padding and Separate the data # Utility function


# convert an array of values into a input feature and target value
look_back = 3
def create_dataset(dataset, look_back=1):
trainX, trainY = create_dataset(normalized_data, look_back)
dataX, dataY = [], []
for i in range(len(dataset)-look_back):
# Procedure c: Reshape into 3D dataset a = dataset[i:(i+look_back), 0]
# [samples, time steps, features] dataX.append(a)
dataY.append(dataset[i + look_back, 0])
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
return numpy.array(dataX), numpy.array(dataY)

12
Suggested steps for LSTM coding (3)

• Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# create and fit the LSTM network


model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

13
Suggested steps for LSTM coding (4 & 5)

• Model training (tackle over-fit issue)


# Model training (without validation dataset)
model.fit(trainX, trainY, nb_epoch=100, batch_size=100)

# Model training (with validation dataset, prevent over-fit)


model.fit(trainX, trainY, nb_epoch=100, batch_size=100,
validation_data=(x_val, y_val) )

• Prediction
# Model prediction
testPredict = model.predict(testX)

14
Suggested steps for LSTM coding (6)

• Inverse transformation

# inverse transformation
testPredict = scaler.inverse_transform(testPredict)

15
Observations

• How to frame the data in an appropriate way


for sequence learning
– Time-steps vs Features

• Normalization gives a better performance


– Fewer epoch is needed for training
– E.g. Epoch = >300 vs 100

16
import numpy # Version 2: correct usage
input_data=numpy.array([[1.0], [2.0], [3.0], [4],[5],[6],[7],[8],[9]]) model_v2 = Sequential()
model_v2.add(LSTM(4, input_dim=1))
#%% Step 1: normalize the dataset model_v2.add(Dense(1))
import matplotlib.pyplot as plt model_v2.compile(loss='mean_squared_error', optimizer=‘sgd')
from sklearn.preprocessing import MinMaxScaler
#%% Step 4: Model training
scaler = MinMaxScaler(feature_range=(-1, 1)) # Model training (without validation dataset)
normalized_data = scaler.fit_transform(input_data) model_v1.fit(trainX_3D_v1, trainY, nb_epoch=200, batch_size=100, verbose=2)
plt.plot(normalized_data) model_v2.fit(trainX_3D_v2, trainY, nb_epoch=200, batch_size=100, verbose=2)

#%% Step 2: Data preparation to a 3D dataset #%% Step 5: Model prediction


# Utility function testX=numpy.array([[3.0], [4.0], [5.0], [6.0]])
# convert an array of values into a input feature and target value # pay special attention on it....
def data_preparation(input_data, model_input_length=1): normalized_testX = scaler.transform(testX) # do not use fit_transform
dataX, target = [], []
for i in range(len(input_data)-model_input_length): testX_3D_v1=numpy.reshape(normalized_testX, (1, 1, look_back))
dataX.append(input_data[i:i+model_input_length, 0]) testPredict_v1 = model_v1.predict(testX_3D_v1)
target.append(input_data[i + model_input_length, 0])
return numpy.array(dataX), numpy.array(target) testX_3D_v2=numpy.reshape(normalized_testX, (1, look_back, 1))
testPredict_v2 = model_v2.predict(testX_3D_v2)
# Procedure a & b: Padding and Separate the data
model_input_length = 4 # the length of data used for modeling #%% Step 6: inverse transformation
trainX, trainY = data_preparation(normalized_data, model_input_length) testPredict_final_v1 = scaler.inverse_transform(testPredict_v1)
testPredict_final_v2 = scaler.inverse_transform(testPredict_v2)
# Procedure c: Reshape into 3D dataset
# [samples, time steps, features] print('*** Final result: Version 1 ***')
trainX_3D_v1 = numpy.reshape(trainX, (trainX.shape[0], 1, model_input_length)) # misuse print(testPredict_final_v1)
trainX_3D_v2 = numpy.reshape(trainX, (trainX.shape[0], model_input_length, 1)) # correct print('*** Final result: Version 2 ***')
print(testPredict_final_v2)
#%% Step 3: Model specification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# create and fit the LSTM network


# Version 1: misuse
model_v1 = Sequential()
model_v1.add(LSTM(4, input_dim=look_back))
model_v1.add(Dense(1))
model_v1.compile(loss='mean_squared_error', optimizer=‘sgd')

17
Example code of time series modeling using Keras (1)

18
# LSTM for international airline passengers problem with window regression framing # make predictions
import numpy trainPredict = model.predict(trainX)
import matplotlib.pyplot as plt testPredict = model.predict(testX)
import pandas
# invert predictions
import math
from keras.models import Sequential trainPredict = scaler.inverse_transform(trainPredict)
from keras.layers import Dense trainY = scaler.inverse_transform([trainY])
from keras.layers import LSTM testPredict = scaler.inverse_transform(testPredict)
from sklearn.preprocessing import MinMaxScaler testY = scaler.inverse_transform([testY])
from sklearn.metrics import mean_squared_error
# calculate root mean squared error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1): trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
dataX, dataY = [], [] print('Train Score: %.2f RMSE' % (trainScore))
for i in range(len(dataset)-look_back-1): testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
a = dataset[i:(i+look_back), 0] print('Test Score: %.2f RMSE' % (testScore))
dataX.append(a)
# shift train predictions for plotting
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY) trainPredictPlot = numpy.empty_like(dataset)
# fix random seed for reproducibility trainPredictPlot[:, :] = numpy.nan
numpy.random.seed(7) trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# load the dataset # shift test predictions for plotting
dataframe = pandas.read_csv('international-airline-passengers.csv', usecols=[1],
testPredictPlot = numpy.empty_like(dataset)
engine='python', skipfooter=3)
dataset = dataframe.values testPredictPlot[:, :] = numpy.nan
dataset = dataset.astype('float32') testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict
# normalize the dataset # plot baseline and predictions
scaler = MinMaxScaler(feature_range=(0, 1)) plt.plot(scaler.inverse_transform(dataset))
dataset = scaler.fit_transform(dataset)
plt.plot(trainPredictPlot)
# split into train and test sets
train_size = int(len(dataset) * 0.67) plt.plot(testPredictPlot)
test_size = len(dataset) - train_size plt.show()
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_dim=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=100, batch_size=1, verbose=2)

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ 19
 Ref: http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-
networks-python-keras/ 20
Example code of time series modeling using Keras (2)

21
# define the raw dataset # %% create and fit the model
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" model1 = Sequential()
# create mapping of characters to integers (0-25) and the reverse model1.add(LSTM(32, input_shape=(X1.shape[1], X1.shape[2])))
char_to_int = dict((c, i) for i, c in enumerate(alphabet)) model1.add(Dense(y1.shape[1], activation='softmax'))
int_to_char = dict((i, c) for i, c in enumerate(alphabet)) model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#%% Preparing training data #numpy.random.seed(176)


# prepare the dataset of input to output pairs encoded as integers model1.fit(X1, y1, nb_epoch=500, batch_size=1, verbose=2)
seq_length = 3
dataX = [] # After we fit the model we can evaluate and summarize the performance
dataY = [] # summarize performance of the model
for i in range(0, len(alphabet) - seq_length, 1): scores = model1.evaluate(X1, y1, verbose=0)
seq_in = alphabet[i:i + seq_length] print("Model Accuracy: %.2f%%" % (scores[1]*100))
seq_out = alphabet[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out]) # %% We can then re-run the training data through the network and generate
print (seq_in, '->', seq_out) predictions,
# converting both the input and output pairs back into their original character format to
#%% We need to reshape the NumPy array into a format expected by the LSTM networks, get a visual idea of how well the network learned the problem.
that is [samples, time steps, features].
# reshape X to be [samples, time steps, features] # demonstrate some model predictions
X1 = numpy.reshape(dataX, (len(dataX), seq_length, 1)) for pattern in dataX:
x = numpy.reshape(pattern, (1, len(pattern), 1))
# Once reshaped, we can then normalize the input integers to the range 0-to-1, the range of x = x / float(len(alphabet))
the sigmoid activation functions used by the LSTM network. prediction = model1.predict(x, verbose=0)
# normalize index = numpy.argmax(prediction)
X1 = X1 / float(len(alphabet)) result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
# Finally, we can think of this problem as a sequence classification task, where each of the 26 print (seq_in, "->", result)
letters represents a different class.
# As such, we can convert the output (y) to a one hot encoding
# one hot encode the output variable
y1 = np_utils.to_categorical(dataY)

http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ 22
23
Example code of time series modeling using Keras (3)

24
https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent 25
26
Suggested setting for LSTM Hyperparameter
Tuning
• For LSTMs, use the softsign activation function over tanh (it’s faster and
less prone to saturation (vanishing gradient) (~0 gradients)).
• https://deeplearning4j.org/lstm.html

27
Bollerslev, Patton and Quaedvlieg (2016)

• Improved version for time series modeling of realized


variance
• Improved Heterogeneous Autoregressive regression
(HAR)

Typical HAR : Daily, Weekly, Monthly

28
Work in progress
Long Short Term Memory network (LSTM)
• Adaptive forget gate, throw away information
• Keep information with time gaps of unknown/different size(s)

Investigation:
• Possible to extract features from different time horizons ?
– Daily, Weekly, Monthly, Intraday
• Model structure ?
– Number of layers ? Activation functions
• How to prevent over-fitting ?
– Types of loss function
• What types of information can be used ?
– Numerical, News, Comment from Social network

29
Make use of different types of information ?
Stock market
• Price
• Volatility
Personal blogs • Volume Economic factors
• Multimedia • CPI
commentary • GDP
• Retail sales

News media Company factors


• Bloomberg Volatility • P/E ratio
• Releases from • ROE
Stock exchange Model • Debt ratio

Machine Learning techniques (more flexible)


Time series approach (more rigid)
30
If everything goes right …

The picture is extracted from: http://machinelearningmastery.com/what-is-deep-learning/


31
Why Deep Learning? (Slide by Andrew Ng, Stanford University)
Thank you for your kind attention.
Hope you find this presentation interesting.

32
Reference
• A Beginner's Guide to Recurrent Networks and LSTMs - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM
• https://deeplearning4j.org/lstm.html
• Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
• Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras (by Jason Brownlee)
• http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
• Keras recurrent tutorial
• https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent
• Deep learning Wikipedia
• https://en.wikipedia.org/wiki/Deep_learning
• What Is The Difference Between Deep Learning, Machine Learning and AI?
• https://www.forbes.com/sites/bernardmarr/2016/12/08/what-is-the-difference-between-deep-learning-machine-learning-and-
ai/#496301bc26cf
• What is Deep Learning? (by Jason Brownlee)
• http://machinelearningmastery.com/what-is-deep-learning/
• Do you recommend using Theano or Tensor Flow as Keras' backend? - Quora
• https://www.quora.com/Do-you-recommend-using-Theano-or-Tensor-Flow-as-Keras-backend
• Backend - Keras Documentation
• https://keras.io/backend/
• Ill-Conditioning in Neural Networks
• ftp://ftp.sas.com/pub/neural/illcond/illcond.html
• Understanding LSTM Networks
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Getting started with the Keras Sequential model
• https://keras.io/getting-started/sequential-model-guide/
• Tim Bollerslev, Andrew J. Patton, Rogier Quaedvlieg (2016) Exploiting the errors: A simple approach for improved volatility forecasting,
Journal of Econometrics, Volume 192, Issue 1, 2016, Pages 1-18

33
View publication stats

You might also like